* [PATCH 0/2] block: Fix deadlock when merging requests with BFQ @ 2021-05-20 22:33 Jan Kara 2021-05-20 22:33 ` [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() Jan Kara 2021-05-20 22:33 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 0 siblings, 2 replies; 18+ messages in thread From: Jan Kara @ 2021-05-20 22:33 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Jan Kara Hello, This patch series fixes a lockdep complaint and a possible deadlock that can happen when blk_mq_sched_try_insert_merge() merges and frees a request. Honza ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() 2021-05-20 22:33 [PATCH 0/2] block: Fix deadlock when merging requests with BFQ Jan Kara @ 2021-05-20 22:33 ` Jan Kara 2021-05-21 0:42 ` Ming Lei 2021-05-20 22:33 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 1 sibling, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-05-20 22:33 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Jan Kara Most of the merging happens at bio level. There should not be much merging happening at request level anymore. Furthermore if we backmerged a request to the previous one, the chances to be able to merge the result to even previous request are slim - that could succeed only if requests were inserted in 2 1 3 order. Merging more requests in elv_attempt_insert_merge() will be difficult to handle when we want to pass requests to free back to the caller of blk_mq_sched_try_insert_merge(). So just remove the possibility of merging multiple requests in elv_attempt_insert_merge(). Signed-off-by: Jan Kara <jack@suse.cz> --- block/elevator.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/block/elevator.c b/block/elevator.c index 440699c28119..098f4bd226f5 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -350,12 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, * we can append 'rq' to an existing request, so we can throw 'rq' away * afterwards. * - * Returns true if we merged, false otherwise + * Returns true if we merged, false otherwise. */ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) { struct request *__rq; - bool ret; if (blk_queue_nomerges(q)) return false; @@ -369,21 +368,13 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) if (blk_queue_noxmerges(q)) return false; - ret = false; /* * See if our hash lookup can find a potential backmerge. */ - while (1) { - __rq = elv_rqhash_find(q, blk_rq_pos(rq)); - if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) - break; - - /* The merged request could be merged with others, try again */ - ret = true; - rq = __rq; - } - - return ret; + __rq = elv_rqhash_find(q, blk_rq_pos(rq)); + if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) + return false; + return true; } void elv_merged_request(struct request_queue *q, struct request *rq, -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() 2021-05-20 22:33 ` [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() Jan Kara @ 2021-05-21 0:42 ` Ming Lei 2021-05-21 11:53 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2021-05-21 0:42 UTC (permalink / raw) To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > Most of the merging happens at bio level. There should not be much > merging happening at request level anymore. Furthermore if we backmerged > a request to the previous one, the chances to be able to merge the > result to even previous request are slim - that could succeed only if > requests were inserted in 2 1 3 order. Merging more requests in Right, but some workload has this kind of pattern. For example of qemu IO emulation, it often can be thought as single job, native aio, direct io with high queue depth. IOs is originated from one VM, but may be from multiple jobs in the VM, so bio merge may not hit much because of IO emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple jobs via the SQ transport), but request merge can really make a difference, see recent patch in the following link: https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t > elv_attempt_insert_merge() will be difficult to handle when we want to > pass requests to free back to the caller of > blk_mq_sched_try_insert_merge(). So just remove the possibility of > merging multiple requests in elv_attempt_insert_merge(). This way will cause regression. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() 2021-05-21 0:42 ` Ming Lei @ 2021-05-21 11:53 ` Jan Kara 2021-05-21 13:12 ` Ming Lei 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-05-21 11:53 UTC (permalink / raw) To: Ming Lei; +Cc: Jan Kara, Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente On Fri 21-05-21 08:42:16, Ming Lei wrote: > On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > > Most of the merging happens at bio level. There should not be much > > merging happening at request level anymore. Furthermore if we backmerged > > a request to the previous one, the chances to be able to merge the > > result to even previous request are slim - that could succeed only if > > requests were inserted in 2 1 3 order. Merging more requests in > > Right, but some workload has this kind of pattern. > > For example of qemu IO emulation, it often can be thought as single job, > native aio, direct io with high queue depth. IOs is originated from one VM, but > may be from multiple jobs in the VM, so bio merge may not hit much because of IO > emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple > jobs via the SQ transport), but request merge can really make a difference, see > recent patch in the following link: > > https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t Oh, request merging definitely does make a difference. But the elevator hash & merge logic I'm modifying here is used only by BFQ and MQ-DEADLINE AFAICT. And these IO schedulers will already call blk_mq_sched_try_merge() from their \.bio_merge handler which gets called from blk_mq_submit_bio(). So all the merging that can happen in the code I remove should have already happened. Or am I missing something? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() 2021-05-21 11:53 ` Jan Kara @ 2021-05-21 13:12 ` Ming Lei 2021-05-21 13:44 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2021-05-21 13:12 UTC (permalink / raw) To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente On Fri, May 21, 2021 at 01:53:54PM +0200, Jan Kara wrote: > On Fri 21-05-21 08:42:16, Ming Lei wrote: > > On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > > > Most of the merging happens at bio level. There should not be much > > > merging happening at request level anymore. Furthermore if we backmerged > > > a request to the previous one, the chances to be able to merge the > > > result to even previous request are slim - that could succeed only if > > > requests were inserted in 2 1 3 order. Merging more requests in > > > > Right, but some workload has this kind of pattern. > > > > For example of qemu IO emulation, it often can be thought as single job, > > native aio, direct io with high queue depth. IOs is originated from one VM, but > > may be from multiple jobs in the VM, so bio merge may not hit much because of IO > > emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple > > jobs via the SQ transport), but request merge can really make a difference, see > > recent patch in the following link: > > > > https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t > > Oh, request merging definitely does make a difference. But the elevator > hash & merge logic I'm modifying here is used only by BFQ and MQ-DEADLINE > AFAICT. And these IO schedulers will already call blk_mq_sched_try_merge() > from their \.bio_merge handler which gets called from blk_mq_submit_bio(). > So all the merging that can happen in the code I remove should have already > happened. Or am I missing something? There might be at least two reasons: 1) when .bio_merge() is called, some requests are kept in plug list, so the bio may not be merged to requests in scheduler queue; when flushing plug list and inserts these requests to scheduler queue, we have to try to merge them further 2) only blk_mq_sched_try_insert_merge() is capable of doing aggressive request merge, such as, when req A is merged to req B, the function will continue to try to merge req B with other in-queue requests, until no any further merge can't be done; neither blk_mq_sched_try_merge() nor blk_attempt_plug_merge can do such aggressive request merge. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() 2021-05-21 13:12 ` Ming Lei @ 2021-05-21 13:44 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2021-05-21 13:44 UTC (permalink / raw) To: Ming Lei; +Cc: Jan Kara, Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente On Fri 21-05-21 21:12:14, Ming Lei wrote: > On Fri, May 21, 2021 at 01:53:54PM +0200, Jan Kara wrote: > > On Fri 21-05-21 08:42:16, Ming Lei wrote: > > > On Fri, May 21, 2021 at 12:33:52AM +0200, Jan Kara wrote: > > > > Most of the merging happens at bio level. There should not be much > > > > merging happening at request level anymore. Furthermore if we backmerged > > > > a request to the previous one, the chances to be able to merge the > > > > result to even previous request are slim - that could succeed only if > > > > requests were inserted in 2 1 3 order. Merging more requests in > > > > > > Right, but some workload has this kind of pattern. > > > > > > For example of qemu IO emulation, it often can be thought as single job, > > > native aio, direct io with high queue depth. IOs is originated from one VM, but > > > may be from multiple jobs in the VM, so bio merge may not hit much because of IO > > > emulation timing(virtio-scsi/blk's MQ, or IO can be interleaved from multiple > > > jobs via the SQ transport), but request merge can really make a difference, see > > > recent patch in the following link: > > > > > > https://lore.kernel.org/linux-block/3f61e939-d95a-1dd1-6870-e66795cfc1b1@suse.de/T/#t > > > > Oh, request merging definitely does make a difference. But the elevator > > hash & merge logic I'm modifying here is used only by BFQ and MQ-DEADLINE > > AFAICT. And these IO schedulers will already call blk_mq_sched_try_merge() > > from their \.bio_merge handler which gets called from blk_mq_submit_bio(). > > So all the merging that can happen in the code I remove should have already > > happened. Or am I missing something? > > There might be at least two reasons: > > 1) when .bio_merge() is called, some requests are kept in plug list, so > the bio may not be merged to requests in scheduler queue; when flushing plug > list and inserts these requests to scheduler queue, we have to try to > merge them further Oh, right, I forgot that plug list stores already requests, not bios. > 2) only blk_mq_sched_try_insert_merge() is capable of doing aggressive > request merge, such as, when req A is merged to req B, the function will > continue to try to merge req B with other in-queue requests, until no > any further merge can't be done; neither blk_mq_sched_try_merge() nor > blk_attempt_plug_merge can do such aggressive request merge. Yes, fair point. I was thinking only about a few requests but it the request sequence is like 0 2 4 6 ... 2n 1 3 5 7 ... 2n+1, then bio merging will result in 'n' requests while request merging will be able to get it down to 1 request. I'll keep the recursive merge and pass back list of requests to free instead. Thanks for explanations! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-20 22:33 [PATCH 0/2] block: Fix deadlock when merging requests with BFQ Jan Kara 2021-05-20 22:33 ` [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() Jan Kara @ 2021-05-20 22:33 ` Jan Kara 2021-05-21 0:57 ` Ming Lei 1 sibling, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-05-20 22:33 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Jan Kara Lockdep complains about lock inversion between ioc->lock and bfqd->lock: bfqd -> ioc: put_io_context+0x33/0x90 -> ioc->lock grabbed blk_mq_free_request+0x51/0x140 blk_put_request+0xe/0x10 blk_attempt_req_merge+0x1d/0x30 elv_attempt_insert_merge+0x56/0xa0 blk_mq_sched_try_insert_merge+0x4b/0x60 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed blk_mq_sched_insert_requests+0xd6/0x2b0 blk_mq_flush_plug_list+0x154/0x280 blk_finish_plug+0x40/0x60 ext4_writepages+0x696/0x1320 do_writepages+0x1c/0x80 __filemap_fdatawrite_range+0xd7/0x120 sync_file_range+0xac/0xf0 ioc->bfqd: bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed put_io_context_active+0x78/0xb0 -> ioc->lock grabbed exit_io_context+0x48/0x50 do_exit+0x7e9/0xdd0 do_group_exit+0x54/0xc0 To avoid this inversion we change blk_mq_sched_try_insert_merge() to not free the merged request but rather leave that upto the caller similarly to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure to free the request after dropping bfqd->lock. As a nice consequence, this also makes locking rules in bfq_finish_requeue_request() more consistent. Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: Jan Kara <jack@suse.cz> --- block/bfq-iosched.c | 20 +++++++------------- block/blk-merge.c | 19 ++++++++----------- block/blk.h | 2 +- block/mq-deadline.c | 4 +++- 4 files changed, 19 insertions(+), 26 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index acd1f881273e..4afdf0b93124 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2317,9 +2317,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free); + spin_unlock_irq(&bfqd->lock); if (free) blk_mq_free_request(free); - spin_unlock_irq(&bfqd->lock); return ret; } @@ -5933,6 +5933,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, spin_lock_irq(&bfqd->lock); if (blk_mq_sched_try_insert_merge(q, rq)) { spin_unlock_irq(&bfqd->lock); + blk_put_request(rq); return; } @@ -6376,6 +6377,7 @@ static void bfq_finish_requeue_request(struct request *rq) { struct bfq_queue *bfqq = RQ_BFQQ(rq); struct bfq_data *bfqd; + unsigned long flags; /* * rq either is not associated with any icq, or is an already @@ -6393,18 +6395,12 @@ static void bfq_finish_requeue_request(struct request *rq) rq->io_start_time_ns, rq->cmd_flags); + spin_lock_irqsave(&bfqd->lock, flags); if (likely(rq->rq_flags & RQF_STARTED)) { - unsigned long flags; - - spin_lock_irqsave(&bfqd->lock, flags); - if (rq == bfqd->waited_rq) bfq_update_inject_limit(bfqd, bfqq); bfq_completed_request(bfqq, bfqd); - bfq_finish_requeue_request_body(bfqq); - - spin_unlock_irqrestore(&bfqd->lock, flags); } else { /* * Request rq may be still/already in the scheduler, @@ -6414,18 +6410,16 @@ static void bfq_finish_requeue_request(struct request *rq) * inconsistencies in the time interval from the end * of this function to the start of the deferred work. * This situation seems to occur only in process - * context, as a consequence of a merge. In the - * current version of the code, this implies that the - * lock is held. + * context, as a consequence of a merge. */ - if (!RB_EMPTY_NODE(&rq->rb_node)) { bfq_remove_request(rq->q, rq); bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags); } - bfq_finish_requeue_request_body(bfqq); } + bfq_finish_requeue_request_body(bfqq); + spin_unlock_irqrestore(&bfqd->lock, flags); /* * Reset private fields. In case of a requeue, this allows diff --git a/block/blk-merge.c b/block/blk-merge.c index 4d97fb6dd226..1398b52a24b4 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q, return NULL; } -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, - struct request *next) +/* + * Try to merge 'next' into 'rq'. Return true if the merge happened, false + * otherwise. The caller is responsible for freeing 'next' if the merge + * happened. + */ +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, + struct request *next) { - struct request *free; - - free = attempt_merge(q, rq, next); - if (free) { - blk_put_request(free); - return 1; - } - - return 0; + return attempt_merge(q, rq, next); } bool blk_rq_merge_ok(struct request *rq, struct bio *bio) diff --git a/block/blk.h b/block/blk.h index 8b3591aee0a5..99ef4f7e7a70 100644 --- a/block/blk.h +++ b/block/blk.h @@ -225,7 +225,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *, void __blk_queue_split(struct bio **bio, unsigned int *nr_segs); int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs); -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, struct request *next); unsigned int blk_recalc_rq_segments(struct request *rq); void blk_rq_set_mixed_merge(struct request *rq); diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 8eea2cbf2bf4..64dd78005ae6 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -494,8 +494,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, */ blk_req_zone_write_unlock(rq); - if (blk_mq_sched_try_insert_merge(q, rq)) + if (blk_mq_sched_try_insert_merge(q, rq)) { + blk_put_request(rq); return; + } trace_block_rq_insert(rq); -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-20 22:33 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara @ 2021-05-21 0:57 ` Ming Lei 2021-05-21 3:29 ` Khazhy Kumykov 0 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2021-05-21 0:57 UTC (permalink / raw) To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote: > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > bfqd -> ioc: > put_io_context+0x33/0x90 -> ioc->lock grabbed > blk_mq_free_request+0x51/0x140 > blk_put_request+0xe/0x10 > blk_attempt_req_merge+0x1d/0x30 > elv_attempt_insert_merge+0x56/0xa0 > blk_mq_sched_try_insert_merge+0x4b/0x60 > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed We could move blk_put_request() into scheduler code, then the lock inversion is avoided. So far only mq-deadline and bfq calls into blk_mq_sched_try_insert_merge(), and this change should be small. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-21 0:57 ` Ming Lei @ 2021-05-21 3:29 ` Khazhy Kumykov 2021-05-21 6:54 ` Ming Lei 0 siblings, 1 reply; 18+ messages in thread From: Khazhy Kumykov @ 2021-05-21 3:29 UTC (permalink / raw) To: Ming Lei; +Cc: Jan Kara, Jens Axboe, linux-block, Paolo Valente [-- Attachment #1: Type: text/plain, Size: 895 bytes --] On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote: > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote: > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > > > bfqd -> ioc: > > put_io_context+0x33/0x90 -> ioc->lock grabbed > > blk_mq_free_request+0x51/0x140 > > blk_put_request+0xe/0x10 > > blk_attempt_req_merge+0x1d/0x30 > > elv_attempt_insert_merge+0x56/0xa0 > > blk_mq_sched_try_insert_merge+0x4b/0x60 > > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > > We could move blk_put_request() into scheduler code, then the lock > inversion is avoided. So far only mq-deadline and bfq calls into > blk_mq_sched_try_insert_merge(), and this change should be small. We'd potentially be putting multiple requests if we keep the recursive merge. Could we move backmerge loop to the schedulers, perhaps? > > > Thanks, > Ming > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3996 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-21 3:29 ` Khazhy Kumykov @ 2021-05-21 6:54 ` Ming Lei 2021-05-21 12:05 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2021-05-21 6:54 UTC (permalink / raw) To: Khazhy Kumykov; +Cc: Jan Kara, Jens Axboe, linux-block, Paolo Valente On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote: > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote: > > > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote: > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > > > > > bfqd -> ioc: > > > put_io_context+0x33/0x90 -> ioc->lock grabbed > > > blk_mq_free_request+0x51/0x140 > > > blk_put_request+0xe/0x10 > > > blk_attempt_req_merge+0x1d/0x30 > > > elv_attempt_insert_merge+0x56/0xa0 > > > blk_mq_sched_try_insert_merge+0x4b/0x60 > > > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > > > > We could move blk_put_request() into scheduler code, then the lock > > inversion is avoided. So far only mq-deadline and bfq calls into > > blk_mq_sched_try_insert_merge(), and this change should be small. > > We'd potentially be putting multiple requests if we keep the recursive merge. Oh, we still can pass a list to hold all requests to be freed, then free them all outside in scheduler code. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-21 6:54 ` Ming Lei @ 2021-05-21 12:05 ` Jan Kara 2021-05-21 13:36 ` Ming Lei 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-05-21 12:05 UTC (permalink / raw) To: Ming Lei; +Cc: Khazhy Kumykov, Jan Kara, Jens Axboe, linux-block, Paolo Valente On Fri 21-05-21 14:54:09, Ming Lei wrote: > On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote: > > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote: > > > > > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote: > > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > > > > > > > bfqd -> ioc: > > > > put_io_context+0x33/0x90 -> ioc->lock grabbed > > > > blk_mq_free_request+0x51/0x140 > > > > blk_put_request+0xe/0x10 > > > > blk_attempt_req_merge+0x1d/0x30 > > > > elv_attempt_insert_merge+0x56/0xa0 > > > > blk_mq_sched_try_insert_merge+0x4b/0x60 > > > > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > > > > > > We could move blk_put_request() into scheduler code, then the lock > > > inversion is avoided. So far only mq-deadline and bfq calls into > > > blk_mq_sched_try_insert_merge(), and this change should be small. > > > > We'd potentially be putting multiple requests if we keep the recursive merge. > > Oh, we still can pass a list to hold all requests to be freed, then free > them all outside in scheduler code. If we cannot really get rid of the recursive merge (not yet convinced), this is also an option I've considered. I was afraid what can we use in struct request to attach request to a list but it seems .merged_requests handlers remove the request from the queuelist already so we should be fine using that. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-21 12:05 ` Jan Kara @ 2021-05-21 13:36 ` Ming Lei 2021-05-21 13:47 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2021-05-21 13:36 UTC (permalink / raw) To: Jan Kara; +Cc: Khazhy Kumykov, Jens Axboe, linux-block, Paolo Valente On Fri, May 21, 2021 at 02:05:51PM +0200, Jan Kara wrote: > On Fri 21-05-21 14:54:09, Ming Lei wrote: > > On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote: > > > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote: > > > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > > > > > > > > > bfqd -> ioc: > > > > > put_io_context+0x33/0x90 -> ioc->lock grabbed > > > > > blk_mq_free_request+0x51/0x140 > > > > > blk_put_request+0xe/0x10 > > > > > blk_attempt_req_merge+0x1d/0x30 > > > > > elv_attempt_insert_merge+0x56/0xa0 > > > > > blk_mq_sched_try_insert_merge+0x4b/0x60 > > > > > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > > > > > > > > We could move blk_put_request() into scheduler code, then the lock > > > > inversion is avoided. So far only mq-deadline and bfq calls into > > > > blk_mq_sched_try_insert_merge(), and this change should be small. > > > > > > We'd potentially be putting multiple requests if we keep the recursive merge. > > > > Oh, we still can pass a list to hold all requests to be freed, then free > > them all outside in scheduler code. > > If we cannot really get rid of the recursive merge (not yet convinced), > this is also an option I've considered. I was afraid what can we use in > struct request to attach request to a list but it seems .merged_requests > handlers remove the request from the queuelist already so we should be fine > using that. The request has been removed from scheduler queue, and safe to free, so it is safe to be held in one temporary list. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-21 13:36 ` Ming Lei @ 2021-05-21 13:47 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2021-05-21 13:47 UTC (permalink / raw) To: Ming Lei; +Cc: Jan Kara, Khazhy Kumykov, Jens Axboe, linux-block, Paolo Valente On Fri 21-05-21 21:36:05, Ming Lei wrote: > On Fri, May 21, 2021 at 02:05:51PM +0200, Jan Kara wrote: > > On Fri 21-05-21 14:54:09, Ming Lei wrote: > > > On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote: > > > > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote: > > > > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > > > > > > > > > > > bfqd -> ioc: > > > > > > put_io_context+0x33/0x90 -> ioc->lock grabbed > > > > > > blk_mq_free_request+0x51/0x140 > > > > > > blk_put_request+0xe/0x10 > > > > > > blk_attempt_req_merge+0x1d/0x30 > > > > > > elv_attempt_insert_merge+0x56/0xa0 > > > > > > blk_mq_sched_try_insert_merge+0x4b/0x60 > > > > > > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > > > > > > > > > > We could move blk_put_request() into scheduler code, then the lock > > > > > inversion is avoided. So far only mq-deadline and bfq calls into > > > > > blk_mq_sched_try_insert_merge(), and this change should be small. > > > > > > > > We'd potentially be putting multiple requests if we keep the recursive merge. > > > > > > Oh, we still can pass a list to hold all requests to be freed, then free > > > them all outside in scheduler code. > > > > If we cannot really get rid of the recursive merge (not yet convinced), > > this is also an option I've considered. I was afraid what can we use in > > struct request to attach request to a list but it seems .merged_requests > > handlers remove the request from the queuelist already so we should be fine > > using that. > > The request has been removed from scheduler queue, and safe to free, > so it is safe to be held in one temporary list. Not quite, there's still ->finish_request hook that will be called from blk_mq_free_request() on the request and e.g. BFQ performs quite a lot of cleanup there. But yes, at least queuelist seems to be available for reuse here. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ @ 2021-05-24 10:04 Jan Kara 2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-05-24 10:04 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Ming Lei, Jan Kara Hello, This patch series fixes a lockdep complaint and a possible deadlock that can happen when blk_mq_sched_try_insert_merge() merges and frees a request. Changes since v1: * Remove patch disabling recursing request merging during request insertion * Modified BFQ to cleanup merged request already in its .merge_requests handler * Added code to handle multiple requests that need freeing after being merged Honza ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-24 10:04 [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ Jan Kara @ 2021-05-24 10:04 ` Jan Kara 2021-05-25 0:29 ` Ming Lei 2021-05-28 9:33 ` Paolo Valente 0 siblings, 2 replies; 18+ messages in thread From: Jan Kara @ 2021-05-24 10:04 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Ming Lei, Jan Kara Lockdep complains about lock inversion between ioc->lock and bfqd->lock: bfqd -> ioc: put_io_context+0x33/0x90 -> ioc->lock grabbed blk_mq_free_request+0x51/0x140 blk_put_request+0xe/0x10 blk_attempt_req_merge+0x1d/0x30 elv_attempt_insert_merge+0x56/0xa0 blk_mq_sched_try_insert_merge+0x4b/0x60 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed blk_mq_sched_insert_requests+0xd6/0x2b0 blk_mq_flush_plug_list+0x154/0x280 blk_finish_plug+0x40/0x60 ext4_writepages+0x696/0x1320 do_writepages+0x1c/0x80 __filemap_fdatawrite_range+0xd7/0x120 sync_file_range+0xac/0xf0 ioc->bfqd: bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed put_io_context_active+0x78/0xb0 -> ioc->lock grabbed exit_io_context+0x48/0x50 do_exit+0x7e9/0xdd0 do_group_exit+0x54/0xc0 To avoid this inversion we change blk_mq_sched_try_insert_merge() to not free the merged request but rather leave that upto the caller similarly to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure to free all the merged requests after dropping bfqd->lock. Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: Jan Kara <jack@suse.cz> --- block/bfq-iosched.c | 6 ++++-- block/blk-merge.c | 19 ++++++++----------- block/blk-mq-sched.c | 5 +++-- block/blk-mq-sched.h | 3 ++- block/blk-mq.h | 11 +++++++++++ block/blk.h | 2 +- block/elevator.c | 11 ++++++++--- block/mq-deadline.c | 5 ++++- include/linux/elevator.h | 3 ++- 9 files changed, 43 insertions(+), 22 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 50a29fdf51da..5e076396b588 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2317,9 +2317,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free); + spin_unlock_irq(&bfqd->lock); if (free) blk_mq_free_request(free); - spin_unlock_irq(&bfqd->lock); return ret; } @@ -5933,14 +5933,16 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct bfq_queue *bfqq; bool idle_timer_disabled = false; unsigned int cmd_flags; + LIST_HEAD(free); #ifdef CONFIG_BFQ_GROUP_IOSCHED if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio) bfqg_stats_update_legacy_io(q, rq); #endif spin_lock_irq(&bfqd->lock); - if (blk_mq_sched_try_insert_merge(q, rq)) { + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { spin_unlock_irq(&bfqd->lock); + blk_mq_free_requests(&free); return; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 4d97fb6dd226..1398b52a24b4 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q, return NULL; } -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, - struct request *next) +/* + * Try to merge 'next' into 'rq'. Return true if the merge happened, false + * otherwise. The caller is responsible for freeing 'next' if the merge + * happened. + */ +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, + struct request *next) { - struct request *free; - - free = attempt_merge(q, rq, next); - if (free) { - blk_put_request(free); - return 1; - } - - return 0; + return attempt_merge(q, rq, next); } bool blk_rq_merge_ok(struct request *rq, struct bio *bio) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 714e678f516a..bf0a3dec8226 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -400,9 +400,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, return ret; } -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq) +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free) { - return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq); + return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free); } EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 5b18ab915c65..8b70de4b8d23 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -11,7 +11,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs, struct request **merged_request); bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs); -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free); void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); diff --git a/block/blk-mq.h b/block/blk-mq.h index 81a775171be7..20ef743a3ff6 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -301,6 +301,17 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q, return NULL; } +/* Free all requests on the list */ +static inline void blk_mq_free_requests(struct list_head *list) +{ + while (!list_empty(list)) { + struct request *rq = list_entry_rq(list->next); + + list_del_init(&rq->queuelist); + blk_mq_free_request(rq); + } +} + /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. diff --git a/block/blk.h b/block/blk.h index 8b3591aee0a5..99ef4f7e7a70 100644 --- a/block/blk.h +++ b/block/blk.h @@ -225,7 +225,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *, void __blk_queue_split(struct bio **bio, unsigned int *nr_segs); int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs); -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, struct request *next); unsigned int blk_recalc_rq_segments(struct request *rq); void blk_rq_set_mixed_merge(struct request *rq); diff --git a/block/elevator.c b/block/elevator.c index 440699c28119..62e9c672da7c 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -350,9 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, * we can append 'rq' to an existing request, so we can throw 'rq' away * afterwards. * - * Returns true if we merged, false otherwise + * Returns true if we merged, false otherwise. 'free' will contain all + * requests that need to be freed. */ -bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) +bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free) { struct request *__rq; bool ret; @@ -363,8 +365,10 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) /* * First try one-hit cache. */ - if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) { + list_add(&rq->queuelist, free); return true; + } if (blk_queue_noxmerges(q)) return false; @@ -378,6 +382,7 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) break; + list_add(&rq->queuelist, free); /* The merged request could be merged with others, try again */ ret = true; rq = __rq; diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 8eea2cbf2bf4..7136262819f1 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -487,6 +487,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; const int data_dir = rq_data_dir(rq); + LIST_HEAD(free); /* * This may be a requeue of a write request that has locked its @@ -494,8 +495,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, */ blk_req_zone_write_unlock(rq); - if (blk_mq_sched_try_insert_merge(q, rq)) + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { + blk_mq_free_requests(&free); return; + } trace_block_rq_insert(rq); diff --git a/include/linux/elevator.h b/include/linux/elevator.h index dcb2f9022c1d..1a5965174f5b 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -117,7 +117,8 @@ extern void elv_merge_requests(struct request_queue *, struct request *, struct request *); extern void elv_merged_request(struct request_queue *, struct request *, enum elv_merge); -extern bool elv_attempt_insert_merge(struct request_queue *, struct request *); +extern bool elv_attempt_insert_merge(struct request_queue *, struct request *, + struct list_head *); extern struct request *elv_former_request(struct request_queue *, struct request *); extern struct request *elv_latter_request(struct request_queue *, struct request *); -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara @ 2021-05-25 0:29 ` Ming Lei 2021-05-28 9:33 ` Paolo Valente 1 sibling, 0 replies; 18+ messages in thread From: Ming Lei @ 2021-05-25 0:29 UTC (permalink / raw) To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente On Mon, May 24, 2021 at 12:04:16PM +0200, Jan Kara wrote: > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > bfqd -> ioc: > put_io_context+0x33/0x90 -> ioc->lock grabbed > blk_mq_free_request+0x51/0x140 > blk_put_request+0xe/0x10 > blk_attempt_req_merge+0x1d/0x30 > elv_attempt_insert_merge+0x56/0xa0 > blk_mq_sched_try_insert_merge+0x4b/0x60 > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > blk_mq_sched_insert_requests+0xd6/0x2b0 > blk_mq_flush_plug_list+0x154/0x280 > blk_finish_plug+0x40/0x60 > ext4_writepages+0x696/0x1320 > do_writepages+0x1c/0x80 > __filemap_fdatawrite_range+0xd7/0x120 > sync_file_range+0xac/0xf0 > > ioc->bfqd: > bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed > put_io_context_active+0x78/0xb0 -> ioc->lock grabbed > exit_io_context+0x48/0x50 > do_exit+0x7e9/0xdd0 > do_group_exit+0x54/0xc0 > > To avoid this inversion we change blk_mq_sched_try_insert_merge() to not > free the merged request but rather leave that upto the caller similarly > to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure > to free all the merged requests after dropping bfqd->lock. > > Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") > Signed-off-by: Jan Kara <jack@suse.cz> Looks fine, Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 2021-05-25 0:29 ` Ming Lei @ 2021-05-28 9:33 ` Paolo Valente 1 sibling, 0 replies; 18+ messages in thread From: Paolo Valente @ 2021-05-28 9:33 UTC (permalink / raw) To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Ming Lei > Il giorno 24 mag 2021, alle ore 12:04, Jan Kara <jack@suse.cz> ha scritto: > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock: > > bfqd -> ioc: > put_io_context+0x33/0x90 -> ioc->lock grabbed > blk_mq_free_request+0x51/0x140 > blk_put_request+0xe/0x10 > blk_attempt_req_merge+0x1d/0x30 > elv_attempt_insert_merge+0x56/0xa0 > blk_mq_sched_try_insert_merge+0x4b/0x60 > bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed > blk_mq_sched_insert_requests+0xd6/0x2b0 > blk_mq_flush_plug_list+0x154/0x280 > blk_finish_plug+0x40/0x60 > ext4_writepages+0x696/0x1320 > do_writepages+0x1c/0x80 > __filemap_fdatawrite_range+0xd7/0x120 > sync_file_range+0xac/0xf0 > > ioc->bfqd: > bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed > put_io_context_active+0x78/0xb0 -> ioc->lock grabbed > exit_io_context+0x48/0x50 > do_exit+0x7e9/0xdd0 > do_group_exit+0x54/0xc0 > > To avoid this inversion we change blk_mq_sched_try_insert_merge() to not > free the merged request but rather leave that upto the caller similarly > to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure > to free all the merged requests after dropping bfqd->lock. > I see you added a (short) loop. Apart from that, Acked-by: Paolo Valente <paolo.valente@linaro.org> Thanks, Paolo > Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") > Signed-off-by: Jan Kara <jack@suse.cz> > --- > block/bfq-iosched.c | 6 ++++-- > block/blk-merge.c | 19 ++++++++----------- > block/blk-mq-sched.c | 5 +++-- > block/blk-mq-sched.h | 3 ++- > block/blk-mq.h | 11 +++++++++++ > block/blk.h | 2 +- > block/elevator.c | 11 ++++++++--- > block/mq-deadline.c | 5 ++++- > include/linux/elevator.h | 3 ++- > 9 files changed, 43 insertions(+), 22 deletions(-) > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 50a29fdf51da..5e076396b588 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -2317,9 +2317,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, > > ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free); > > + spin_unlock_irq(&bfqd->lock); > if (free) > blk_mq_free_request(free); > - spin_unlock_irq(&bfqd->lock); > > return ret; > } > @@ -5933,14 +5933,16 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > struct bfq_queue *bfqq; > bool idle_timer_disabled = false; > unsigned int cmd_flags; > + LIST_HEAD(free); > > #ifdef CONFIG_BFQ_GROUP_IOSCHED > if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio) > bfqg_stats_update_legacy_io(q, rq); > #endif > spin_lock_irq(&bfqd->lock); > - if (blk_mq_sched_try_insert_merge(q, rq)) { > + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { > spin_unlock_irq(&bfqd->lock); > + blk_mq_free_requests(&free); > return; > } > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 4d97fb6dd226..1398b52a24b4 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q, > return NULL; > } > > -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, > - struct request *next) > +/* > + * Try to merge 'next' into 'rq'. Return true if the merge happened, false > + * otherwise. The caller is responsible for freeing 'next' if the merge > + * happened. > + */ > +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, > + struct request *next) > { > - struct request *free; > - > - free = attempt_merge(q, rq, next); > - if (free) { > - blk_put_request(free); > - return 1; > - } > - > - return 0; > + return attempt_merge(q, rq, next); > } > > bool blk_rq_merge_ok(struct request *rq, struct bio *bio) > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 714e678f516a..bf0a3dec8226 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -400,9 +400,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, > return ret; > } > > -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq) > +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, > + struct list_head *free) > { > - return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq); > + return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free); > } > EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); > > diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h > index 5b18ab915c65..8b70de4b8d23 100644 > --- a/block/blk-mq-sched.h > +++ b/block/blk-mq-sched.h > @@ -11,7 +11,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, > unsigned int nr_segs, struct request **merged_request); > bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, > unsigned int nr_segs); > -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); > +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, > + struct list_head *free); > void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx); > void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 81a775171be7..20ef743a3ff6 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -301,6 +301,17 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q, > return NULL; > } > > +/* Free all requests on the list */ > +static inline void blk_mq_free_requests(struct list_head *list) > +{ > + while (!list_empty(list)) { > + struct request *rq = list_entry_rq(list->next); > + > + list_del_init(&rq->queuelist); > + blk_mq_free_request(rq); > + } > +} > + > /* > * For shared tag users, we track the number of currently active users > * and attempt to provide a fair share of the tag depth for each of them. > diff --git a/block/blk.h b/block/blk.h > index 8b3591aee0a5..99ef4f7e7a70 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -225,7 +225,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *, > void __blk_queue_split(struct bio **bio, unsigned int *nr_segs); > int ll_back_merge_fn(struct request *req, struct bio *bio, > unsigned int nr_segs); > -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, > +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, > struct request *next); > unsigned int blk_recalc_rq_segments(struct request *rq); > void blk_rq_set_mixed_merge(struct request *rq); > diff --git a/block/elevator.c b/block/elevator.c > index 440699c28119..62e9c672da7c 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -350,9 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, > * we can append 'rq' to an existing request, so we can throw 'rq' away > * afterwards. > * > - * Returns true if we merged, false otherwise > + * Returns true if we merged, false otherwise. 'free' will contain all > + * requests that need to be freed. > */ > -bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) > +bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq, > + struct list_head *free) > { > struct request *__rq; > bool ret; > @@ -363,8 +365,10 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) > /* > * First try one-hit cache. > */ > - if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) > + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) { > + list_add(&rq->queuelist, free); > return true; > + } > > if (blk_queue_noxmerges(q)) > return false; > @@ -378,6 +382,7 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) > if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) > break; > > + list_add(&rq->queuelist, free); > /* The merged request could be merged with others, try again */ > ret = true; > rq = __rq; > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index 8eea2cbf2bf4..7136262819f1 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -487,6 +487,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > struct request_queue *q = hctx->queue; > struct deadline_data *dd = q->elevator->elevator_data; > const int data_dir = rq_data_dir(rq); > + LIST_HEAD(free); > > /* > * This may be a requeue of a write request that has locked its > @@ -494,8 +495,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > */ > blk_req_zone_write_unlock(rq); > > - if (blk_mq_sched_try_insert_merge(q, rq)) > + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { > + blk_mq_free_requests(&free); > return; > + } > > trace_block_rq_insert(rq); > > diff --git a/include/linux/elevator.h b/include/linux/elevator.h > index dcb2f9022c1d..1a5965174f5b 100644 > --- a/include/linux/elevator.h > +++ b/include/linux/elevator.h > @@ -117,7 +117,8 @@ extern void elv_merge_requests(struct request_queue *, struct request *, > struct request *); > extern void elv_merged_request(struct request_queue *, struct request *, > enum elv_merge); > -extern bool elv_attempt_insert_merge(struct request_queue *, struct request *); > +extern bool elv_attempt_insert_merge(struct request_queue *, struct request *, > + struct list_head *); > extern struct request *elv_former_request(struct request_queue *, struct request *); > extern struct request *elv_latter_request(struct request_queue *, struct request *); > > -- > 2.26.2 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2 v3] block: Fix deadlock when merging requests with BFQ @ 2021-05-28 12:30 Jan Kara 2021-05-28 12:30 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-05-28 12:30 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Ming Lei, Paolo Valente, Jan Kara Hello, This patch series fixes a lockdep complaint and a possible deadlock that can happen when blk_mq_sched_try_insert_merge() merges and frees a request. Jens, can you please merge the series? Thanks! Changed since v2: * Added reviewed-by tags Changes since v1: * Remove patch disabling recursing request merging during request insertion * Modified BFQ to cleanup merged request already in its .merge_requests handler * Added code to handle multiple requests that need freeing after being merged Honza ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-05-28 12:30 [PATCH 0/2 v3] block: Fix deadlock when merging requests with BFQ Jan Kara @ 2021-05-28 12:30 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2021-05-28 12:30 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Ming Lei, Paolo Valente, Jan Kara Lockdep complains about lock inversion between ioc->lock and bfqd->lock: bfqd -> ioc: put_io_context+0x33/0x90 -> ioc->lock grabbed blk_mq_free_request+0x51/0x140 blk_put_request+0xe/0x10 blk_attempt_req_merge+0x1d/0x30 elv_attempt_insert_merge+0x56/0xa0 blk_mq_sched_try_insert_merge+0x4b/0x60 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed blk_mq_sched_insert_requests+0xd6/0x2b0 blk_mq_flush_plug_list+0x154/0x280 blk_finish_plug+0x40/0x60 ext4_writepages+0x696/0x1320 do_writepages+0x1c/0x80 __filemap_fdatawrite_range+0xd7/0x120 sync_file_range+0xac/0xf0 ioc->bfqd: bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed put_io_context_active+0x78/0xb0 -> ioc->lock grabbed exit_io_context+0x48/0x50 do_exit+0x7e9/0xdd0 do_group_exit+0x54/0xc0 To avoid this inversion we change blk_mq_sched_try_insert_merge() to not free the merged request but rather leave that upto the caller similarly to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure to free all the merged requests after dropping bfqd->lock. Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> --- block/bfq-iosched.c | 6 ++++-- block/blk-merge.c | 19 ++++++++----------- block/blk-mq-sched.c | 5 +++-- block/blk-mq-sched.h | 3 ++- block/blk-mq.h | 11 +++++++++++ block/blk.h | 2 +- block/elevator.c | 11 ++++++++--- block/mq-deadline.c | 5 ++++- include/linux/elevator.h | 3 ++- 9 files changed, 43 insertions(+), 22 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 50a29fdf51da..5e076396b588 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2317,9 +2317,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free); + spin_unlock_irq(&bfqd->lock); if (free) blk_mq_free_request(free); - spin_unlock_irq(&bfqd->lock); return ret; } @@ -5933,14 +5933,16 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct bfq_queue *bfqq; bool idle_timer_disabled = false; unsigned int cmd_flags; + LIST_HEAD(free); #ifdef CONFIG_BFQ_GROUP_IOSCHED if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio) bfqg_stats_update_legacy_io(q, rq); #endif spin_lock_irq(&bfqd->lock); - if (blk_mq_sched_try_insert_merge(q, rq)) { + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { spin_unlock_irq(&bfqd->lock); + blk_mq_free_requests(&free); return; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 4d97fb6dd226..1398b52a24b4 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q, return NULL; } -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, - struct request *next) +/* + * Try to merge 'next' into 'rq'. Return true if the merge happened, false + * otherwise. The caller is responsible for freeing 'next' if the merge + * happened. + */ +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, + struct request *next) { - struct request *free; - - free = attempt_merge(q, rq, next); - if (free) { - blk_put_request(free); - return 1; - } - - return 0; + return attempt_merge(q, rq, next); } bool blk_rq_merge_ok(struct request *rq, struct bio *bio) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 714e678f516a..bf0a3dec8226 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -400,9 +400,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, return ret; } -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq) +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free) { - return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq); + return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free); } EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 5b18ab915c65..8b70de4b8d23 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -11,7 +11,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs, struct request **merged_request); bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs); -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free); void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); diff --git a/block/blk-mq.h b/block/blk-mq.h index 81a775171be7..20ef743a3ff6 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -301,6 +301,17 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q, return NULL; } +/* Free all requests on the list */ +static inline void blk_mq_free_requests(struct list_head *list) +{ + while (!list_empty(list)) { + struct request *rq = list_entry_rq(list->next); + + list_del_init(&rq->queuelist); + blk_mq_free_request(rq); + } +} + /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. diff --git a/block/blk.h b/block/blk.h index 8b3591aee0a5..99ef4f7e7a70 100644 --- a/block/blk.h +++ b/block/blk.h @@ -225,7 +225,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *, void __blk_queue_split(struct bio **bio, unsigned int *nr_segs); int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs); -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, struct request *next); unsigned int blk_recalc_rq_segments(struct request *rq); void blk_rq_set_mixed_merge(struct request *rq); diff --git a/block/elevator.c b/block/elevator.c index 440699c28119..62e9c672da7c 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -350,9 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, * we can append 'rq' to an existing request, so we can throw 'rq' away * afterwards. * - * Returns true if we merged, false otherwise + * Returns true if we merged, false otherwise. 'free' will contain all + * requests that need to be freed. */ -bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) +bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free) { struct request *__rq; bool ret; @@ -363,8 +365,10 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) /* * First try one-hit cache. */ - if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) { + list_add(&rq->queuelist, free); return true; + } if (blk_queue_noxmerges(q)) return false; @@ -378,6 +382,7 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) break; + list_add(&rq->queuelist, free); /* The merged request could be merged with others, try again */ ret = true; rq = __rq; diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 8eea2cbf2bf4..7136262819f1 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -487,6 +487,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; const int data_dir = rq_data_dir(rq); + LIST_HEAD(free); /* * This may be a requeue of a write request that has locked its @@ -494,8 +495,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, */ blk_req_zone_write_unlock(rq); - if (blk_mq_sched_try_insert_merge(q, rq)) + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { + blk_mq_free_requests(&free); return; + } trace_block_rq_insert(rq); diff --git a/include/linux/elevator.h b/include/linux/elevator.h index dcb2f9022c1d..1a5965174f5b 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -117,7 +117,8 @@ extern void elv_merge_requests(struct request_queue *, struct request *, struct request *); extern void elv_merged_request(struct request_queue *, struct request *, enum elv_merge); -extern bool elv_attempt_insert_merge(struct request_queue *, struct request *); +extern bool elv_attempt_insert_merge(struct request_queue *, struct request *, + struct list_head *); extern struct request *elv_former_request(struct request_queue *, struct request *); extern struct request *elv_latter_request(struct request_queue *, struct request *); -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 0/2 v4] block: Fix deadlock when merging requests with BFQ @ 2021-06-23 9:36 Jan Kara 2021-06-23 9:36 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2021-06-23 9:36 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara Hello, This patch series fixes a lockdep complaint and a possible deadlock that can happen when blk_mq_sched_try_insert_merge() merges and frees a request. Jens, can you please merge the series? It seems to have slipped through last time when I posted it. Thanks! Changes since v3: * Rebased on top of for-next branch in axboe/linux-block.git Changes since v2: * Added reviewed-by tags Changes since v1: * Remove patch disabling recursing request merging during request insertion * Modified BFQ to cleanup merged request already in its .merge_requests handler * Added code to handle multiple requests that need freeing after being merged Honza ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock 2021-06-23 9:36 [PATCH 0/2 v4] block: Fix deadlock when merging requests with BFQ Jan Kara @ 2021-06-23 9:36 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2021-06-23 9:36 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara, Ming Lei Lockdep complains about lock inversion between ioc->lock and bfqd->lock: bfqd -> ioc: put_io_context+0x33/0x90 -> ioc->lock grabbed blk_mq_free_request+0x51/0x140 blk_put_request+0xe/0x10 blk_attempt_req_merge+0x1d/0x30 elv_attempt_insert_merge+0x56/0xa0 blk_mq_sched_try_insert_merge+0x4b/0x60 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed blk_mq_sched_insert_requests+0xd6/0x2b0 blk_mq_flush_plug_list+0x154/0x280 blk_finish_plug+0x40/0x60 ext4_writepages+0x696/0x1320 do_writepages+0x1c/0x80 __filemap_fdatawrite_range+0xd7/0x120 sync_file_range+0xac/0xf0 ioc->bfqd: bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed put_io_context_active+0x78/0xb0 -> ioc->lock grabbed exit_io_context+0x48/0x50 do_exit+0x7e9/0xdd0 do_group_exit+0x54/0xc0 To avoid this inversion we change blk_mq_sched_try_insert_merge() to not free the merged request but rather leave that upto the caller similarly to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure to free all the merged requests after dropping bfqd->lock. Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> --- block/bfq-iosched.c | 6 ++++-- block/blk-merge.c | 19 ++++++++----------- block/blk-mq-sched.c | 5 +++-- block/blk-mq-sched.h | 3 ++- block/blk-mq.h | 11 +++++++++++ block/blk.h | 2 +- block/elevator.c | 11 ++++++++--- block/mq-deadline-main.c | 5 ++++- include/linux/elevator.h | 3 ++- 9 files changed, 43 insertions(+), 22 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 9433d38e486c..727955918563 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2345,9 +2345,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free); + spin_unlock_irq(&bfqd->lock); if (free) blk_mq_free_request(free); - spin_unlock_irq(&bfqd->lock); return ret; } @@ -5969,14 +5969,16 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct bfq_queue *bfqq; bool idle_timer_disabled = false; unsigned int cmd_flags; + LIST_HEAD(free); #ifdef CONFIG_BFQ_GROUP_IOSCHED if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio) bfqg_stats_update_legacy_io(q, rq); #endif spin_lock_irq(&bfqd->lock); - if (blk_mq_sched_try_insert_merge(q, rq)) { + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { spin_unlock_irq(&bfqd->lock); + blk_mq_free_requests(&free); return; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 4d97fb6dd226..1398b52a24b4 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q, return NULL; } -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, - struct request *next) +/* + * Try to merge 'next' into 'rq'. Return true if the merge happened, false + * otherwise. The caller is responsible for freeing 'next' if the merge + * happened. + */ +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, + struct request *next) { - struct request *free; - - free = attempt_merge(q, rq, next); - if (free) { - blk_put_request(free); - return 1; - } - - return 0; + return attempt_merge(q, rq, next); } bool blk_rq_merge_ok(struct request *rq, struct bio *bio) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 2403a5c2b053..c838d81ac058 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -399,9 +399,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, return ret; } -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq) +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free) { - return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq); + return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free); } EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index aff037cfd8e7..5246ae040704 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -13,7 +13,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs, struct request **merged_request); bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs); -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free); void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); diff --git a/block/blk-mq.h b/block/blk-mq.h index 4b1ca7b7bbeb..d08779f77a26 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -302,6 +302,17 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q, return NULL; } +/* Free all requests on the list */ +static inline void blk_mq_free_requests(struct list_head *list) +{ + while (!list_empty(list)) { + struct request *rq = list_entry_rq(list->next); + + list_del_init(&rq->queuelist); + blk_mq_free_request(rq); + } +} + /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. diff --git a/block/blk.h b/block/blk.h index d3fa47af3607..9ad3c7ee6ba0 100644 --- a/block/blk.h +++ b/block/blk.h @@ -224,7 +224,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *, void __blk_queue_split(struct bio **bio, unsigned int *nr_segs); int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs); -int blk_attempt_req_merge(struct request_queue *q, struct request *rq, +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, struct request *next); unsigned int blk_recalc_rq_segments(struct request *rq); void blk_rq_set_mixed_merge(struct request *rq); diff --git a/block/elevator.c b/block/elevator.c index 85d0d4adbb64..52ada14cfe45 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -350,9 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, * we can append 'rq' to an existing request, so we can throw 'rq' away * afterwards. * - * Returns true if we merged, false otherwise + * Returns true if we merged, false otherwise. 'free' will contain all + * requests that need to be freed. */ -bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) +bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq, + struct list_head *free) { struct request *__rq; bool ret; @@ -363,8 +365,10 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) /* * First try one-hit cache. */ - if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) { + list_add(&rq->queuelist, free); return true; + } if (blk_queue_noxmerges(q)) return false; @@ -378,6 +382,7 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq) if (!__rq || !blk_attempt_req_merge(q, __rq, rq)) break; + list_add(&rq->queuelist, free); /* The merged request could be merged with others, try again */ ret = true; rq = __rq; diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c index 4815e536091f..9db6da9ef4c6 100644 --- a/block/mq-deadline-main.c +++ b/block/mq-deadline-main.c @@ -719,6 +719,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct dd_per_prio *per_prio; enum dd_prio prio; struct dd_blkcg *blkcg; + LIST_HEAD(free); lockdep_assert_held(&dd->lock); @@ -742,8 +743,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, WARN_ON_ONCE(rq->elv.priv[0]); rq->elv.priv[0] = blkcg; - if (blk_mq_sched_try_insert_merge(q, rq)) + if (blk_mq_sched_try_insert_merge(q, rq, &free)) { + blk_mq_free_requests(&free); return; + } trace_block_rq_insert(rq); diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 783ecb3cb77a..ef9ceead3db1 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -117,7 +117,8 @@ extern void elv_merge_requests(struct request_queue *, struct request *, struct request *); extern void elv_merged_request(struct request_queue *, struct request *, enum elv_merge); -extern bool elv_attempt_insert_merge(struct request_queue *, struct request *); +extern bool elv_attempt_insert_merge(struct request_queue *, struct request *, + struct list_head *); extern struct request *elv_former_request(struct request_queue *, struct request *); extern struct request *elv_latter_request(struct request_queue *, struct request *); void elevator_init_mq(struct request_queue *q); -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-06-23 9:36 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-20 22:33 [PATCH 0/2] block: Fix deadlock when merging requests with BFQ Jan Kara 2021-05-20 22:33 ` [PATCH 1/2] block: Do not merge recursively in elv_attempt_insert_merge() Jan Kara 2021-05-21 0:42 ` Ming Lei 2021-05-21 11:53 ` Jan Kara 2021-05-21 13:12 ` Ming Lei 2021-05-21 13:44 ` Jan Kara 2021-05-20 22:33 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 2021-05-21 0:57 ` Ming Lei 2021-05-21 3:29 ` Khazhy Kumykov 2021-05-21 6:54 ` Ming Lei 2021-05-21 12:05 ` Jan Kara 2021-05-21 13:36 ` Ming Lei 2021-05-21 13:47 ` Jan Kara 2021-05-24 10:04 [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ Jan Kara 2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 2021-05-25 0:29 ` Ming Lei 2021-05-28 9:33 ` Paolo Valente 2021-05-28 12:30 [PATCH 0/2 v3] block: Fix deadlock when merging requests with BFQ Jan Kara 2021-05-28 12:30 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara 2021-06-23 9:36 [PATCH 0/2 v4] block: Fix deadlock when merging requests with BFQ Jan Kara 2021-06-23 9:36 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.