linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ
@ 2021-05-24 10:04 Jan Kara
  2021-05-24 10:04 ` [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged() Jan Kara
  2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara
  0 siblings, 2 replies; 8+ messages in thread
From: Jan Kara @ 2021-05-24 10:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Ming Lei, Jan Kara

Hello,

This patch series fixes a lockdep complaint and a possible deadlock that
can happen when blk_mq_sched_try_insert_merge() merges and frees a request.

Changes since v1:
* Remove patch disabling recursing request merging during request insertion
* Modified BFQ to cleanup merged request already in its .merge_requests handler
* Added code to handle multiple requests that need freeing after being merged

								Honza

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged()
  2021-05-24 10:04 [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ Jan Kara
@ 2021-05-24 10:04 ` Jan Kara
  2021-05-28  9:29   ` Paolo Valente
  2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Kara @ 2021-05-24 10:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Ming Lei, Jan Kara

Currently, bfq does very little in bfq_requests_merged() and handles all
the request cleanup in bfq_finish_requeue_request() called from
blk_mq_free_request(). That is currently safe only because
blk_mq_free_request() is called shortly after bfq_requests_merged()
while bfqd->lock is still held. However to fix a lock inversion between
bfqd->lock and ioc->lock, we need to call blk_mq_free_request() after
dropping bfqd->lock. That would mean that already merged request could
be seen by other processes inside bfq queues and possibly dispatched to
the device which is wrong. So move cleanup of the request from
bfq_finish_requeue_request() to bfq_requests_merged().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 41 +++++++++++++----------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index acd1f881273e..50a29fdf51da 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2405,7 +2405,7 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		*next_bfqq = bfq_init_rq(next);
 
 	if (!bfqq)
-		return;
+		goto remove;
 
 	/*
 	 * If next and rq belong to the same bfq_queue and next is older
@@ -2428,6 +2428,14 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		bfqq->next_rq = rq;
 
 	bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
+remove:
+	/* Merged request may be in the IO scheduler. Remove it. */
+	if (!RB_EMPTY_NODE(&next->rb_node)) {
+		bfq_remove_request(next->q, next);
+		if (next_bfqq)
+			bfqg_stats_update_io_remove(bfqq_group(next_bfqq),
+						    next->cmd_flags);
+	}
 }
 
 /* Must be called with bfqq != NULL */
@@ -6376,6 +6384,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 {
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;
+	unsigned long flags;
 
 	/*
 	 * rq either is not associated with any icq, or is an already
@@ -6393,39 +6402,15 @@ static void bfq_finish_requeue_request(struct request *rq)
 					     rq->io_start_time_ns,
 					     rq->cmd_flags);
 
+	spin_lock_irqsave(&bfqd->lock, flags);
 	if (likely(rq->rq_flags & RQF_STARTED)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&bfqd->lock, flags);
-
 		if (rq == bfqd->waited_rq)
 			bfq_update_inject_limit(bfqd, bfqq);
 
 		bfq_completed_request(bfqq, bfqd);
-		bfq_finish_requeue_request_body(bfqq);
-
-		spin_unlock_irqrestore(&bfqd->lock, flags);
-	} else {
-		/*
-		 * Request rq may be still/already in the scheduler,
-		 * in which case we need to remove it (this should
-		 * never happen in case of requeue). And we cannot
-		 * defer such a check and removal, to avoid
-		 * inconsistencies in the time interval from the end
-		 * of this function to the start of the deferred work.
-		 * This situation seems to occur only in process
-		 * context, as a consequence of a merge. In the
-		 * current version of the code, this implies that the
-		 * lock is held.
-		 */
-
-		if (!RB_EMPTY_NODE(&rq->rb_node)) {
-			bfq_remove_request(rq->q, rq);
-			bfqg_stats_update_io_remove(bfqq_group(bfqq),
-						    rq->cmd_flags);
-		}
-		bfq_finish_requeue_request_body(bfqq);
 	}
+	bfq_finish_requeue_request_body(bfqq);
+	spin_unlock_irqrestore(&bfqd->lock, flags);
 
 	/*
 	 * Reset private fields. In case of a requeue, this allows
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock
  2021-05-24 10:04 [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ Jan Kara
  2021-05-24 10:04 ` [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged() Jan Kara
@ 2021-05-24 10:04 ` Jan Kara
  2021-05-25  0:29   ` Ming Lei
  2021-05-28  9:33   ` Paolo Valente
  1 sibling, 2 replies; 8+ messages in thread
From: Jan Kara @ 2021-05-24 10:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Paolo Valente, Ming Lei, Jan Kara

Lockdep complains about lock inversion between ioc->lock and bfqd->lock:

bfqd -> ioc:
 put_io_context+0x33/0x90 -> ioc->lock grabbed
 blk_mq_free_request+0x51/0x140
 blk_put_request+0xe/0x10
 blk_attempt_req_merge+0x1d/0x30
 elv_attempt_insert_merge+0x56/0xa0
 blk_mq_sched_try_insert_merge+0x4b/0x60
 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
 blk_mq_sched_insert_requests+0xd6/0x2b0
 blk_mq_flush_plug_list+0x154/0x280
 blk_finish_plug+0x40/0x60
 ext4_writepages+0x696/0x1320
 do_writepages+0x1c/0x80
 __filemap_fdatawrite_range+0xd7/0x120
 sync_file_range+0xac/0xf0

ioc->bfqd:
 bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
 put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
 exit_io_context+0x48/0x50
 do_exit+0x7e9/0xdd0
 do_group_exit+0x54/0xc0

To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
free the merged request but rather leave that upto the caller similarly
to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
to free all the merged requests after dropping bfqd->lock.

Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c      |  6 ++++--
 block/blk-merge.c        | 19 ++++++++-----------
 block/blk-mq-sched.c     |  5 +++--
 block/blk-mq-sched.h     |  3 ++-
 block/blk-mq.h           | 11 +++++++++++
 block/blk.h              |  2 +-
 block/elevator.c         | 11 ++++++++---
 block/mq-deadline.c      |  5 ++++-
 include/linux/elevator.h |  3 ++-
 9 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 50a29fdf51da..5e076396b588 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2317,9 +2317,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
 
 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
 
+	spin_unlock_irq(&bfqd->lock);
 	if (free)
 		blk_mq_free_request(free);
-	spin_unlock_irq(&bfqd->lock);
 
 	return ret;
 }
@@ -5933,14 +5933,16 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	struct bfq_queue *bfqq;
 	bool idle_timer_disabled = false;
 	unsigned int cmd_flags;
+	LIST_HEAD(free);
 
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
 	if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio)
 		bfqg_stats_update_legacy_io(q, rq);
 #endif
 	spin_lock_irq(&bfqd->lock);
-	if (blk_mq_sched_try_insert_merge(q, rq)) {
+	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		spin_unlock_irq(&bfqd->lock);
+		blk_mq_free_requests(&free);
 		return;
 	}
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 4d97fb6dd226..1398b52a24b4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q,
 	return NULL;
 }
 
-int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
-			  struct request *next)
+/*
+ * Try to merge 'next' into 'rq'. Return true if the merge happened, false
+ * otherwise. The caller is responsible for freeing 'next' if the merge
+ * happened.
+ */
+bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
+			   struct request *next)
 {
-	struct request *free;
-
-	free = attempt_merge(q, rq, next);
-	if (free) {
-		blk_put_request(free);
-		return 1;
-	}
-
-	return 0;
+	return attempt_merge(q, rq, next);
 }
 
 bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 714e678f516a..bf0a3dec8226 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -400,9 +400,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
 	return ret;
 }
 
-bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq)
+bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
+				   struct list_head *free)
 {
-	return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq);
+	return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free);
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge);
 
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 5b18ab915c65..8b70de4b8d23 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -11,7 +11,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs, struct request **merged_request);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs);
-bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
+bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
+				   struct list_head *free);
 void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
 void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 81a775171be7..20ef743a3ff6 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -301,6 +301,17 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q,
 	return NULL;
 }
 
+/* Free all requests on the list */
+static inline void blk_mq_free_requests(struct list_head *list)
+{
+	while (!list_empty(list)) {
+		struct request *rq = list_entry_rq(list->next);
+
+		list_del_init(&rq->queuelist);
+		blk_mq_free_request(rq);
+	}
+}
+
 /*
  * For shared tag users, we track the number of currently active users
  * and attempt to provide a fair share of the tag depth for each of them.
diff --git a/block/blk.h b/block/blk.h
index 8b3591aee0a5..99ef4f7e7a70 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -225,7 +225,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *,
 void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
 int ll_back_merge_fn(struct request *req, struct bio *bio,
 		unsigned int nr_segs);
-int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
+bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
 				struct request *next);
 unsigned int blk_recalc_rq_segments(struct request *rq);
 void blk_rq_set_mixed_merge(struct request *rq);
diff --git a/block/elevator.c b/block/elevator.c
index 440699c28119..62e9c672da7c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -350,9 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
  * we can append 'rq' to an existing request, so we can throw 'rq' away
  * afterwards.
  *
- * Returns true if we merged, false otherwise
+ * Returns true if we merged, false otherwise. 'free' will contain all
+ * requests that need to be freed.
  */
-bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq)
+bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq,
+			      struct list_head *free)
 {
 	struct request *__rq;
 	bool ret;
@@ -363,8 +365,10 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq)
 	/*
 	 * First try one-hit cache.
 	 */
-	if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq))
+	if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) {
+		list_add(&rq->queuelist, free);
 		return true;
+	}
 
 	if (blk_queue_noxmerges(q))
 		return false;
@@ -378,6 +382,7 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq)
 		if (!__rq || !blk_attempt_req_merge(q, __rq, rq))
 			break;
 
+		list_add(&rq->queuelist, free);
 		/* The merged request could be merged with others, try again */
 		ret = true;
 		rq = __rq;
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 8eea2cbf2bf4..7136262819f1 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -487,6 +487,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	struct request_queue *q = hctx->queue;
 	struct deadline_data *dd = q->elevator->elevator_data;
 	const int data_dir = rq_data_dir(rq);
+	LIST_HEAD(free);
 
 	/*
 	 * This may be a requeue of a write request that has locked its
@@ -494,8 +495,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	 */
 	blk_req_zone_write_unlock(rq);
 
-	if (blk_mq_sched_try_insert_merge(q, rq))
+	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
+		blk_mq_free_requests(&free);
 		return;
+	}
 
 	trace_block_rq_insert(rq);
 
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index dcb2f9022c1d..1a5965174f5b 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -117,7 +117,8 @@ extern void elv_merge_requests(struct request_queue *, struct request *,
 			       struct request *);
 extern void elv_merged_request(struct request_queue *, struct request *,
 		enum elv_merge);
-extern bool elv_attempt_insert_merge(struct request_queue *, struct request *);
+extern bool elv_attempt_insert_merge(struct request_queue *, struct request *,
+				     struct list_head *);
 extern struct request *elv_former_request(struct request_queue *, struct request *);
 extern struct request *elv_latter_request(struct request_queue *, struct request *);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock
  2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara
@ 2021-05-25  0:29   ` Ming Lei
  2021-05-28  9:33   ` Paolo Valente
  1 sibling, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-25  0:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Paolo Valente

On Mon, May 24, 2021 at 12:04:16PM +0200, Jan Kara wrote:
> Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> 
> bfqd -> ioc:
>  put_io_context+0x33/0x90 -> ioc->lock grabbed
>  blk_mq_free_request+0x51/0x140
>  blk_put_request+0xe/0x10
>  blk_attempt_req_merge+0x1d/0x30
>  elv_attempt_insert_merge+0x56/0xa0
>  blk_mq_sched_try_insert_merge+0x4b/0x60
>  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
>  blk_mq_sched_insert_requests+0xd6/0x2b0
>  blk_mq_flush_plug_list+0x154/0x280
>  blk_finish_plug+0x40/0x60
>  ext4_writepages+0x696/0x1320
>  do_writepages+0x1c/0x80
>  __filemap_fdatawrite_range+0xd7/0x120
>  sync_file_range+0xac/0xf0
> 
> ioc->bfqd:
>  bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
>  put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
>  exit_io_context+0x48/0x50
>  do_exit+0x7e9/0xdd0
>  do_group_exit+0x54/0xc0
> 
> To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
> free the merged request but rather leave that upto the caller similarly
> to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
> to free all the merged requests after dropping bfqd->lock.
> 
> Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged()
  2021-05-24 10:04 ` [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged() Jan Kara
@ 2021-05-28  9:29   ` Paolo Valente
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Valente @ 2021-05-28  9:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Ming Lei



> Il giorno 24 mag 2021, alle ore 12:04, Jan Kara <jack@suse.cz> ha scritto:
> 
> Currently, bfq does very little in bfq_requests_merged() and handles all
> the request cleanup in bfq_finish_requeue_request() called from
> blk_mq_free_request(). That is currently safe only because
> blk_mq_free_request() is called shortly after bfq_requests_merged()
> while bfqd->lock is still held. However to fix a lock inversion between
> bfqd->lock and ioc->lock, we need to call blk_mq_free_request() after
> dropping bfqd->lock. That would mean that already merged request could
> be seen by other processes inside bfq queues and possibly dispatched to
> the device which is wrong. So move cleanup of the request from
> bfq_finish_requeue_request() to bfq_requests_merged().
> 

I didn't even remember any longer why I had to handle that deferred
removal in bfq_finish_requeue_request() :)

Your solution seems very clean to me.

Acked-by: Paolo Valente <paolo.valente@linaro.org>

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> block/bfq-iosched.c | 41 +++++++++++++----------------------------
> 1 file changed, 13 insertions(+), 28 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index acd1f881273e..50a29fdf51da 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2405,7 +2405,7 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
> 		*next_bfqq = bfq_init_rq(next);
> 
> 	if (!bfqq)
> -		return;
> +		goto remove;
> 
> 	/*
> 	 * If next and rq belong to the same bfq_queue and next is older
> @@ -2428,6 +2428,14 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
> 		bfqq->next_rq = rq;
> 
> 	bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
> +remove:
> +	/* Merged request may be in the IO scheduler. Remove it. */
> +	if (!RB_EMPTY_NODE(&next->rb_node)) {
> +		bfq_remove_request(next->q, next);
> +		if (next_bfqq)
> +			bfqg_stats_update_io_remove(bfqq_group(next_bfqq),
> +						    next->cmd_flags);
> +	}
> }
> 
> /* Must be called with bfqq != NULL */
> @@ -6376,6 +6384,7 @@ static void bfq_finish_requeue_request(struct request *rq)
> {
> 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
> 	struct bfq_data *bfqd;
> +	unsigned long flags;
> 
> 	/*
> 	 * rq either is not associated with any icq, or is an already
> @@ -6393,39 +6402,15 @@ static void bfq_finish_requeue_request(struct request *rq)
> 					     rq->io_start_time_ns,
> 					     rq->cmd_flags);
> 
> +	spin_lock_irqsave(&bfqd->lock, flags);
> 	if (likely(rq->rq_flags & RQF_STARTED)) {
> -		unsigned long flags;
> -
> -		spin_lock_irqsave(&bfqd->lock, flags);
> -
> 		if (rq == bfqd->waited_rq)
> 			bfq_update_inject_limit(bfqd, bfqq);
> 
> 		bfq_completed_request(bfqq, bfqd);
> -		bfq_finish_requeue_request_body(bfqq);
> -
> -		spin_unlock_irqrestore(&bfqd->lock, flags);
> -	} else {
> -		/*
> -		 * Request rq may be still/already in the scheduler,
> -		 * in which case we need to remove it (this should
> -		 * never happen in case of requeue). And we cannot
> -		 * defer such a check and removal, to avoid
> -		 * inconsistencies in the time interval from the end
> -		 * of this function to the start of the deferred work.
> -		 * This situation seems to occur only in process
> -		 * context, as a consequence of a merge. In the
> -		 * current version of the code, this implies that the
> -		 * lock is held.
> -		 */
> -
> -		if (!RB_EMPTY_NODE(&rq->rb_node)) {
> -			bfq_remove_request(rq->q, rq);
> -			bfqg_stats_update_io_remove(bfqq_group(bfqq),
> -						    rq->cmd_flags);
> -		}
> -		bfq_finish_requeue_request_body(bfqq);
> 	}
> +	bfq_finish_requeue_request_body(bfqq);
> +	spin_unlock_irqrestore(&bfqd->lock, flags);
> 
> 	/*
> 	 * Reset private fields. In case of a requeue, this allows
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock
  2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara
  2021-05-25  0:29   ` Ming Lei
@ 2021-05-28  9:33   ` Paolo Valente
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Valente @ 2021-05-28  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, linux-block, Khazhy Kumykov, Ming Lei



> Il giorno 24 mag 2021, alle ore 12:04, Jan Kara <jack@suse.cz> ha scritto:
> 
> Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> 
> bfqd -> ioc:
> put_io_context+0x33/0x90 -> ioc->lock grabbed
> blk_mq_free_request+0x51/0x140
> blk_put_request+0xe/0x10
> blk_attempt_req_merge+0x1d/0x30
> elv_attempt_insert_merge+0x56/0xa0
> blk_mq_sched_try_insert_merge+0x4b/0x60
> bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
> blk_mq_sched_insert_requests+0xd6/0x2b0
> blk_mq_flush_plug_list+0x154/0x280
> blk_finish_plug+0x40/0x60
> ext4_writepages+0x696/0x1320
> do_writepages+0x1c/0x80
> __filemap_fdatawrite_range+0xd7/0x120
> sync_file_range+0xac/0xf0
> 
> ioc->bfqd:
> bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
> put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
> exit_io_context+0x48/0x50
> do_exit+0x7e9/0xdd0
> do_group_exit+0x54/0xc0
> 
> To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
> free the merged request but rather leave that upto the caller similarly
> to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
> to free all the merged requests after dropping bfqd->lock.
> 

I see you added a (short) loop. Apart from that,
Acked-by: Paolo Valente <paolo.valente@linaro.org>

Thanks,
Paolo

> Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> block/bfq-iosched.c      |  6 ++++--
> block/blk-merge.c        | 19 ++++++++-----------
> block/blk-mq-sched.c     |  5 +++--
> block/blk-mq-sched.h     |  3 ++-
> block/blk-mq.h           | 11 +++++++++++
> block/blk.h              |  2 +-
> block/elevator.c         | 11 ++++++++---
> block/mq-deadline.c      |  5 ++++-
> include/linux/elevator.h |  3 ++-
> 9 files changed, 43 insertions(+), 22 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 50a29fdf51da..5e076396b588 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2317,9 +2317,9 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
> 
> 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
> 
> +	spin_unlock_irq(&bfqd->lock);
> 	if (free)
> 		blk_mq_free_request(free);
> -	spin_unlock_irq(&bfqd->lock);
> 
> 	return ret;
> }
> @@ -5933,14 +5933,16 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> 	struct bfq_queue *bfqq;
> 	bool idle_timer_disabled = false;
> 	unsigned int cmd_flags;
> +	LIST_HEAD(free);
> 
> #ifdef CONFIG_BFQ_GROUP_IOSCHED
> 	if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio)
> 		bfqg_stats_update_legacy_io(q, rq);
> #endif
> 	spin_lock_irq(&bfqd->lock);
> -	if (blk_mq_sched_try_insert_merge(q, rq)) {
> +	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
> 		spin_unlock_irq(&bfqd->lock);
> +		blk_mq_free_requests(&free);
> 		return;
> 	}
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 4d97fb6dd226..1398b52a24b4 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -846,18 +846,15 @@ static struct request *attempt_front_merge(struct request_queue *q,
> 	return NULL;
> }
> 
> -int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
> -			  struct request *next)
> +/*
> + * Try to merge 'next' into 'rq'. Return true if the merge happened, false
> + * otherwise. The caller is responsible for freeing 'next' if the merge
> + * happened.
> + */
> +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
> +			   struct request *next)
> {
> -	struct request *free;
> -
> -	free = attempt_merge(q, rq, next);
> -	if (free) {
> -		blk_put_request(free);
> -		return 1;
> -	}
> -
> -	return 0;
> +	return attempt_merge(q, rq, next);
> }
> 
> bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 714e678f516a..bf0a3dec8226 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -400,9 +400,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
> 	return ret;
> }
> 
> -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq)
> +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
> +				   struct list_head *free)
> {
> -	return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq);
> +	return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free);
> }
> EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge);
> 
> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index 5b18ab915c65..8b70de4b8d23 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -11,7 +11,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
> 		unsigned int nr_segs, struct request **merged_request);
> bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
> 		unsigned int nr_segs);
> -bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
> +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
> +				   struct list_head *free);
> void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
> void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
> 
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 81a775171be7..20ef743a3ff6 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -301,6 +301,17 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q,
> 	return NULL;
> }
> 
> +/* Free all requests on the list */
> +static inline void blk_mq_free_requests(struct list_head *list)
> +{
> +	while (!list_empty(list)) {
> +		struct request *rq = list_entry_rq(list->next);
> +
> +		list_del_init(&rq->queuelist);
> +		blk_mq_free_request(rq);
> +	}
> +}
> +
> /*
>  * For shared tag users, we track the number of currently active users
>  * and attempt to provide a fair share of the tag depth for each of them.
> diff --git a/block/blk.h b/block/blk.h
> index 8b3591aee0a5..99ef4f7e7a70 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -225,7 +225,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *,
> void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
> int ll_back_merge_fn(struct request *req, struct bio *bio,
> 		unsigned int nr_segs);
> -int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
> +bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
> 				struct request *next);
> unsigned int blk_recalc_rq_segments(struct request *rq);
> void blk_rq_set_mixed_merge(struct request *rq);
> diff --git a/block/elevator.c b/block/elevator.c
> index 440699c28119..62e9c672da7c 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -350,9 +350,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
>  * we can append 'rq' to an existing request, so we can throw 'rq' away
>  * afterwards.
>  *
> - * Returns true if we merged, false otherwise
> + * Returns true if we merged, false otherwise. 'free' will contain all
> + * requests that need to be freed.
>  */
> -bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq)
> +bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq,
> +			      struct list_head *free)
> {
> 	struct request *__rq;
> 	bool ret;
> @@ -363,8 +365,10 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq)
> 	/*
> 	 * First try one-hit cache.
> 	 */
> -	if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq))
> +	if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) {
> +		list_add(&rq->queuelist, free);
> 		return true;
> +	}
> 
> 	if (blk_queue_noxmerges(q))
> 		return false;
> @@ -378,6 +382,7 @@ bool elv_attempt_insert_merge(struct request_queue *q, struct request *rq)
> 		if (!__rq || !blk_attempt_req_merge(q, __rq, rq))
> 			break;
> 
> +		list_add(&rq->queuelist, free);
> 		/* The merged request could be merged with others, try again */
> 		ret = true;
> 		rq = __rq;
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 8eea2cbf2bf4..7136262819f1 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -487,6 +487,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> 	struct request_queue *q = hctx->queue;
> 	struct deadline_data *dd = q->elevator->elevator_data;
> 	const int data_dir = rq_data_dir(rq);
> +	LIST_HEAD(free);
> 
> 	/*
> 	 * This may be a requeue of a write request that has locked its
> @@ -494,8 +495,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> 	 */
> 	blk_req_zone_write_unlock(rq);
> 
> -	if (blk_mq_sched_try_insert_merge(q, rq))
> +	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
> +		blk_mq_free_requests(&free);
> 		return;
> +	}
> 
> 	trace_block_rq_insert(rq);
> 
> diff --git a/include/linux/elevator.h b/include/linux/elevator.h
> index dcb2f9022c1d..1a5965174f5b 100644
> --- a/include/linux/elevator.h
> +++ b/include/linux/elevator.h
> @@ -117,7 +117,8 @@ extern void elv_merge_requests(struct request_queue *, struct request *,
> 			       struct request *);
> extern void elv_merged_request(struct request_queue *, struct request *,
> 		enum elv_merge);
> -extern bool elv_attempt_insert_merge(struct request_queue *, struct request *);
> +extern bool elv_attempt_insert_merge(struct request_queue *, struct request *,
> +				     struct list_head *);
> extern struct request *elv_former_request(struct request_queue *, struct request *);
> extern struct request *elv_latter_request(struct request_queue *, struct request *);
> 
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged()
  2021-06-23  9:36 [PATCH 0/2 v4] block: Fix deadlock when merging requests with BFQ Jan Kara
@ 2021-06-23  9:36 ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2021-06-23  9:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara

Currently, bfq does very little in bfq_requests_merged() and handles all
the request cleanup in bfq_finish_requeue_request() called from
blk_mq_free_request(). That is currently safe only because
blk_mq_free_request() is called shortly after bfq_requests_merged()
while bfqd->lock is still held. However to fix a lock inversion between
bfqd->lock and ioc->lock, we need to call blk_mq_free_request() after
dropping bfqd->lock. That would mean that already merged request could
be seen by other processes inside bfq queues and possibly dispatched to
the device which is wrong. So move cleanup of the request from
bfq_finish_requeue_request() to bfq_requests_merged().

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 41 +++++++++++++----------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index fedb0a8fd388..9433d38e486c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2433,7 +2433,7 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		*next_bfqq = bfq_init_rq(next);
 
 	if (!bfqq)
-		return;
+		goto remove;
 
 	/*
 	 * If next and rq belong to the same bfq_queue and next is older
@@ -2456,6 +2456,14 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		bfqq->next_rq = rq;
 
 	bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
+remove:
+	/* Merged request may be in the IO scheduler. Remove it. */
+	if (!RB_EMPTY_NODE(&next->rb_node)) {
+		bfq_remove_request(next->q, next);
+		if (next_bfqq)
+			bfqg_stats_update_io_remove(bfqq_group(next_bfqq),
+						    next->cmd_flags);
+	}
 }
 
 /* Must be called with bfqq != NULL */
@@ -6414,6 +6422,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 {
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;
+	unsigned long flags;
 
 	/*
 	 * rq either is not associated with any icq, or is an already
@@ -6431,39 +6440,15 @@ static void bfq_finish_requeue_request(struct request *rq)
 					     rq->io_start_time_ns,
 					     rq->cmd_flags);
 
+	spin_lock_irqsave(&bfqd->lock, flags);
 	if (likely(rq->rq_flags & RQF_STARTED)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&bfqd->lock, flags);
-
 		if (rq == bfqd->waited_rq)
 			bfq_update_inject_limit(bfqd, bfqq);
 
 		bfq_completed_request(bfqq, bfqd);
-		bfq_finish_requeue_request_body(bfqq);
-
-		spin_unlock_irqrestore(&bfqd->lock, flags);
-	} else {
-		/*
-		 * Request rq may be still/already in the scheduler,
-		 * in which case we need to remove it (this should
-		 * never happen in case of requeue). And we cannot
-		 * defer such a check and removal, to avoid
-		 * inconsistencies in the time interval from the end
-		 * of this function to the start of the deferred work.
-		 * This situation seems to occur only in process
-		 * context, as a consequence of a merge. In the
-		 * current version of the code, this implies that the
-		 * lock is held.
-		 */
-
-		if (!RB_EMPTY_NODE(&rq->rb_node)) {
-			bfq_remove_request(rq->q, rq);
-			bfqg_stats_update_io_remove(bfqq_group(bfqq),
-						    rq->cmd_flags);
-		}
-		bfq_finish_requeue_request_body(bfqq);
 	}
+	bfq_finish_requeue_request_body(bfqq);
+	spin_unlock_irqrestore(&bfqd->lock, flags);
 
 	/*
 	 * Reset private fields. In case of a requeue, this allows
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged()
  2021-05-28 12:30 [PATCH 0/2 v3] block: Fix deadlock when merging requests with BFQ Jan Kara
@ 2021-05-28 12:30 ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2021-05-28 12:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Khazhy Kumykov, Ming Lei, Paolo Valente, Jan Kara

Currently, bfq does very little in bfq_requests_merged() and handles all
the request cleanup in bfq_finish_requeue_request() called from
blk_mq_free_request(). That is currently safe only because
blk_mq_free_request() is called shortly after bfq_requests_merged()
while bfqd->lock is still held. However to fix a lock inversion between
bfqd->lock and ioc->lock, we need to call blk_mq_free_request() after
dropping bfqd->lock. That would mean that already merged request could
be seen by other processes inside bfq queues and possibly dispatched to
the device which is wrong. So move cleanup of the request from
bfq_finish_requeue_request() to bfq_requests_merged().

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 41 +++++++++++++----------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index acd1f881273e..50a29fdf51da 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2405,7 +2405,7 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		*next_bfqq = bfq_init_rq(next);
 
 	if (!bfqq)
-		return;
+		goto remove;
 
 	/*
 	 * If next and rq belong to the same bfq_queue and next is older
@@ -2428,6 +2428,14 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		bfqq->next_rq = rq;
 
 	bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
+remove:
+	/* Merged request may be in the IO scheduler. Remove it. */
+	if (!RB_EMPTY_NODE(&next->rb_node)) {
+		bfq_remove_request(next->q, next);
+		if (next_bfqq)
+			bfqg_stats_update_io_remove(bfqq_group(next_bfqq),
+						    next->cmd_flags);
+	}
 }
 
 /* Must be called with bfqq != NULL */
@@ -6376,6 +6384,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 {
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;
+	unsigned long flags;
 
 	/*
 	 * rq either is not associated with any icq, or is an already
@@ -6393,39 +6402,15 @@ static void bfq_finish_requeue_request(struct request *rq)
 					     rq->io_start_time_ns,
 					     rq->cmd_flags);
 
+	spin_lock_irqsave(&bfqd->lock, flags);
 	if (likely(rq->rq_flags & RQF_STARTED)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&bfqd->lock, flags);
-
 		if (rq == bfqd->waited_rq)
 			bfq_update_inject_limit(bfqd, bfqq);
 
 		bfq_completed_request(bfqq, bfqd);
-		bfq_finish_requeue_request_body(bfqq);
-
-		spin_unlock_irqrestore(&bfqd->lock, flags);
-	} else {
-		/*
-		 * Request rq may be still/already in the scheduler,
-		 * in which case we need to remove it (this should
-		 * never happen in case of requeue). And we cannot
-		 * defer such a check and removal, to avoid
-		 * inconsistencies in the time interval from the end
-		 * of this function to the start of the deferred work.
-		 * This situation seems to occur only in process
-		 * context, as a consequence of a merge. In the
-		 * current version of the code, this implies that the
-		 * lock is held.
-		 */
-
-		if (!RB_EMPTY_NODE(&rq->rb_node)) {
-			bfq_remove_request(rq->q, rq);
-			bfqg_stats_update_io_remove(bfqq_group(bfqq),
-						    rq->cmd_flags);
-		}
-		bfq_finish_requeue_request_body(bfqq);
 	}
+	bfq_finish_requeue_request_body(bfqq);
+	spin_unlock_irqrestore(&bfqd->lock, flags);
 
 	/*
 	 * Reset private fields. In case of a requeue, this allows
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-23  9:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24 10:04 [PATCH 0/2 v2] block: Fix deadlock when merging requests with BFQ Jan Kara
2021-05-24 10:04 ` [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged() Jan Kara
2021-05-28  9:29   ` Paolo Valente
2021-05-24 10:04 ` [PATCH 2/2] blk: Fix lock inversion between ioc lock and bfqd lock Jan Kara
2021-05-25  0:29   ` Ming Lei
2021-05-28  9:33   ` Paolo Valente
2021-05-28 12:30 [PATCH 0/2 v3] block: Fix deadlock when merging requests with BFQ Jan Kara
2021-05-28 12:30 ` [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged() Jan Kara
2021-06-23  9:36 [PATCH 0/2 v4] block: Fix deadlock when merging requests with BFQ Jan Kara
2021-06-23  9:36 ` [PATCH 1/2] bfq: Remove merged request already in bfq_requests_merged() Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).