Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests
@ 2021-04-06 21:49 Bart Van Assche
  2021-04-06 21:49 ` [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-06 21:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hi Jens,

This patch series fixes the race between iterating over requests and
freeing requests that has been reported by multiple different users over
the past two years. Please consider this patch series for kernel v5.13.

Thanks,

Bart.

Changes between v5 and v6:
- Fixed an additional race between iterating over tags and freeing scheduler
  requests that was spotted by Khazhy.
- Added two patches to fix the race conditions between updating the number of
  hardware queues and iterating over a tag set.

Changes between v4 and v5:
- Addressed Khazhy's review comments. Note: the changes that have been made
  in v5 only change behavior in case CONFIG_PROVE_RCU=y.

Changes between v3 and v4:
- Fixed support for tag sets shared across hardware queues.
- Renamed blk_mq_wait_for_tag_readers() into blk_mq_wait_for_tag_iter().
- Removed the fourth argument of blk_mq_queue_tag_busy_iter() again.

Changes between v2 and v3:
- Converted the single v2 patch into a series of three patches.
- Switched from SRCU to a combination of RCU and semaphores.

Changes between v1 and v2:
- Reformatted patch description.
- Added Tested-by/Reviewed-by tags.
- Changed srcu_barrier() calls into synchronize_srcu() calls.

Bart Van Assche (5):
  blk-mq: Move the elevator_exit() definition
  blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  blk-mq: Fix races between iterating over requests and freeing requests
  blk-mq: Make it safe to use RCU to iterate over
    blk_mq_tag_set.tag_list
  blk-mq: Fix a race between blk_mq_update_nr_hw_queues() and iterating
    over tags

 block/blk-core.c          |  34 +++++++++-
 block/blk-mq-tag.c        | 128 ++++++++++++++++++++++++++++++++++----
 block/blk-mq-tag.h        |   6 +-
 block/blk-mq.c            |  31 ++++++---
 block/blk-mq.h            |   1 +
 block/blk.h               |  11 +---
 block/elevator.c          |   9 +++
 drivers/scsi/hosts.c      |  16 ++---
 drivers/scsi/ufs/ufshcd.c |   4 +-
 include/linux/blk-mq.h    |   2 +
 10 files changed, 202 insertions(+), 40 deletions(-)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition
  2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
@ 2021-04-06 21:49 ` Bart Van Assche
  2021-04-07 15:36   ` John Garry
  2021-04-13  7:44   ` Daniel Wagner
  2021-04-06 21:49 ` [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-06 21:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Shin'ichiro Kawasaki, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, John Garry, Khazhy Kumykov

Since elevator_exit() has only one caller, move its definition from
block/blk.h into block/elevator.c. Remove the inline keyword since modern
compilers are smart enough to decide when to inline functions that occur
in the same compilation unit.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Khazhy Kumykov <khazhy@google.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk.h      | 9 ---------
 block/elevator.c | 8 ++++++++
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 8f4337c5a9e6..2ed6c684d63a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -199,15 +199,6 @@ void __elevator_exit(struct request_queue *, struct elevator_queue *);
 int elv_register_queue(struct request_queue *q, bool uevent);
 void elv_unregister_queue(struct request_queue *q);
 
-static inline void elevator_exit(struct request_queue *q,
-		struct elevator_queue *e)
-{
-	lockdep_assert_held(&q->sysfs_lock);
-
-	blk_mq_sched_free_requests(q);
-	__elevator_exit(q, e);
-}
-
 ssize_t part_size_show(struct device *dev, struct device_attribute *attr,
 		char *buf);
 ssize_t part_stat_show(struct device *dev, struct device_attribute *attr,
diff --git a/block/elevator.c b/block/elevator.c
index 293c5c81397a..4b20d1ab29cc 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -197,6 +197,14 @@ void __elevator_exit(struct request_queue *q, struct elevator_queue *e)
 	kobject_put(&e->kobj);
 }
 
+static void elevator_exit(struct request_queue *q, struct elevator_queue *e)
+{
+	lockdep_assert_held(&q->sysfs_lock);
+
+	blk_mq_sched_free_requests(q);
+	__elevator_exit(q, e);
+}
+
 static inline void __elv_rqhash_del(struct request *rq)
 {
 	hash_del(&rq->hash);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
  2021-04-06 21:49 ` [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition Bart Van Assche
@ 2021-04-06 21:49 ` Bart Van Assche
  2021-04-07 16:57   ` John Garry
  2021-04-13  7:50   ` Daniel Wagner
  2021-04-06 21:49 ` [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-06 21:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Shin'ichiro Kawasaki, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, John Garry, Khazhy Kumykov

Since in the next patch knowledge is required of whether or not it is
allowed to sleep inside the tag iteration functions, pass this context
information to the tag iteration functions. I have reviewed all callers of
tag iteration functions to verify these annotations by starting from the
output of the following grep command:

    git grep -nHE 'blk_mq_(all_tag|tagset_busy)_iter'

My conclusions from that analysis are as follows:
- Sleeping is allowed in the blk-mq-debugfs code that iterates over tags.
- Since the blk_mq_tagset_busy_iter() calls in the mtip32xx driver are
  preceded by a function that sleeps (blk_mq_quiesce_queue()), sleeping is
  safe in the context of the blk_mq_tagset_busy_iter() calls.
- The same reasoning also applies to the nbd driver.
- All blk_mq_tagset_busy_iter() calls in the NVMe drivers are followed by a
  call to a function that sleeps so sleeping inside blk_mq_tagset_busy_iter()
  when called from the NVMe driver is fine.
- scsi_host_busy(), scsi_host_complete_all_commands() and
  scsi_host_busy_iter() are used by multiple SCSI LLDs so analyzing whether
  or not these functions may sleep is hard. Instead of performing that
  analysis, make it safe to call these functions from atomic context.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Khazhy Kumykov <khazhy@google.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c        | 38 +++++++++++++++++++++++++++++++++-----
 block/blk-mq-tag.h        |  2 +-
 block/blk-mq.c            |  2 +-
 drivers/scsi/hosts.c      | 16 ++++++++--------
 drivers/scsi/ufs/ufshcd.c |  4 ++--
 include/linux/blk-mq.h    |  2 ++
 6 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..d8eaa38a1bd1 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -322,18 +322,19 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
 }
 
 /**
- * blk_mq_all_tag_iter - iterate over all requests in a tag map
+ * blk_mq_all_tag_iter_atomic - iterate over all requests in a tag map
  * @tags:	Tag map to iterate over.
  * @fn:		Pointer to the function that will be called for each
  *		request. @fn will be called as follows: @fn(rq, @priv,
  *		reserved) where rq is a pointer to a request. 'reserved'
  *		indicates whether or not @rq is a reserved request. Return
- *		true to continue iterating tags, false to stop.
+ *		true to continue iterating tags, false to stop. Must not
+ *		sleep.
  * @priv:	Will be passed as second argument to @fn.
  *
- * Caller has to pass the tag map from which requests are allocated.
+ * Does not sleep.
  */
-void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
 		void *priv)
 {
 	__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
@@ -348,6 +349,8 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
  *		indicates whether or not @rq is a reserved request. Return
  *		true to continue iterating tags, false to stop.
  * @priv:	Will be passed as second argument to @fn.
+ *
+ * May sleep.
  */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
@@ -362,6 +365,31 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
+/**
+ * blk_mq_tagset_busy_iter_atomic - iterate over all started requests in a tag set
+ * @tagset:	Tag set to iterate over.
+ * @fn:		Pointer to the function that will be called for each started
+ *		request. @fn will be called as follows: @fn(rq, @priv,
+ *		reserved) where rq is a pointer to a request. 'reserved'
+ *		indicates whether or not @rq is a reserved request. Return
+ *		true to continue iterating tags, false to stop. Must not sleep.
+ * @priv:	Will be passed as second argument to @fn.
+ *
+ * Does not sleep.
+ */
+void blk_mq_tagset_busy_iter_atomic(struct blk_mq_tag_set *tagset,
+		busy_tag_iter_fn *fn, void *priv)
+{
+	int i;
+
+	for (i = 0; i < tagset->nr_hw_queues; i++) {
+		if (tagset->tags && tagset->tags[i])
+			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
+					      BT_TAG_ITER_STARTED);
+	}
+}
+EXPORT_SYMBOL(blk_mq_tagset_busy_iter_atomic);
+
 static bool blk_mq_tagset_count_completed_rqs(struct request *rq,
 		void *data, bool reserved)
 {
@@ -384,7 +412,7 @@ void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset)
 	while (true) {
 		unsigned count = 0;
 
-		blk_mq_tagset_busy_iter(tagset,
+		blk_mq_tagset_busy_iter_atomic(tagset,
 				blk_mq_tagset_count_completed_rqs, &count);
 		if (!count)
 			break;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..0290c308ece9 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -43,7 +43,7 @@ extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
-void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
 		void *priv);
 
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d4d7c1caa439..5b170faa6318 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2483,7 +2483,7 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
 		.hctx	= hctx,
 	};
 
-	blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
+	blk_mq_all_tag_iter_atomic(tags, blk_mq_has_request, &data);
 	return data.has_rq;
 }
 
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 697c09ef259b..f8863aa88642 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -573,8 +573,8 @@ int scsi_host_busy(struct Scsi_Host *shost)
 {
 	int cnt = 0;
 
-	blk_mq_tagset_busy_iter(&shost->tag_set,
-				scsi_host_check_in_flight, &cnt);
+	blk_mq_tagset_busy_iter_atomic(&shost->tag_set,
+				       scsi_host_check_in_flight, &cnt);
 	return cnt;
 }
 EXPORT_SYMBOL(scsi_host_busy);
@@ -672,8 +672,8 @@ static bool complete_all_cmds_iter(struct request *rq, void *data, bool rsvd)
  */
 void scsi_host_complete_all_commands(struct Scsi_Host *shost, int status)
 {
-	blk_mq_tagset_busy_iter(&shost->tag_set, complete_all_cmds_iter,
-				&status);
+	blk_mq_tagset_busy_iter_atomic(&shost->tag_set, complete_all_cmds_iter,
+				       &status);
 }
 EXPORT_SYMBOL_GPL(scsi_host_complete_all_commands);
 
@@ -694,11 +694,11 @@ static bool __scsi_host_busy_iter_fn(struct request *req, void *priv,
 /**
  * scsi_host_busy_iter - Iterate over all busy commands
  * @shost:	Pointer to Scsi_Host.
- * @fn:		Function to call on each busy command
+ * @fn:		Function to call on each busy command. Must not sleep.
  * @priv:	Data pointer passed to @fn
  *
  * If locking against concurrent command completions is required
- * ithas to be provided by the caller
+ * it has to be provided by the caller.
  **/
 void scsi_host_busy_iter(struct Scsi_Host *shost,
 			 bool (*fn)(struct scsi_cmnd *, void *, bool),
@@ -709,7 +709,7 @@ void scsi_host_busy_iter(struct Scsi_Host *shost,
 		.priv = priv,
 	};
 
-	blk_mq_tagset_busy_iter(&shost->tag_set, __scsi_host_busy_iter_fn,
-				&iter_data);
+	blk_mq_tagset_busy_iter_atomic(&shost->tag_set,
+				       __scsi_host_busy_iter_fn, &iter_data);
 }
 EXPORT_SYMBOL_GPL(scsi_host_busy_iter);
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c86760788c72..6d2f8f18e2a3 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1380,7 +1380,7 @@ static bool ufshcd_any_tag_in_use(struct ufs_hba *hba)
 	struct request_queue *q = hba->cmd_queue;
 	int busy = 0;
 
-	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_is_busy, &busy);
+	blk_mq_tagset_busy_iter_atomic(q->tag_set, ufshcd_is_busy, &busy);
 	return busy;
 }
 
@@ -6269,7 +6269,7 @@ static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 		.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
 	};
 
-	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
+	blk_mq_tagset_busy_iter_atomic(q->tag_set, ufshcd_compl_tm, &ci);
 	return ci.ncpl ? IRQ_HANDLED : IRQ_NONE;
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2c473c9b8990..dfa0114a49fd 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -526,6 +526,8 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
+void blk_mq_tagset_busy_iter_atomic(struct blk_mq_tag_set *tagset,
+		busy_tag_iter_fn *fn, void *priv);
 void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests
  2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
  2021-04-06 21:49 ` [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition Bart Van Assche
  2021-04-06 21:49 ` [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter Bart Van Assche
@ 2021-04-06 21:49 ` Bart Van Assche
  2021-04-07  0:02   ` Khazhy Kumykov
  2021-04-06 21:49 ` [PATCH v6 4/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2021-04-06 21:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Shin'ichiro Kawasaki, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, John Garry, Khazhy Kumykov

When multiple request queues share a tag set and when switching the I/O
scheduler for one of the request queues associated with that tag set, the
following race can happen:
* blk_mq_tagset_busy_iter() calls bt_tags_iter() and bt_tags_iter() assigns
  a pointer to a scheduler request to the local variable 'rq'.
* blk_mq_sched_free_requests() is called to free hctx->sched_tags.
* blk_mq_tagset_busy_iter() dereferences 'rq' and triggers a use-after-free.

Fix this race as follows:
* Use rcu_assign_pointer() and rcu_dereference() to access hctx->tags->rqs[].
  The performance impact of the assignments added to the hot path is minimal
  (about 1% according to one particular test).
* Protect hctx->tags->rqs[] reads with an RCU read-side lock or with a
  semaphore. Which mechanism is used depends on whether or not it is allowed
  to sleep and also on whether or not the callback function may sleep.
* Wait for all preexisting readers to finish before freeing scheduler
  requests.

Another race is as follows:
* blk_mq_sched_free_requests() is called to free hctx->sched_tags.
* blk_mq_queue_tag_busy_iter() iterates over the same tag set but for another
  request queue than the queue for which scheduler tags are being freed.
* bt_iter() examines rq->q after *rq has been freed.

Fix this race by protecting the rq->q read in bt_iter() with an RCU read
lock and by calling synchronize_rcu() before freeing scheduler tags.

Multiple users have reported use-after-free complaints similar to the
following (from https://lore.kernel.org/linux-block/1545261885.185366.488.camel@acm.org/ ):

BUG: KASAN: use-after-free in bt_iter+0x86/0xf0
Read of size 8 at addr ffff88803b335240 by task fio/21412

CPU: 0 PID: 21412 Comm: fio Tainted: G        W         4.20.0-rc6-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 print_address_description+0x71/0x239
 kasan_report.cold.5+0x242/0x301
 __asan_load8+0x54/0x90
 bt_iter+0x86/0xf0
 blk_mq_queue_tag_busy_iter+0x373/0x5e0
 blk_mq_in_flight+0x96/0xb0
 part_in_flight+0x40/0x140
 part_round_stats+0x18e/0x370
 blk_account_io_start+0x3d7/0x670
 blk_mq_bio_to_request+0x19c/0x3a0
 blk_mq_make_request+0x7a9/0xcb0
 generic_make_request+0x41d/0x960
 submit_bio+0x9b/0x250
 do_blockdev_direct_IO+0x435c/0x4c70
 __blockdev_direct_IO+0x79/0x88
 ext4_direct_IO+0x46c/0xc00
 generic_file_direct_write+0x119/0x210
 __generic_file_write_iter+0x11c/0x280
 ext4_file_write_iter+0x1b8/0x6f0
 aio_write+0x204/0x310
 io_submit_one+0x9d3/0xe80
 __x64_sys_io_submit+0x115/0x340
 do_syscall_64+0x71/0x210

Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Khazhy Kumykov <khazhy@google.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c   | 34 ++++++++++++++++++++++++++++++-
 block/blk-mq-tag.c | 51 ++++++++++++++++++++++++++++++++++++++++------
 block/blk-mq-tag.h |  4 +++-
 block/blk-mq.c     | 21 +++++++++++++++----
 block/blk-mq.h     |  1 +
 block/blk.h        |  2 ++
 block/elevator.c   |  1 +
 7 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9bcdae93f6d4..ca7f833e25a8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -279,6 +279,36 @@ void blk_dump_rq_flags(struct request *rq, char *msg)
 }
 EXPORT_SYMBOL(blk_dump_rq_flags);
 
+/**
+ * blk_mq_wait_for_tag_iter - wait for preexisting tag iteration functions to finish
+ * @set: Tag set to wait on.
+ *
+ * Waits for preexisting calls of blk_mq_all_tag_iter(),
+ * blk_mq_tagset_busy_iter() and also for their atomic variants to finish
+ * accessing hctx->tags->rqs[]. New readers may start while this function is
+ * in progress or after this function has returned. Use this function to make
+ * sure that hctx->tags->rqs[] changes have become globally visible.
+ *
+ * Waits for preexisting blk_mq_queue_tag_busy_iter(q, fn, priv) calls to
+ * finish accessing requests associated with other request queues than 'q'.
+ */
+void blk_mq_wait_for_tag_iter(struct blk_mq_tag_set *set)
+{
+	struct blk_mq_tags *tags;
+	int i;
+
+	if (set->tags) {
+		for (i = 0; i < set->nr_hw_queues; i++) {
+			tags = set->tags[i];
+			if (!tags)
+				continue;
+			down_write(&tags->iter_rwsem);
+			up_write(&tags->iter_rwsem);
+		}
+	}
+	synchronize_rcu();
+}
+
 /**
  * blk_sync_queue - cancel any pending callbacks on a queue
  * @q: the queue
@@ -412,8 +442,10 @@ void blk_cleanup_queue(struct request_queue *q)
 	 * it is safe to free requests now.
 	 */
 	mutex_lock(&q->sysfs_lock);
-	if (q->elevator)
+	if (q->elevator) {
+		blk_mq_wait_for_tag_iter(q->tag_set);
 		blk_mq_sched_free_requests(q);
+	}
 	mutex_unlock(&q->sysfs_lock);
 
 	percpu_ref_exit(&q->q_usage_counter);
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d8eaa38a1bd1..a73e81d75fb8 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -209,14 +209,24 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
-	rq = tags->rqs[bitnr];
-
+	rcu_read_lock();
+	/*
+	 * The request 'rq' points at is protected by an RCU read lock until
+	 * its queue pointer has been verified and by q_usage_count while the
+	 * callback function is being invoked. an See also the
+	 * percpu_ref_tryget() and blk_queue_exit() calls in
+	 * blk_mq_queue_tag_busy_iter().
+	 */
+	rq = rcu_dereference(tags->rqs[bitnr]);
 	/*
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx) {
+		rcu_read_unlock();
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
+	}
+	rcu_read_unlock();
 	return true;
 }
 
@@ -254,11 +264,17 @@ struct bt_tags_iter_data {
 	unsigned int flags;
 };
 
+/* Include reserved tags. */
 #define BT_TAG_ITER_RESERVED		(1 << 0)
+/* Only include started requests. */
 #define BT_TAG_ITER_STARTED		(1 << 1)
+/* Iterate over tags->static_rqs[] instead of tags->rqs[]. */
 #define BT_TAG_ITER_STATIC_RQS		(1 << 2)
+/* The callback function may sleep. */
+#define BT_TAG_ITER_MAY_SLEEP		(1 << 3)
 
-static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
+static bool __bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr,
+			   void *data)
 {
 	struct bt_tags_iter_data *iter_data = data;
 	struct blk_mq_tags *tags = iter_data->tags;
@@ -275,7 +291,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
 		rq = tags->static_rqs[bitnr];
 	else
-		rq = tags->rqs[bitnr];
+		rq = rcu_dereference_check(tags->rqs[bitnr],
+					   lockdep_is_held(&tags->iter_rwsem));
 	if (!rq)
 		return true;
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
@@ -284,6 +301,25 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	return iter_data->fn(rq, iter_data->data, reserved);
 }
 
+static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
+{
+	struct bt_tags_iter_data *iter_data = data;
+	struct blk_mq_tags *tags = iter_data->tags;
+	bool res;
+
+	if (iter_data->flags & BT_TAG_ITER_MAY_SLEEP) {
+		down_read(&tags->iter_rwsem);
+		res = __bt_tags_iter(bitmap, bitnr, data);
+		up_read(&tags->iter_rwsem);
+	} else {
+		rcu_read_lock();
+		res = __bt_tags_iter(bitmap, bitnr, data);
+		rcu_read_unlock();
+	}
+
+	return res;
+}
+
 /**
  * bt_tags_for_each - iterate over the requests in a tag map
  * @tags:	Tag map to iterate over.
@@ -357,10 +393,12 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 {
 	int i;
 
+	might_sleep();
+
 	for (i = 0; i < tagset->nr_hw_queues; i++) {
 		if (tagset->tags && tagset->tags[i])
 			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
-					      BT_TAG_ITER_STARTED);
+				BT_TAG_ITER_STARTED | BT_TAG_ITER_MAY_SLEEP);
 	}
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
@@ -544,6 +582,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	init_rwsem(&tags->iter_rwsem);
 
 	if (blk_mq_is_sbitmap_shared(flags))
 		return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 0290c308ece9..d1d73d7cc7df 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -17,9 +17,11 @@ struct blk_mq_tags {
 	struct sbitmap_queue __bitmap_tags;
 	struct sbitmap_queue __breserved_tags;
 
-	struct request **rqs;
+	struct request __rcu **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+	struct rw_semaphore iter_rwsem;
 };
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5b170faa6318..d6c9b655c0f5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -495,8 +495,10 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_crypto_free_request(rq);
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
-	if (rq->tag != BLK_MQ_NO_TAG)
+	if (rq->tag != BLK_MQ_NO_TAG) {
 		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
+		rcu_assign_pointer(hctx->tags->rqs[rq->tag], NULL);
+	}
 	if (sched_tag != BLK_MQ_NO_TAG)
 		blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
@@ -838,9 +840,20 @@ EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
+	struct request *rq;
+
 	if (tag < tags->nr_tags) {
-		prefetch(tags->rqs[tag]);
-		return tags->rqs[tag];
+		/*
+		 * Freeing tags happens with the request queue frozen so the
+		 * rcu dereference below is protected by the request queue
+		 * usage count. We can only verify that usage count after
+		 * having read the request pointer.
+		 */
+		rq = rcu_dereference_check(tags->rqs[tag], true);
+		WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && rq &&
+			     percpu_ref_is_zero(&rq->q->q_usage_counter));
+		prefetch(rq);
+		return rq;
 	}
 
 	return NULL;
@@ -1111,7 +1124,7 @@ static bool blk_mq_get_driver_tag(struct request *rq)
 		rq->rq_flags |= RQF_MQ_INFLIGHT;
 		__blk_mq_inc_active_requests(hctx);
 	}
-	hctx->tags->rqs[rq->tag] = rq;
+	rcu_assign_pointer(hctx->tags->rqs[rq->tag], rq);
 	return true;
 }
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3616453ca28c..9ccb1818303b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -226,6 +226,7 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
 	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
+	rcu_assign_pointer(hctx->tags->rqs[rq->tag], NULL);
 	rq->tag = BLK_MQ_NO_TAG;
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
diff --git a/block/blk.h b/block/blk.h
index 2ed6c684d63a..aadf0d82a028 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -185,6 +185,8 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
 void blk_account_io_start(struct request *req);
 void blk_account_io_done(struct request *req, u64 now);
 
+void blk_mq_wait_for_tag_iter(struct blk_mq_tag_set *set);
+
 /*
  * Internal elevator interface
  */
diff --git a/block/elevator.c b/block/elevator.c
index 4b20d1ab29cc..70a10e31b336 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -201,6 +201,7 @@ static void elevator_exit(struct request_queue *q, struct elevator_queue *e)
 {
 	lockdep_assert_held(&q->sysfs_lock);
 
+	blk_mq_wait_for_tag_iter(q->tag_set);
 	blk_mq_sched_free_requests(q);
 	__elevator_exit(q, e);
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v6 4/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list
  2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
                   ` (2 preceding siblings ...)
  2021-04-06 21:49 ` [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests Bart Van Assche
@ 2021-04-06 21:49 ` Bart Van Assche
  2021-04-06 21:49 ` [PATCH v6 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags Bart Van Assche
  2021-04-08  6:45 ` [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Shinichiro Kawasaki
  5 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-06 21:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Shin'ichiro Kawasaki, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, John Garry, Daniel Wagner

Since the next patch in this series will use RCU to iterate over tag_list,
make this safe. Note: call_rcu() is already used to free the request queue.
From blk-sysfs.c:

	call_rcu(&q->rcu_head, blk_free_queue_rcu);

See also:
* Commit 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over
  blk_mq_tag_set.tag_list"; v4.12).
* Commit 08c875cbf481 ("block: Use non _rcu version of list functions for
  tag_set_list"; v5.9).

Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d6c9b655c0f5..ee98ce03fdc9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2946,7 +2946,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 	struct blk_mq_tag_set *set = q->tag_set;
 
 	mutex_lock(&set->tag_list_lock);
-	list_del(&q->tag_set_list);
+	list_del_rcu(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
@@ -2954,7 +2954,11 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 		blk_mq_update_tag_set_shared(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
-	INIT_LIST_HEAD(&q->tag_set_list);
+	/*
+	 * Calling synchronize_rcu() and INIT_LIST_HEAD(&q->tag_set_list) is
+	 * not necessary since blk_mq_del_queue_tag_set() is only called from
+	 * blk_cleanup_queue().
+	 */
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v6 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags
  2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
                   ` (3 preceding siblings ...)
  2021-04-06 21:49 ` [PATCH v6 4/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
@ 2021-04-06 21:49 ` Bart Van Assche
  2021-04-07  0:04   ` Khazhy Kumykov
  2021-04-08  6:45 ` [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Shinichiro Kawasaki
  5 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2021-04-06 21:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Khazhy Kumykov,
	Martin K . Petersen, Shin'ichiro Kawasaki, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, John Garry

Serialize the tag set modifications performed by blk_mq_update_nr_hw_queues()
and iterating over a tag set to prevent that iterating over a tag set crashes
due to a tag set being examined while it is being modified.

Reported-by: Khazhy Kumykov <khazhy@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Khazhy Kumykov <khazhy@google.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a73e81d75fb8..bb2b9d412c41 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -376,6 +376,31 @@ void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
 	__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
 }
 
+/*
+ * Iterate over all request queues in a tag set, find the first queue with a
+ * non-zero usage count, increment its usage count and return the pointer to
+ * that request queue. This prevents that blk_mq_update_nr_hw_queues() will
+ * modify @set->nr_hw_queues while iterating over tags since
+ * blk_mq_update_nr_hw_queues() only modifies @set->nr_hw_queues while the
+ * usage count of all queues associated with a tag set is zero.
+ */
+static struct request_queue *
+blk_mq_get_any_tagset_queue(struct blk_mq_tag_set *set)
+{
+	struct request_queue *q;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(q, &set->tag_list, tag_set_list) {
+		if (percpu_ref_tryget(&q->q_usage_counter)) {
+			rcu_read_unlock();
+			return q;
+		}
+	}
+	rcu_read_unlock();
+
+	return NULL;
+}
+
 /**
  * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set
  * @tagset:	Tag set to iterate over.
@@ -391,15 +416,22 @@ void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
 {
+	struct request_queue *q;
 	int i;
 
 	might_sleep();
 
+	q = blk_mq_get_any_tagset_queue(tagset);
+	if (!q)
+		return;
+
 	for (i = 0; i < tagset->nr_hw_queues; i++) {
 		if (tagset->tags && tagset->tags[i])
 			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
 				BT_TAG_ITER_STARTED | BT_TAG_ITER_MAY_SLEEP);
 	}
+
+	blk_queue_exit(q);
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
@@ -418,13 +450,20 @@ EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 void blk_mq_tagset_busy_iter_atomic(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
 {
+	struct request_queue *q;
 	int i;
 
+	q = blk_mq_get_any_tagset_queue(tagset);
+	if (!q)
+		return;
+
 	for (i = 0; i < tagset->nr_hw_queues; i++) {
 		if (tagset->tags && tagset->tags[i])
 			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
 					      BT_TAG_ITER_STARTED);
 	}
+
+	blk_queue_exit(q);
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter_atomic);
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests
  2021-04-06 21:49 ` [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests Bart Van Assche
@ 2021-04-07  0:02   ` Khazhy Kumykov
  2021-04-07 21:54     ` Bart Van Assche
  2021-04-20 21:41     ` Bart Van Assche
  0 siblings, 2 replies; 23+ messages in thread
From: Khazhy Kumykov @ 2021-04-07  0:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry


[-- Attachment #1: Type: text/plain, Size: 13789 bytes --]

On Tue, Apr 6, 2021 at 2:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> When multiple request queues share a tag set and when switching the I/O
> scheduler for one of the request queues associated with that tag set, the
> following race can happen:
> * blk_mq_tagset_busy_iter() calls bt_tags_iter() and bt_tags_iter() assigns
>   a pointer to a scheduler request to the local variable 'rq'.
> * blk_mq_sched_free_requests() is called to free hctx->sched_tags.
> * blk_mq_tagset_busy_iter() dereferences 'rq' and triggers a use-after-free.
>
> Fix this race as follows:
> * Use rcu_assign_pointer() and rcu_dereference() to access hctx->tags->rqs[].
>   The performance impact of the assignments added to the hot path is minimal
>   (about 1% according to one particular test).
> * Protect hctx->tags->rqs[] reads with an RCU read-side lock or with a
>   semaphore. Which mechanism is used depends on whether or not it is allowed
>   to sleep and also on whether or not the callback function may sleep.
> * Wait for all preexisting readers to finish before freeing scheduler
>   requests.
>
> Another race is as follows:
> * blk_mq_sched_free_requests() is called to free hctx->sched_tags.
> * blk_mq_queue_tag_busy_iter() iterates over the same tag set but for another
>   request queue than the queue for which scheduler tags are being freed.
> * bt_iter() examines rq->q after *rq has been freed.
>
> Fix this race by protecting the rq->q read in bt_iter() with an RCU read
> lock and by calling synchronize_rcu() before freeing scheduler tags.
>
> Multiple users have reported use-after-free complaints similar to the
> following (from https://lore.kernel.org/linux-block/1545261885.185366.488.camel@acm.org/ ):
>
> BUG: KASAN: use-after-free in bt_iter+0x86/0xf0
> Read of size 8 at addr ffff88803b335240 by task fio/21412
>
> CPU: 0 PID: 21412 Comm: fio Tainted: G        W         4.20.0-rc6-dbg+ #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> Call Trace:
>  dump_stack+0x86/0xca
>  print_address_description+0x71/0x239
>  kasan_report.cold.5+0x242/0x301
>  __asan_load8+0x54/0x90
>  bt_iter+0x86/0xf0
>  blk_mq_queue_tag_busy_iter+0x373/0x5e0
>  blk_mq_in_flight+0x96/0xb0
>  part_in_flight+0x40/0x140
>  part_round_stats+0x18e/0x370
>  blk_account_io_start+0x3d7/0x670
>  blk_mq_bio_to_request+0x19c/0x3a0
>  blk_mq_make_request+0x7a9/0xcb0
>  generic_make_request+0x41d/0x960
>  submit_bio+0x9b/0x250
>  do_blockdev_direct_IO+0x435c/0x4c70
>  __blockdev_direct_IO+0x79/0x88
>  ext4_direct_IO+0x46c/0xc00
>  generic_file_direct_write+0x119/0x210
>  __generic_file_write_iter+0x11c/0x280
>  ext4_file_write_iter+0x1b8/0x6f0
>  aio_write+0x204/0x310
>  io_submit_one+0x9d3/0xe80
>  __x64_sys_io_submit+0x115/0x340
>  do_syscall_64+0x71/0x210
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Khazhy Kumykov <khazhy@google.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-core.c   | 34 ++++++++++++++++++++++++++++++-
>  block/blk-mq-tag.c | 51 ++++++++++++++++++++++++++++++++++++++++------
>  block/blk-mq-tag.h |  4 +++-
>  block/blk-mq.c     | 21 +++++++++++++++----
>  block/blk-mq.h     |  1 +
>  block/blk.h        |  2 ++
>  block/elevator.c   |  1 +
>  7 files changed, 102 insertions(+), 12 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 9bcdae93f6d4..ca7f833e25a8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -279,6 +279,36 @@ void blk_dump_rq_flags(struct request *rq, char *msg)
>  }
>  EXPORT_SYMBOL(blk_dump_rq_flags);
>
> +/**
> + * blk_mq_wait_for_tag_iter - wait for preexisting tag iteration functions to finish
> + * @set: Tag set to wait on.
> + *
> + * Waits for preexisting calls of blk_mq_all_tag_iter(),
> + * blk_mq_tagset_busy_iter() and also for their atomic variants to finish
> + * accessing hctx->tags->rqs[]. New readers may start while this function is
> + * in progress or after this function has returned. Use this function to make
> + * sure that hctx->tags->rqs[] changes have become globally visible.
> + *
> + * Waits for preexisting blk_mq_queue_tag_busy_iter(q, fn, priv) calls to
> + * finish accessing requests associated with other request queues than 'q'.
> + */
> +void blk_mq_wait_for_tag_iter(struct blk_mq_tag_set *set)
> +{
> +       struct blk_mq_tags *tags;
> +       int i;
> +
> +       if (set->tags) {
> +               for (i = 0; i < set->nr_hw_queues; i++) {
> +                       tags = set->tags[i];
> +                       if (!tags)
> +                               continue;
> +                       down_write(&tags->iter_rwsem);
> +                       up_write(&tags->iter_rwsem);
> +               }
> +       }
> +       synchronize_rcu();
> +}
> +
>  /**
>   * blk_sync_queue - cancel any pending callbacks on a queue
>   * @q: the queue
> @@ -412,8 +442,10 @@ void blk_cleanup_queue(struct request_queue *q)
>          * it is safe to free requests now.
>          */
>         mutex_lock(&q->sysfs_lock);
> -       if (q->elevator)
> +       if (q->elevator) {
> +               blk_mq_wait_for_tag_iter(q->tag_set);
>                 blk_mq_sched_free_requests(q);
> +       }
>         mutex_unlock(&q->sysfs_lock);
>
>         percpu_ref_exit(&q->q_usage_counter);
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d8eaa38a1bd1..a73e81d75fb8 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -209,14 +209,24 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>
>         if (!reserved)
>                 bitnr += tags->nr_reserved_tags;
> -       rq = tags->rqs[bitnr];
> -
> +       rcu_read_lock();
> +       /*
> +        * The request 'rq' points at is protected by an RCU read lock until
> +        * its queue pointer has been verified and by q_usage_count while the
> +        * callback function is being invoked. an See also the
extra "an"?
> +        * percpu_ref_tryget() and blk_queue_exit() calls in
> +        * blk_mq_queue_tag_busy_iter().
> +        */
> +       rq = rcu_dereference(tags->rqs[bitnr]);
>         /*
>          * We can hit rq == NULL here, because the tagging functions
>          * test and set the bit before assigning ->rqs[].
>          */
> -       if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
> +       if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx) {
> +               rcu_read_unlock();
>                 return iter_data->fn(hctx, rq, iter_data->data, reserved);
> +       }
> +       rcu_read_unlock();
>         return true;
>  }
>
> @@ -254,11 +264,17 @@ struct bt_tags_iter_data {
>         unsigned int flags;
>  };
>
> +/* Include reserved tags. */
>  #define BT_TAG_ITER_RESERVED           (1 << 0)
> +/* Only include started requests. */
>  #define BT_TAG_ITER_STARTED            (1 << 1)
> +/* Iterate over tags->static_rqs[] instead of tags->rqs[]. */
>  #define BT_TAG_ITER_STATIC_RQS         (1 << 2)
> +/* The callback function may sleep. */
> +#define BT_TAG_ITER_MAY_SLEEP          (1 << 3)
>
> -static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> +static bool __bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr,
> +                          void *data)
>  {
>         struct bt_tags_iter_data *iter_data = data;
>         struct blk_mq_tags *tags = iter_data->tags;
> @@ -275,7 +291,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>         if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
>                 rq = tags->static_rqs[bitnr];
>         else
> -               rq = tags->rqs[bitnr];
> +               rq = rcu_dereference_check(tags->rqs[bitnr],
> +                                          lockdep_is_held(&tags->iter_rwsem));
>         if (!rq)
>                 return true;
>         if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
> @@ -284,6 +301,25 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>         return iter_data->fn(rq, iter_data->data, reserved);
>  }
>
> +static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> +{
> +       struct bt_tags_iter_data *iter_data = data;
> +       struct blk_mq_tags *tags = iter_data->tags;
> +       bool res;
> +
> +       if (iter_data->flags & BT_TAG_ITER_MAY_SLEEP) {
> +               down_read(&tags->iter_rwsem);
> +               res = __bt_tags_iter(bitmap, bitnr, data);
> +               up_read(&tags->iter_rwsem);
> +       } else {
> +               rcu_read_lock();
> +               res = __bt_tags_iter(bitmap, bitnr, data);
> +               rcu_read_unlock();
> +       }
> +
> +       return res;
> +}
> +
>  /**
>   * bt_tags_for_each - iterate over the requests in a tag map
>   * @tags:      Tag map to iterate over.
> @@ -357,10 +393,12 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
>  {
>         int i;
>
> +       might_sleep();
> +
>         for (i = 0; i < tagset->nr_hw_queues; i++) {
>                 if (tagset->tags && tagset->tags[i])
>                         __blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
> -                                             BT_TAG_ITER_STARTED);
> +                               BT_TAG_ITER_STARTED | BT_TAG_ITER_MAY_SLEEP);
>         }
>  }
>  EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
> @@ -544,6 +582,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>
>         tags->nr_tags = total_tags;
>         tags->nr_reserved_tags = reserved_tags;
> +       init_rwsem(&tags->iter_rwsem);
>
>         if (blk_mq_is_sbitmap_shared(flags))
>                 return tags;
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 0290c308ece9..d1d73d7cc7df 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -17,9 +17,11 @@ struct blk_mq_tags {
>         struct sbitmap_queue __bitmap_tags;
>         struct sbitmap_queue __breserved_tags;
>
> -       struct request **rqs;
> +       struct request __rcu **rqs;
>         struct request **static_rqs;
>         struct list_head page_list;
> +
> +       struct rw_semaphore iter_rwsem;
>  };
>
>  extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5b170faa6318..d6c9b655c0f5 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -495,8 +495,10 @@ static void __blk_mq_free_request(struct request *rq)
>         blk_crypto_free_request(rq);
>         blk_pm_mark_last_busy(rq);
>         rq->mq_hctx = NULL;
> -       if (rq->tag != BLK_MQ_NO_TAG)
> +       if (rq->tag != BLK_MQ_NO_TAG) {
>                 blk_mq_put_tag(hctx->tags, ctx, rq->tag);
> +               rcu_assign_pointer(hctx->tags->rqs[rq->tag], NULL);
> +       }
>         if (sched_tag != BLK_MQ_NO_TAG)
>                 blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
>         blk_mq_sched_restart(hctx);
> @@ -838,9 +840,20 @@ EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> +       struct request *rq;
> +
>         if (tag < tags->nr_tags) {
> -               prefetch(tags->rqs[tag]);
> -               return tags->rqs[tag];
> +               /*
> +                * Freeing tags happens with the request queue frozen so the
> +                * rcu dereference below is protected by the request queue
> +                * usage count. We can only verify that usage count after
> +                * having read the request pointer.
> +                */
> +               rq = rcu_dereference_check(tags->rqs[tag], true);
> +               WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && rq &&
> +                            percpu_ref_is_zero(&rq->q->q_usage_counter));
> +               prefetch(rq);
> +               return rq;
>         }
>
>         return NULL;
> @@ -1111,7 +1124,7 @@ static bool blk_mq_get_driver_tag(struct request *rq)
>                 rq->rq_flags |= RQF_MQ_INFLIGHT;
>                 __blk_mq_inc_active_requests(hctx);
>         }
> -       hctx->tags->rqs[rq->tag] = rq;
> +       rcu_assign_pointer(hctx->tags->rqs[rq->tag], rq);
>         return true;
>  }
>
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 3616453ca28c..9ccb1818303b 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -226,6 +226,7 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
>                                            struct request *rq)
>  {
>         blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
> +       rcu_assign_pointer(hctx->tags->rqs[rq->tag], NULL);
>         rq->tag = BLK_MQ_NO_TAG;
>
>         if (rq->rq_flags & RQF_MQ_INFLIGHT) {
> diff --git a/block/blk.h b/block/blk.h
> index 2ed6c684d63a..aadf0d82a028 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -185,6 +185,8 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
>  void blk_account_io_start(struct request *req);
>  void blk_account_io_done(struct request *req, u64 now);
>
> +void blk_mq_wait_for_tag_iter(struct blk_mq_tag_set *set);
> +
>  /*
>   * Internal elevator interface
>   */
> diff --git a/block/elevator.c b/block/elevator.c
> index 4b20d1ab29cc..70a10e31b336 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -201,6 +201,7 @@ static void elevator_exit(struct request_queue *q, struct elevator_queue *e)
>  {
>         lockdep_assert_held(&q->sysfs_lock);
>
> +       blk_mq_wait_for_tag_iter(q->tag_set);
>         blk_mq_sched_free_requests(q);
>         __elevator_exit(q, e);
>  }

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags
  2021-04-06 21:49 ` [PATCH v6 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags Bart Van Assche
@ 2021-04-07  0:04   ` Khazhy Kumykov
  0 siblings, 0 replies; 23+ messages in thread
From: Khazhy Kumykov @ 2021-04-07  0:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry


[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

On Tue, Apr 6, 2021 at 2:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
>

series lgtm, thanks!

Reviewed-By: Khazhismel Kumykov <khazhy@google.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition
  2021-04-06 21:49 ` [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition Bart Van Assche
@ 2021-04-07 15:36   ` John Garry
  2021-04-20 21:30     ` Bart Van Assche
  2021-04-13  7:44   ` Daniel Wagner
  1 sibling, 1 reply; 23+ messages in thread
From: John Garry @ 2021-04-07 15:36 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

On 06/04/2021 22:49, Bart Van Assche wrote:
> Since elevator_exit() has only one caller, move its definition from
> block/blk.h into block/elevator.c. Remove the inline keyword since modern
> compilers are smart enough to decide when to inline functions that occur
> in the same compilation unit.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Khazhy Kumykov <khazhy@google.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/blk.h      | 9 ---------
>   block/elevator.c | 8 ++++++++
>   2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk.h b/block/blk.h
> index 8f4337c5a9e6..2ed6c684d63a 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -199,15 +199,6 @@ void __elevator_exit(struct request_queue *, struct elevator_queue *);
>   int elv_register_queue(struct request_queue *q, bool uevent);
>   void elv_unregister_queue(struct request_queue *q);
>   
> -static inline void elevator_exit(struct request_queue *q,
> -		struct elevator_queue *e)
> -{
> -	lockdep_assert_held(&q->sysfs_lock);
> -
> -	blk_mq_sched_free_requests(q);
> -	__elevator_exit(q, e);
> -}
> -
>   ssize_t part_size_show(struct device *dev, struct device_attribute *attr,
>   		char *buf);
>   ssize_t part_stat_show(struct device *dev, struct device_attribute *attr,
> diff --git a/block/elevator.c b/block/elevator.c
> index 293c5c81397a..4b20d1ab29cc 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -197,6 +197,14 @@ void __elevator_exit(struct request_queue *q, struct elevator_queue *e)
>   	kobject_put(&e->kobj);
>   }
>   
> +static void elevator_exit(struct request_queue *q, struct elevator_queue *e)
> +{
> +	lockdep_assert_held(&q->sysfs_lock);
> +
> +	blk_mq_sched_free_requests(q);
> +	__elevator_exit(q, e);

To me, it seems odd that the double-underscore prefix symbol is public 
(__elevator_exit), while the companion symbol (elevator_exit) is private.

But it looks a sensible change to bring into the c file anyway.

Thanks,
John

> +}
> +
>   static inline void __elv_rqhash_del(struct request *rq)
>   {
>   	hash_del(&rq->hash);
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-06 21:49 ` [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter Bart Van Assche
@ 2021-04-07 16:57   ` John Garry
  2021-04-07 18:42     ` Bart Van Assche
  2021-04-13  7:50   ` Daniel Wagner
  1 sibling, 1 reply; 23+ messages in thread
From: John Garry @ 2021-04-07 16:57 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

On 06/04/2021 22:49, Bart Van Assche wrote:
> Since in the next patch knowledge is required of whether or not it is
> allowed to sleep inside the tag iteration functions, pass this context
> information to the tag iteration functions. I have reviewed all callers of
> tag iteration functions to verify these annotations by starting from the
> output of the following grep command:
> 
>      git grep -nHE 'blk_mq_(all_tag|tagset_busy)_iter'
> 
> My conclusions from that analysis are as follows:
> - Sleeping is allowed in the blk-mq-debugfs code that iterates over tags.
> - Since the blk_mq_tagset_busy_iter() calls in the mtip32xx driver are
>    preceded by a function that sleeps (blk_mq_quiesce_queue()), sleeping is
>    safe in the context of the blk_mq_tagset_busy_iter() calls.
> - The same reasoning also applies to the nbd driver.
> - All blk_mq_tagset_busy_iter() calls in the NVMe drivers are followed by a
>    call to a function that sleeps so sleeping inside blk_mq_tagset_busy_iter()
>    when called from the NVMe driver is fine.

Hi Bart,

> - scsi_host_busy(), scsi_host_complete_all_commands() and
>    scsi_host_busy_iter() are used by multiple SCSI LLDs so analyzing whether
>    or not these functions may sleep is hard. Instead of performing that
>    analysis, make it safe to call these functions from atomic context.

Please help me understand this solution. The background is that we are 
unsure if the SCSI iters callback functions may sleep. So we use the 
blk_mq_all_tag_iter_atomic() iter, which tells us that we must not 
sleep. And internally, it uses rcu read lock protection mechanism, which 
relies on not sleeping. So it seems that we're making the SCSI iter 
functions being safe in atomic context, and, as such, rely on the iter 
callbacks not to sleep.

But if we call the SCSI iter function from non-atomic context and the 
iter callback may sleep, then that is a problem, right? We're still 
using rcu.

Thanks,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-07 16:57   ` John Garry
@ 2021-04-07 18:42     ` Bart Van Assche
  2021-04-08 12:48       ` John Garry
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2021-04-07 18:42 UTC (permalink / raw)
  To: John Garry, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

On 4/7/21 9:57 AM, John Garry wrote:
> On 06/04/2021 22:49, Bart Van Assche wrote:
>> Since in the next patch knowledge is required of whether or not it is
>> allowed to sleep inside the tag iteration functions, pass this context
>> information to the tag iteration functions. I have reviewed all 
>> callers of
>> tag iteration functions to verify these annotations by starting from the
>> output of the following grep command:
>>
>>      git grep -nHE 'blk_mq_(all_tag|tagset_busy)_iter'
>>
>> My conclusions from that analysis are as follows:
>> - Sleeping is allowed in the blk-mq-debugfs code that iterates over tags.
>> - Since the blk_mq_tagset_busy_iter() calls in the mtip32xx driver are
>>    preceded by a function that sleeps (blk_mq_quiesce_queue()), 
>> sleeping is
>>    safe in the context of the blk_mq_tagset_busy_iter() calls.
>> - The same reasoning also applies to the nbd driver.
>> - All blk_mq_tagset_busy_iter() calls in the NVMe drivers are followed 
>> by a
>>    call to a function that sleeps so sleeping inside 
>> blk_mq_tagset_busy_iter()
>>    when called from the NVMe driver is fine.
> 
> Hi Bart,
> 
>> - scsi_host_busy(), scsi_host_complete_all_commands() and
>>    scsi_host_busy_iter() are used by multiple SCSI LLDs so analyzing 
>> whether
>>    or not these functions may sleep is hard. Instead of performing that
>>    analysis, make it safe to call these functions from atomic context.
> 
> Please help me understand this solution. The background is that we are 
> unsure if the SCSI iters callback functions may sleep. So we use the 
> blk_mq_all_tag_iter_atomic() iter, which tells us that we must not 
> sleep. And internally, it uses rcu read lock protection mechanism, which 
> relies on not sleeping. So it seems that we're making the SCSI iter 
> functions being safe in atomic context, and, as such, rely on the iter 
> callbacks not to sleep.
> 
> But if we call the SCSI iter function from non-atomic context and the 
> iter callback may sleep, then that is a problem, right? We're still 
> using rcu.

Hi John,

Please take a look at the output of the following grep command:

git grep -nHEw 'blk_mq_tagset_busy_iter|scsi_host_busy_iter'\ drivers/scsi

Do you agree with me that it is safe to call all the callback functions 
passed to blk_mq_tagset_busy_iter() and scsi_host_busy_iter() from an 
atomic context?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests
  2021-04-07  0:02   ` Khazhy Kumykov
@ 2021-04-07 21:54     ` Bart Van Assche
  2021-04-20 21:41     ` Bart Van Assche
  1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-07 21:54 UTC (permalink / raw)
  To: Khazhy Kumykov
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry

On 4/6/21 5:02 PM, Khazhy Kumykov wrote:
> On Tue, Apr 6, 2021 at 2:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
>> +       /*
>> +        * The request 'rq' points at is protected by an RCU read lock until
>> +        * its queue pointer has been verified and by q_usage_count while the
>> +        * callback function is being invoked. an See also the
 >
> extra "an"?

Thanks for having reported this. I will fix this before I repost this 
patch series.

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests
  2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
                   ` (4 preceding siblings ...)
  2021-04-06 21:49 ` [PATCH v6 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags Bart Van Assche
@ 2021-04-08  6:45 ` Shinichiro Kawasaki
  2021-04-20 21:55   ` Bart Van Assche
  5 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2021-04-08  6:45 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On Apr 06, 2021 / 14:49, Bart Van Assche wrote:
> Hi Jens,
> 
> This patch series fixes the race between iterating over requests and
> freeing requests that has been reported by multiple different users over
> the past two years. Please consider this patch series for kernel v5.13.
> 
> Thanks,
> 
> Bart.
> 
> Changes between v5 and v6:
> - Fixed an additional race between iterating over tags and freeing scheduler
>   requests that was spotted by Khazhy.
> - Added two patches to fix the race conditions between updating the number of
>   hardware queues and iterating over a tag set.
> 
> Changes between v4 and v5:
> - Addressed Khazhy's review comments. Note: the changes that have been made
>   in v5 only change behavior in case CONFIG_PROVE_RCU=y.
> 
> Changes between v3 and v4:
> - Fixed support for tag sets shared across hardware queues.
> - Renamed blk_mq_wait_for_tag_readers() into blk_mq_wait_for_tag_iter().
> - Removed the fourth argument of blk_mq_queue_tag_busy_iter() again.
> 
> Changes between v2 and v3:
> - Converted the single v2 patch into a series of three patches.
> - Switched from SRCU to a combination of RCU and semaphores.
> 
> Changes between v1 and v2:
> - Reformatted patch description.
> - Added Tested-by/Reviewed-by tags.
> - Changed srcu_barrier() calls into synchronize_srcu() calls.

I applied this v6 series on top of the kernel v5.12-rc6 and tested again.
I needed to apply another dependent fix patch [1] also to avoid conflict.

[1] https://marc.info/?l=linux-block&m=161545067909064&w=2

I confirmed this series fixes the use-after-free issue, and observed no
regression in my test set. For the series, especially for the patches #3-5,

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-07 18:42     ` Bart Van Assche
@ 2021-04-08 12:48       ` John Garry
  2021-04-08 16:12         ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: John Garry @ 2021-04-08 12:48 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

On 07/04/2021 19:42, Bart Van Assche wrote:
>>
>>> - scsi_host_busy(), scsi_host_complete_all_commands() and
>>>    scsi_host_busy_iter() are used by multiple SCSI LLDs so analyzing 
>>> whether
>>>    or not these functions may sleep is hard. Instead of performing that
>>>    analysis, make it safe to call these functions from atomic context.
>>
>> Please help me understand this solution. The background is that we are 
>> unsure if the SCSI iters callback functions may sleep. So we use the 
>> blk_mq_all_tag_iter_atomic() iter, which tells us that we must not 
>> sleep. And internally, it uses rcu read lock protection mechanism, 
>> which relies on not sleeping. So it seems that we're making the SCSI 
>> iter functions being safe in atomic context, and, as such, rely on the 
>> iter callbacks not to sleep.
>>
>> But if we call the SCSI iter function from non-atomic context and the 
>> iter callback may sleep, then that is a problem, right? We're still 
>> using rcu.
> 

Hi Bart,

> 
> Please take a look at the output of the following grep command:
> 
> git grep -nHEw 'blk_mq_tagset_busy_iter|scsi_host_busy_iter'\ drivers/scsi
> 
> Do you agree with me that it is safe to call all the callback functions 
> passed to blk_mq_tagset_busy_iter() and scsi_host_busy_iter() from an 
> atomic context?
> 

That list output I got is at the bottom (with this patchset, not 
mainline - not sure which you meant).

The following does not look atomic safe with the mutex usage:
drivers/block/nbd.c:819:        blk_mq_tagset_busy_iter(&nbd->tag_set, 
nbd_clear_req, NULL);

static bool nbd_clear_req(struct request *req, void *data, bool reserved)
{
	struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);

	mutex_lock(&cmd->lock);
	cmd->status = BLK_STS_IOERR;
	mutex_unlock(&cmd->lock);

	blk_mq_complete_request(req);
	return true;
}

But blk_mq_tagset_busy_iter() uses BT_TAG_ITER_MAY sleep flag in your 
series.

As for the fc, I am not sure. I assume that you would know more about 
this. My concern is

__nvme_fc_abort_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_fcp_op *op)
{
...

	ctrl->lport->ops->fcp_abort(&ctrl->lport->localport, ..);
}

Looking at many instances of fcp_abort callback, they look atomic safe 
from general high usage of spinlock, but I am not certain. They are 
quite complex.

Thanks,
John

block/blk-core.c:287: * blk_mq_tagset_busy_iter() and also for their 
atomic variants to finish
block/blk-mq-debugfs.c:418: 
blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy_rq,
block/blk-mq-tag.c:405: * blk_mq_tagset_busy_iter - iterate over all 
started requests in a tag set
block/blk-mq-tag.c:416:void blk_mq_tagset_busy_iter(struct 
blk_mq_tag_set *tagset,
block/blk-mq-tag.c:436:EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
drivers/block/mtip32xx/mtip32xx.c:2685: 
blk_mq_tagset_busy_iter(&dd->tags, mtip_queue_cmd, dd);
drivers/block/mtip32xx/mtip32xx.c:2690: 
blk_mq_tagset_busy_iter(&dd->tags,
drivers/block/mtip32xx/mtip32xx.c:3800: 
blk_mq_tagset_busy_iter(&dd->tags, mtip_no_dev_cleanup, dd);
drivers/block/nbd.c:819:        blk_mq_tagset_busy_iter(&nbd->tag_set, 
nbd_clear_req, NULL);
drivers/nvme/host/core.c:392: 
blk_mq_tagset_busy_iter(ctrl->tagset,
drivers/nvme/host/core.c:402: 
blk_mq_tagset_busy_iter(ctrl->admin_tagset,
drivers/nvme/host/fc.c:2477: 
blk_mq_tagset_busy_iter(&ctrl->tag_set,
drivers/nvme/host/fc.c:2500: 
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
drivers/nvme/host/pci.c:2504:   blk_mq_tagset_busy_iter(&dev->tagset, 
nvme_cancel_request, &dev->ctrl);
drivers/nvme/host/pci.c:2505: 
blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, 
&dev->ctrl);
drivers/nvme/target/loop.c:420: 
blk_mq_tagset_busy_iter(&ctrl->tag_set,
drivers/nvme/target/loop.c:430: 
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
include/linux/blk-mq.h:527:void blk_mq_tagset_busy_iter(struct 
blk_mq_tag_set *tagset,
lines 1-18/18 (END)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-08 12:48       ` John Garry
@ 2021-04-08 16:12         ` Bart Van Assche
  2021-04-08 16:35           ` John Garry
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2021-04-08 16:12 UTC (permalink / raw)
  To: John Garry, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

On 4/8/21 5:48 AM, John Garry wrote:
> The following does not look atomic safe with the mutex usage:
> drivers/block/nbd.c:819:        blk_mq_tagset_busy_iter(&nbd->tag_set,
> nbd_clear_req, NULL);
> 
> static bool nbd_clear_req(struct request *req, void *data, bool reserved)
> {
>     struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
> 
>     mutex_lock(&cmd->lock);
>     cmd->status = BLK_STS_IOERR;
>     mutex_unlock(&cmd->lock);
> 
>     blk_mq_complete_request(req);
>     return true;
> }
> 
> But blk_mq_tagset_busy_iter() uses BT_TAG_ITER_MAY sleep flag in your
> series.

I will mention the nbd driver in the commit message.

> As for the fc, I am not sure. I assume that you would know more about
> this. My concern is
> 
> __nvme_fc_abort_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_fcp_op *op)
> {
> ...
> 
>     ctrl->lport->ops->fcp_abort(&ctrl->lport->localport, ..);
> }
> 
> Looking at many instances of fcp_abort callback, they look atomic safe
> from general high usage of spinlock, but I am not certain. They are
> quite complex.
I have not tried to analyze whether or not it is safe to call
__nvme_fc_abort_op() from an atomic context. Instead I analyzed the
contexts from which this function is called, namely the
blk_mq_tagset_busy_iter() calls in __nvme_fc_abort_outstanding_ios() and
__nvme_fc_abort_outstanding_ios(). Both blk_mq_tagset_busy_iter() calls
are followed by a call to a function that may sleep. Hence it is safe to
sleep inside the blk_mq_tagset_busy_iter() calls from the nvme_fc code.
I have not tried to analyze whether it would be safe to change these
blk_mq_tagset_busy_iter() calls into blk_mq_tagset_busy_iter_atomic()
calls. Does this answer your question?

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-08 16:12         ` Bart Van Assche
@ 2021-04-08 16:35           ` John Garry
  0 siblings, 0 replies; 23+ messages in thread
From: John Garry @ 2021-04-08 16:35 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

Hi Bart,

>> But blk_mq_tagset_busy_iter() uses BT_TAG_ITER_MAY sleep flag in your
>> series.
> 
> I will mention the nbd driver in the commit message.
> 
>> As for the fc, I am not sure. I assume that you would know more about
>> this. My concern is
>>
>> __nvme_fc_abort_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_fcp_op *op)
>> {
>> ...
>>
>>      ctrl->lport->ops->fcp_abort(&ctrl->lport->localport, ..);
>> }
>>
>> Looking at many instances of fcp_abort callback, they look atomic safe
>> from general high usage of spinlock, but I am not certain. They are
>> quite complex.
> I have not tried to analyze whether or not it is safe to call
> __nvme_fc_abort_op() from an atomic context. Instead I analyzed the
> contexts from which this function is called, namely the
> blk_mq_tagset_busy_iter() calls in __nvme_fc_abort_outstanding_ios() and
> __nvme_fc_abort_outstanding_ios(). Both blk_mq_tagset_busy_iter() calls
> are followed by a call to a function that may sleep. Hence it is safe to
> sleep inside the blk_mq_tagset_busy_iter() calls from the nvme_fc code.
> I have not tried to analyze whether it would be safe to change these
> blk_mq_tagset_busy_iter() calls into blk_mq_tagset_busy_iter_atomic()
> calls. Does this answer your question?

Yes, fine.

Thanks,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition
  2021-04-06 21:49 ` [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition Bart Van Assche
  2021-04-07 15:36   ` John Garry
@ 2021-04-13  7:44   ` Daniel Wagner
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Wagner @ 2021-04-13  7:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry, Khazhy Kumykov

On Tue, Apr 06, 2021 at 02:49:01PM -0700, Bart Van Assche wrote:
> Since elevator_exit() has only one caller, move its definition from
> block/blk.h into block/elevator.c. Remove the inline keyword since modern
> compilers are smart enough to decide when to inline functions that occur
> in the same compilation unit.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Khazhy Kumykov <khazhy@google.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-06 21:49 ` [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter Bart Van Assche
  2021-04-07 16:57   ` John Garry
@ 2021-04-13  7:50   ` Daniel Wagner
  2021-04-20 21:35     ` Bart Van Assche
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2021-04-13  7:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry, Khazhy Kumykov

On Tue, Apr 06, 2021 at 02:49:02PM -0700, Bart Van Assche wrote:
> Since in the next patch knowledge is required of whether or not it is
> allowed to sleep inside the tag iteration functions, pass this context
> information to the tag iteration functions. I have reviewed all callers of
> tag iteration functions to verify these annotations by starting from the
> output of the following grep command:
> 
>     git grep -nHE 'blk_mq_(all_tag|tagset_busy)_iter'
> 
> My conclusions from that analysis are as follows:
> - Sleeping is allowed in the blk-mq-debugfs code that iterates over tags.
> - Since the blk_mq_tagset_busy_iter() calls in the mtip32xx driver are
>   preceded by a function that sleeps (blk_mq_quiesce_queue()), sleeping is
>   safe in the context of the blk_mq_tagset_busy_iter() calls.
> - The same reasoning also applies to the nbd driver.
> - All blk_mq_tagset_busy_iter() calls in the NVMe drivers are followed by a
>   call to a function that sleeps so sleeping inside blk_mq_tagset_busy_iter()
>   when called from the NVMe driver is fine.
> - scsi_host_busy(), scsi_host_complete_all_commands() and
>   scsi_host_busy_iter() are used by multiple SCSI LLDs so analyzing whether
>   or not these functions may sleep is hard. Instead of performing that
>   analysis, make it safe to call these functions from atomic context.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Khazhy Kumykov <khazhy@google.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Maybe you could also annotate blk_mq_all_tag_iter() with a
might_sleep(). This would help to find API abusers more easily.

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition
  2021-04-07 15:36   ` John Garry
@ 2021-04-20 21:30     ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-20 21:30 UTC (permalink / raw)
  To: John Garry, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Khazhy Kumykov

On 4/7/21 8:36 AM, John Garry wrote:
> To me, it seems odd that the double-underscore prefix symbol is public
> (__elevator_exit), while the companion symbol (elevator_exit) is private.
> 
> But it looks a sensible change to bring into the c file anyway.

Hi John,

Does the above reply count as a Reviewed-by?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-13  7:50   ` Daniel Wagner
@ 2021-04-20 21:35     ` Bart Van Assche
  2021-04-21  7:25       ` Daniel Wagner
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2021-04-20 21:35 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry, Khazhy Kumykov

On 4/13/21 12:50 AM, Daniel Wagner wrote:
> Maybe you could also annotate blk_mq_all_tag_iter() with a
> might_sleep(). This would help to find API abusers more easily.

Hmm ... my understanding is that blk_mq_all_tag_iter() does not sleep
since that function does not sleep itself and since the only caller of
blk_mq_all_tag_iter() callers passes a callback function that does not
sleep?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests
  2021-04-07  0:02   ` Khazhy Kumykov
  2021-04-07 21:54     ` Bart Van Assche
@ 2021-04-20 21:41     ` Bart Van Assche
  1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-20 21:41 UTC (permalink / raw)
  To: Khazhy Kumykov
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry

On 4/6/21 5:02 PM, Khazhy Kumykov wrote:
> On Tue, Apr 6, 2021 at 2:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
[ ... ]
>> +       /*
>> +        * The request 'rq' points at is protected by an RCU read lock until
>> +        * its queue pointer has been verified and by q_usage_count while the
>> +        * callback function is being invoked. an See also the
> extra "an"?

Indeed. I will remove it.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests
  2021-04-08  6:45 ` [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Shinichiro Kawasaki
@ 2021-04-20 21:55   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2021-04-20 21:55 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On 4/7/21 11:45 PM, Shinichiro Kawasaki wrote:
> I applied this v6 series on top of the kernel v5.12-rc6 and tested again.
> I needed to apply another dependent fix patch [1] also to avoid conflict.
> 
> [1] https://marc.info/?l=linux-block&m=161545067909064&w=2
> 
> I confirmed this series fixes the use-after-free issue, and observed no
> regression in my test set. For the series, especially for the patches #3-5,
> 
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

Hi Shin'ichiro,

Thanks again for the testing. You may want to know that this series has
been prepared and tested on top of Jens' for-next branch
(https://git.kernel.dk/cgit/linux-block/log/?h=for-next).

Bart.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter
  2021-04-20 21:35     ` Bart Van Assche
@ 2021-04-21  7:25       ` Daniel Wagner
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Wagner @ 2021-04-21  7:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Shin'ichiro Kawasaki, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, John Garry, Khazhy Kumykov

On Tue, Apr 20, 2021 at 02:35:53PM -0700, Bart Van Assche wrote:
> On 4/13/21 12:50 AM, Daniel Wagner wrote:
> > Maybe you could also annotate blk_mq_all_tag_iter() with a
> > might_sleep(). This would help to find API abusers more easily.
> 
> Hmm ... my understanding is that blk_mq_all_tag_iter() does not sleep
> since that function does not sleep itself and since the only caller of
> blk_mq_all_tag_iter() callers passes a callback function that does not
> sleep?

Yes, you are right. I read the documentation and assumed a might_sleep()
would be missing.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06 21:49 [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
2021-04-06 21:49 ` [PATCH v6 1/5] blk-mq: Move the elevator_exit() definition Bart Van Assche
2021-04-07 15:36   ` John Garry
2021-04-20 21:30     ` Bart Van Assche
2021-04-13  7:44   ` Daniel Wagner
2021-04-06 21:49 ` [PATCH v6 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter Bart Van Assche
2021-04-07 16:57   ` John Garry
2021-04-07 18:42     ` Bart Van Assche
2021-04-08 12:48       ` John Garry
2021-04-08 16:12         ` Bart Van Assche
2021-04-08 16:35           ` John Garry
2021-04-13  7:50   ` Daniel Wagner
2021-04-20 21:35     ` Bart Van Assche
2021-04-21  7:25       ` Daniel Wagner
2021-04-06 21:49 ` [PATCH v6 3/5] blk-mq: Fix races between iterating over requests and freeing requests Bart Van Assche
2021-04-07  0:02   ` Khazhy Kumykov
2021-04-07 21:54     ` Bart Van Assche
2021-04-20 21:41     ` Bart Van Assche
2021-04-06 21:49 ` [PATCH v6 4/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
2021-04-06 21:49 ` [PATCH v6 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags Bart Van Assche
2021-04-07  0:04   ` Khazhy Kumykov
2021-04-08  6:45 ` [PATCH v6 0/5] blk-mq: Fix a race between iterating over requests and freeing requests Shinichiro Kawasaki
2021-04-20 21:55   ` Bart Van Assche

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git