Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V5 00/12] block: support bio based io polling
@ 2021-04-01  2:19 Ming Lei
  2021-04-01  2:19 ` [PATCH V5 01/12] block: add helper of blk_queue_poll Ming Lei
                   ` (13 more replies)
  0 siblings, 14 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

Hi Jens,

Add per-task io poll context for holding HIPRI blk-mq/underlying bios
queued from bio based driver's io submission context, and reuse one bio
padding field for storing 'cookie' returned from submit_bio() for these
bios. Also explicitly end these bios in poll context by adding two
new bio flags.

In this way, we needn't to poll all underlying hw queues any more,
which is implemented in Jeffle's patches. And we can just poll hw queues
in which there is HIPRI IO queued.

Usually io submission and io poll share same context, so the added io
poll context data is just like one stack variable, and the cost for
saving bios is cheap.

V5:
	- fix one use-after-free issue in case that polling is from another
	context: adds one new cookie of BLK_QC_T_NOT_READY for preventing
	this issue in patch 8/12
	- add reviewed-by & tested-by tag

V4:
	- cover one more test_bit(QUEUE_FLAG_POLL, ...) suggested by
	  Jeffle(01/12)
	- drop patch of 'block: add helper of blk_create_io_context'
	- add new helper of blk_create_io_poll_context() (03/12)
	- drain submission queues in exit_io_context(), suggested by
	  Jeffle(08/13)
	- considering shared io context case for blk_bio_poll_io_drain()
	(08/13)
	- fix one issue in blk_bio_poll_pack_groups() as suggested by
	Jeffle(08/13)
	- add reviewed-by tag
V3:
	- fix cookie returned for bio based driver, as suggested by Jeffle Xu
	- draining pending bios when submission context is exiting
	- patch style and comment fix, as suggested by Mike
	- allow poll context data to be NULL by always polling on submission queue
	- remove RFC, and reviewed-by

V2:
	- address queue depth scalability issue reported by Jeffle via bio
	group list. Reuse .bi_end_io for linking bios which share same
	.bi_end_io, and support 32 such groups in submit queue. With this way,
	the scalability issue caused by kfifio is solved. Before really
	ending bio, .bi_end_io is recovered from the group head.



Jeffle Xu (4):
  block/mq: extract one helper function polling hw queue
  block: add queue_to_disk() to get gendisk from request_queue
  block: add poll_capable method to support bio-based IO polling
  dm: support IO polling for bio-based dm device

Ming Lei (8):
  block: add helper of blk_queue_poll
  block: add one helper to free io_context
  block: create io poll context for submission and poll task
  block: add req flag of REQ_POLL_CTX
  block: add new field into 'struct bvec_iter'
  block: prepare for supporting bio_list via other link
  block: use per-task poll context to implement bio based io polling
  blk-mq: limit hw queues to be polled in each blk_poll()

 block/bio.c                   |   5 +
 block/blk-core.c              | 258 ++++++++++++++++++++++++++--
 block/blk-ioc.c               |  15 +-
 block/blk-mq.c                | 308 +++++++++++++++++++++++++++++++++-
 block/blk-sysfs.c             |  16 +-
 block/blk.h                   |  58 +++++++
 drivers/md/dm-table.c         |  24 +++
 drivers/md/dm.c               |  14 ++
 drivers/nvme/host/core.c      |   2 +-
 include/linux/bio.h           | 132 ++++++++-------
 include/linux/blk_types.h     |  30 +++-
 include/linux/blkdev.h        |   4 +
 include/linux/bvec.h          |   8 +
 include/linux/device-mapper.h |   1 +
 include/linux/iocontext.h     |   2 +
 include/trace/events/kyber.h  |   6 +-
 16 files changed, 788 insertions(+), 95 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 01/12] block: add helper of blk_queue_poll
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-01  2:19 ` [PATCH V5 02/12] block: add one helper to free io_context Ming Lei
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei, Chaitanya Kulkarni

There has been 3 users, and will be more, so add one such helper.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c         | 2 +-
 block/blk-mq.c           | 3 +--
 block/blk-sysfs.c        | 2 +-
 drivers/nvme/host/core.c | 2 +-
 include/linux/blkdev.h   | 1 +
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff208497..a31371d55b9d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -836,7 +836,7 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		}
 	}
 
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_queue_poll(q))
 		bio->bi_opf &= ~REQ_HIPRI;
 
 	switch (bio_op(bio)) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d4d7c1caa439..63c81df3b8b5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3869,8 +3869,7 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	struct blk_mq_hw_ctx *hctx;
 	long state;
 
-	if (!blk_qc_t_valid(cookie) ||
-	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_qc_t_valid(cookie) || !blk_queue_poll(q))
 		return 0;
 
 	if (current->plug)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0f4f0c8a7825..db3268d41274 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -417,7 +417,7 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
 
 static ssize_t queue_poll_show(struct request_queue *q, char *page)
 {
-	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
+	return queue_var_show(blk_queue_poll(q), page);
 }
 
 static ssize_t queue_poll_store(struct request_queue *q, const char *page,
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642be..34b8c78f88e0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -956,7 +956,7 @@ static void nvme_execute_rq_polled(struct request_queue *q,
 {
 	DECLARE_COMPLETION_ONSTACK(wait);
 
-	WARN_ON_ONCE(!test_bit(QUEUE_FLAG_POLL, &q->queue_flags));
+	WARN_ON_ONCE(!blk_queue_poll(q));
 
 	rq->cmd_flags |= REQ_HIPRI;
 	rq->end_io_data = &wait;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bc6bc8383b43..89a01850cf12 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -665,6 +665,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
+#define blk_queue_poll(q)	test_bit(QUEUE_FLAG_POLL, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 02/12] block: add one helper to free io_context
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
  2021-04-01  2:19 ` [PATCH V5 01/12] block: add helper of blk_queue_poll Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-01  2:19 ` [PATCH V5 03/12] block: create io poll context for submission and poll task Ming Lei
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

Prepare for putting bio poll queue into io_context, so add one helper
for free io_context.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-ioc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 57299f860d41..b0cde18c4b8c 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -17,6 +17,11 @@
  */
 static struct kmem_cache *iocontext_cachep;
 
+static inline void free_io_context(struct io_context *ioc)
+{
+	kmem_cache_free(iocontext_cachep, ioc);
+}
+
 /**
  * get_io_context - increment reference count to io_context
  * @ioc: io_context to get
@@ -129,7 +134,7 @@ static void ioc_release_fn(struct work_struct *work)
 
 	spin_unlock_irq(&ioc->lock);
 
-	kmem_cache_free(iocontext_cachep, ioc);
+	free_io_context(ioc);
 }
 
 /**
@@ -164,7 +169,7 @@ void put_io_context(struct io_context *ioc)
 	}
 
 	if (free_ioc)
-		kmem_cache_free(iocontext_cachep, ioc);
+		free_io_context(ioc);
 }
 
 /**
@@ -278,7 +283,7 @@ int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
 	    (task == current || !(task->flags & PF_EXITING)))
 		task->io_context = ioc;
 	else
-		kmem_cache_free(iocontext_cachep, ioc);
+		free_io_context(ioc);
 
 	ret = task->io_context ? 0 : -EBUSY;
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 03/12] block: create io poll context for submission and poll task
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
  2021-04-01  2:19 ` [PATCH V5 01/12] block: add helper of blk_queue_poll Ming Lei
  2021-04-01  2:19 ` [PATCH V5 02/12] block: add one helper to free io_context Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12 10:19   ` Christoph Hellwig
  2021-04-01  2:19 ` [PATCH V5 04/12] block: add req flag of REQ_POLL_CTX Ming Lei
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

Create per-task io poll context for both IO submission and poll task
if the queue is bio based and supports polling.

This io polling context includes two queues:

1) submission queue(sq) for storing HIPRI bio, written by submission task
   and read by poll task.
2) polling queue(pq) for holding data moved from sq, only used in poll
   context for running bio polling.

Following patches will support bio based io polling.

Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c          | 79 +++++++++++++++++++++++++++++++++------
 block/blk-ioc.c           |  1 +
 block/blk-mq.c            | 14 +++++++
 block/blk.h               | 38 +++++++++++++++++++
 include/linux/iocontext.h |  2 +
 5 files changed, 123 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a31371d55b9d..8a21a8c010a6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -792,6 +792,61 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
 	return BLK_STS_OK;
 }
 
+static inline struct blk_bio_poll_ctx *blk_get_bio_poll_ctx(void)
+{
+	struct io_context *ioc = current->io_context;
+
+	return ioc ? ioc->data : NULL;
+}
+
+static inline unsigned int bio_grp_list_size(unsigned int nr_grps)
+{
+	return sizeof(struct bio_grp_list) + nr_grps *
+		sizeof(struct bio_grp_list_data);
+}
+
+static void bio_poll_ctx_init(struct blk_bio_poll_ctx *pc)
+{
+	pc->sq = (void *)pc + sizeof(*pc);
+	pc->sq->max_nr_grps = BLK_BIO_POLL_SQ_SZ;
+
+	pc->pq = (void *)pc->sq + bio_grp_list_size(BLK_BIO_POLL_SQ_SZ);
+	pc->pq->max_nr_grps = BLK_BIO_POLL_PQ_SZ;
+
+	spin_lock_init(&pc->sq_lock);
+	spin_lock_init(&pc->pq_lock);
+}
+
+void bio_poll_ctx_alloc(struct io_context *ioc)
+{
+	struct blk_bio_poll_ctx *pc;
+	unsigned int size = sizeof(*pc) +
+		bio_grp_list_size(BLK_BIO_POLL_SQ_SZ) +
+		bio_grp_list_size(BLK_BIO_POLL_PQ_SZ);
+
+	pc = kzalloc(GFP_ATOMIC, size);
+	if (pc) {
+		bio_poll_ctx_init(pc);
+		if (cmpxchg(&ioc->data, NULL, (void *)pc))
+			kfree(pc);
+	}
+}
+
+static inline bool blk_queue_support_bio_poll(struct request_queue *q)
+{
+	return !queue_is_mq(q) && blk_queue_poll(q);
+}
+
+static inline void blk_bio_poll_preprocess(struct request_queue *q,
+		struct bio *bio)
+{
+	if (!(bio->bi_opf & REQ_HIPRI))
+		return;
+
+	if (!blk_queue_poll(q) || (!queue_is_mq(q) && !blk_get_bio_poll_ctx()))
+		bio->bi_opf &= ~REQ_HIPRI;
+}
+
 static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 {
 	struct block_device *bdev = bio->bi_bdev;
@@ -836,8 +891,19 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		}
 	}
 
-	if (!blk_queue_poll(q))
-		bio->bi_opf &= ~REQ_HIPRI;
+	/*
+	 * Various block parts want %current->io_context, so allocate it up
+	 * front rather than dealing with lots of pain to allocate it only
+	 * where needed. This may fail and the block layer knows how to live
+	 * with it.
+	 */
+	if (unlikely(!current->io_context))
+		create_task_io_context(current, GFP_ATOMIC, q->node);
+
+	if (blk_queue_support_bio_poll(q) && (bio->bi_opf & REQ_HIPRI))
+		blk_create_io_poll_context(q);
+
+	blk_bio_poll_preprocess(q, bio);
 
 	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
@@ -876,15 +942,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		break;
 	}
 
-	/*
-	 * Various block parts want %current->io_context, so allocate it up
-	 * front rather than dealing with lots of pain to allocate it only
-	 * where needed. This may fail and the block layer knows how to live
-	 * with it.
-	 */
-	if (unlikely(!current->io_context))
-		create_task_io_context(current, GFP_ATOMIC, q->node);
-
 	if (blk_throtl_bio(bio)) {
 		blkcg_bio_issue_init(bio);
 		return false;
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index b0cde18c4b8c..5574c398eff6 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -19,6 +19,7 @@ static struct kmem_cache *iocontext_cachep;
 
 static inline void free_io_context(struct io_context *ioc)
 {
+	kfree(ioc->data);
 	kmem_cache_free(iocontext_cachep, ioc);
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 63c81df3b8b5..1ada2c0e76b1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3852,6 +3852,17 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
+static int blk_bio_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+{
+	/*
+	 * Create poll queue for storing poll bio and its cookie from
+	 * submission queue
+	 */
+	blk_create_io_poll_context(q);
+
+	return 0;
+}
+
 /**
  * blk_poll - poll for IO completions
  * @q:  the queue
@@ -3875,6 +3886,9 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	if (current->plug)
 		blk_flush_plug_list(current->plug, false);
 
+	if (!queue_is_mq(q))
+		return blk_bio_poll(q, cookie, spin);
+
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
 
 	/*
diff --git a/block/blk.h b/block/blk.h
index 3b53e44b967e..35901cee709d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -357,4 +357,42 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		unsigned int max_sectors, bool *same_page);
 
+/* Grouping bios that share same data into one list */
+struct bio_grp_list_data {
+	void *grp_data;
+
+	/* all bios in this list share same 'grp_data' */
+	struct bio_list list;
+};
+
+struct bio_grp_list {
+	unsigned int max_nr_grps, nr_grps;
+	struct bio_grp_list_data head[0];
+};
+
+struct blk_bio_poll_ctx {
+	spinlock_t sq_lock;
+	struct bio_grp_list *sq;
+
+	spinlock_t pq_lock;
+	struct bio_grp_list *pq;
+};
+
+#define BLK_BIO_POLL_SQ_SZ		16U
+#define BLK_BIO_POLL_PQ_SZ		(BLK_BIO_POLL_SQ_SZ * 2)
+
+void bio_poll_ctx_alloc(struct io_context *ioc);
+
+static inline void blk_create_io_poll_context(struct request_queue *q)
+{
+	struct io_context *ioc;
+
+	if (unlikely(!current->io_context))
+		create_task_io_context(current, GFP_ATOMIC, q->node);
+
+	ioc = current->io_context;
+	if (unlikely(ioc && !ioc->data))
+		bio_poll_ctx_alloc(ioc);
+}
+
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index 0a9dc40b7be8..f9a467571356 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -110,6 +110,8 @@ struct io_context {
 	struct io_cq __rcu	*icq_hint;
 	struct hlist_head	icq_list;
 
+	void			*data;
+
 	struct work_struct release_work;
 };
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 04/12] block: add req flag of REQ_POLL_CTX
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (2 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 03/12] block: create io poll context for submission and poll task Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-01  2:19 ` [PATCH V5 05/12] block: add new field into 'struct bvec_iter' Ming Lei
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

Add one req flag REQ_POLL_CTX which will be used in the following patch for
supporting bio based IO polling.

Exactly this flag can help us to do:

1) request flag is cloned in bio_fast_clone(), so if we mark one FS bio
as REQ_POLL_CTX, all bios cloned from this FS bio will be marked as
REQ_POLL_CTX too.

2) create per-task io polling context if the bio based queue supports
polling and the submitted bio is HIPRI. Per-task io poll context will be
created during submit_bio() before marking this HIPRI bio as REQ_POLL_CTX.
Then we can avoid to create such io polling context if one cloned bio with
REQ_POLL_CTX is submitted from another kernel context.

3) for supporting bio based io polling, we need to poll IOs from all
underlying queues of the bio device, this way help us to recognize which
IO needs to polled in bio based style, which will be applied in
following patch.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c          | 28 ++++++++++++++++++++++++++--
 include/linux/blk_types.h |  4 ++++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 8a21a8c010a6..a777ba4fe06f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -840,11 +840,30 @@ static inline bool blk_queue_support_bio_poll(struct request_queue *q)
 static inline void blk_bio_poll_preprocess(struct request_queue *q,
 		struct bio *bio)
 {
+	bool mq;
+
 	if (!(bio->bi_opf & REQ_HIPRI))
 		return;
 
-	if (!blk_queue_poll(q) || (!queue_is_mq(q) && !blk_get_bio_poll_ctx()))
+	/*
+	 * Can't support bio based IO polling without per-task poll ctx
+	 *
+	 * We have created per-task io poll context, and mark this
+	 * bio as REQ_POLL_CTX, so: 1) if any cloned bio from this bio is
+	 * submitted from another kernel context, we won't create bio
+	 * poll context for it, and that bio can be completed by IRQ;
+	 * 2) If such bio is submitted from current context, we will
+	 * complete it via blk_poll(); 3) If driver knows that one
+	 * underlying bio allocated from driver is for FS bio, meantime
+	 * it is submitted in current context, driver can mark such bio
+	 * as REQ_HIPRI & REQ_POLL_CTX manually, so the bio can be completed
+	 * via blk_poll too.
+	 */
+	mq = queue_is_mq(q);
+	if (!blk_queue_poll(q) || (!mq && !blk_get_bio_poll_ctx()))
 		bio->bi_opf &= ~REQ_HIPRI;
+	else if (!mq)
+		bio->bi_opf |= REQ_POLL_CTX;
 }
 
 static noinline_for_stack bool submit_bio_checks(struct bio *bio)
@@ -900,7 +919,12 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 	if (unlikely(!current->io_context))
 		create_task_io_context(current, GFP_ATOMIC, q->node);
 
-	if (blk_queue_support_bio_poll(q) && (bio->bi_opf & REQ_HIPRI))
+	/*
+	 * If REQ_POLL_CTX isn't set for this HIPRI bio, we think it
+	 * originated from FS and allocate io polling context.
+	 */
+	if (blk_queue_support_bio_poll(q) && (bio->bi_opf & REQ_HIPRI) &&
+			!(bio->bi_opf & REQ_POLL_CTX))
 		blk_create_io_poll_context(q);
 
 	blk_bio_poll_preprocess(q, bio);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..99160d588c2d 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -394,6 +394,9 @@ enum req_flag_bits {
 
 	__REQ_HIPRI,
 
+	/* for marking IOs originated from same FS bio in same context */
+	__REQ_POLL_CTX,
+
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
@@ -418,6 +421,7 @@ enum req_flag_bits {
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
 #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
+#define REQ_POLL_CTX			(1ULL << __REQ_POLL_CTX)
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 05/12] block: add new field into 'struct bvec_iter'
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (3 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 04/12] block: add req flag of REQ_POLL_CTX Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12  9:26   ` Christoph Hellwig
  2021-04-01  2:19 ` [PATCH V5 06/12] block/mq: extract one helper function polling hw queue Ming Lei
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

There is a hole at the end of 'struct bvec_iter', so put a new field
here and we can save cookie returned from submit_bio() here for
supporting bio based polling.

This way can avoid to extend bio unnecessarily.

Meantime add two helpers to get/set this field.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk.h          | 10 ++++++++++
 include/linux/bvec.h |  8 ++++++++
 2 files changed, 18 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index 35901cee709d..c1d8456656df 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -395,4 +395,14 @@ static inline void blk_create_io_poll_context(struct request_queue *q)
 		bio_poll_ctx_alloc(ioc);
 }
 
+static inline unsigned int bio_get_private_data(struct bio *bio)
+{
+	return bio->bi_iter.bi_private_data;
+}
+
+static inline void bio_set_private_data(struct bio *bio, unsigned int data)
+{
+	bio->bi_iter.bi_private_data = data;
+}
+
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index ff832e698efb..547ad7526960 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -43,6 +43,14 @@ struct bvec_iter {
 
 	unsigned int            bi_bvec_done;	/* number of bytes completed in
 						   current bvec */
+
+	/*
+	 * There is a hole at the end of bvec_iter, add one new field to hold
+	 * something which isn't related with 'bvec_iter', so that we can
+	 * avoid extending bio. So far this new field is used for bio based
+	 * polling, we will store returning value of submit_bio() here.
+	 */
+	unsigned int		bi_private_data;
 };
 
 struct bvec_iter_all {
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 06/12] block/mq: extract one helper function polling hw queue
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (4 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 05/12] block: add new field into 'struct bvec_iter' Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12  9:29   ` Christoph Hellwig
  2021-04-01  2:19 ` [PATCH V5 07/12] block: prepare for supporting bio_list via other link Ming Lei
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

From: Jeffle Xu <jefflexu@linux.alibaba.com>

Extract the logic of polling one hw queue and related statistics
handling out as the helper function.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1ada2c0e76b1..0cb88c719916 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3852,6 +3852,19 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
+static inline int blk_mq_poll_hctx(struct request_queue *q,
+				   struct blk_mq_hw_ctx *hctx)
+{
+	int ret;
+
+	hctx->poll_invoked++;
+	ret = q->mq_ops->poll(hctx);
+	if (ret > 0)
+		hctx->poll_success++;
+
+	return ret;
+}
+
 static int blk_bio_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	/*
@@ -3908,11 +3921,8 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	do {
 		int ret;
 
-		hctx->poll_invoked++;
-
-		ret = q->mq_ops->poll(hctx);
+		ret = blk_mq_poll_hctx(q, hctx);
 		if (ret > 0) {
-			hctx->poll_success++;
 			__set_current_state(TASK_RUNNING);
 			return ret;
 		}
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 07/12] block: prepare for supporting bio_list via other link
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (5 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 06/12] block/mq: extract one helper function polling hw queue Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12 10:18   ` Christoph Hellwig
  2021-04-01  2:19 ` [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling Ming Lei
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

So far bio list helpers always use .bi_next to traverse the list, we
will support to link bios by other bio field.

Prepare for such support by adding a macro so that users can define
another helpers for linking bios by other bio field.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/bio.h | 132 +++++++++++++++++++++++---------------------
 1 file changed, 68 insertions(+), 64 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index d0246c92a6e8..619edd26a6c0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -608,75 +608,11 @@ static inline unsigned bio_list_size(const struct bio_list *bl)
 	return sz;
 }
 
-static inline void bio_list_add(struct bio_list *bl, struct bio *bio)
-{
-	bio->bi_next = NULL;
-
-	if (bl->tail)
-		bl->tail->bi_next = bio;
-	else
-		bl->head = bio;
-
-	bl->tail = bio;
-}
-
-static inline void bio_list_add_head(struct bio_list *bl, struct bio *bio)
-{
-	bio->bi_next = bl->head;
-
-	bl->head = bio;
-
-	if (!bl->tail)
-		bl->tail = bio;
-}
-
-static inline void bio_list_merge(struct bio_list *bl, struct bio_list *bl2)
-{
-	if (!bl2->head)
-		return;
-
-	if (bl->tail)
-		bl->tail->bi_next = bl2->head;
-	else
-		bl->head = bl2->head;
-
-	bl->tail = bl2->tail;
-}
-
-static inline void bio_list_merge_head(struct bio_list *bl,
-				       struct bio_list *bl2)
-{
-	if (!bl2->head)
-		return;
-
-	if (bl->head)
-		bl2->tail->bi_next = bl->head;
-	else
-		bl->tail = bl2->tail;
-
-	bl->head = bl2->head;
-}
-
 static inline struct bio *bio_list_peek(struct bio_list *bl)
 {
 	return bl->head;
 }
 
-static inline struct bio *bio_list_pop(struct bio_list *bl)
-{
-	struct bio *bio = bl->head;
-
-	if (bio) {
-		bl->head = bl->head->bi_next;
-		if (!bl->head)
-			bl->tail = NULL;
-
-		bio->bi_next = NULL;
-	}
-
-	return bio;
-}
-
 static inline struct bio *bio_list_get(struct bio_list *bl)
 {
 	struct bio *bio = bl->head;
@@ -686,6 +622,74 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
 	return bio;
 }
 
+#define BIO_LIST_HELPERS(_pre, link)					\
+									\
+static inline void _pre##_add(struct bio_list *bl, struct bio *bio)	\
+{									\
+	bio->bi_##link = NULL;						\
+									\
+	if (bl->tail)							\
+		bl->tail->bi_##link = bio;				\
+	else								\
+		bl->head = bio;						\
+									\
+	bl->tail = bio;							\
+}									\
+									\
+static inline void _pre##_add_head(struct bio_list *bl, struct bio *bio) \
+{									\
+	bio->bi_##link = bl->head;					\
+									\
+	bl->head = bio;							\
+									\
+	if (!bl->tail)							\
+		bl->tail = bio;						\
+}									\
+									\
+static inline void _pre##_merge(struct bio_list *bl, struct bio_list *bl2) \
+{									\
+	if (!bl2->head)							\
+		return;							\
+									\
+	if (bl->tail)							\
+		bl->tail->bi_##link = bl2->head;			\
+	else								\
+		bl->head = bl2->head;					\
+									\
+	bl->tail = bl2->tail;						\
+}									\
+									\
+static inline void _pre##_merge_head(struct bio_list *bl,		\
+				       struct bio_list *bl2)		\
+{									\
+	if (!bl2->head)							\
+		return;							\
+									\
+	if (bl->head)							\
+		bl2->tail->bi_##link = bl->head;			\
+	else								\
+		bl->tail = bl2->tail;					\
+									\
+	bl->head = bl2->head;						\
+}									\
+									\
+static inline struct bio *_pre##_pop(struct bio_list *bl)		\
+{									\
+	struct bio *bio = bl->head;					\
+									\
+	if (bio) {							\
+		bl->head = bl->head->bi_##link;				\
+		if (!bl->head)						\
+			bl->tail = NULL;				\
+									\
+		bio->bi_##link = NULL;					\
+	}								\
+									\
+	return bio;							\
+}									\
+
+BIO_LIST_HELPERS(bio_list, next);
+
 /*
  * Increment chain count for the bio. Make sure the CHAIN flag update
  * is visible before the raised count.
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (6 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 07/12] block: prepare for supporting bio_list via other link Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12  9:54   ` Christoph Hellwig
  2021-04-12 10:16   ` Christoph Hellwig
  2021-04-01  2:19 ` [PATCH V5 09/12] blk-mq: limit hw queues to be polled in each blk_poll() Ming Lei
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

Currently bio based IO polling needs to poll all hw queue blindly, this
way is very inefficient, and one big reason is that we can't pass any
bio submission result to blk_poll().

In IO submission context, track associated underlying bios by per-task
submission queue and store returned 'cookie' in
bio->bi_iter.bi_private_data, and return current->pid to caller of
submit_bio() for any bio based driver's IO, which is submitted from FS.

In IO poll context, the passed cookie tells us the PID of submission
context, then we can find bios from the per-task io pull context of
submission context. Moving bios from submission queue to poll queue of
the poll context, and keep polling until these bios are ended. Remove
bio from poll queue if the bio is ended. Add bio flags of BIO_DONE and
BIO_END_BY_POLL for such purpose.

In was found in Jeffle Xu's test that kfifo doesn't scale well for a
submission queue as queue depth is increased, so a new mechanism for
tracking bios is needed. So far bio's size is close to 2 cacheline size,
and it may not be accepted to add new field into bio for solving the
scalability issue by tracking bios via linked list, switch to bio group
list for tracking bio, the idea is to reuse .bi_end_io for linking bios
into a linked list for all sharing same .bi_end_io(call it bio group),
which is recovered before ending bio really, since BIO_END_BY_POLL is
added for enhancing this point. Usually .bi_end_bio is same for all
bios in same layer, so it is enough to provide very limited groups, such
as 16 or less for fixing the scalability issue.

Usually submission shares context with io poll. The per-task poll context
is just like stack variable, and it is cheap to move data between the two
per-task queues.

Also when the submission task is exiting, drain pending IOs in the context
until all are done.

Tested-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c               |   5 +
 block/blk-core.c          | 155 +++++++++++++++++++++++-
 block/blk-ioc.c           |   3 +
 block/blk-mq.c            | 244 +++++++++++++++++++++++++++++++++++++-
 block/blk.h               |  10 ++
 include/linux/blk_types.h |  26 +++-
 6 files changed, 439 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 26b7f721cda8..04c043dc60fc 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1402,6 +1402,11 @@ static inline bool bio_remaining_done(struct bio *bio)
  **/
 void bio_endio(struct bio *bio)
 {
+	/* BIO_END_BY_POLL has to be set before calling submit_bio */
+	if (bio_flagged(bio, BIO_END_BY_POLL)) {
+		bio_set_flag(bio, BIO_DONE);
+		return;
+	}
 again:
 	if (!bio_remaining_done(bio))
 		return;
diff --git a/block/blk-core.c b/block/blk-core.c
index a777ba4fe06f..760e7d96f258 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -805,6 +805,81 @@ static inline unsigned int bio_grp_list_size(unsigned int nr_grps)
 		sizeof(struct bio_grp_list_data);
 }
 
+static inline void *bio_grp_data(struct bio *bio)
+{
+	return bio->bi_poll;
+}
+
+/* add bio into bio group list, return true if it is added */
+static bool bio_grp_list_add(struct bio_grp_list *list, struct bio *bio)
+{
+	int i;
+	struct bio_grp_list_data *grp;
+
+	for (i = 0; i < list->nr_grps; i++) {
+		grp = &list->head[i];
+		if (grp->grp_data == bio_grp_data(bio)) {
+			__bio_grp_list_add(&grp->list, bio);
+			return true;
+		}
+	}
+
+	if (i == list->max_nr_grps)
+		return false;
+
+	/* create a new group */
+	grp = &list->head[i];
+	bio_list_init(&grp->list);
+	grp->grp_data = bio_grp_data(bio);
+	__bio_grp_list_add(&grp->list, bio);
+	list->nr_grps++;
+
+	return true;
+}
+
+static int bio_grp_list_find_grp(struct bio_grp_list *list, void *grp_data)
+{
+	int i;
+	struct bio_grp_list_data *grp;
+
+	for (i = 0; i < list->nr_grps; i++) {
+		grp = &list->head[i];
+		if (grp->grp_data == grp_data)
+			return i;
+	}
+
+	if (i < list->max_nr_grps) {
+		grp = &list->head[i];
+		bio_list_init(&grp->list);
+		return i;
+	}
+
+	return -1;
+}
+
+/* Move as many as possible groups from 'src' to 'dst' */
+void bio_grp_list_move(struct bio_grp_list *dst, struct bio_grp_list *src)
+{
+	int i, j, cnt = 0;
+	struct bio_grp_list_data *grp;
+
+	for (i = src->nr_grps - 1; i >= 0; i--) {
+		grp = &src->head[i];
+		j = bio_grp_list_find_grp(dst, grp->grp_data);
+		if (j < 0)
+			break;
+		if (bio_grp_list_grp_empty(&dst->head[j])) {
+			dst->head[j].grp_data = grp->grp_data;
+			dst->nr_grps++;
+		}
+		__bio_grp_list_merge(&dst->head[j].list, &grp->list);
+		bio_list_init(&grp->list);
+		cnt++;
+	}
+
+	src->nr_grps -= cnt;
+}
+
 static void bio_poll_ctx_init(struct blk_bio_poll_ctx *pc)
 {
 	pc->sq = (void *)pc + sizeof(*pc);
@@ -866,6 +941,47 @@ static inline void blk_bio_poll_preprocess(struct request_queue *q,
 		bio->bi_opf |= REQ_POLL_CTX;
 }
 
+static inline void blk_bio_poll_mark_queued(struct bio *bio, bool queued)
+{
+	/*
+	 * The bio has been added to per-task poll queue, mark it as
+	 * END_BY_POLL, so that this bio is always completed from
+	 * blk_poll() which is provided with cookied from this bio's
+	 * submission.
+	 */
+	if (!queued)
+		bio->bi_opf &= ~(REQ_HIPRI | REQ_POLL_CTX);
+	else
+		bio_set_flag(bio, BIO_END_BY_POLL);
+}
+
+static bool blk_bio_poll_prep_submit(struct io_context *ioc, struct bio *bio)
+{
+	struct blk_bio_poll_ctx *pc = ioc->data;
+	unsigned int queued;
+
+	/*
+	 * We rely on immutable .bi_end_io between blk-mq bio submission
+	 * and completion. However, bio crypt may update .bi_end_io during
+	 * submission, so simply don't support bio based polling for this
+	 * setting.
+	 */
+	if (likely(!bio_has_crypt_ctx(bio))) {
+		/* track this bio via bio group list */
+		spin_lock(&pc->sq_lock);
+		queued = bio_grp_list_add(pc->sq, bio);
+		blk_bio_poll_mark_queued(bio, queued);
+		if (queued)
+			bio_set_private_data(bio, BLK_QC_T_NOT_READY);
+		spin_unlock(&pc->sq_lock);
+	} else {
+		queued = false;
+		blk_bio_poll_mark_queued(bio, false);
+	}
+
+	return queued;
+}
+
 static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 {
 	struct block_device *bdev = bio->bi_bdev;
@@ -1024,7 +1140,7 @@ static blk_qc_t __submit_bio(struct bio *bio)
  * bio_list_on_stack[1] contains bios that were submitted before the current
  *	->submit_bio_bio, but that haven't been processed yet.
  */
-static blk_qc_t __submit_bio_noacct(struct bio *bio)
+static blk_qc_t __submit_bio_noacct_ctx(struct bio *bio, struct io_context *ioc)
 {
 	struct bio_list bio_list_on_stack[2];
 	blk_qc_t ret = BLK_QC_T_NONE;
@@ -1047,7 +1163,15 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
 		bio_list_on_stack[1] = bio_list_on_stack[0];
 		bio_list_init(&bio_list_on_stack[0]);
 
-		ret = __submit_bio(bio);
+		if (ioc && queue_is_mq(q) && (bio->bi_opf & REQ_HIPRI)) {
+			bool queued = blk_bio_poll_prep_submit(ioc, bio);
+
+			ret = __submit_bio(bio);
+			if (queued)
+				bio_set_private_data(bio, ret);
+		} else {
+			ret = __submit_bio(bio);
+		}
 
 		/*
 		 * Sort new bios into those for a lower level and those for the
@@ -1073,6 +1197,33 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
 	return ret;
 }
 
+static inline blk_qc_t __submit_bio_noacct_poll(struct bio *bio,
+		struct io_context *ioc)
+{
+	struct blk_bio_poll_ctx *pc = ioc->data;
+
+	__submit_bio_noacct_ctx(bio, ioc);
+
+	/* bio submissions queued to per-task poll context */
+	if (READ_ONCE(pc->sq->nr_grps))
+		return current->pid;
+
+	/* swapper's pid is 0, but it can't submit poll IO for us */
+	return BLK_QC_T_BIO_NONE;
+}
+
+static inline blk_qc_t __submit_bio_noacct(struct bio *bio)
+{
+	struct io_context *ioc = current->io_context;
+
+	if (ioc && ioc->data && (bio->bi_opf & REQ_HIPRI))
+		return __submit_bio_noacct_poll(bio, ioc);
+
+	__submit_bio_noacct_ctx(bio, NULL);
+
+	return BLK_QC_T_BIO_NONE;
+}
+
 static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
 {
 	struct bio_list bio_list[2] = { };
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 5574c398eff6..c1fd7c593a54 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -211,6 +211,9 @@ void exit_io_context(struct task_struct *task)
 	task->io_context = NULL;
 	task_unlock(task);
 
+	/* drain io poll submissions */
+	blk_bio_poll_io_drain(ioc);
+
 	atomic_dec(&ioc->nr_tasks);
 	put_io_context_active(ioc);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0cb88c719916..20bfc8c2d02e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3865,14 +3865,256 @@ static inline int blk_mq_poll_hctx(struct request_queue *q,
 	return ret;
 }
 
+static int blk_mq_poll_io(struct bio *bio)
+{
+	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	blk_qc_t cookie = bio_get_private_data(bio);
+	int ret = 0;
+
+	/* wait until the bio is submitted really */
+	if (!blk_qc_t_ready(cookie))
+		return 0;
+
+	if (!bio_flagged(bio, BIO_DONE) && blk_qc_t_valid(cookie)) {
+		struct blk_mq_hw_ctx *hctx =
+			q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
+
+		ret += blk_mq_poll_hctx(q, hctx);
+	}
+	return ret;
+}
+
+static int blk_bio_poll_and_end_io(struct bio_grp_list *grps)
+{
+	int ret = 0;
+	int i;
+
+	/*
+	 * Poll hw queue first.
+	 *
+	 * TODO: limit max poll times and make sure to not poll same
+	 * hw queue one more time.
+	 */
+	for (i = 0; i < grps->nr_grps; i++) {
+		struct bio_grp_list_data *grp = &grps->head[i];
+		struct bio *bio;
+
+		if (bio_grp_list_grp_empty(grp))
+			continue;
+
+		for (bio = grp->list.head; bio; bio = bio->bi_poll)
+			ret += blk_mq_poll_io(bio);
+	}
+
+	/* reap bios */
+	for (i = 0; i < grps->nr_grps; i++) {
+		struct bio_grp_list_data *grp = &grps->head[i];
+		struct bio *bio;
+		struct bio_list bl;
+
+		if (bio_grp_list_grp_empty(grp))
+			continue;
+
+		bio_list_init(&bl);
+
+		while ((bio = __bio_grp_list_pop(&grp->list))) {
+			if (bio_flagged(bio, BIO_DONE)) {
+				/* now recover original data */
+				bio->bi_poll = grp->grp_data;
+
+				/* clear BIO_END_BY_POLL and end me really */
+				bio_clear_flag(bio, BIO_END_BY_POLL);
+				bio_endio(bio);
+			} else {
+				__bio_grp_list_add(&bl, bio);
+			}
+		}
+		__bio_grp_list_merge(&grp->list, &bl);
+	}
+	return ret;
+}
+
+static void blk_bio_poll_pack_groups(struct bio_grp_list *grps)
+{
+	int i, j, k = 0;
+	int cnt = 0;
+
+	for (i = grps->nr_grps - 1; i >= 0; i--) {
+		struct bio_grp_list_data *grp = &grps->head[i];
+		struct bio_grp_list_data *hole = NULL;
+
+		if (bio_grp_list_grp_empty(grp)) {
+			cnt++;
+			continue;
+		}
+
+		for (j = k; j < i; j++) {
+			if (bio_grp_list_grp_empty(&grps->head[j])) {
+				hole = &grps->head[j];
+				break;
+			}
+		}
+		if (hole == NULL)
+			break;
+		*hole = *grp;
+		cnt++;
+		k = j;
+	}
+
+	grps->nr_grps -= cnt;
+}
+
+#define  MAX_BIO_GRPS_ON_STACK  8
+struct bio_grp_list_stack {
+	unsigned int max_nr_grps, nr_grps;
+	struct bio_grp_list_data head[MAX_BIO_GRPS_ON_STACK];
+};
+
+static int blk_bio_poll_io(struct io_context *submit_ioc,
+		struct io_context *poll_ioc)
+
+{
+	struct bio_grp_list_stack _bio_grps = {
+		.max_nr_grps	= ARRAY_SIZE(_bio_grps.head),
+		.nr_grps	= 0
+	};
+	struct bio_grp_list *bio_grps = (struct bio_grp_list *)&_bio_grps;
+	struct blk_bio_poll_ctx *submit_ctx = submit_ioc->data;
+	struct blk_bio_poll_ctx *poll_ctx = poll_ioc ?
+		poll_ioc->data : NULL;
+	int ret = 0;
+
+	/*
+	 * Move IO submission result from submission queue in submission
+	 * context to poll queue of poll context.
+	 */
+	if (READ_ONCE(submit_ctx->sq->nr_grps) > 0) {
+		spin_lock(&submit_ctx->sq_lock);
+		bio_grp_list_move(bio_grps, submit_ctx->sq);
+		spin_unlock(&submit_ctx->sq_lock);
+	}
+
+	/* merge new bios first, then start to poll bios from pq */
+	if (poll_ctx) {
+		spin_lock(&poll_ctx->pq_lock);
+		bio_grp_list_move(poll_ctx->pq, bio_grps);
+		bio_grp_list_move(bio_grps, poll_ctx->pq);
+		spin_unlock(&poll_ctx->pq_lock);
+	}
+
+	do {
+		ret += blk_bio_poll_and_end_io(bio_grps);
+		blk_bio_poll_pack_groups(bio_grps);
+
+		if (bio_grps->nr_grps) {
+			/*
+			 * move back, and keep polling until all can be
+			 * held in either poll queue or submission queue.
+			 */
+			if (poll_ctx) {
+				spin_lock(&poll_ctx->pq_lock);
+				bio_grp_list_move(poll_ctx->pq, bio_grps);
+				spin_unlock(&poll_ctx->pq_lock);
+			} else {
+				spin_lock(&submit_ctx->sq_lock);
+				bio_grp_list_move(submit_ctx->sq, bio_grps);
+				spin_unlock(&submit_ctx->sq_lock);
+			}
+		}
+	} while (bio_grps->nr_grps > 0);
+
+	return ret;
+}
+
+void blk_bio_poll_io_drain(struct io_context *submit_ioc)
+{
+	struct blk_bio_poll_ctx *submit_ctx = submit_ioc->data;
+
+	if (!submit_ctx)
+		return;
+
+	spin_lock(&submit_ctx->sq_lock);
+	while (READ_ONCE(submit_ctx->sq->nr_grps) > 0) {
+		blk_bio_poll_and_end_io(submit_ctx->sq);
+		blk_bio_poll_pack_groups(submit_ctx->sq);
+		cpu_relax();
+	}
+	spin_unlock(&submit_ctx->sq_lock);
+}
+
+static bool blk_bio_ioc_valid(struct task_struct *t)
+{
+	if (!t)
+		return false;
+
+	if (!t->io_context)
+		return false;
+
+	if (!t->io_context->data)
+		return false;
+
+	return true;
+}
+
+static int __blk_bio_poll(blk_qc_t cookie)
+{
+	struct io_context *poll_ioc = current->io_context;
+	pid_t pid;
+	struct task_struct *submit_task;
+	int ret;
+
+	pid = (pid_t)cookie;
+
+	/* io poll often share io submission context */
+	if (likely(current->pid == pid && blk_bio_ioc_valid(current)))
+		return blk_bio_poll_io(poll_ioc, poll_ioc);
+
+	submit_task = find_get_task_by_vpid(pid);
+	if (likely(blk_bio_ioc_valid(submit_task)))
+		ret = blk_bio_poll_io(submit_task->io_context, poll_ioc);
+	else
+		ret = 0;
+	if (likely(submit_task))
+		put_task_struct(submit_task);
+
+	return ret;
+}
+
 static int blk_bio_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
+	long state;
+
+	/* no need to poll */
+	if (cookie == BLK_QC_T_BIO_NONE)
+		return 0;
+
 	/*
 	 * Create poll queue for storing poll bio and its cookie from
 	 * submission queue
 	 */
 	blk_create_io_poll_context(q);
 
+	state = current->state;
+	do {
+		int ret;
+
+		ret = __blk_bio_poll(cookie);
+		if (ret > 0) {
+			__set_current_state(TASK_RUNNING);
+			return ret;
+		}
+
+		if (signal_pending_state(state, current))
+			__set_current_state(TASK_RUNNING);
+
+		if (current->state == TASK_RUNNING)
+			return 1;
+		if (ret < 0 || !spin)
+			break;
+		cpu_relax();
+	} while (!need_resched());
+
+	__set_current_state(TASK_RUNNING);
 	return 0;
 }
 
@@ -3893,7 +4135,7 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	struct blk_mq_hw_ctx *hctx;
 	long state;
 
-	if (!blk_qc_t_valid(cookie) || !blk_queue_poll(q))
+	if (!blk_queue_poll(q) || (queue_is_mq(q) && !blk_qc_t_valid(cookie)))
 		return 0;
 
 	if (current->plug)
diff --git a/block/blk.h b/block/blk.h
index c1d8456656df..c0216f2e6293 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -381,6 +381,7 @@ struct blk_bio_poll_ctx {
 #define BLK_BIO_POLL_SQ_SZ		16U
 #define BLK_BIO_POLL_PQ_SZ		(BLK_BIO_POLL_SQ_SZ * 2)
 
+void blk_bio_poll_io_drain(struct io_context *submit_ioc);
 void bio_poll_ctx_alloc(struct io_context *ioc);
 
 static inline void blk_create_io_poll_context(struct request_queue *q)
@@ -405,4 +406,13 @@ static inline void bio_set_private_data(struct bio *bio, unsigned int data)
 	bio->bi_iter.bi_private_data = data;
 }
 
+BIO_LIST_HELPERS(__bio_grp_list, poll);
+
+static inline bool bio_grp_list_grp_empty(struct bio_grp_list_data *grp)
+{
+	return bio_list_empty(&grp->list);
+}
+
+void bio_grp_list_move(struct bio_grp_list *dst, struct bio_grp_list *src);
+
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99160d588c2d..178af164ba2a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -235,7 +235,18 @@ struct bio {
 
 	struct bvec_iter	bi_iter;
 
-	bio_end_io_t		*bi_end_io;
+	union {
+		bio_end_io_t		*bi_end_io;
+		/*
+		 * bio based io polling needs to track bio via bio group
+		 * list which groups bios by their .bi_end_io, and original
+		 * .bi_end_io is saved into the group head. Will recover
+		 * .bi_end_io before really ending bio. BIO_END_BY_POLL
+		 * will make sure that this bio won't be ended before
+		 * recovering .bi_end_io.
+		 */
+		struct bio		*bi_poll;
+	};
 
 	void			*bi_private;
 #ifdef CONFIG_BLK_CGROUP
@@ -304,6 +315,9 @@ enum {
 	BIO_CGROUP_ACCT,	/* has been accounted to a cgroup */
 	BIO_TRACKED,		/* set if bio goes through the rq_qos path */
 	BIO_REMAPPED,
+	BIO_END_BY_POLL,	/* end by blk_bio_poll() explicitly */
+	/* set when bio can be ended, used for bio with BIO_END_BY_POLL */
+	BIO_DONE,
 	BIO_FLAG_LAST
 };
 
@@ -513,6 +527,16 @@ typedef unsigned int blk_qc_t;
 #define BLK_QC_T_NONE		-1U
 #define BLK_QC_T_SHIFT		16
 #define BLK_QC_T_INTERNAL	(1U << 31)
+/* only used for bio based submission, has to be defined as 0 */
+#define BLK_QC_T_BIO_NONE	0
+/* only used for bio based polling, not ready for polling */
+#define BLK_QC_T_NOT_READY	-2U
+
+/* not ready for bio based polling since this bio isn't submitted really */
+static inline bool blk_qc_t_ready(blk_qc_t cookie)
+{
+	return cookie != BLK_QC_T_NOT_READY;
+}
 
 static inline bool blk_qc_t_valid(blk_qc_t cookie)
 {
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 09/12] blk-mq: limit hw queues to be polled in each blk_poll()
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (7 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-01  2:19 ` [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue Ming Lei
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

Limit at most 8 queues are polled in each blk_pull(), avoid to
add extra latency when queue depth is high.

Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 79 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 24 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20bfc8c2d02e..e2701d502e51 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3865,36 +3865,31 @@ static inline int blk_mq_poll_hctx(struct request_queue *q,
 	return ret;
 }
 
-static int blk_mq_poll_io(struct bio *bio)
-{
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
-	blk_qc_t cookie = bio_get_private_data(bio);
-	int ret = 0;
+#define POLL_HCTX_MAX_CNT 8
 
-	/* wait until the bio is submitted really */
-	if (!blk_qc_t_ready(cookie))
-		return 0;
+static bool blk_add_unique_hctx(struct blk_mq_hw_ctx **data, int *cnt,
+		struct blk_mq_hw_ctx *hctx)
+{
+	int i;
 
-	if (!bio_flagged(bio, BIO_DONE) && blk_qc_t_valid(cookie)) {
-		struct blk_mq_hw_ctx *hctx =
-			q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
+	for (i = 0; i < *cnt; i++) {
+		if (data[i] == hctx)
+			goto exit;
+	}
 
-		ret += blk_mq_poll_hctx(q, hctx);
+	if (i < POLL_HCTX_MAX_CNT) {
+		data[i] = hctx;
+		(*cnt)++;
 	}
-	return ret;
+ exit:
+	return *cnt == POLL_HCTX_MAX_CNT;
 }
 
-static int blk_bio_poll_and_end_io(struct bio_grp_list *grps)
+static void blk_build_poll_queues(struct bio_grp_list *grps,
+		struct blk_mq_hw_ctx **data, int *cnt)
 {
-	int ret = 0;
 	int i;
 
-	/*
-	 * Poll hw queue first.
-	 *
-	 * TODO: limit max poll times and make sure to not poll same
-	 * hw queue one more time.
-	 */
 	for (i = 0; i < grps->nr_grps; i++) {
 		struct bio_grp_list_data *grp = &grps->head[i];
 		struct bio *bio;
@@ -3902,11 +3897,31 @@ static int blk_bio_poll_and_end_io(struct bio_grp_list *grps)
 		if (bio_grp_list_grp_empty(grp))
 			continue;
 
-		for (bio = grp->list.head; bio; bio = bio->bi_poll)
-			ret += blk_mq_poll_io(bio);
+		for (bio = grp->list.head; bio; bio = bio->bi_poll) {
+			blk_qc_t  cookie;
+			struct blk_mq_hw_ctx *hctx;
+			struct request_queue *q;
+
+			if (bio_flagged(bio, BIO_DONE))
+				continue;
+
+			/* wait until the bio is submitted really */
+			cookie = bio_get_private_data(bio);
+			if (!blk_qc_t_ready(cookie) || !blk_qc_t_valid(cookie))
+				continue;
+
+			q = bio->bi_bdev->bd_disk->queue;
+			hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
+			if (blk_add_unique_hctx(data, cnt, hctx))
+				return;
+		}
 	}
+}
+
+static void blk_bio_poll_reap_ios(struct bio_grp_list *grps)
+{
+	int i;
 
-	/* reap bios */
 	for (i = 0; i < grps->nr_grps; i++) {
 		struct bio_grp_list_data *grp = &grps->head[i];
 		struct bio *bio;
@@ -3931,6 +3946,22 @@ static int blk_bio_poll_and_end_io(struct bio_grp_list *grps)
 		}
 		__bio_grp_list_merge(&grp->list, &bl);
 	}
+}
+
+static int blk_bio_poll_and_end_io(struct bio_grp_list *grps)
+{
+	int ret = 0;
+	int i;
+	struct blk_mq_hw_ctx *hctx[POLL_HCTX_MAX_CNT];
+	int cnt = 0;
+
+	blk_build_poll_queues(grps, hctx, &cnt);
+
+	for (i = 0; i < cnt; i++)
+		ret += blk_mq_poll_hctx(hctx[i]->queue, hctx[i]);
+
+	blk_bio_poll_reap_ios(grps);
+
 	return ret;
 }
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (8 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 09/12] blk-mq: limit hw queues to be polled in each blk_poll() Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12 12:52   ` Jens Axboe
  2021-04-01  2:19 ` [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling Ming Lei
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

From: Jeffle Xu <jefflexu@linux.alibaba.com>

Sometimes we need to get the corresponding gendisk from request_queue.

It is preferred that block drivers store private data in
gendisk->private_data rather than request_queue->queuedata, e.g. see:
commit c4a59c4e5db3 ("dm: stop using ->queuedata").

So if only request_queue is given, we need to get its corresponding
gendisk to get the private data stored in that gendisk.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/blkdev.h       | 2 ++
 include/trace/events/kyber.h | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 89a01850cf12..bfab74b45f15 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -686,6 +686,8 @@ static inline bool blk_account_rq(struct request *rq)
 	dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \
 	(dir), (attrs))
 
+#define queue_to_disk(q)	(dev_to_disk(kobj_to_dev((q)->kobj.parent)))
+
 static inline bool queue_is_mq(struct request_queue *q)
 {
 	return q->mq_ops;
diff --git a/include/trace/events/kyber.h b/include/trace/events/kyber.h
index c0e7d24ca256..f9802562edf6 100644
--- a/include/trace/events/kyber.h
+++ b/include/trace/events/kyber.h
@@ -30,7 +30,7 @@ TRACE_EVENT(kyber_latency,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent)));
+		__entry->dev		= disk_devt(queue_to_disk(q));
 		strlcpy(__entry->domain, domain, sizeof(__entry->domain));
 		strlcpy(__entry->type, type, sizeof(__entry->type));
 		__entry->percentile	= percentile;
@@ -59,7 +59,7 @@ TRACE_EVENT(kyber_adjust,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent)));
+		__entry->dev		= disk_devt(queue_to_disk(q));
 		strlcpy(__entry->domain, domain, sizeof(__entry->domain));
 		__entry->depth		= depth;
 	),
@@ -81,7 +81,7 @@ TRACE_EVENT(kyber_throttled,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent)));
+		__entry->dev		= disk_devt(queue_to_disk(q));
 		strlcpy(__entry->domain, domain, sizeof(__entry->domain));
 	),
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (9 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-12  9:38   ` Christoph Hellwig
  2021-04-16  8:00   ` [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag Jeffle Xu
  2021-04-01  2:19 ` [PATCH V5 12/12] dm: support IO polling for bio-based dm device Ming Lei
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

From: Jeffle Xu <jefflexu@linux.alibaba.com>

This method can be used to check if bio-based device supports IO polling
or not. For mq devices, checking for hw queue in polling mode is
adequate, while the sanity check shall be implementation specific for
bio-based devices. For example, dm device needs to check if all
underlying devices are capable of IO polling.

Though bio-based device may have done the sanity check during the
device initialization phase, cacheing the result of this sanity check
(such as by cacheing in the queue_flags) may not work. Because for dm
devices, users could change the state of the underlying devices through
'/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
the cached result of the very beginning sanity check could be
out-of-date. Thus the sanity check needs to be done every time 'io_poll'
is to be modified.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-sysfs.c      | 14 +++++++++++---
 include/linux/blkdev.h |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index db3268d41274..c8e7e4af66cb 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -426,9 +426,17 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 	unsigned long poll_on;
 	ssize_t ret;
 
-	if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
-	    !q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
-		return -EINVAL;
+	if (queue_is_mq(q)) {
+		if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
+		    !q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
+			return -EINVAL;
+	} else {
+		struct gendisk *disk = queue_to_disk(q);
+
+		if (!disk->fops->poll_capable ||
+		    !disk->fops->poll_capable(disk))
+			return -EINVAL;
+	}
 
 	ret = queue_var_store(&poll_on, page, count);
 	if (ret < 0)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bfab74b45f15..a46f975f2a2f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1881,6 +1881,7 @@ struct block_device_operations {
 	int (*report_zones)(struct gendisk *, sector_t sector,
 			unsigned int nr_zones, report_zones_cb cb, void *data);
 	char *(*devnode)(struct gendisk *disk, umode_t *mode);
+	bool (*poll_capable)(struct gendisk *disk);
 	struct module *owner;
 	const struct pr_ops *pr_ops;
 };
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V5 12/12] dm: support IO polling for bio-based dm device
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (10 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling Ming Lei
@ 2021-04-01  2:19 ` Ming Lei
  2021-04-09 15:39 ` [PATCH V5 00/12] block: support bio based io polling Ming Lei
  2021-04-12  9:46 ` Christoph Hellwig
  13 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-01  2:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke,
	Ming Lei

From: Jeffle Xu <jefflexu@linux.alibaba.com>

IO polling is enabled when all underlying target devices are capable
of IO polling. The sanity check supports the stacked device model, in
which one dm device may be build upon another dm device. In this case,
the mapped device will check if the underlying dm target device
supports IO polling.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-table.c         | 24 ++++++++++++++++++++++++
 drivers/md/dm.c               | 14 ++++++++++++++
 include/linux/device-mapper.h |  1 +
 3 files changed, 39 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 95391f78b8d5..a8f3575fb118 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1509,6 +1509,12 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector)
 	return &t->targets[(KEYS_PER_NODE * n) + k];
 }
 
+static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev,
+				   sector_t start, sector_t len, void *data)
+{
+	return !blk_queue_poll(bdev_get_queue(dev->bdev));
+}
+
 /*
  * type->iterate_devices() should be called when the sanity check needs to
  * iterate and check all underlying data devices. iterate_devices() will
@@ -1559,6 +1565,11 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev,
 	return 0;
 }
 
+int dm_table_supports_poll(struct dm_table *t)
+{
+	return !dm_table_any_dev_attr(t, device_not_poll_capable, NULL);
+}
+
 /*
  * Check whether a table has no data devices attached using each
  * target's iterate_devices method.
@@ -2079,6 +2090,19 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 
 	dm_update_keyslot_manager(q, t);
 	blk_queue_update_readahead(q);
+
+	/*
+	 * Check for request-based device is remained to
+	 * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
+	 * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
+	 * devices supporting polling.
+	 */
+	if (__table_type_bio_based(t->type)) {
+		if (dm_table_supports_poll(t))
+			blk_queue_flag_set(QUEUE_FLAG_POLL, q);
+		else
+			blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
+	}
 }
 
 unsigned int dm_table_get_num_targets(struct dm_table *t)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 50b693d776d6..dacc988045c9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1720,6 +1720,19 @@ static blk_qc_t dm_submit_bio(struct bio *bio)
 	return ret;
 }
 
+static bool dm_poll_capable(struct gendisk *disk)
+{
+	int ret, srcu_idx;
+	struct mapped_device *md = disk->private_data;
+	struct dm_table *t;
+
+	t = dm_get_live_table(md, &srcu_idx);
+	ret = dm_table_supports_poll(t);
+	dm_put_live_table(md, srcu_idx);
+
+	return ret;
+}
+
 /*-----------------------------------------------------------------
  * An IDR is used to keep track of allocated minor numbers.
  *---------------------------------------------------------------*/
@@ -3132,6 +3145,7 @@ static const struct pr_ops dm_pr_ops = {
 };
 
 static const struct block_device_operations dm_blk_dops = {
+	.poll_capable = dm_poll_capable,
 	.submit_bio = dm_submit_bio,
 	.open = dm_blk_open,
 	.release = dm_blk_close,
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 7f4ac87c0b32..31bfd6f70013 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -538,6 +538,7 @@ unsigned int dm_table_get_num_targets(struct dm_table *t);
 fmode_t dm_table_get_mode(struct dm_table *t);
 struct mapped_device *dm_table_get_md(struct dm_table *t);
 const char *dm_table_device_name(struct dm_table *t);
+int dm_table_supports_poll(struct dm_table *t);
 
 /*
  * Trigger an event.
-- 
2.29.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 00/12] block: support bio based io polling
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (11 preceding siblings ...)
  2021-04-01  2:19 ` [PATCH V5 12/12] dm: support IO polling for bio-based dm device Ming Lei
@ 2021-04-09 15:39 ` Ming Lei
  2021-04-12  9:46 ` Christoph Hellwig
  13 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-09 15:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke

On Thu, Apr 01, 2021 at 10:19:15AM +0800, Ming Lei wrote:
> Hi Jens,
> 
> Add per-task io poll context for holding HIPRI blk-mq/underlying bios
> queued from bio based driver's io submission context, and reuse one bio
> padding field for storing 'cookie' returned from submit_bio() for these
> bios. Also explicitly end these bios in poll context by adding two
> new bio flags.
> 
> In this way, we needn't to poll all underlying hw queues any more,
> which is implemented in Jeffle's patches. And we can just poll hw queues
> in which there is HIPRI IO queued.
> 
> Usually io submission and io poll share same context, so the added io
> poll context data is just like one stack variable, and the cost for
> saving bios is cheap.
> 
> V5:
> 	- fix one use-after-free issue in case that polling is from another
> 	context: adds one new cookie of BLK_QC_T_NOT_READY for preventing
> 	this issue in patch 8/12
> 	- add reviewed-by & tested-by tag

Hello Guys,

Ping...


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 05/12] block: add new field into 'struct bvec_iter'
  2021-04-01  2:19 ` [PATCH V5 05/12] block: add new field into 'struct bvec_iter' Ming Lei
@ 2021-04-12  9:26   ` Christoph Hellwig
  2021-04-13  9:36     ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12  9:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

I don't like where this is going.

I think the model of storing the polling cookie in the bio is useful,
but:

 (1) I think having this in the iter is a mess.  Can you measure if
     just marking bvec_iter __packed will generate much worse code
     at all anymore?  If not we can just move this into the bio
     If it really generates much worse code I think you need to pick
     a different name as  as that i really confusing vs the bio field
     of the same name that is used entirely differenly.  Similarly
     the bio_get_private_data and bio_set_private_data helpers are
     entirely misnamed, as the names suggest they deal with the
     bi_private field in struct bio.  I actually suspect not having
     these helpers would be much preferable
 (2) once we do have the cookie in the bio we need to take advantage
     of that properly.  That is stop returning the cookie up the stack
     as we do right now but just rely on the bio, which will clean up
     tons of crap.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 06/12] block/mq: extract one helper function polling hw queue
  2021-04-01  2:19 ` [PATCH V5 06/12] block/mq: extract one helper function polling hw queue Ming Lei
@ 2021-04-12  9:29   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12  9:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

s/one/a/

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-01  2:19 ` [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling Ming Lei
@ 2021-04-12  9:38   ` Christoph Hellwig
  2021-04-14  8:38     ` JeffleXu
  2021-04-16  8:00   ` [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag Jeffle Xu
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12  9:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
> From: Jeffle Xu <jefflexu@linux.alibaba.com>
> 
> This method can be used to check if bio-based device supports IO polling
> or not. For mq devices, checking for hw queue in polling mode is
> adequate, while the sanity check shall be implementation specific for
> bio-based devices. For example, dm device needs to check if all
> underlying devices are capable of IO polling.
> 
> Though bio-based device may have done the sanity check during the
> device initialization phase, cacheing the result of this sanity check
> (such as by cacheing in the queue_flags) may not work. Because for dm
> devices, users could change the state of the underlying devices through
> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
> the cached result of the very beginning sanity check could be
> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
> is to be modified.

I really don't think thi should be a method, and I really do dislike
how we have all this "if (is_mq)" junk.  Why can't we have a flag on
the gendisk that signals if the device can support polling that
is autoamtically set for blk-mq and as-needed by bio based drivers?
And please move everything that significantly hanges things for the
mq based path to separate prep patches early in th series.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 00/12] block: support bio based io polling
  2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
                   ` (12 preceding siblings ...)
  2021-04-09 15:39 ` [PATCH V5 00/12] block: support bio based io polling Ming Lei
@ 2021-04-12  9:46 ` Christoph Hellwig
  13 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12  9:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

Hi Ming,

I really struggly to review this series.  A lot of the code and
especially code organization is really strange.  For example a
signiciant chunk of the bio based poll logic is added to blk-mq.c where
it obviously does not fit in, and a lot of the patches seems to both
change blk-mq logic and add new bio based logic.  Also the bio poll
logic is split over various patches in a way that is absolutel not
obvious to me.

Can you try to restructure this a bit:

 - move everything that cleans up or improves existing logic to the
   beginning of the series and into separate patches (where that isn't
   already the case)
 - keep the actual bio polling logic together where the functionality
   isn't otherwise useful
 - maybe add a new block/bio-poll.c file to keep all the code togeher?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-01  2:19 ` [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling Ming Lei
@ 2021-04-12  9:54   ` Christoph Hellwig
  2021-04-12 10:20     ` Ming Lei
  2021-04-12 10:16   ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12  9:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Thu, Apr 01, 2021 at 10:19:23AM +0800, Ming Lei wrote:
> Currently bio based IO polling needs to poll all hw queue blindly, this
> way is very inefficient, and one big reason is that we can't pass any
> bio submission result to blk_poll().
> 
> In IO submission context, track associated underlying bios by per-task
> submission queue and store returned 'cookie' in
> bio->bi_iter.bi_private_data, and return current->pid to caller of
> submit_bio() for any bio based driver's IO, which is submitted from FS.
> 
> In IO poll context, the passed cookie tells us the PID of submission
> context, then we can find bios from the per-task io pull context of
> submission context. Moving bios from submission queue to poll queue of
> the poll context, and keep polling until these bios are ended. Remove
> bio from poll queue if the bio is ended. Add bio flags of BIO_DONE and
> BIO_END_BY_POLL for such purpose.
> 
> In was found in Jeffle Xu's test that kfifo doesn't scale well for a
> submission queue as queue depth is increased, so a new mechanism for
> tracking bios is needed. So far bio's size is close to 2 cacheline size,
> and it may not be accepted to add new field into bio for solving the
> scalability issue by tracking bios via linked list, switch to bio group
> list for tracking bio, the idea is to reuse .bi_end_io for linking bios
> into a linked list for all sharing same .bi_end_io(call it bio group),
> which is recovered before ending bio really, since BIO_END_BY_POLL is
> added for enhancing this point. Usually .bi_end_bio is same for all
> bios in same layer, so it is enough to provide very limited groups, such
> as 16 or less for fixing the scalability issue.
> 
> Usually submission shares context with io poll. The per-task poll context
> is just like stack variable, and it is cheap to move data between the two
> per-task queues.
> 
> Also when the submission task is exiting, drain pending IOs in the context
> until all are done.
> 
> Tested-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bio.c               |   5 +
>  block/blk-core.c          | 155 +++++++++++++++++++++++-
>  block/blk-ioc.c           |   3 +
>  block/blk-mq.c            | 244 +++++++++++++++++++++++++++++++++++++-
>  block/blk.h               |  10 ++
>  include/linux/blk_types.h |  26 +++-
>  6 files changed, 439 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 26b7f721cda8..04c043dc60fc 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1402,6 +1402,11 @@ static inline bool bio_remaining_done(struct bio *bio)
>   **/
>  void bio_endio(struct bio *bio)
>  {
> +	/* BIO_END_BY_POLL has to be set before calling submit_bio */
> +	if (bio_flagged(bio, BIO_END_BY_POLL)) {
> +		bio_set_flag(bio, BIO_DONE);
> +		return;
> +	}

Why can't driver that implements bio based polling call a separate
bio_endio_polled instead?

> +static inline void *bio_grp_data(struct bio *bio)
> +{
> +	return bio->bi_poll;
> +}

What is the purpose of this helper?  And why does it have to lose the
type information?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-01  2:19 ` [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling Ming Lei
  2021-04-12  9:54   ` Christoph Hellwig
@ 2021-04-12 10:16   ` Christoph Hellwig
  2021-04-12 10:37     ` Ming Lei
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12 10:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

> +static int blk_bio_poll_io(struct io_context *submit_ioc,
> +		struct io_context *poll_ioc)

Givem that poll_ioc is always current->io_context there is no
need to pass it.

> +	struct blk_bio_poll_ctx *poll_ctx = poll_ioc ?
> +		poll_ioc->data : NULL;

and it really should not be NULL here, should it?

> +static int __blk_bio_poll(blk_qc_t cookie)
> +{
> +	struct io_context *poll_ioc = current->io_context;
> +	pid_t pid;
> +	struct task_struct *submit_task;
> +	int ret;
> +
> +	pid = (pid_t)cookie;
> +
> +	/* io poll often share io submission context */
> +	if (likely(current->pid == pid && blk_bio_ioc_valid(current)))
> +		return blk_bio_poll_io(poll_ioc, poll_ioc);
> +
> +	submit_task = find_get_task_by_vpid(pid);
> +	if (likely(blk_bio_ioc_valid(submit_task)))
> +		ret = blk_bio_poll_io(submit_task->io_context, poll_ioc);
> +	else
> +		ret = 0;
> +	if (likely(submit_task))
> +		put_task_struct(submit_task);

Wouldn't it make more sense to just store the submitting context
in the bio, even if that uses more space?  Having to call
find_get_task_by_vpid in the poll context seems rather problematic.

Note that this requires doing the refacoring to get rid of the separate
blk_qc_t passed up the stack I asked for earlier, but hiding all these
details seems like a really useful change anyway.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 07/12] block: prepare for supporting bio_list via other link
  2021-04-01  2:19 ` [PATCH V5 07/12] block: prepare for supporting bio_list via other link Ming Lei
@ 2021-04-12 10:18   ` Christoph Hellwig
  2021-04-12 11:37     ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12 10:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

I'd so something s/prepare for supporting/support using/ for the title.
Can't say I like spreading the bio_list even more.

Btw, what uses bi_next for the bios that are being polled?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 03/12] block: create io poll context for submission and poll task
  2021-04-01  2:19 ` [PATCH V5 03/12] block: create io poll context for submission and poll task Ming Lei
@ 2021-04-12 10:19   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12 10:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

>  /**
>   * blk_poll - poll for IO completions

> @@ -3875,6 +3886,9 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>  	if (current->plug)
>  		blk_flush_plug_list(current->plug, false);
>  
> +	if (!queue_is_mq(q))
> +		return blk_bio_poll(q, cookie, spin);
> +
>  	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];

Before doing this, blk_poll needs to be split into the public blk_poll
that should go into blk_poll, and a blk_mq_poll that should remain in
blk-mq.c.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-12  9:54   ` Christoph Hellwig
@ 2021-04-12 10:20     ` Ming Lei
  2021-04-12 10:29       ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-12 10:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Mon, Apr 12, 2021 at 10:54:27AM +0100, Christoph Hellwig wrote:
> On Thu, Apr 01, 2021 at 10:19:23AM +0800, Ming Lei wrote:
> > Currently bio based IO polling needs to poll all hw queue blindly, this
> > way is very inefficient, and one big reason is that we can't pass any
> > bio submission result to blk_poll().
> > 
> > In IO submission context, track associated underlying bios by per-task
> > submission queue and store returned 'cookie' in
> > bio->bi_iter.bi_private_data, and return current->pid to caller of
> > submit_bio() for any bio based driver's IO, which is submitted from FS.
> > 
> > In IO poll context, the passed cookie tells us the PID of submission
> > context, then we can find bios from the per-task io pull context of
> > submission context. Moving bios from submission queue to poll queue of
> > the poll context, and keep polling until these bios are ended. Remove
> > bio from poll queue if the bio is ended. Add bio flags of BIO_DONE and
> > BIO_END_BY_POLL for such purpose.
> > 
> > In was found in Jeffle Xu's test that kfifo doesn't scale well for a
> > submission queue as queue depth is increased, so a new mechanism for
> > tracking bios is needed. So far bio's size is close to 2 cacheline size,
> > and it may not be accepted to add new field into bio for solving the
> > scalability issue by tracking bios via linked list, switch to bio group
> > list for tracking bio, the idea is to reuse .bi_end_io for linking bios
> > into a linked list for all sharing same .bi_end_io(call it bio group),
> > which is recovered before ending bio really, since BIO_END_BY_POLL is
> > added for enhancing this point. Usually .bi_end_bio is same for all
> > bios in same layer, so it is enough to provide very limited groups, such
> > as 16 or less for fixing the scalability issue.
> > 
> > Usually submission shares context with io poll. The per-task poll context
> > is just like stack variable, and it is cheap to move data between the two
> > per-task queues.
> > 
> > Also when the submission task is exiting, drain pending IOs in the context
> > until all are done.
> > 
> > Tested-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/bio.c               |   5 +
> >  block/blk-core.c          | 155 +++++++++++++++++++++++-
> >  block/blk-ioc.c           |   3 +
> >  block/blk-mq.c            | 244 +++++++++++++++++++++++++++++++++++++-
> >  block/blk.h               |  10 ++
> >  include/linux/blk_types.h |  26 +++-
> >  6 files changed, 439 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 26b7f721cda8..04c043dc60fc 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -1402,6 +1402,11 @@ static inline bool bio_remaining_done(struct bio *bio)
> >   **/
> >  void bio_endio(struct bio *bio)
> >  {
> > +	/* BIO_END_BY_POLL has to be set before calling submit_bio */
> > +	if (bio_flagged(bio, BIO_END_BY_POLL)) {
> > +		bio_set_flag(bio, BIO_DONE);
> > +		return;
> > +	}
> 
> Why can't driver that implements bio based polling call a separate
> bio_endio_polled instead?

This bio is blk-mq IO which is underlying bio of DM or bio based driver, so
they doesn't belong to DM or bio based driver. Actually the bio_endio()
is called by blk_update_request().

The patch just tracks underlying bio-mq bios in current context, then
poll them until all are done.

> 
> > +static inline void *bio_grp_data(struct bio *bio)
> > +{
> > +	return bio->bi_poll;
> > +}
> 
> What is the purpose of this helper?  And why does it have to lose the
> type information?

This patch stores bio->bi_end_io(shared with ->bi_poll) into one per-task
data structure, and links all bios sharing same .bi_end_io into one list
via ->bi_end_io. And their ->bi_end_io is recovered before calling
bio_endio().

The helper is used for checking if one bio can be added to bio group,
and storing the data. The helper just serves for document purpose.

And the type info doesn't matter.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-12 10:20     ` Ming Lei
@ 2021-04-12 10:29       ` Christoph Hellwig
  2021-04-12 11:42         ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2021-04-12 10:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Jeffle Xu,
	Mike Snitzer, dm-devel, Hannes Reinecke

On Mon, Apr 12, 2021 at 06:20:55PM +0800, Ming Lei wrote:
> > > +static inline void *bio_grp_data(struct bio *bio)
> > > +{
> > > +	return bio->bi_poll;
> > > +}
> > 
> > What is the purpose of this helper?  And why does it have to lose the
> > type information?
> 
> This patch stores bio->bi_end_io(shared with ->bi_poll) into one per-task
> data structure, and links all bios sharing same .bi_end_io into one list
> via ->bi_end_io. And their ->bi_end_io is recovered before calling
> bio_endio().
> 
> The helper is used for checking if one bio can be added to bio group,
> and storing the data. The helper just serves for document purpose.
> 
> And the type info doesn't matter.

So why is bi_poll typed to start with then just to need a accessor
to remove the typer information?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-12 10:16   ` Christoph Hellwig
@ 2021-04-12 10:37     ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-12 10:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Mon, Apr 12, 2021 at 11:16:59AM +0100, Christoph Hellwig wrote:
> > +static int blk_bio_poll_io(struct io_context *submit_ioc,
> > +		struct io_context *poll_ioc)
> 
> Givem that poll_ioc is always current->io_context there is no
> need to pass it.

Yeah, it is true.

> 
> > +	struct blk_bio_poll_ctx *poll_ctx = poll_ioc ?
> > +		poll_ioc->data : NULL;
> 
> and it really should not be NULL here, should it?

Yeah, it is NULL just in case of memory allocation failure.

> 
> > +static int __blk_bio_poll(blk_qc_t cookie)
> > +{
> > +	struct io_context *poll_ioc = current->io_context;
> > +	pid_t pid;
> > +	struct task_struct *submit_task;
> > +	int ret;
> > +
> > +	pid = (pid_t)cookie;
> > +
> > +	/* io poll often share io submission context */
> > +	if (likely(current->pid == pid && blk_bio_ioc_valid(current)))
> > +		return blk_bio_poll_io(poll_ioc, poll_ioc);
> > +
> > +	submit_task = find_get_task_by_vpid(pid);
> > +	if (likely(blk_bio_ioc_valid(submit_task)))
> > +		ret = blk_bio_poll_io(submit_task->io_context, poll_ioc);
> > +	else
> > +		ret = 0;
> > +	if (likely(submit_task))
> > +		put_task_struct(submit_task);
> 
> Wouldn't it make more sense to just store the submitting context
> in the bio, even if that uses more space?  Having to call

But where to store the submitting context in bio? pid is 32bit and we can
pass it from submit_bio() perfectly, then avoid to add anything to bio.

If we save the submitting context in bio, we still have to handle task
exit related race, not see any benefit.

So far bio is ~128byte with typical setting, and people usually hate to
add more stuff into bio.

> find_get_task_by_vpid in the poll context seems rather problematic.

Why? We already handle submit context early exit.

> 
> Note that this requires doing the refacoring to get rid of the separate
> blk_qc_t passed up the stack I asked for earlier, but hiding all these
> details seems like a really useful change anyway.

That is hard to do because of race between submission and completion:

1) HIPRI could be cleared because of bio splitting, so we can't do that
for this kind of bio

2) we have to make sure that the bio won't be completed when storing
cookie into this bio. BIO_END_BY_POLL is added in this patch for bio
based polling. You mean we may cover all normal blk-mq polling via
BIO_END_BY_POLL?



Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 07/12] block: prepare for supporting bio_list via other link
  2021-04-12 10:18   ` Christoph Hellwig
@ 2021-04-12 11:37     ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-12 11:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Mon, Apr 12, 2021 at 11:18:05AM +0100, Christoph Hellwig wrote:
> I'd so something s/prepare for supporting/support using/ for the title.
> Can't say I like spreading the bio_list even more.
> 
> Btw, what uses bi_next for the bios that are being polled?

.bi_next is used for io merge after the bio is submitted to request
queue, so we can't reuse it.

-- 
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling
  2021-04-12 10:29       ` Christoph Hellwig
@ 2021-04-12 11:42         ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-12 11:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Mon, Apr 12, 2021 at 11:29:47AM +0100, Christoph Hellwig wrote:
> On Mon, Apr 12, 2021 at 06:20:55PM +0800, Ming Lei wrote:
> > > > +static inline void *bio_grp_data(struct bio *bio)
> > > > +{
> > > > +	return bio->bi_poll;
> > > > +}
> > > 
> > > What is the purpose of this helper?  And why does it have to lose the
> > > type information?
> > 
> > This patch stores bio->bi_end_io(shared with ->bi_poll) into one per-task
> > data structure, and links all bios sharing same .bi_end_io into one list
> > via ->bi_end_io. And their ->bi_end_io is recovered before calling
> > bio_endio().
> > 
> > The helper is used for checking if one bio can be added to bio group,
> > and storing the data. The helper just serves for document purpose.
> > 
> > And the type info doesn't matter.
> 
> So why is bi_poll typed to start with then just to need a accessor
> to remove the typer information?

It should be a bug from the beginning, either .bi_poll can be dropped
or it should be 'void *'. Just find it, thanks for pointing it out.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue
  2021-04-01  2:19 ` [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue Ming Lei
@ 2021-04-12 12:52   ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2021-04-12 12:52 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Jeffle Xu, Mike Snitzer, dm-devel, Hannes Reinecke

On 3/31/21 8:19 PM, Ming Lei wrote:
> From: Jeffle Xu <jefflexu@linux.alibaba.com>
> 
> Sometimes we need to get the corresponding gendisk from request_queue.
> 
> It is preferred that block drivers store private data in
> gendisk->private_data rather than request_queue->queuedata, e.g. see:
> commit c4a59c4e5db3 ("dm: stop using ->queuedata").
> 
> So if only request_queue is given, we need to get its corresponding
> gendisk to get the private data stored in that gendisk.

Applied this one as a separate cleanup/helper.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 05/12] block: add new field into 'struct bvec_iter'
  2021-04-12  9:26   ` Christoph Hellwig
@ 2021-04-13  9:36     ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-13  9:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jeffle Xu, Mike Snitzer, dm-devel,
	Hannes Reinecke

On Mon, Apr 12, 2021 at 10:26:53AM +0100, Christoph Hellwig wrote:
> I don't like where this is going.
> 
> I think the model of storing the polling cookie in the bio is useful,
> but:
> 
>  (1) I think having this in the iter is a mess.  Can you measure if
>      just marking bvec_iter __packed will generate much worse code
>      at all anymore?  If not we can just move this into the bio

Just test with packed 'struct bvec_iter' by running io_uring/libaio over
nvme/null_blk with different bs size, not see obvious difference
compared with unpacked bvec_iter.

So will switch to packed bvec_iter in next version.

>      If it really generates much worse code I think you need to pick
>      a different name as  as that i really confusing vs the bio field
>      of the same name that is used entirely differenly.  Similarly
>      the bio_get_private_data and bio_set_private_data helpers are
>      entirely misnamed, as the names suggest they deal with the
>      bi_private field in struct bio.  I actually suspect not having
>      these helpers would be much preferable

OK, how about naming it as .bi_poll_data?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-12  9:38   ` Christoph Hellwig
@ 2021-04-14  8:38     ` JeffleXu
  2021-04-14 11:24       ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: JeffleXu @ 2021-04-14  8:38 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: Jens Axboe, linux-block, Mike Snitzer, dm-devel, Hannes Reinecke



On 4/12/21 5:38 PM, Christoph Hellwig wrote:
> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
>>
>> This method can be used to check if bio-based device supports IO polling
>> or not. For mq devices, checking for hw queue in polling mode is
>> adequate, while the sanity check shall be implementation specific for
>> bio-based devices. For example, dm device needs to check if all
>> underlying devices are capable of IO polling.
>>
>> Though bio-based device may have done the sanity check during the
>> device initialization phase, cacheing the result of this sanity check
>> (such as by cacheing in the queue_flags) may not work. Because for dm
>> devices, users could change the state of the underlying devices through
>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
>> the cached result of the very beginning sanity check could be
>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
>> is to be modified.
> 
> I really don't think thi should be a method, and I really do dislike
> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
> the gendisk that signals if the device can support polling that
> is autoamtically set for blk-mq and as-needed by bio based drivers?

That would consume one more bit of queue->queue_flags.

Besides, DM/MD is somehow special here that when one of the underlying
devices is disabled polling through '/sys/block/<dev>/io_poll',
currently there's no mechanism notifying the above MD/DM to clear the
previously set queue_flags. Thus the outdated queue_flags still
indicates this DM/MD is capable of polling, while in fact one of the
underlying device has been disabled for polling.

Mike had ever suggested that we can trust the queue_flag, and clear the
outdated queue_flags when later the IO submission or polling routine
finally finds that the device is not capable of polling. Currently
submit_bio_checks() will silently clear the REQ_HIPRI flag and still
submit the bio when the device is actually not capable of polling. To
fix the issue, could we break the submission and return an error code in
submit_bio_checks() if the device is not capable of polling when
submitting HIPRI bio?


> And please move everything that significantly hanges things for the
> mq based path to separate prep patches early in th series.
> 



-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-14  8:38     ` JeffleXu
@ 2021-04-14 11:24       ` Ming Lei
  2021-04-15  1:34         ` JeffleXu
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-14 11:24 UTC (permalink / raw)
  To: JeffleXu
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke

On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
> 
> 
> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
> > On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
> >> From: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>
> >> This method can be used to check if bio-based device supports IO polling
> >> or not. For mq devices, checking for hw queue in polling mode is
> >> adequate, while the sanity check shall be implementation specific for
> >> bio-based devices. For example, dm device needs to check if all
> >> underlying devices are capable of IO polling.
> >>
> >> Though bio-based device may have done the sanity check during the
> >> device initialization phase, cacheing the result of this sanity check
> >> (such as by cacheing in the queue_flags) may not work. Because for dm
> >> devices, users could change the state of the underlying devices through
> >> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
> >> the cached result of the very beginning sanity check could be
> >> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
> >> is to be modified.
> > 
> > I really don't think thi should be a method, and I really do dislike
> > how we have all this "if (is_mq)" junk.  Why can't we have a flag on
> > the gendisk that signals if the device can support polling that
> > is autoamtically set for blk-mq and as-needed by bio based drivers?
> 
> That would consume one more bit of queue->queue_flags.
> 
> Besides, DM/MD is somehow special here that when one of the underlying
> devices is disabled polling through '/sys/block/<dev>/io_poll',
> currently there's no mechanism notifying the above MD/DM to clear the
> previously set queue_flags. Thus the outdated queue_flags still
> indicates this DM/MD is capable of polling, while in fact one of the
> underlying device has been disabled for polling.

Right, just like there isn't queue limit progagation.

Another blocker could be that bio based queue doesn't support queue
freezing.

> 
> Mike had ever suggested that we can trust the queue_flag, and clear the
> outdated queue_flags when later the IO submission or polling routine
> finally finds that the device is not capable of polling. Currently
> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
> submit the bio when the device is actually not capable of polling. To
> fix the issue, could we break the submission and return an error code in
> submit_bio_checks() if the device is not capable of polling when
> submitting HIPRI bio?

I think we may just leave it alone, if underlying queue becomes not pollable,
the bio still can be submitted & completed via IRQ, just not efficient enough.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-14 11:24       ` Ming Lei
@ 2021-04-15  1:34         ` JeffleXu
  2021-04-15  7:43           ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: JeffleXu @ 2021-04-15  1:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke



On 4/14/21 7:24 PM, Ming Lei wrote:
> On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
>>
>>
>> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
>>> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
>>>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>>
>>>> This method can be used to check if bio-based device supports IO polling
>>>> or not. For mq devices, checking for hw queue in polling mode is
>>>> adequate, while the sanity check shall be implementation specific for
>>>> bio-based devices. For example, dm device needs to check if all
>>>> underlying devices are capable of IO polling.
>>>>
>>>> Though bio-based device may have done the sanity check during the
>>>> device initialization phase, cacheing the result of this sanity check
>>>> (such as by cacheing in the queue_flags) may not work. Because for dm
>>>> devices, users could change the state of the underlying devices through
>>>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
>>>> the cached result of the very beginning sanity check could be
>>>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
>>>> is to be modified.
>>>
>>> I really don't think thi should be a method, and I really do dislike
>>> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
>>> the gendisk that signals if the device can support polling that
>>> is autoamtically set for blk-mq and as-needed by bio based drivers?
>>
>> That would consume one more bit of queue->queue_flags.
>>
>> Besides, DM/MD is somehow special here that when one of the underlying
>> devices is disabled polling through '/sys/block/<dev>/io_poll',
>> currently there's no mechanism notifying the above MD/DM to clear the
>> previously set queue_flags. Thus the outdated queue_flags still
>> indicates this DM/MD is capable of polling, while in fact one of the
>> underlying device has been disabled for polling.
> 
> Right, just like there isn't queue limit progagation.
> 
> Another blocker could be that bio based queue doesn't support queue
> freezing.

Do you mean the queue freezing is called in the following code snippet?

```
static ssize_t queue_poll_store(struct request_queue *q, const char
*page, size_t count)
{
	...
	if (poll_on) {
		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
	} else {
		blk_mq_freeze_queue(q);
		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
		blk_mq_unfreeze_queue(q);
	}
```

And I can't understand how bio-based queue doesn't support queue freezing.

```
submit_bio_noacct
	__submit_bio_noacct
		bio_queue_enter
```

Every time submitting a bio, bio_queue_enter() will be called, and once
the queue has been frozen, bio_queue_enter() will wait there until the
queue is unfrozen.

> 
>>
>> Mike had ever suggested that we can trust the queue_flag, and clear the
>> outdated queue_flags when later the IO submission or polling routine
>> finally finds that the device is not capable of polling. Currently
>> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
>> submit the bio when the device is actually not capable of polling. To
>> fix the issue, could we break the submission and return an error code in
>> submit_bio_checks() if the device is not capable of polling when
>> submitting HIPRI bio?
> 
> I think we may just leave it alone, if underlying queue becomes not pollable,
> the bio still can be submitted & completed via IRQ, just not efficient enough.

Yes it still works. I agree if there's no better solution...

And what about the issue Christoph originally concerned? Do we use one
more flag bit indicating if the queue capable of polling, or the
poll_capable() method way?

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-15  1:34         ` JeffleXu
@ 2021-04-15  7:43           ` Ming Lei
  2021-04-15  9:21             ` JeffleXu
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-15  7:43 UTC (permalink / raw)
  To: JeffleXu
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke

On Thu, Apr 15, 2021 at 09:34:36AM +0800, JeffleXu wrote:
> 
> 
> On 4/14/21 7:24 PM, Ming Lei wrote:
> > On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
> >>
> >>
> >> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
> >>> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
> >>>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>>
> >>>> This method can be used to check if bio-based device supports IO polling
> >>>> or not. For mq devices, checking for hw queue in polling mode is
> >>>> adequate, while the sanity check shall be implementation specific for
> >>>> bio-based devices. For example, dm device needs to check if all
> >>>> underlying devices are capable of IO polling.
> >>>>
> >>>> Though bio-based device may have done the sanity check during the
> >>>> device initialization phase, cacheing the result of this sanity check
> >>>> (such as by cacheing in the queue_flags) may not work. Because for dm
> >>>> devices, users could change the state of the underlying devices through
> >>>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
> >>>> the cached result of the very beginning sanity check could be
> >>>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
> >>>> is to be modified.
> >>>
> >>> I really don't think thi should be a method, and I really do dislike
> >>> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
> >>> the gendisk that signals if the device can support polling that
> >>> is autoamtically set for blk-mq and as-needed by bio based drivers?
> >>
> >> That would consume one more bit of queue->queue_flags.
> >>
> >> Besides, DM/MD is somehow special here that when one of the underlying
> >> devices is disabled polling through '/sys/block/<dev>/io_poll',
> >> currently there's no mechanism notifying the above MD/DM to clear the
> >> previously set queue_flags. Thus the outdated queue_flags still
> >> indicates this DM/MD is capable of polling, while in fact one of the
> >> underlying device has been disabled for polling.
> > 
> > Right, just like there isn't queue limit progagation.
> > 
> > Another blocker could be that bio based queue doesn't support queue
> > freezing.
> 
> Do you mean the queue freezing is called in the following code snippet?
> 
> ```
> static ssize_t queue_poll_store(struct request_queue *q, const char
> *page, size_t count)
> {
> 	...
> 	if (poll_on) {
> 		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> 	} else {
> 		blk_mq_freeze_queue(q);
> 		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> 		blk_mq_unfreeze_queue(q);
> 	}
> ```

Yes, if it is a bio based queue. Or bio queued queue(DM, MD or others) may
use freeze_queue to do similar thing.

> 
> And I can't understand how bio-based queue doesn't support queue freezing.
> 
> ```
> submit_bio_noacct
> 	__submit_bio_noacct
> 		bio_queue_enter
> ```
> 
> Every time submitting a bio, bio_queue_enter() will be called, and once
> the queue has been frozen, bio_queue_enter() will wait there until the
> queue is unfrozen.

Not like blk-mq, the refcount is just grabbed during submission for bio based
queue. I will research a bit and see if we can extend freeze queue for
covering bio based queue. One trouble is that bio is ended before
freeing request.

> 
> > 
> >>
> >> Mike had ever suggested that we can trust the queue_flag, and clear the
> >> outdated queue_flags when later the IO submission or polling routine
> >> finally finds that the device is not capable of polling. Currently
> >> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
> >> submit the bio when the device is actually not capable of polling. To
> >> fix the issue, could we break the submission and return an error code in
> >> submit_bio_checks() if the device is not capable of polling when
> >> submitting HIPRI bio?
> > 
> > I think we may just leave it alone, if underlying queue becomes not pollable,
> > the bio still can be submitted & completed via IRQ, just not efficient enough.
> 
> Yes it still works. I agree if there's no better solution...
> 
> And what about the issue Christoph originally concerned? Do we use one
> more flag bit indicating if the queue capable of polling, or the
> poll_capable() method way?

Just wondering why we can't use QUEUE_FLAG_POLL simply? If user wants to
enable it, let's do it for them. And bio driver can start with default poll
state by checking underlying queues.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-15  7:43           ` Ming Lei
@ 2021-04-15  9:21             ` JeffleXu
  2021-04-15 10:06               ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: JeffleXu @ 2021-04-15  9:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke



On 4/15/21 3:43 PM, Ming Lei wrote:
> On Thu, Apr 15, 2021 at 09:34:36AM +0800, JeffleXu wrote:
>>
>>
>> On 4/14/21 7:24 PM, Ming Lei wrote:
>>> On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
>>>>
>>>>
>>>> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
>>>>> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
>>>>>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>>>>
>>>>>> This method can be used to check if bio-based device supports IO polling
>>>>>> or not. For mq devices, checking for hw queue in polling mode is
>>>>>> adequate, while the sanity check shall be implementation specific for
>>>>>> bio-based devices. For example, dm device needs to check if all
>>>>>> underlying devices are capable of IO polling.
>>>>>>
>>>>>> Though bio-based device may have done the sanity check during the
>>>>>> device initialization phase, cacheing the result of this sanity check
>>>>>> (such as by cacheing in the queue_flags) may not work. Because for dm
>>>>>> devices, users could change the state of the underlying devices through
>>>>>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
>>>>>> the cached result of the very beginning sanity check could be
>>>>>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
>>>>>> is to be modified.
>>>>>
>>>>> I really don't think thi should be a method, and I really do dislike
>>>>> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
>>>>> the gendisk that signals if the device can support polling that
>>>>> is autoamtically set for blk-mq and as-needed by bio based drivers?
>>>>
>>>> That would consume one more bit of queue->queue_flags.
>>>>
>>>> Besides, DM/MD is somehow special here that when one of the underlying
>>>> devices is disabled polling through '/sys/block/<dev>/io_poll',
>>>> currently there's no mechanism notifying the above MD/DM to clear the
>>>> previously set queue_flags. Thus the outdated queue_flags still
>>>> indicates this DM/MD is capable of polling, while in fact one of the
>>>> underlying device has been disabled for polling.
>>>
>>> Right, just like there isn't queue limit progagation.
>>>
>>> Another blocker could be that bio based queue doesn't support queue
>>> freezing.
>>
>> Do you mean the queue freezing is called in the following code snippet?
>>
>> ```
>> static ssize_t queue_poll_store(struct request_queue *q, const char
>> *page, size_t count)
>> {
>> 	...
>> 	if (poll_on) {
>> 		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
>> 	} else {
>> 		blk_mq_freeze_queue(q);
>> 		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
>> 		blk_mq_unfreeze_queue(q);
>> 	}
>> ```
> 
> Yes, if it is a bio based queue. Or bio queued queue(DM, MD or others) may
> use freeze_queue to do similar thing.
> 
>>
>> And I can't understand how bio-based queue doesn't support queue freezing.
>>
>> ```
>> submit_bio_noacct
>> 	__submit_bio_noacct
>> 		bio_queue_enter
>> ```
>>
>> Every time submitting a bio, bio_queue_enter() will be called, and once
>> the queue has been frozen, bio_queue_enter() will wait there until the
>> queue is unfrozen.
> 
> Not like blk-mq, the refcount is just grabbed during submission for bio based
> queue. 

Could you please explain it more detailed ....


I will research a bit and see if we can extend freeze queue for
> covering bio based queue. One trouble is that bio is ended before
> freeing request.
> 
>>
>>>
>>>>
>>>> Mike had ever suggested that we can trust the queue_flag, and clear the
>>>> outdated queue_flags when later the IO submission or polling routine
>>>> finally finds that the device is not capable of polling. Currently
>>>> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
>>>> submit the bio when the device is actually not capable of polling. To
>>>> fix the issue, could we break the submission and return an error code in
>>>> submit_bio_checks() if the device is not capable of polling when
>>>> submitting HIPRI bio?
>>>
>>> I think we may just leave it alone, if underlying queue becomes not pollable,
>>> the bio still can be submitted & completed via IRQ, just not efficient enough.
>>
>> Yes it still works. I agree if there's no better solution...
>>
>> And what about the issue Christoph originally concerned? Do we use one
>> more flag bit indicating if the queue capable of polling, or the
>> poll_capable() method way?
> 
> Just wondering why we can't use QUEUE_FLAG_POLL simply? If user wants to
> enable it, let's do it for them. And bio driver can start with default poll
> state by checking underlying queues.
> 

Consider the following scenario: QUEUE_FLAG_POLL is set after
initialization, indicating the device capable of polling; then polling
is turned off by '/sys/block/<dev>/io_poll', thus QUEUE_FLAG_POLL is
cleared.

So we have to implement two semantics:
1. a flag indicating whether the device is **capable** of polling or
not; (adding another flag bit, or calling poll_capable() every time
polling is to be turned on)
2. a flag indicating whether polling is currently **turned on** or not
(i.e., QUEUE_FLAG_POLL)



-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-15  9:21             ` JeffleXu
@ 2021-04-15 10:06               ` Ming Lei
  2021-04-15 11:21                 ` JeffleXu
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-15 10:06 UTC (permalink / raw)
  To: JeffleXu
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke

On Thu, Apr 15, 2021 at 05:21:56PM +0800, JeffleXu wrote:
> 
> 
> On 4/15/21 3:43 PM, Ming Lei wrote:
> > On Thu, Apr 15, 2021 at 09:34:36AM +0800, JeffleXu wrote:
> >>
> >>
> >> On 4/14/21 7:24 PM, Ming Lei wrote:
> >>> On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
> >>>>
> >>>>
> >>>> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
> >>>>> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
> >>>>>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>>>>
> >>>>>> This method can be used to check if bio-based device supports IO polling
> >>>>>> or not. For mq devices, checking for hw queue in polling mode is
> >>>>>> adequate, while the sanity check shall be implementation specific for
> >>>>>> bio-based devices. For example, dm device needs to check if all
> >>>>>> underlying devices are capable of IO polling.
> >>>>>>
> >>>>>> Though bio-based device may have done the sanity check during the
> >>>>>> device initialization phase, cacheing the result of this sanity check
> >>>>>> (such as by cacheing in the queue_flags) may not work. Because for dm
> >>>>>> devices, users could change the state of the underlying devices through
> >>>>>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
> >>>>>> the cached result of the very beginning sanity check could be
> >>>>>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
> >>>>>> is to be modified.
> >>>>>
> >>>>> I really don't think thi should be a method, and I really do dislike
> >>>>> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
> >>>>> the gendisk that signals if the device can support polling that
> >>>>> is autoamtically set for blk-mq and as-needed by bio based drivers?
> >>>>
> >>>> That would consume one more bit of queue->queue_flags.
> >>>>
> >>>> Besides, DM/MD is somehow special here that when one of the underlying
> >>>> devices is disabled polling through '/sys/block/<dev>/io_poll',
> >>>> currently there's no mechanism notifying the above MD/DM to clear the
> >>>> previously set queue_flags. Thus the outdated queue_flags still
> >>>> indicates this DM/MD is capable of polling, while in fact one of the
> >>>> underlying device has been disabled for polling.
> >>>
> >>> Right, just like there isn't queue limit progagation.
> >>>
> >>> Another blocker could be that bio based queue doesn't support queue
> >>> freezing.
> >>
> >> Do you mean the queue freezing is called in the following code snippet?
> >>
> >> ```
> >> static ssize_t queue_poll_store(struct request_queue *q, const char
> >> *page, size_t count)
> >> {
> >> 	...
> >> 	if (poll_on) {
> >> 		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> >> 	} else {
> >> 		blk_mq_freeze_queue(q);
> >> 		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> >> 		blk_mq_unfreeze_queue(q);
> >> 	}
> >> ```
> > 
> > Yes, if it is a bio based queue. Or bio queued queue(DM, MD or others) may
> > use freeze_queue to do similar thing.
> > 
> >>
> >> And I can't understand how bio-based queue doesn't support queue freezing.
> >>
> >> ```
> >> submit_bio_noacct
> >> 	__submit_bio_noacct
> >> 		bio_queue_enter
> >> ```
> >>
> >> Every time submitting a bio, bio_queue_enter() will be called, and once
> >> the queue has been frozen, bio_queue_enter() will wait there until the
> >> queue is unfrozen.
> > 
> > Not like blk-mq, the refcount is just grabbed during submission for bio based
> > queue. 
> 
> Could you please explain it more detailed ....

Please see __submit_bio(), in which the queue ref is dropped.

> 
> 
> I will research a bit and see if we can extend freeze queue for
> > covering bio based queue. One trouble is that bio is ended before
> > freeing request.
> > 
> >>
> >>>
> >>>>
> >>>> Mike had ever suggested that we can trust the queue_flag, and clear the
> >>>> outdated queue_flags when later the IO submission or polling routine
> >>>> finally finds that the device is not capable of polling. Currently
> >>>> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
> >>>> submit the bio when the device is actually not capable of polling. To
> >>>> fix the issue, could we break the submission and return an error code in
> >>>> submit_bio_checks() if the device is not capable of polling when
> >>>> submitting HIPRI bio?
> >>>
> >>> I think we may just leave it alone, if underlying queue becomes not pollable,
> >>> the bio still can be submitted & completed via IRQ, just not efficient enough.
> >>
> >> Yes it still works. I agree if there's no better solution...
> >>
> >> And what about the issue Christoph originally concerned? Do we use one
> >> more flag bit indicating if the queue capable of polling, or the
> >> poll_capable() method way?
> > 
> > Just wondering why we can't use QUEUE_FLAG_POLL simply? If user wants to
> > enable it, let's do it for them. And bio driver can start with default poll
> > state by checking underlying queues.
> > 
> 
> Consider the following scenario: QUEUE_FLAG_POLL is set after
> initialization, indicating the device capable of polling; then polling
> is turned off by '/sys/block/<dev>/io_poll', thus QUEUE_FLAG_POLL is
> cleared.

If the flag is cleared, the bio will be submitted to irq queue, what is
the problem?


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-15 10:06               ` Ming Lei
@ 2021-04-15 11:21                 ` JeffleXu
  2021-04-15 13:08                   ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: JeffleXu @ 2021-04-15 11:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke



On 4/15/21 6:06 PM, Ming Lei wrote:
> On Thu, Apr 15, 2021 at 05:21:56PM +0800, JeffleXu wrote:
>>
>>
>> On 4/15/21 3:43 PM, Ming Lei wrote:
>>> On Thu, Apr 15, 2021 at 09:34:36AM +0800, JeffleXu wrote:
>>>>
>>>>
>>>> On 4/14/21 7:24 PM, Ming Lei wrote:
>>>>> On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
>>>>>>
>>>>>>
>>>>>> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
>>>>>>> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
>>>>>>>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>>>>>>
>>>>>>>> This method can be used to check if bio-based device supports IO polling
>>>>>>>> or not. For mq devices, checking for hw queue in polling mode is
>>>>>>>> adequate, while the sanity check shall be implementation specific for
>>>>>>>> bio-based devices. For example, dm device needs to check if all
>>>>>>>> underlying devices are capable of IO polling.
>>>>>>>>
>>>>>>>> Though bio-based device may have done the sanity check during the
>>>>>>>> device initialization phase, cacheing the result of this sanity check
>>>>>>>> (such as by cacheing in the queue_flags) may not work. Because for dm
>>>>>>>> devices, users could change the state of the underlying devices through
>>>>>>>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
>>>>>>>> the cached result of the very beginning sanity check could be
>>>>>>>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
>>>>>>>> is to be modified.
>>>>>>>
>>>>>>> I really don't think thi should be a method, and I really do dislike
>>>>>>> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
>>>>>>> the gendisk that signals if the device can support polling that
>>>>>>> is autoamtically set for blk-mq and as-needed by bio based drivers?
>>>>>>
>>>>>> That would consume one more bit of queue->queue_flags.
>>>>>>
>>>>>> Besides, DM/MD is somehow special here that when one of the underlying
>>>>>> devices is disabled polling through '/sys/block/<dev>/io_poll',
>>>>>> currently there's no mechanism notifying the above MD/DM to clear the
>>>>>> previously set queue_flags. Thus the outdated queue_flags still
>>>>>> indicates this DM/MD is capable of polling, while in fact one of the
>>>>>> underlying device has been disabled for polling.
>>>>>
>>>>> Right, just like there isn't queue limit progagation.
>>>>>
>>>>> Another blocker could be that bio based queue doesn't support queue
>>>>> freezing.
>>>>
>>>> Do you mean the queue freezing is called in the following code snippet?
>>>>
>>>> ```
>>>> static ssize_t queue_poll_store(struct request_queue *q, const char
>>>> *page, size_t count)
>>>> {
>>>> 	...
>>>> 	if (poll_on) {
>>>> 		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
>>>> 	} else {
>>>> 		blk_mq_freeze_queue(q);
>>>> 		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
>>>> 		blk_mq_unfreeze_queue(q);
>>>> 	}
>>>> ```
>>>
>>> Yes, if it is a bio based queue. Or bio queued queue(DM, MD or others) may
>>> use freeze_queue to do similar thing.
>>>
>>>>
>>>> And I can't understand how bio-based queue doesn't support queue freezing.
>>>>
>>>> ```
>>>> submit_bio_noacct
>>>> 	__submit_bio_noacct
>>>> 		bio_queue_enter
>>>> ```
>>>>
>>>> Every time submitting a bio, bio_queue_enter() will be called, and once
>>>> the queue has been frozen, bio_queue_enter() will wait there until the
>>>> queue is unfrozen.
>>>
>>> Not like blk-mq, the refcount is just grabbed during submission for bio based
>>> queue. 
>>
>> Could you please explain it more detailed ....
> 
> Please see __submit_bio(), in which the queue ref is dropped.
> 
>>
>>
>> I will research a bit and see if we can extend freeze queue for
>>> covering bio based queue. One trouble is that bio is ended before
>>> freeing request.
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Mike had ever suggested that we can trust the queue_flag, and clear the
>>>>>> outdated queue_flags when later the IO submission or polling routine
>>>>>> finally finds that the device is not capable of polling. Currently
>>>>>> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
>>>>>> submit the bio when the device is actually not capable of polling. To
>>>>>> fix the issue, could we break the submission and return an error code in
>>>>>> submit_bio_checks() if the device is not capable of polling when
>>>>>> submitting HIPRI bio?
>>>>>
>>>>> I think we may just leave it alone, if underlying queue becomes not pollable,
>>>>> the bio still can be submitted & completed via IRQ, just not efficient enough.
>>>>
>>>> Yes it still works. I agree if there's no better solution...
>>>>
>>>> And what about the issue Christoph originally concerned? Do we use one
>>>> more flag bit indicating if the queue capable of polling, or the
>>>> poll_capable() method way?
>>>
>>> Just wondering why we can't use QUEUE_FLAG_POLL simply? If user wants to
>>> enable it, let's do it for them. And bio driver can start with default poll
>>> state by checking underlying queues.
>>>
>>
>> Consider the following scenario: QUEUE_FLAG_POLL is set after
>> initialization, indicating the device capable of polling; then polling
>> is turned off by '/sys/block/<dev>/io_poll', thus QUEUE_FLAG_POLL is
>> cleared.
> 
> If the flag is cleared, the bio will be submitted to irq queue, what is
> the problem?
> 

The IO path has no problem. It is the control path. If you want to turn
on polling then, you have to check if the device capable of polling,
while QUEUE_FLAG_POLL has been cleared in this case. IOW you can't rely
on QUEUE_FLAG_POLL to see if the device has the **ability** of polling.
QUEUE_FLAG_POLL flag only indicates if polling is turned on or off
currently.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling
  2021-04-15 11:21                 ` JeffleXu
@ 2021-04-15 13:08                   ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-15 13:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Mike Snitzer,
	dm-devel, Hannes Reinecke

On Thu, Apr 15, 2021 at 07:21:52PM +0800, JeffleXu wrote:
> 
> 
> On 4/15/21 6:06 PM, Ming Lei wrote:
> > On Thu, Apr 15, 2021 at 05:21:56PM +0800, JeffleXu wrote:
> >>
> >>
> >> On 4/15/21 3:43 PM, Ming Lei wrote:
> >>> On Thu, Apr 15, 2021 at 09:34:36AM +0800, JeffleXu wrote:
> >>>>
> >>>>
> >>>> On 4/14/21 7:24 PM, Ming Lei wrote:
> >>>>> On Wed, Apr 14, 2021 at 04:38:25PM +0800, JeffleXu wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 4/12/21 5:38 PM, Christoph Hellwig wrote:
> >>>>>>> On Thu, Apr 01, 2021 at 10:19:26AM +0800, Ming Lei wrote:
> >>>>>>>> From: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>>>>>>
> >>>>>>>> This method can be used to check if bio-based device supports IO polling
> >>>>>>>> or not. For mq devices, checking for hw queue in polling mode is
> >>>>>>>> adequate, while the sanity check shall be implementation specific for
> >>>>>>>> bio-based devices. For example, dm device needs to check if all
> >>>>>>>> underlying devices are capable of IO polling.
> >>>>>>>>
> >>>>>>>> Though bio-based device may have done the sanity check during the
> >>>>>>>> device initialization phase, cacheing the result of this sanity check
> >>>>>>>> (such as by cacheing in the queue_flags) may not work. Because for dm
> >>>>>>>> devices, users could change the state of the underlying devices through
> >>>>>>>> '/sys/block/<dev>/io_poll', bypassing the dm device above. In this case,
> >>>>>>>> the cached result of the very beginning sanity check could be
> >>>>>>>> out-of-date. Thus the sanity check needs to be done every time 'io_poll'
> >>>>>>>> is to be modified.
> >>>>>>>
> >>>>>>> I really don't think thi should be a method, and I really do dislike
> >>>>>>> how we have all this "if (is_mq)" junk.  Why can't we have a flag on
> >>>>>>> the gendisk that signals if the device can support polling that
> >>>>>>> is autoamtically set for blk-mq and as-needed by bio based drivers?
> >>>>>>
> >>>>>> That would consume one more bit of queue->queue_flags.
> >>>>>>
> >>>>>> Besides, DM/MD is somehow special here that when one of the underlying
> >>>>>> devices is disabled polling through '/sys/block/<dev>/io_poll',
> >>>>>> currently there's no mechanism notifying the above MD/DM to clear the
> >>>>>> previously set queue_flags. Thus the outdated queue_flags still
> >>>>>> indicates this DM/MD is capable of polling, while in fact one of the
> >>>>>> underlying device has been disabled for polling.
> >>>>>
> >>>>> Right, just like there isn't queue limit progagation.
> >>>>>
> >>>>> Another blocker could be that bio based queue doesn't support queue
> >>>>> freezing.
> >>>>
> >>>> Do you mean the queue freezing is called in the following code snippet?
> >>>>
> >>>> ```
> >>>> static ssize_t queue_poll_store(struct request_queue *q, const char
> >>>> *page, size_t count)
> >>>> {
> >>>> 	...
> >>>> 	if (poll_on) {
> >>>> 		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> >>>> 	} else {
> >>>> 		blk_mq_freeze_queue(q);
> >>>> 		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> >>>> 		blk_mq_unfreeze_queue(q);
> >>>> 	}
> >>>> ```
> >>>
> >>> Yes, if it is a bio based queue. Or bio queued queue(DM, MD or others) may
> >>> use freeze_queue to do similar thing.
> >>>
> >>>>
> >>>> And I can't understand how bio-based queue doesn't support queue freezing.
> >>>>
> >>>> ```
> >>>> submit_bio_noacct
> >>>> 	__submit_bio_noacct
> >>>> 		bio_queue_enter
> >>>> ```
> >>>>
> >>>> Every time submitting a bio, bio_queue_enter() will be called, and once
> >>>> the queue has been frozen, bio_queue_enter() will wait there until the
> >>>> queue is unfrozen.
> >>>
> >>> Not like blk-mq, the refcount is just grabbed during submission for bio based
> >>> queue. 
> >>
> >> Could you please explain it more detailed ....
> > 
> > Please see __submit_bio(), in which the queue ref is dropped.
> > 
> >>
> >>
> >> I will research a bit and see if we can extend freeze queue for
> >>> covering bio based queue. One trouble is that bio is ended before
> >>> freeing request.
> >>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Mike had ever suggested that we can trust the queue_flag, and clear the
> >>>>>> outdated queue_flags when later the IO submission or polling routine
> >>>>>> finally finds that the device is not capable of polling. Currently
> >>>>>> submit_bio_checks() will silently clear the REQ_HIPRI flag and still
> >>>>>> submit the bio when the device is actually not capable of polling. To
> >>>>>> fix the issue, could we break the submission and return an error code in
> >>>>>> submit_bio_checks() if the device is not capable of polling when
> >>>>>> submitting HIPRI bio?
> >>>>>
> >>>>> I think we may just leave it alone, if underlying queue becomes not pollable,
> >>>>> the bio still can be submitted & completed via IRQ, just not efficient enough.
> >>>>
> >>>> Yes it still works. I agree if there's no better solution...
> >>>>
> >>>> And what about the issue Christoph originally concerned? Do we use one
> >>>> more flag bit indicating if the queue capable of polling, or the
> >>>> poll_capable() method way?
> >>>
> >>> Just wondering why we can't use QUEUE_FLAG_POLL simply? If user wants to
> >>> enable it, let's do it for them. And bio driver can start with default poll
> >>> state by checking underlying queues.
> >>>
> >>
> >> Consider the following scenario: QUEUE_FLAG_POLL is set after
> >> initialization, indicating the device capable of polling; then polling
> >> is turned off by '/sys/block/<dev>/io_poll', thus QUEUE_FLAG_POLL is
> >> cleared.
> > 
> > If the flag is cleared, the bio will be submitted to irq queue, what is
> > the problem?
> > 
> 
> The IO path has no problem. It is the control path. If you want to turn

Can you explain a bit what the control path is?

> on polling then, you have to check if the device capable of polling,
> while QUEUE_FLAG_POLL has been cleared in this case. IOW you can't rely
> on QUEUE_FLAG_POLL to see if the device has the **ability** of polling.
> QUEUE_FLAG_POLL flag only indicates if polling is turned on or off
> currently.

For bio based driver, I'd suggest to start with do polling simply if
QUEUE_FLAG_POLL is set in bio request queue flag. The flag can be
enabled/disabled during initialization, or via sysfs. That said we
can start with always thinking the bio queue is capable of io polling.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-01  2:19 ` [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling Ming Lei
  2021-04-12  9:38   ` Christoph Hellwig
@ 2021-04-16  8:00   ` Jeffle Xu
  2021-04-16  8:42     ` JeffleXu
  2021-04-16  9:07     ` Ming Lei
  1 sibling, 2 replies; 46+ messages in thread
From: Jeffle Xu @ 2021-04-16  8:00 UTC (permalink / raw)
  To: ming.lei, snitzer, axboe; +Cc: linux-block, dm-devel

Hi,
How about this patch to remove the extra poll_capable() method?

And the following 'dm: support IO polling for bio-based dm device' needs
following change.

```
+       /*
+        * Check for request-based device is remained to
+        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
+        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
+        * devices supporting polling.
+        */
+       if (__table_type_bio_based(t->type)) {
+               if (dm_table_supports_poll(t)) {
+                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
+                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
+               }
+               else {
+                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
+                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
+               }
+       }
```


Introduce QUEUE_FLAG_POLL_CAP flag, indicating if the device supports IO
polling or not. Thus both blk-mq and bio-based device could set this
flag at the initialization phase, and then only this flag needs to be
checked instead of rechecking if the device has the ability of IO
polling when enabling IO polling via sysfs.

For NVMe, the ability of IO polling may change after RESET, since
nvme.poll_queues module parameter may change. Thus the ability of IO
polling need to be rechecked after RESET.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 block/blk-mq.c           |  5 +++--
 block/blk-sysfs.c        |  3 +--
 drivers/nvme/host/core.c |  2 ++
 include/linux/blk-mq.h   | 12 ++++++++++++
 include/linux/blkdev.h   |  2 ++
 5 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 414f5d99d9de..55ef6b975169 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3227,9 +3227,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->tag_set = set;
 
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
-	if (set->nr_maps > HCTX_TYPE_POLL &&
-	    set->map[HCTX_TYPE_POLL].nr_queues)
+	if (blk_mq_poll_capable(set)) {
+		blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
 		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
+	}
 
 	q->sg_reserved_size = INT_MAX;
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index db3268d41274..64f0ab84b606 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -426,8 +426,7 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 	unsigned long poll_on;
 	ssize_t ret;
 
-	if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
-	    !q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
+	if(!blk_queue_poll_cap(q))
 		return -EINVAL;
 
 	ret = queue_var_store(&poll_on, page, count);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bb7da34dd967..5344cc877b05 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2210,6 +2210,8 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
 	ns->lba_shift = id->lbaf[lbaf].ds;
 	nvme_set_queue_limits(ns->ctrl, ns->queue);
 
+	blk_mq_check_poll(ns->disk->queue, ns->disk->queue->tag_set);
+
 	ret = nvme_configure_metadata(ns, id);
 	if (ret)
 		goto out_unfreeze;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2c473c9b8990..ee4c89c8bebc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -618,4 +618,16 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio);
 void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx,
 		struct lock_class_key *key);
 
+static inline bool blk_mq_poll_capable(struct blk_mq_tag_set *set)
+{
+	return set->nr_maps > HCTX_TYPE_POLL &&
+	       set->map[HCTX_TYPE_POLL].nr_queues;
+}
+
+static inline void blk_mq_check_poll(struct request_queue *q,
+				     struct blk_mq_tag_set *set)
+{
+	if (blk_mq_poll_capable(set))
+		blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
+}
 #endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1e88116dc070..d192a106bf40 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -621,6 +621,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_POLL_CAP     30	/* device supports IO polling */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -668,6 +669,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
 #define blk_queue_poll(q)	test_bit(QUEUE_FLAG_POLL, &(q)->queue_flags)
+#define blk_queue_poll_cap(q)	test_bit(QUEUE_FLAG_POLL_CAP, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-16  8:00   ` [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag Jeffle Xu
@ 2021-04-16  8:42     ` JeffleXu
  2021-04-16  9:07     ` Ming Lei
  1 sibling, 0 replies; 46+ messages in thread
From: JeffleXu @ 2021-04-16  8:42 UTC (permalink / raw)
  To: ming.lei, snitzer, axboe; +Cc: linux-block, dm-devel



On 4/16/21 4:00 PM, Jeffle Xu wrote:
> Hi,
> How about this patch to remove the extra poll_capable() method?
> 
> And the following 'dm: support IO polling for bio-based dm device' needs
> following change.
> 
> ```
> +       /*
> +        * Check for request-based device is remained to
> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
> +        * devices supporting polling.
> +        */
> +       if (__table_type_bio_based(t->type)) {
> +               if (dm_table_supports_poll(t)) {
> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> +               }
> +               else {
> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
> +               }
> +       }
> ```
> 
> 
> Introduce QUEUE_FLAG_POLL_CAP flag, indicating if the device supports IO
> polling or not. Thus both blk-mq and bio-based device could set this
> flag at the initialization phase, and then only this flag needs to be
> checked instead of rechecking if the device has the ability of IO
> polling when enabling IO polling via sysfs.
> 
> For NVMe, the ability of IO polling may change after RESET, since
> nvme.poll_queues module parameter may change. Thus the ability of IO
> polling need to be rechecked after RESET.
> 
The defect of this approach is that all device drivers that may change
tag_set after initialization (e.g., NVMe RESET) need to update
QUEUE_FLAG_POLL_CAP. Previous this patch, tag_set is checked directly in
queue_poll_store, and thus device drivers don't need to care the
queue_flags.


> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  block/blk-mq.c           |  5 +++--
>  block/blk-sysfs.c        |  3 +--
>  drivers/nvme/host/core.c |  2 ++
>  include/linux/blk-mq.h   | 12 ++++++++++++
>  include/linux/blkdev.h   |  2 ++
>  5 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 414f5d99d9de..55ef6b975169 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3227,9 +3227,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  	q->tag_set = set;
>  
>  	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
> -	if (set->nr_maps > HCTX_TYPE_POLL &&
> -	    set->map[HCTX_TYPE_POLL].nr_queues)
> +	if (blk_mq_poll_capable(set)) {
> +		blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
>  		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> +	}
>  
>  	q->sg_reserved_size = INT_MAX;
>  
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index db3268d41274..64f0ab84b606 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -426,8 +426,7 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
>  	unsigned long poll_on;
>  	ssize_t ret;
>  
> -	if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
> -	    !q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
> +	if(!blk_queue_poll_cap(q))
>  		return -EINVAL;
>  
>  	ret = queue_var_store(&poll_on, page, count);
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index bb7da34dd967..5344cc877b05 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2210,6 +2210,8 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
>  	ns->lba_shift = id->lbaf[lbaf].ds;
>  	nvme_set_queue_limits(ns->ctrl, ns->queue);
>  
> +	blk_mq_check_poll(ns->disk->queue, ns->disk->queue->tag_set);
> +
>  	ret = nvme_configure_metadata(ns, id);
>  	if (ret)
>  		goto out_unfreeze;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 2c473c9b8990..ee4c89c8bebc 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -618,4 +618,16 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio);
>  void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx,
>  		struct lock_class_key *key);
>  
> +static inline bool blk_mq_poll_capable(struct blk_mq_tag_set *set)
> +{
> +	return set->nr_maps > HCTX_TYPE_POLL &&
> +	       set->map[HCTX_TYPE_POLL].nr_queues;
> +}
> +


> +static inline void blk_mq_check_poll(struct request_queue *q,
> +				     struct blk_mq_tag_set *set)
> +{
> +	if (blk_mq_poll_capable(set))
> +		blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
> +}


Sorry it should be

> +static inline void blk_mq_check_poll(struct request_queue *q,
> +				     struct blk_mq_tag_set *set)
> +{
> +	if (blk_mq_poll_capable(set))
> +		blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
> +}
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
> +}


>  #endif
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1e88116dc070..d192a106bf40 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -621,6 +621,7 @@ struct request_queue {
>  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
>  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
>  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> +#define QUEUE_FLAG_POLL_CAP     30	/* device supports IO polling */
>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> @@ -668,6 +669,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
>  #define blk_queue_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
>  #define blk_queue_poll(q)	test_bit(QUEUE_FLAG_POLL, &(q)->queue_flags)
> +#define blk_queue_poll_cap(q)	test_bit(QUEUE_FLAG_POLL_CAP, &(q)->queue_flags)
>  
>  extern void blk_set_pm_only(struct request_queue *q);
>  extern void blk_clear_pm_only(struct request_queue *q);
> 

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-16  8:00   ` [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag Jeffle Xu
  2021-04-16  8:42     ` JeffleXu
@ 2021-04-16  9:07     ` Ming Lei
  2021-04-16 10:20       ` JeffleXu
  2021-04-17 14:06       ` JeffleXu
  1 sibling, 2 replies; 46+ messages in thread
From: Ming Lei @ 2021-04-16  9:07 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: snitzer, axboe, linux-block, dm-devel

On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
> Hi,
> How about this patch to remove the extra poll_capable() method?
> 
> And the following 'dm: support IO polling for bio-based dm device' needs
> following change.
> 
> ```
> +       /*
> +        * Check for request-based device is remained to
> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
> +        * devices supporting polling.
> +        */
> +       if (__table_type_bio_based(t->type)) {
> +               if (dm_table_supports_poll(t)) {
> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> +               }
> +               else {
> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
> +               }
> +       }
> ```

Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
DM, and the result is basically subset of treating DM as always being capable
of polling.

Also underlying queue change(either limits or flag) won't be propagated
to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
queues are capable of supporting polling at the exact time of 'write sysfs/poll',
cause any of them may change in future.

So why not start with the simplest approach(always capable of polling)
which does meet normal bio based polling requirement?



Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-16  9:07     ` Ming Lei
@ 2021-04-16 10:20       ` JeffleXu
  2021-04-17 14:06       ` JeffleXu
  1 sibling, 0 replies; 46+ messages in thread
From: JeffleXu @ 2021-04-16 10:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: snitzer, axboe, linux-block, dm-devel



On 4/16/21 5:07 PM, Ming Lei wrote:
> On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
>> Hi,
>> How about this patch to remove the extra poll_capable() method?
>>
>> And the following 'dm: support IO polling for bio-based dm device' needs
>> following change.
>>
>> ```
>> +       /*
>> +        * Check for request-based device is remained to
>> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
>> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
>> +        * devices supporting polling.
>> +        */
>> +       if (__table_type_bio_based(t->type)) {
>> +               if (dm_table_supports_poll(t)) {
>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
>> +               }
>> +               else {
>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
>> +               }
>> +       }
>> ```
> 
> Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
> DM, and the result is basically subset of treating DM as always being capable
> of polling.
> 
> Also underlying queue change(either limits or flag) won't be propagated
> to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
> queues are capable of supporting polling at the exact time of 'write sysfs/poll',
> cause any of them may change in future.

Yes it is.

> 
> So why not start with the simplest approach(always capable of polling)
> which does meet normal bio based polling requirement?
> 

I agree if we have no better way. Though the handling of "sysfs/io_poll"
is somehow inconsistent between blk-mq and bio-based device then.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-16  9:07     ` Ming Lei
  2021-04-16 10:20       ` JeffleXu
@ 2021-04-17 14:06       ` JeffleXu
  2021-04-19  2:21         ` Ming Lei
  1 sibling, 1 reply; 46+ messages in thread
From: JeffleXu @ 2021-04-17 14:06 UTC (permalink / raw)
  To: Ming Lei; +Cc: snitzer, axboe, linux-block, dm-devel



On 4/16/21 5:07 PM, Ming Lei wrote:
> On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
>> Hi,
>> How about this patch to remove the extra poll_capable() method?
>>
>> And the following 'dm: support IO polling for bio-based dm device' needs
>> following change.
>>
>> ```
>> +       /*
>> +        * Check for request-based device is remained to
>> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
>> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
>> +        * devices supporting polling.
>> +        */
>> +       if (__table_type_bio_based(t->type)) {
>> +               if (dm_table_supports_poll(t)) {
>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
>> +               }
>> +               else {
>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
>> +               }
>> +       }
>> ```
> 
> Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
> DM, and the result is basically subset of treating DM as always being capable
> of polling.
> 
> Also underlying queue change(either limits or flag) won't be propagated
> to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
> queues are capable of supporting polling at the exact time of 'write sysfs/poll',
> cause any of them may change in future.
> 
> So why not start with the simplest approach(always capable of polling)
> which does meet normal bio based polling requirement?
> 

I find one scenario where this issue may matter. Consider the scenario
where HIPRI bios are submitted to DM device though **all** underlying
devices has been disabled for polling. In this case, a **valid** cookie
(pid of current submitting process) is still returned. Then if @spin of
the following blk_poll() is true, blk_poll() will get stuck in dead loop
because blk_mq_poll() always returns 0, since previously submitted bios
are all enqueued into IRQ hw queue.

Maybe you need to re-remove the bio from the poll context if the
returned cookie is BLK_QC_T_NONE?


Something like:

-static blk_qc_t __submit_bio_noacct(struct bio *bio)
+static blk_qc_t __submit_bio_noacct_ctx(struct bio *bio, struct
io_context *ioc)
 {
 	struct bio_list bio_list_on_stack[2];
 	blk_qc_t ret = BLK_QC_T_NONE;
@@ -1047,7 +1163,15 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
 		bio_list_on_stack[1] = bio_list_on_stack[0];
 		bio_list_init(&bio_list_on_stack[0]);

		if (ioc && queue_is_mq(q) && (bio->bi_opf & REQ_HIPRI)) {
			bool queued = blk_bio_poll_prep_submit(ioc, bio);

			ret = __submit_bio(bio);
+			if (queued && !blk_qc_t_valid(ret))
				/* TODO:remove bio from poll_context */
				
				bio_set_private_data(bio, ret);
		} else {
			ret = __submit_bio(bio);
		}


Then if you'd like fix this in this way, the returned value of
.submit_bio() of DM/MD also needs to return BLK_QC_T_NONE now. Currently
.submit_bio() of DM actually returns the cookie of the last split bio
(to underlying mq deivice).

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-17 14:06       ` JeffleXu
@ 2021-04-19  2:21         ` Ming Lei
  2021-04-19  5:40           ` JeffleXu
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-19  2:21 UTC (permalink / raw)
  To: JeffleXu; +Cc: snitzer, axboe, linux-block, dm-devel

On Sat, Apr 17, 2021 at 10:06:53PM +0800, JeffleXu wrote:
> 
> 
> On 4/16/21 5:07 PM, Ming Lei wrote:
> > On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
> >> Hi,
> >> How about this patch to remove the extra poll_capable() method?
> >>
> >> And the following 'dm: support IO polling for bio-based dm device' needs
> >> following change.
> >>
> >> ```
> >> +       /*
> >> +        * Check for request-based device is remained to
> >> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
> >> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
> >> +        * devices supporting polling.
> >> +        */
> >> +       if (__table_type_bio_based(t->type)) {
> >> +               if (dm_table_supports_poll(t)) {
> >> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
> >> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> >> +               }
> >> +               else {
> >> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> >> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
> >> +               }
> >> +       }
> >> ```
> > 
> > Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
> > DM, and the result is basically subset of treating DM as always being capable
> > of polling.
> > 
> > Also underlying queue change(either limits or flag) won't be propagated
> > to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
> > queues are capable of supporting polling at the exact time of 'write sysfs/poll',
> > cause any of them may change in future.
> > 
> > So why not start with the simplest approach(always capable of polling)
> > which does meet normal bio based polling requirement?
> > 
> 
> I find one scenario where this issue may matter. Consider the scenario
> where HIPRI bios are submitted to DM device though **all** underlying
> devices has been disabled for polling. In this case, a **valid** cookie
> (pid of current submitting process) is still returned. Then if @spin of
> the following blk_poll() is true, blk_poll() will get stuck in dead loop
> because blk_mq_poll() always returns 0, since previously submitted bios
> are all enqueued into IRQ hw queue.
> 
> Maybe you need to re-remove the bio from the poll context if the
> returned cookie is BLK_QC_T_NONE?

It won't be one issue, see blk_bio_poll_preprocess() which is called
from submit_bio_checks(), so any bio's HIPRI will be cleared if the
queue doesn't support POLL, that code does cover underlying bios.

> 
> 
> Something like:
> 
> -static blk_qc_t __submit_bio_noacct(struct bio *bio)
> +static blk_qc_t __submit_bio_noacct_ctx(struct bio *bio, struct
> io_context *ioc)
>  {
>  	struct bio_list bio_list_on_stack[2];
>  	blk_qc_t ret = BLK_QC_T_NONE;
> @@ -1047,7 +1163,15 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
>  		bio_list_on_stack[1] = bio_list_on_stack[0];
>  		bio_list_init(&bio_list_on_stack[0]);
> 
> 		if (ioc && queue_is_mq(q) && (bio->bi_opf & REQ_HIPRI)) {

REQ_HIPRI won't be set for underlying bios which queue doesn't support
poll, so this branch won't be reached. And the submission queue will
be empty, and blk_poll() for DM/MD(bio based queue) checks nothing, but
the polling won't be stopped until the iocb is completed. And this
handling is actually same with current polling on IRQ based queue.

> 			bool queued = blk_bio_poll_prep_submit(ioc, bio);
> 
> 			ret = __submit_bio(bio);
> +			if (queued && !blk_qc_t_valid(ret))
> 				/* TODO:remove bio from poll_context */
> 				
> 				bio_set_private_data(bio, ret);
> 		} else {
> 			ret = __submit_bio(bio);
> 		}
> 
> 
> Then if you'd like fix this in this way, the returned value of
> .submit_bio() of DM/MD also needs to return BLK_QC_T_NONE now. Currently
> .submit_bio() of DM actually returns the cookie of the last split bio
> (to underlying mq deivice).

I am a bit confused, this patch requires .submit_bio() of DM/MD(bio
based queue) to return either 0 or pid of the submission task.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-19  2:21         ` Ming Lei
@ 2021-04-19  5:40           ` JeffleXu
  2021-04-19 13:36             ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: JeffleXu @ 2021-04-19  5:40 UTC (permalink / raw)
  To: Ming Lei; +Cc: snitzer, axboe, linux-block, dm-devel



On 4/19/21 10:21 AM, Ming Lei wrote:
> On Sat, Apr 17, 2021 at 10:06:53PM +0800, JeffleXu wrote:
>>
>>
>> On 4/16/21 5:07 PM, Ming Lei wrote:
>>> On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
>>>> Hi,
>>>> How about this patch to remove the extra poll_capable() method?
>>>>
>>>> And the following 'dm: support IO polling for bio-based dm device' needs
>>>> following change.
>>>>
>>>> ```
>>>> +       /*
>>>> +        * Check for request-based device is remained to
>>>> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
>>>> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
>>>> +        * devices supporting polling.
>>>> +        */
>>>> +       if (__table_type_bio_based(t->type)) {
>>>> +               if (dm_table_supports_poll(t)) {
>>>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
>>>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
>>>> +               }
>>>> +               else {
>>>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
>>>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
>>>> +               }
>>>> +       }
>>>> ```
>>>
>>> Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
>>> DM, and the result is basically subset of treating DM as always being capable
>>> of polling.
>>>
>>> Also underlying queue change(either limits or flag) won't be propagated
>>> to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
>>> queues are capable of supporting polling at the exact time of 'write sysfs/poll',
>>> cause any of them may change in future.
>>>
>>> So why not start with the simplest approach(always capable of polling)
>>> which does meet normal bio based polling requirement?
>>>
>>
>> I find one scenario where this issue may matter. Consider the scenario
>> where HIPRI bios are submitted to DM device though **all** underlying
>> devices has been disabled for polling. In this case, a **valid** cookie
>> (pid of current submitting process) is still returned. Then if @spin of
>> the following blk_poll() is true, blk_poll() will get stuck in dead loop
>> because blk_mq_poll() always returns 0, since previously submitted bios
>> are all enqueued into IRQ hw queue.
>>
>> Maybe you need to re-remove the bio from the poll context if the
>> returned cookie is BLK_QC_T_NONE?
> 
> It won't be one issue, see blk_bio_poll_preprocess() which is called
> from submit_bio_checks(), so any bio's HIPRI will be cleared if the
> queue doesn't support POLL, that code does cover underlying bios.

Sorry there may be some confusion in my description. Let's discuss in
the following scenario: MD/DM advertise QUEUE_FLAG_POLL, though **all**
underlying devices are without QUEUE_FLAG_POLL. This scenario is
possible, if you want to enable MD/DM's polling without checking the
capability of underlying devices.

In this case, it seems that REQ_HIPRI is kept for both MD/DM and
underlying blk-mq devices. I used to think that REQ_HIPRI will be
cleared for underlying blk-mq deivces, but now it seems that REQ_HIPRI
of bios submitted to underlying blk-mq deivces won't be cleared, since
submit_bio_checks() is only called in the entry of submit_bio(), not in
the while() loop of __submit_bio_noacct_ctx(). Though these underlying
blk-mq devices don't support IO polling at all, or they all have been
disabled for polling, REQ_HIPRI bios are finally submitted down.

Or do I miss something?


> 
>>
>>
>> Something like:
>>
>> -static blk_qc_t __submit_bio_noacct(struct bio *bio)
>> +static blk_qc_t __submit_bio_noacct_ctx(struct bio *bio, struct
>> io_context *ioc)
>>  {
>>  	struct bio_list bio_list_on_stack[2];
>>  	blk_qc_t ret = BLK_QC_T_NONE;
>> @@ -1047,7 +1163,15 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
>>  		bio_list_on_stack[1] = bio_list_on_stack[0];
>>  		bio_list_init(&bio_list_on_stack[0]);
>>
>> 		if (ioc && queue_is_mq(q) && (bio->bi_opf & REQ_HIPRI)) {
> 
> REQ_HIPRI won't be set for underlying bios which queue doesn't support
> poll, so this branch won't be reached. 

Sorry I missed the '(bio->bi_opf & REQ_HIPRI)' condition here. Indeed
bio without REQ_HIPRI won't be enqueued into the poll_context.

> And the submission queue will
> be empty, and blk_poll() for DM/MD(bio based queue) checks nothing, but
> the polling won't be stopped until the iocb is completed. And this
> handling is actually same with current polling on IRQ based queue.
> 
>> 			bool queued = blk_bio_poll_prep_submit(ioc, bio);
>>
>> 			ret = __submit_bio(bio);
>> +			if (queued && !blk_qc_t_valid(ret))
>> 				/* TODO:remove bio from poll_context */
>> 				
>> 				bio_set_private_data(bio, ret);
>> 		} else {
>> 			ret = __submit_bio(bio);
>> 		}
>>
>>
>> Then if you'd like fix this in this way, the returned value of
>> .submit_bio() of DM/MD also needs to return BLK_QC_T_NONE now. Currently
>> .submit_bio() of DM actually returns the cookie of the last split bio
>> (to underlying mq deivice).
> 
> I am a bit confused, this patch requires .submit_bio() of DM/MD(bio
> based queue) to return either 0 or pid of the submission task.
> 
> 
> Thanks,
> Ming
> 

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-19  5:40           ` JeffleXu
@ 2021-04-19 13:36             ` Ming Lei
  2021-04-20  7:25               ` JeffleXu
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2021-04-19 13:36 UTC (permalink / raw)
  To: JeffleXu; +Cc: snitzer, axboe, linux-block, dm-devel

On Mon, Apr 19, 2021 at 01:40:21PM +0800, JeffleXu wrote:
> 
> 
> On 4/19/21 10:21 AM, Ming Lei wrote:
> > On Sat, Apr 17, 2021 at 10:06:53PM +0800, JeffleXu wrote:
> >>
> >>
> >> On 4/16/21 5:07 PM, Ming Lei wrote:
> >>> On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
> >>>> Hi,
> >>>> How about this patch to remove the extra poll_capable() method?
> >>>>
> >>>> And the following 'dm: support IO polling for bio-based dm device' needs
> >>>> following change.
> >>>>
> >>>> ```
> >>>> +       /*
> >>>> +        * Check for request-based device is remained to
> >>>> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
> >>>> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
> >>>> +        * devices supporting polling.
> >>>> +        */
> >>>> +       if (__table_type_bio_based(t->type)) {
> >>>> +               if (dm_table_supports_poll(t)) {
> >>>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
> >>>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> >>>> +               }
> >>>> +               else {
> >>>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> >>>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
> >>>> +               }
> >>>> +       }
> >>>> ```
> >>>
> >>> Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
> >>> DM, and the result is basically subset of treating DM as always being capable
> >>> of polling.
> >>>
> >>> Also underlying queue change(either limits or flag) won't be propagated
> >>> to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
> >>> queues are capable of supporting polling at the exact time of 'write sysfs/poll',
> >>> cause any of them may change in future.
> >>>
> >>> So why not start with the simplest approach(always capable of polling)
> >>> which does meet normal bio based polling requirement?
> >>>
> >>
> >> I find one scenario where this issue may matter. Consider the scenario
> >> where HIPRI bios are submitted to DM device though **all** underlying
> >> devices has been disabled for polling. In this case, a **valid** cookie
> >> (pid of current submitting process) is still returned. Then if @spin of
> >> the following blk_poll() is true, blk_poll() will get stuck in dead loop
> >> because blk_mq_poll() always returns 0, since previously submitted bios
> >> are all enqueued into IRQ hw queue.
> >>
> >> Maybe you need to re-remove the bio from the poll context if the
> >> returned cookie is BLK_QC_T_NONE?
> > 
> > It won't be one issue, see blk_bio_poll_preprocess() which is called
> > from submit_bio_checks(), so any bio's HIPRI will be cleared if the
> > queue doesn't support POLL, that code does cover underlying bios.
> 
> Sorry there may be some confusion in my description. Let's discuss in
> the following scenario: MD/DM advertise QUEUE_FLAG_POLL, though **all**
> underlying devices are without QUEUE_FLAG_POLL. This scenario is
> possible, if you want to enable MD/DM's polling without checking the
> capability of underlying devices.
> 
> In this case, it seems that REQ_HIPRI is kept for both MD/DM and
> underlying blk-mq devices. I used to think that REQ_HIPRI will be
> cleared for underlying blk-mq deivces, but now it seems that REQ_HIPRI
> of bios submitted to underlying blk-mq deivces won't be cleared, since
> submit_bio_checks() is only called in the entry of submit_bio(), not in
> the while() loop of __submit_bio_noacct_ctx(). Though these underlying
> blk-mq devices don't support IO polling at all, or they all have been
> disabled for polling, REQ_HIPRI bios are finally submitted down.
> 
> Or do I miss something?

No matter the loop, the bios are actually submitted to the
current->bio_list via submit_bio_noacct() or submit_bio().
'grep -r submit_bio drivers/md' will show you the point.

Also it is a bug if one underlying bio is submitted without being checked.

You can observe it by the following bpftrace when you run io_uring on dm
disk:

#include <linux/blkdev.h>

kprobe:blk_mq_submit_bio
/strncmp(((struct bio *)arg0)->bi_bdev->bd_disk->disk_name, "nvme", 4) == 0/
{
	$b = (struct bio *)arg0;
	$hipri = $b->bi_opf & (1 << __REQ_HIPRI);

	printf("%s %d: %s %lu %lu high prio %d\n", comm, tid, $b->bi_bdev->bd_disk->disk_name,
		$b->bi_iter.bi_sector, $b->bi_iter.bi_size, $hipri);
}


> 
> 
> > 
> >>
> >>
> >> Something like:
> >>
> >> -static blk_qc_t __submit_bio_noacct(struct bio *bio)
> >> +static blk_qc_t __submit_bio_noacct_ctx(struct bio *bio, struct
> >> io_context *ioc)
> >>  {
> >>  	struct bio_list bio_list_on_stack[2];
> >>  	blk_qc_t ret = BLK_QC_T_NONE;
> >> @@ -1047,7 +1163,15 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
> >>  		bio_list_on_stack[1] = bio_list_on_stack[0];
> >>  		bio_list_init(&bio_list_on_stack[0]);
> >>
> >> 		if (ioc && queue_is_mq(q) && (bio->bi_opf & REQ_HIPRI)) {
> > 
> > REQ_HIPRI won't be set for underlying bios which queue doesn't support
> > poll, so this branch won't be reached. 
> 
> Sorry I missed the '(bio->bi_opf & REQ_HIPRI)' condition here. Indeed
> bio without REQ_HIPRI won't be enqueued into the poll_context.

Even though these bios are queued, blk_poll() still can handle them
easily by just ignoring queues which aren't POLL_TYPE. However, I still
think their HIPRI will be cleared.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag
  2021-04-19 13:36             ` Ming Lei
@ 2021-04-20  7:25               ` JeffleXu
  0 siblings, 0 replies; 46+ messages in thread
From: JeffleXu @ 2021-04-20  7:25 UTC (permalink / raw)
  To: Ming Lei; +Cc: snitzer, axboe, linux-block, dm-devel



On 4/19/21 9:36 PM, Ming Lei wrote:
> On Mon, Apr 19, 2021 at 01:40:21PM +0800, JeffleXu wrote:
>>
>>
>> On 4/19/21 10:21 AM, Ming Lei wrote:
>>> On Sat, Apr 17, 2021 at 10:06:53PM +0800, JeffleXu wrote:
>>>>
>>>>
>>>> On 4/16/21 5:07 PM, Ming Lei wrote:
>>>>> On Fri, Apr 16, 2021 at 04:00:37PM +0800, Jeffle Xu wrote:
>>>>>> Hi,
>>>>>> How about this patch to remove the extra poll_capable() method?
>>>>>>
>>>>>> And the following 'dm: support IO polling for bio-based dm device' needs
>>>>>> following change.
>>>>>>
>>>>>> ```
>>>>>> +       /*
>>>>>> +        * Check for request-based device is remained to
>>>>>> +        * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
>>>>>> +        * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
>>>>>> +        * devices supporting polling.
>>>>>> +        */
>>>>>> +       if (__table_type_bio_based(t->type)) {
>>>>>> +               if (dm_table_supports_poll(t)) {
>>>>>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL_CAP, q);
>>>>>> +                       blk_queue_flag_set(QUEUE_FLAG_POLL, q);
>>>>>> +               }
>>>>>> +               else {
>>>>>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
>>>>>> +                       blk_queue_flag_clear(QUEUE_FLAG_POLL_CAP, q);
>>>>>> +               }
>>>>>> +       }
>>>>>> ```
>>>>>
>>>>> Frankly speaking, I don't see any value of using QUEUE_FLAG_POLL_CAP for
>>>>> DM, and the result is basically subset of treating DM as always being capable
>>>>> of polling.
>>>>>
>>>>> Also underlying queue change(either limits or flag) won't be propagated
>>>>> to DM/MD automatically. Strictly speaking it doesn't matter if all underlying
>>>>> queues are capable of supporting polling at the exact time of 'write sysfs/poll',
>>>>> cause any of them may change in future.
>>>>>
>>>>> So why not start with the simplest approach(always capable of polling)
>>>>> which does meet normal bio based polling requirement?
>>>>>
>>>>
>>>> I find one scenario where this issue may matter. Consider the scenario
>>>> where HIPRI bios are submitted to DM device though **all** underlying
>>>> devices has been disabled for polling. In this case, a **valid** cookie
>>>> (pid of current submitting process) is still returned. Then if @spin of
>>>> the following blk_poll() is true, blk_poll() will get stuck in dead loop
>>>> because blk_mq_poll() always returns 0, since previously submitted bios
>>>> are all enqueued into IRQ hw queue.
>>>>
>>>> Maybe you need to re-remove the bio from the poll context if the
>>>> returned cookie is BLK_QC_T_NONE?
>>>
>>> It won't be one issue, see blk_bio_poll_preprocess() which is called
>>> from submit_bio_checks(), so any bio's HIPRI will be cleared if the
>>> queue doesn't support POLL, that code does cover underlying bios.
>>
>> Sorry there may be some confusion in my description. Let's discuss in
>> the following scenario: MD/DM advertise QUEUE_FLAG_POLL, though **all**
>> underlying devices are without QUEUE_FLAG_POLL. This scenario is
>> possible, if you want to enable MD/DM's polling without checking the
>> capability of underlying devices.
>>
>> In this case, it seems that REQ_HIPRI is kept for both MD/DM and
>> underlying blk-mq devices. I used to think that REQ_HIPRI will be
>> cleared for underlying blk-mq deivces, but now it seems that REQ_HIPRI
>> of bios submitted to underlying blk-mq deivces won't be cleared, since
>> submit_bio_checks() is only called in the entry of submit_bio(), not in
>> the while() loop of __submit_bio_noacct_ctx(). Though these underlying
>> blk-mq devices don't support IO polling at all, or they all have been
>> disabled for polling, REQ_HIPRI bios are finally submitted down.
>>
>> Or do I miss something?
> 
> No matter the loop, the bios are actually submitted to the
> current->bio_list via submit_bio_noacct() or submit_bio().
> 'grep -r submit_bio drivers/md' will show you the point.

Oops. I forgot that. Thanks and sorry for the noise.

So if that's the case, it seems that patch 11/12 are not needed anymore.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  2:19 [PATCH V5 00/12] block: support bio based io polling Ming Lei
2021-04-01  2:19 ` [PATCH V5 01/12] block: add helper of blk_queue_poll Ming Lei
2021-04-01  2:19 ` [PATCH V5 02/12] block: add one helper to free io_context Ming Lei
2021-04-01  2:19 ` [PATCH V5 03/12] block: create io poll context for submission and poll task Ming Lei
2021-04-12 10:19   ` Christoph Hellwig
2021-04-01  2:19 ` [PATCH V5 04/12] block: add req flag of REQ_POLL_CTX Ming Lei
2021-04-01  2:19 ` [PATCH V5 05/12] block: add new field into 'struct bvec_iter' Ming Lei
2021-04-12  9:26   ` Christoph Hellwig
2021-04-13  9:36     ` Ming Lei
2021-04-01  2:19 ` [PATCH V5 06/12] block/mq: extract one helper function polling hw queue Ming Lei
2021-04-12  9:29   ` Christoph Hellwig
2021-04-01  2:19 ` [PATCH V5 07/12] block: prepare for supporting bio_list via other link Ming Lei
2021-04-12 10:18   ` Christoph Hellwig
2021-04-12 11:37     ` Ming Lei
2021-04-01  2:19 ` [PATCH V5 08/12] block: use per-task poll context to implement bio based io polling Ming Lei
2021-04-12  9:54   ` Christoph Hellwig
2021-04-12 10:20     ` Ming Lei
2021-04-12 10:29       ` Christoph Hellwig
2021-04-12 11:42         ` Ming Lei
2021-04-12 10:16   ` Christoph Hellwig
2021-04-12 10:37     ` Ming Lei
2021-04-01  2:19 ` [PATCH V5 09/12] blk-mq: limit hw queues to be polled in each blk_poll() Ming Lei
2021-04-01  2:19 ` [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue Ming Lei
2021-04-12 12:52   ` Jens Axboe
2021-04-01  2:19 ` [PATCH V5 11/12] block: add poll_capable method to support bio-based IO polling Ming Lei
2021-04-12  9:38   ` Christoph Hellwig
2021-04-14  8:38     ` JeffleXu
2021-04-14 11:24       ` Ming Lei
2021-04-15  1:34         ` JeffleXu
2021-04-15  7:43           ` Ming Lei
2021-04-15  9:21             ` JeffleXu
2021-04-15 10:06               ` Ming Lei
2021-04-15 11:21                 ` JeffleXu
2021-04-15 13:08                   ` Ming Lei
2021-04-16  8:00   ` [PATCH] block: introduce QUEUE_FLAG_POLL_CAP flag Jeffle Xu
2021-04-16  8:42     ` JeffleXu
2021-04-16  9:07     ` Ming Lei
2021-04-16 10:20       ` JeffleXu
2021-04-17 14:06       ` JeffleXu
2021-04-19  2:21         ` Ming Lei
2021-04-19  5:40           ` JeffleXu
2021-04-19 13:36             ` Ming Lei
2021-04-20  7:25               ` JeffleXu
2021-04-01  2:19 ` [PATCH V5 12/12] dm: support IO polling for bio-based dm device Ming Lei
2021-04-09 15:39 ` [PATCH V5 00/12] block: support bio based io polling Ming Lei
2021-04-12  9:46 ` Christoph Hellwig

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git