linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] block/dm: support bio polling
@ 2021-06-16 13:05 Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 1/4] block: add helper of blk_queue_poll Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Ming Lei @ 2021-06-16 13:05 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer
  Cc: linux-block, Jeffle Xu, dm-devel, Hannes Reinecke,
	Christoph Hellwig, Ming Lei

Hello Guys,

Based on Christoph's bio based polling model[1], implement DM bio polling
with one very simple approach.

Patch 1 adds helper of blk_queue_poll().

Patch 2 reuses .bi_next to bio driver for storing driver data.

Patch 3 adds .bio_poll() callback to block_device_operations, so bio
driver can implement its own logic for io polling.

Patch 4 implements bio polling for device mapper.

Any comments are welcome.

[1] switch block layer polling to a bio based model v4
https://lore.kernel.org/linux-block/20210615160619.GA32435@lst.de/T/#t


Ming Lei (4):
  block: add helper of blk_queue_poll
  block: add field of .bi_bio_drv_data to bio
  block: add ->poll_bio to block_device_operations
  dm: support bio polling

 block/blk-core.c          | 21 +++++++++-----
 block/blk-sysfs.c         |  4 +--
 block/genhd.c             |  3 ++
 drivers/md/dm-table.c     | 24 ++++++++++++++++
 drivers/md/dm.c           | 59 ++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/core.c  |  2 +-
 include/linux/blk_types.h | 11 +++++++-
 include/linux/blkdev.h    |  3 ++
 8 files changed, 112 insertions(+), 15 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/4] block: add helper of blk_queue_poll
  2021-06-16 13:05 [RFC PATCH 0/4] block/dm: support bio polling Ming Lei
@ 2021-06-16 13:05 ` Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 2/4] block: add field of .bi_bio_drv_data to bio Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-06-16 13:05 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer
  Cc: linux-block, Jeffle Xu, dm-devel, Hannes Reinecke,
	Christoph Hellwig, Ming Lei, Chaitanya Kulkarni

There has been 3 users, and will be more, so add one such helper.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c         | 5 ++---
 block/blk-sysfs.c        | 4 ++--
 drivers/nvme/host/core.c | 2 +-
 include/linux/blkdev.h   | 1 +
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 531176578221..1e24c71c6738 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -835,7 +835,7 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		}
 	}
 
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_queue_poll(q))
 		bio->bi_opf &= ~REQ_POLLED;
 
 	switch (bio_op(bio)) {
@@ -1117,8 +1117,7 @@ int bio_poll(struct bio *bio, unsigned int flags)
 	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
 	int ret;
 
-	if (cookie == BLK_QC_T_NONE ||
-	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (cookie == BLK_QC_T_NONE || !blk_queue_poll(q))
 		return 0;
 
 	if (current->plug)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f78e73ca6091..93dcf2dfaafd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -422,13 +422,13 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
 
 static ssize_t queue_poll_show(struct request_queue *q, char *page)
 {
-	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
+	return queue_var_show(blk_queue_poll(q), page);
 }
 
 static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 				size_t count)
 {
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_queue_poll(q))
 		return -EINVAL;
 	pr_info_ratelimited("writes to the poll attribute are ignored.\n");
 	pr_info_ratelimited("please use driver specific parameters instead.\n");
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fe0b8da3de7f..e31c7704ef4d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1025,7 +1025,7 @@ static void nvme_execute_rq_polled(struct request_queue *q,
 {
 	DECLARE_COMPLETION_ONSTACK(wait);
 
-	WARN_ON_ONCE(!test_bit(QUEUE_FLAG_POLL, &q->queue_flags));
+	WARN_ON_ONCE(!blk_queue_poll(q));
 
 	rq->cmd_flags |= REQ_POLLED;
 	rq->end_io_data = &wait;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 561b04117bd4..fc0ba0b80776 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -677,6 +677,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
+#define blk_queue_poll(q)	test_bit(QUEUE_FLAG_POLL, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/4] block: add field of .bi_bio_drv_data to bio
  2021-06-16 13:05 [RFC PATCH 0/4] block/dm: support bio polling Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 1/4] block: add helper of blk_queue_poll Ming Lei
@ 2021-06-16 13:05 ` Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 3/4] block: add ->poll_bio to block_device_operations Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 4/4] dm: support bio polling Ming Lei
  3 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-06-16 13:05 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer
  Cc: linux-block, Jeffle Xu, dm-devel, Hannes Reinecke,
	Christoph Hellwig, Ming Lei

After bio is submitted, bio->bi_next is used for IO merge for request
based queue only.

Reuse the filed for bio based driver for storing driver data, and name
it as .bi_bio_drv_data.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/blk_types.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 6e6c2af48d74..acb45213338c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -220,7 +220,16 @@ typedef unsigned int blk_qc_t;
  * stacking drivers)
  */
 struct bio {
-	struct bio		*bi_next;	/* request queue link */
+	union {
+		/* request queue link */
+		struct bio		*bi_next;
+
+		/*
+		 * once bio is submitted to bio based queue, driver can use
+		 * this field to store its own data
+		 */
+		void			*bi_bio_drv_data;
+	};
 	struct block_device	*bi_bdev;
 	unsigned int		bi_opf;		/* bottom bits req flags,
 						 * top bits REQ_OP. Use
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/4] block: add ->poll_bio to block_device_operations
  2021-06-16 13:05 [RFC PATCH 0/4] block/dm: support bio polling Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 1/4] block: add helper of blk_queue_poll Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 2/4] block: add field of .bi_bio_drv_data to bio Ming Lei
@ 2021-06-16 13:05 ` Ming Lei
  2021-06-16 13:05 ` [RFC PATCH 4/4] dm: support bio polling Ming Lei
  3 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-06-16 13:05 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer
  Cc: linux-block, Jeffle Xu, dm-devel, Hannes Reinecke,
	Christoph Hellwig, Ming Lei

Prepare for supporting IO polling for bio based driver.

Add ->poll_bio callback so that bio driver can provide their own logic
for polling bio.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 18 +++++++++++++-----
 block/genhd.c          |  3 +++
 include/linux/blkdev.h |  2 ++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1e24c71c6738..a1552ec8d608 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1113,11 +1113,13 @@ EXPORT_SYMBOL(submit_bio);
  */
 int bio_poll(struct bio *bio, unsigned int flags)
 {
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct gendisk *disk = bio->bi_bdev->bd_disk;
+	struct request_queue *q = disk->queue;
 	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
 	int ret;
 
-	if (cookie == BLK_QC_T_NONE || !blk_queue_poll(q))
+	if ((queue_is_mq(q) && cookie == BLK_QC_T_NONE) ||
+			!blk_queue_poll(q))
 		return 0;
 
 	if (current->plug)
@@ -1125,10 +1127,16 @@ int bio_poll(struct bio *bio, unsigned int flags)
 
 	if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT))
 		return 0;
-	if (WARN_ON_ONCE(!queue_is_mq(q)))
-		ret = 0;	/* not yet implemented, should not happen */
-	else
+	if (!queue_is_mq(q)) {
+		if (disk->fops->poll_bio) {
+			ret = disk->fops->poll_bio(bio, flags);
+		} else {
+			WARN_ON_ONCE(1);
+			ret = 0;
+		}
+	} else {
 		ret = blk_mq_poll(q, cookie, flags);
+	}
 	blk_queue_exit(q);
 	return ret;
 }
diff --git a/block/genhd.c b/block/genhd.c
index 5f5628216295..042dfa5b3f79 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -471,6 +471,9 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
 {
 	int ret;
 
+	/* ->poll_bio is only for bio based driver */
+	WARN_ON_ONCE(queue_is_mq(disk->queue) && disk->fops->poll_bio);
+
 	/*
 	 * The disk queue should now be all set with enough information about
 	 * the device for the elevator code to pick an adequate default
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fc0ba0b80776..6da6fb120148 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1858,6 +1858,8 @@ static inline void blk_ksm_unregister(struct request_queue *q) { }
 
 struct block_device_operations {
 	void (*submit_bio)(struct bio *bio);
+	/* ->poll_bio is for bio driver only */
+	int (*poll_bio)(struct bio *bio, unsigned int flags);
 	int (*open) (struct block_device *, fmode_t);
 	void (*release) (struct gendisk *, fmode_t);
 	int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 4/4] dm: support bio polling
  2021-06-16 13:05 [RFC PATCH 0/4] block/dm: support bio polling Ming Lei
                   ` (2 preceding siblings ...)
  2021-06-16 13:05 ` [RFC PATCH 3/4] block: add ->poll_bio to block_device_operations Ming Lei
@ 2021-06-16 13:05 ` Ming Lei
  2021-06-16 16:05   ` Mike Snitzer
  3 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-06-16 13:05 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer
  Cc: linux-block, Jeffle Xu, dm-devel, Hannes Reinecke,
	Christoph Hellwig, Ming Lei

Support bio(REQ_POLLED) polling in the following approach:

1) setup one list in instance of 'struct dm_io', adds every 'struct
dm_target_io' instance cloned for current dm bio into this list;
store the list in 1) into bio->bi_bio_drv_data

2) hold one refcnt on io->io_count after submitting this dm bio with
REQ_POLLED

4) implement .poll_bio() callback, and iterate over the list in 1) and
polled on each ->clone of 'dm_target_io' instance; call dec_pending()
if all target ios are done in .poll_bio().

4) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
which is based on Jeffle's previous patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-table.c | 24 ++++++++++++++++++
 drivers/md/dm.c       | 59 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ee47a332b462..b14b379442d2 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1491,6 +1491,12 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector)
 	return &t->targets[(KEYS_PER_NODE * n) + k];
 }
 
+static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev,
+				   sector_t start, sector_t len, void *data)
+{
+	return !blk_queue_poll(bdev_get_queue(dev->bdev));
+}
+
 /*
  * type->iterate_devices() should be called when the sanity check needs to
  * iterate and check all underlying data devices. iterate_devices() will
@@ -1541,6 +1547,11 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev,
 	return 0;
 }
 
+static int dm_table_supports_poll(struct dm_table *t)
+{
+	return !dm_table_any_dev_attr(t, device_not_poll_capable, NULL);
+}
+
 /*
  * Check whether a table has no data devices attached using each
  * target's iterate_devices method.
@@ -2078,6 +2089,19 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 
 	dm_update_keyslot_manager(q, t);
 	blk_queue_update_readahead(q);
+
+	/*
+	 * Check for request-based device is remained to
+	 * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
+	 * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
+	 * devices supporting polling.
+	 */
+	if (__table_type_bio_based(t->type)) {
+		if (dm_table_supports_poll(t))
+			blk_queue_flag_set(QUEUE_FLAG_POLL, q);
+		else
+			blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
+	}
 }
 
 unsigned int dm_table_get_num_targets(struct dm_table *t)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 363f12a285ce..0a0e4a38f435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -84,6 +84,7 @@ struct dm_target_io {
 	struct dm_target *ti;
 	unsigned target_bio_nr;
 	unsigned *len_ptr;
+	struct list_head list;
 	bool inside_dm_io;
 	struct bio clone;
 };
@@ -99,6 +100,7 @@ struct dm_io {
 	blk_status_t status;
 	atomic_t io_count;
 	struct bio *orig_bio;
+	struct list_head poll_head;
 	unsigned long start_time;
 	spinlock_t endio_lock;
 	struct dm_stats_aux stats_aux;
@@ -655,6 +657,11 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	io->md = md;
 	spin_lock_init(&io->endio_lock);
 
+	if (bio->bi_opf & REQ_POLLED) {
+		bio->bi_bio_drv_data = io;
+		INIT_LIST_HEAD(&io->poll_head);
+	}
+
 	start_io_acct(io);
 
 	return io;
@@ -692,6 +699,8 @@ static struct dm_target_io *alloc_tio(struct clone_info *ci, struct dm_target *t
 
 static void free_tio(struct dm_target_io *tio)
 {
+	list_del_init(&tio->list);
+
 	if (tio->inside_dm_io)
 		return;
 	bio_put(&tio->clone);
@@ -936,10 +945,15 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
 		io_error = io->status;
 		bio = io->orig_bio;
 		end_io_acct(io);
+
 		free_io(md, io);
 
-		if (io_error == BLK_STS_DM_REQUEUE)
+		if (io_error == BLK_STS_DM_REQUEUE) {
+			/* not poll any more in case of requeue */
+			if (bio->bi_opf & REQ_POLLED)
+				bio->bi_opf &= ~REQ_POLLED;
 			return;
+		}
 
 		if ((bio->bi_opf & REQ_PREFLUSH) && bio->bi_iter.bi_size) {
 			/*
@@ -1043,7 +1057,9 @@ static void clone_endio(struct bio *bio)
 		up(&md->swap_bios_semaphore);
 	}
 
-	free_tio(tio);
+	/* Any cloned bio submitted as POLLED, free them all after dm_io is done */
+	if (list_empty(&tio->list))
+		free_tio(tio);
 	dec_pending(io, error);
 }
 
@@ -1300,6 +1316,11 @@ static void __map_bio(struct dm_target_io *tio)
 	struct dm_io *io = tio->io;
 	struct dm_target *ti = tio->ti;
 
+	if (clone->bi_opf & REQ_POLLED)
+		list_add_tail(&tio->list, &io->poll_head);
+	else
+		INIT_LIST_HEAD(&tio->list);
+
 	clone->bi_end_io = clone_endio;
 
 	/*
@@ -1666,8 +1687,9 @@ static void __split_and_process_bio(struct mapped_device *md,
 		}
 	}
 
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
+	/* drop the extra reference count for non-POLLED bio */
+	if (!(bio->bi_opf & REQ_POLLED))
+		dec_pending(ci.io, errno_to_blk_status(error));
 }
 
 static void dm_submit_bio(struct bio *bio)
@@ -1707,6 +1729,34 @@ static void dm_submit_bio(struct bio *bio)
 	dm_put_live_table(md, srcu_idx);
 }
 
+static int dm_poll_bio(struct bio *bio, unsigned int flags)
+{
+	struct dm_io *io = bio->bi_bio_drv_data;
+	struct dm_target_io *tio;
+
+	if (!(bio->bi_opf & REQ_POLLED) || !io)
+		return 0;
+
+	list_for_each_entry(tio, &io->poll_head, list)
+		bio_poll(&tio->clone, flags);
+
+	/* bio_poll holds the last reference */
+	if (atomic_read(&io->io_count) == 1) {
+		/* free all target IOs submitted as POLLED */
+		while (!list_empty(&io->poll_head)) {
+			struct dm_target_io *tio =
+				list_entry(io->poll_head.next,
+					struct dm_target_io, list);
+			free_tio(tio);
+		}
+		bio->bi_bio_drv_data = NULL;
+		dec_pending(io, 0);
+		return 1;
+	}
+
+	return 0;
+}
+
 /*-----------------------------------------------------------------
  * An IDR is used to keep track of allocated minor numbers.
  *---------------------------------------------------------------*/
@@ -3121,6 +3171,7 @@ static const struct pr_ops dm_pr_ops = {
 
 static const struct block_device_operations dm_blk_dops = {
 	.submit_bio = dm_submit_bio,
+	.poll_bio = dm_poll_bio,
 	.open = dm_blk_open,
 	.release = dm_blk_close,
 	.ioctl = dm_blk_ioctl,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 4/4] dm: support bio polling
  2021-06-16 13:05 ` [RFC PATCH 4/4] dm: support bio polling Ming Lei
@ 2021-06-16 16:05   ` Mike Snitzer
  2021-06-17  2:14     ` Ming Lei
  2021-06-18 14:33     ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Mike Snitzer @ 2021-06-16 16:05 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jeffle Xu, dm-devel, Hannes Reinecke

On Wed, Jun 16 2021 at  9:05P -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> Support bio(REQ_POLLED) polling in the following approach:
> 
> 1) setup one list in instance of 'struct dm_io', adds every 'struct
> dm_target_io' instance cloned for current dm bio into this list;
> store the list in 1) into bio->bi_bio_drv_data
> 
> 2) hold one refcnt on io->io_count after submitting this dm bio with
> REQ_POLLED
> 
> 4) implement .poll_bio() callback, and iterate over the list in 1) and
> polled on each ->clone of 'dm_target_io' instance; call dec_pending()
> if all target ios are done in .poll_bio().
> 
> 4) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
> which is based on Jeffle's previous patch.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Thanks for refreshing this DM bio polling support Ming.

In general I'm really happy to see polling switch over to using bios,
nice job Christoph! Are you hoping for all this to land in time for
5.14 merge?

Once Ming responds to my review inlined below, and I Acked-by his
set, would you be willing to fold it at the end of your patchset so
that I don't need to rebase on block to get these changes in, etc?

Mike

> ---
>  drivers/md/dm-table.c | 24 ++++++++++++++++++
>  drivers/md/dm.c       | 59 ++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 79 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index ee47a332b462..b14b379442d2 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1491,6 +1491,12 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector)
>  	return &t->targets[(KEYS_PER_NODE * n) + k];
>  }
>  
> +static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev,
> +				   sector_t start, sector_t len, void *data)
> +{
> +	return !blk_queue_poll(bdev_get_queue(dev->bdev));
> +}
> +
>  /*
>   * type->iterate_devices() should be called when the sanity check needs to
>   * iterate and check all underlying data devices. iterate_devices() will
> @@ -1541,6 +1547,11 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev,
>  	return 0;
>  }
>  
> +static int dm_table_supports_poll(struct dm_table *t)
> +{
> +	return !dm_table_any_dev_attr(t, device_not_poll_capable, NULL);
> +}
> +
>  /*
>   * Check whether a table has no data devices attached using each
>   * target's iterate_devices method.
> @@ -2078,6 +2089,19 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
>  
>  	dm_update_keyslot_manager(q, t);
>  	blk_queue_update_readahead(q);
> +
> +	/*
> +	 * Check for request-based device is remained to
> +	 * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
> +	 * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
> +	 * devices supporting polling.
> +	 */
> +	if (__table_type_bio_based(t->type)) {
> +		if (dm_table_supports_poll(t))
> +			blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> +		else
> +			blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> +	}
>  }
>  
>  unsigned int dm_table_get_num_targets(struct dm_table *t)
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 363f12a285ce..0a0e4a38f435 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -84,6 +84,7 @@ struct dm_target_io {
>  	struct dm_target *ti;
>  	unsigned target_bio_nr;
>  	unsigned *len_ptr;
> +	struct list_head list;
>  	bool inside_dm_io;
>  	struct bio clone;
>  };
> @@ -99,6 +100,7 @@ struct dm_io {
>  	blk_status_t status;
>  	atomic_t io_count;
>  	struct bio *orig_bio;
> +	struct list_head poll_head;
>  	unsigned long start_time;
>  	spinlock_t endio_lock;
>  	struct dm_stats_aux stats_aux;
> @@ -655,6 +657,11 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
>  	io->md = md;
>  	spin_lock_init(&io->endio_lock);
>  
> +	if (bio->bi_opf & REQ_POLLED) {
> +		bio->bi_bio_drv_data = io;
> +		INIT_LIST_HEAD(&io->poll_head);
> +	}
> +
>  	start_io_acct(io);
>  
>  	return io;
> @@ -692,6 +699,8 @@ static struct dm_target_io *alloc_tio(struct clone_info *ci, struct dm_target *t
>  
>  static void free_tio(struct dm_target_io *tio)
>  {
> +	list_del_init(&tio->list);
> +
>  	if (tio->inside_dm_io)
>  		return;
>  	bio_put(&tio->clone);
> @@ -936,10 +945,15 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
>  		io_error = io->status;
>  		bio = io->orig_bio;
>  		end_io_acct(io);
> +
>  		free_io(md, io);
>  
> -		if (io_error == BLK_STS_DM_REQUEUE)
> +		if (io_error == BLK_STS_DM_REQUEUE) {
> +			/* not poll any more in case of requeue */
> +			if (bio->bi_opf & REQ_POLLED)
> +				bio->bi_opf &= ~REQ_POLLED;
>  			return;
> +		}
>  
>  		if ((bio->bi_opf & REQ_PREFLUSH) && bio->bi_iter.bi_size) {
>  			/*
> @@ -1043,7 +1057,9 @@ static void clone_endio(struct bio *bio)
>  		up(&md->swap_bios_semaphore);
>  	}
>  
> -	free_tio(tio);
> +	/* Any cloned bio submitted as POLLED, free them all after dm_io is done */
> +	if (list_empty(&tio->list))
> +		free_tio(tio);
>  	dec_pending(io, error);
>  }
>  
> @@ -1300,6 +1316,11 @@ static void __map_bio(struct dm_target_io *tio)
>  	struct dm_io *io = tio->io;
>  	struct dm_target *ti = tio->ti;
>  
> +	if (clone->bi_opf & REQ_POLLED)
> +		list_add_tail(&tio->list, &io->poll_head);
> +	else
> +		INIT_LIST_HEAD(&tio->list);
> +

Why not INIT_LIST_HEAD() at end of alloc_tio()? Shouldn't that be done
even if you have this else claue here because you can clear REQ_POLLED
on BLK_STS_DM_REQUEUE? (otherwise you're calling list_add_tail on list
that wasn't ever INIT_LIST_HEAD).

>  	clone->bi_end_io = clone_endio;
>  
>  	/*
> @@ -1666,8 +1687,9 @@ static void __split_and_process_bio(struct mapped_device *md,
>  		}
>  	}
>  
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> +	/* drop the extra reference count for non-POLLED bio */
> +	if (!(bio->bi_opf & REQ_POLLED))
> +		dec_pending(ci.io, errno_to_blk_status(error));
>  }
>  
>  static void dm_submit_bio(struct bio *bio)
> @@ -1707,6 +1729,34 @@ static void dm_submit_bio(struct bio *bio)
>  	dm_put_live_table(md, srcu_idx);
>  }
>  
> +static int dm_poll_bio(struct bio *bio, unsigned int flags)
> +{
> +	struct dm_io *io = bio->bi_bio_drv_data;
> +	struct dm_target_io *tio;
> +
> +	if (!(bio->bi_opf & REQ_POLLED) || !io)
> +		return 0;

Should this be a WARN_ON()? Cannot see why this would ever happen
other than a bug?  Or is there some race that makes it more likely?

> +	list_for_each_entry(tio, &io->poll_head, list)
> +		bio_poll(&tio->clone, flags);
> +
> +	/* bio_poll holds the last reference */
> +	if (atomic_read(&io->io_count) == 1) {
> +		/* free all target IOs submitted as POLLED */
> +		while (!list_empty(&io->poll_head)) {
> +			struct dm_target_io *tio =
> +				list_entry(io->poll_head.next,
> +					struct dm_target_io, list);
> +			free_tio(tio);
> +		}
> +		bio->bi_bio_drv_data = NULL;
> +		dec_pending(io, 0);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
>  /*-----------------------------------------------------------------
>   * An IDR is used to keep track of allocated minor numbers.
>   *---------------------------------------------------------------*/
> @@ -3121,6 +3171,7 @@ static const struct pr_ops dm_pr_ops = {
>  
>  static const struct block_device_operations dm_blk_dops = {
>  	.submit_bio = dm_submit_bio,
> +	.poll_bio = dm_poll_bio,
>  	.open = dm_blk_open,
>  	.release = dm_blk_close,
>  	.ioctl = dm_blk_ioctl,
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 4/4] dm: support bio polling
  2021-06-16 16:05   ` Mike Snitzer
@ 2021-06-17  2:14     ` Ming Lei
  2021-06-18 14:33     ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-06-17  2:14 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Jeffle Xu, dm-devel,
	Hannes Reinecke

On Wed, Jun 16, 2021 at 12:05:13PM -0400, Mike Snitzer wrote:
> On Wed, Jun 16 2021 at  9:05P -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > Support bio(REQ_POLLED) polling in the following approach:
> > 
> > 1) setup one list in instance of 'struct dm_io', adds every 'struct
> > dm_target_io' instance cloned for current dm bio into this list;
> > store the list in 1) into bio->bi_bio_drv_data
> > 
> > 2) hold one refcnt on io->io_count after submitting this dm bio with
> > REQ_POLLED
> > 
> > 4) implement .poll_bio() callback, and iterate over the list in 1) and
> > polled on each ->clone of 'dm_target_io' instance; call dec_pending()
> > if all target ios are done in .poll_bio().
> > 
> > 4) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
> > which is based on Jeffle's previous patch.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> Thanks for refreshing this DM bio polling support Ming.
> 
> In general I'm really happy to see polling switch over to using bios,
> nice job Christoph! Are you hoping for all this to land in time for
> 5.14 merge?
> 
> Once Ming responds to my review inlined below, and I Acked-by his
> set, would you be willing to fold it at the end of your patchset so
> that I don't need to rebase on block to get these changes in, etc?
> 
> Mike
> 
> > ---
> >  drivers/md/dm-table.c | 24 ++++++++++++++++++
> >  drivers/md/dm.c       | 59 ++++++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 79 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index ee47a332b462..b14b379442d2 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -1491,6 +1491,12 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector)
> >  	return &t->targets[(KEYS_PER_NODE * n) + k];
> >  }
> >  
> > +static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev,
> > +				   sector_t start, sector_t len, void *data)
> > +{
> > +	return !blk_queue_poll(bdev_get_queue(dev->bdev));
> > +}
> > +
> >  /*
> >   * type->iterate_devices() should be called when the sanity check needs to
> >   * iterate and check all underlying data devices. iterate_devices() will
> > @@ -1541,6 +1547,11 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev,
> >  	return 0;
> >  }
> >  
> > +static int dm_table_supports_poll(struct dm_table *t)
> > +{
> > +	return !dm_table_any_dev_attr(t, device_not_poll_capable, NULL);
> > +}
> > +
> >  /*
> >   * Check whether a table has no data devices attached using each
> >   * target's iterate_devices method.
> > @@ -2078,6 +2089,19 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> >  
> >  	dm_update_keyslot_manager(q, t);
> >  	blk_queue_update_readahead(q);
> > +
> > +	/*
> > +	 * Check for request-based device is remained to
> > +	 * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
> > +	 * For bio-based device, only set QUEUE_FLAG_POLL when all underlying
> > +	 * devices supporting polling.
> > +	 */
> > +	if (__table_type_bio_based(t->type)) {
> > +		if (dm_table_supports_poll(t))
> > +			blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> > +		else
> > +			blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> > +	}
> >  }
> >  
> >  unsigned int dm_table_get_num_targets(struct dm_table *t)
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 363f12a285ce..0a0e4a38f435 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -84,6 +84,7 @@ struct dm_target_io {
> >  	struct dm_target *ti;
> >  	unsigned target_bio_nr;
> >  	unsigned *len_ptr;
> > +	struct list_head list;
> >  	bool inside_dm_io;
> >  	struct bio clone;
> >  };
> > @@ -99,6 +100,7 @@ struct dm_io {
> >  	blk_status_t status;
> >  	atomic_t io_count;
> >  	struct bio *orig_bio;
> > +	struct list_head poll_head;
> >  	unsigned long start_time;
> >  	spinlock_t endio_lock;
> >  	struct dm_stats_aux stats_aux;
> > @@ -655,6 +657,11 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
> >  	io->md = md;
> >  	spin_lock_init(&io->endio_lock);
> >  
> > +	if (bio->bi_opf & REQ_POLLED) {
> > +		bio->bi_bio_drv_data = io;
> > +		INIT_LIST_HEAD(&io->poll_head);
> > +	}
> > +
> >  	start_io_acct(io);
> >  
> >  	return io;
> > @@ -692,6 +699,8 @@ static struct dm_target_io *alloc_tio(struct clone_info *ci, struct dm_target *t
> >  
> >  static void free_tio(struct dm_target_io *tio)
> >  {
> > +	list_del_init(&tio->list);
> > +
> >  	if (tio->inside_dm_io)
> >  		return;
> >  	bio_put(&tio->clone);
> > @@ -936,10 +945,15 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
> >  		io_error = io->status;
> >  		bio = io->orig_bio;
> >  		end_io_acct(io);
> > +
> >  		free_io(md, io);
> >  
> > -		if (io_error == BLK_STS_DM_REQUEUE)
> > +		if (io_error == BLK_STS_DM_REQUEUE) {
> > +			/* not poll any more in case of requeue */
> > +			if (bio->bi_opf & REQ_POLLED)
> > +				bio->bi_opf &= ~REQ_POLLED;
> >  			return;
> > +		}
> >  
> >  		if ((bio->bi_opf & REQ_PREFLUSH) && bio->bi_iter.bi_size) {
> >  			/*
> > @@ -1043,7 +1057,9 @@ static void clone_endio(struct bio *bio)
> >  		up(&md->swap_bios_semaphore);
> >  	}
> >  
> > -	free_tio(tio);
> > +	/* Any cloned bio submitted as POLLED, free them all after dm_io is done */
> > +	if (list_empty(&tio->list))
> > +		free_tio(tio);
> >  	dec_pending(io, error);
> >  }
> >  
> > @@ -1300,6 +1316,11 @@ static void __map_bio(struct dm_target_io *tio)
> >  	struct dm_io *io = tio->io;
> >  	struct dm_target *ti = tio->ti;
> >  
> > +	if (clone->bi_opf & REQ_POLLED)
> > +		list_add_tail(&tio->list, &io->poll_head);
> > +	else
> > +		INIT_LIST_HEAD(&tio->list);
> > +
> 
> Why not INIT_LIST_HEAD() at end of alloc_tio()? Shouldn't that be done
> even if you have this else claue here because you can clear REQ_POLLED
> on BLK_STS_DM_REQUEUE? (otherwise you're calling list_add_tail on list
> that wasn't ever INIT_LIST_HEAD).

It is fine to add one un-initialized list node via list_add_tail(), just
the list itself has to be initialized well.

> 
> >  	clone->bi_end_io = clone_endio;
> >  
> >  	/*
> > @@ -1666,8 +1687,9 @@ static void __split_and_process_bio(struct mapped_device *md,
> >  		}
> >  	}
> >  
> > -	/* drop the extra reference count */
> > -	dec_pending(ci.io, errno_to_blk_status(error));
> > +	/* drop the extra reference count for non-POLLED bio */
> > +	if (!(bio->bi_opf & REQ_POLLED))
> > +		dec_pending(ci.io, errno_to_blk_status(error));
> >  }
> >  
> >  static void dm_submit_bio(struct bio *bio)
> > @@ -1707,6 +1729,34 @@ static void dm_submit_bio(struct bio *bio)
> >  	dm_put_live_table(md, srcu_idx);
> >  }
> >  
> > +static int dm_poll_bio(struct bio *bio, unsigned int flags)
> > +{
> > +	struct dm_io *io = bio->bi_bio_drv_data;
> > +	struct dm_target_io *tio;
> > +
> > +	if (!(bio->bi_opf & REQ_POLLED) || !io)
> > +		return 0;
> 
> Should this be a WARN_ON()? Cannot see why this would ever happen
> other than a bug?  Or is there some race that makes it more likely?

REQ_POLLED can be cleared in case of requeue or blk_queue_split(), however
the upper layer still keeps polling. And we need to return simply for
bios which will be completed via IRQ.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 4/4] dm: support bio polling
  2021-06-16 16:05   ` Mike Snitzer
  2021-06-17  2:14     ` Ming Lei
@ 2021-06-18 14:33     ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:33 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, linux-block, Jeffle Xu,
	dm-devel, Hannes Reinecke

On Wed, Jun 16, 2021 at 12:05:13PM -0400, Mike Snitzer wrote:
> In general I'm really happy to see polling switch over to using bios,
> nice job Christoph! Are you hoping for all this to land in time for
> 5.14 merge?

Yes, although time is running out.

> Once Ming responds to my review inlined below, and I Acked-by his
> set, would you be willing to fold it at the end of your patchset so
> that I don't need to rebase on block to get these changes in, etc?

In principle yes, but I need to take a look first.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-18 14:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 13:05 [RFC PATCH 0/4] block/dm: support bio polling Ming Lei
2021-06-16 13:05 ` [RFC PATCH 1/4] block: add helper of blk_queue_poll Ming Lei
2021-06-16 13:05 ` [RFC PATCH 2/4] block: add field of .bi_bio_drv_data to bio Ming Lei
2021-06-16 13:05 ` [RFC PATCH 3/4] block: add ->poll_bio to block_device_operations Ming Lei
2021-06-16 13:05 ` [RFC PATCH 4/4] dm: support bio polling Ming Lei
2021-06-16 16:05   ` Mike Snitzer
2021-06-17  2:14     ` Ming Lei
2021-06-18 14:33     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).