Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] Enable polling on stackable devices
@ 2020-02-11 19:17 Andrzej Jakowski
  2020-02-11 19:17 ` [PATCH v2 1/2] block: reintroduce polling on bio level Andrzej Jakowski
  2020-02-11 19:17 ` [PATCH v2 2/2] md: enable io polling Andrzej Jakowski
  0 siblings, 2 replies; 8+ messages in thread
From: Andrzej Jakowski @ 2020-02-11 19:17 UTC (permalink / raw)
  To: axboe, song; +Cc: linux-block, linux-raid, Andrzej Jakowski

Changes since v1:
 - reintroduced original blk_poll() function that has been removed some time
   ago (Jens)

 - added fastpath calls to blk_mq_poll() in blk_poll() (Christoph, Jens)

 - incorporated code style fixes into md patch (Christoph)
 
[1]: https://lore.kernel.org/linux-block/20200126044138.5066-1-andrzej.jakowski@linux.intel.com/T/#t

---

IO polling is available on blk-mq devices. It is not possible to perform IO
polling on stackable devices like MD.

In this patch series we propose to reintroduce blk_poll() function. blk_poll()
when called on stackable block device that supports polling will invoke its
polling handler. Otherwise it will call blk_mq_poll() directly for fast
accesses.

This patch set also includes example implemetation of polling on MD RAID-0
volume.

---

TODO:
 - introduce REQ_NOWAIT support for stackable devices in a separate patchset 
   (Christoph)

Andrzej Jakowski (1):
  block: reintroduce polling on bio level

Artur Paszkiewicz (1):
  md: enable io polling

 block/blk-core.c       | 28 ++++++++++++++++++++++++++++
 block/blk-mq.c         | 23 ++---------------------
 block/blk-mq.h         |  2 ++
 drivers/md/md.c        | 40 ++++++++++++++++++++++++++++++++++++----
 include/linux/blkdev.h |  2 ++
 5 files changed, 70 insertions(+), 25 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] block: reintroduce polling on bio level
  2020-02-11 19:17 [PATCH v2 0/2] Enable polling on stackable devices Andrzej Jakowski
@ 2020-02-11 19:17 ` Andrzej Jakowski
  2020-02-11 19:17 ` [PATCH v2 2/2] md: enable io polling Andrzej Jakowski
  1 sibling, 0 replies; 8+ messages in thread
From: Andrzej Jakowski @ 2020-02-11 19:17 UTC (permalink / raw)
  To: axboe, song; +Cc: linux-block, linux-raid, Andrzej Jakowski, Artur Paszkiewicz

In current implementation it is possible to perform IO polling if
underlying device is block-mq device. It is not possible however to do
polled IO on stackable block devices like MD.

This patch does following:
 - reintroduces polling on bio level that was removed some time ago.
   Implementation for that has been pulled from commit
   529262d56dbe ("block: remove ->poll_fn").

 - modifies blk_poll() to introduce fastpath access for blk_mq_poll().
   In other words bio_poll() calls poll_fn() for stackable block
   devices supporting polling, otherwise it invokes blk_mq_poll()
   directly.

We managed to collect performance data on RAID-0 volume built on top of
2xP4800X devices with polling on and off. Here are results for pvsync2:

Polling      QD       Latency         IOPS
----------------------------------------------
off           1       12.20us         78400
on            1        8.80us        111000

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
---
 block/blk-core.c       | 28 ++++++++++++++++++++++++++++
 block/blk-mq.c         | 23 ++---------------------
 block/blk-mq.h         |  2 ++
 include/linux/blkdev.h |  2 ++
 4 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 089e890ab208..48001538de92 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1201,6 +1201,34 @@ blk_qc_t submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
+/**
+ * blk_poll - poll for IO completions
+ * @q:  the queue
+ * @cookie: cookie passed back at IO submission time
+ * @spin: whether to spin for completions
+ *
+ * Description:
+ *    Poll for completions on the passed in queue. Returns number of
+ *    completed entries found. If @spin is true, then blk_poll will continue
+ *    looping until at least one completion is found, unless the task is
+ *    otherwise marked running (or we need to reschedule).
+ */
+int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+{
+	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags) ||
+	    !blk_qc_t_valid(cookie))
+		return 0;
+
+	if (current->plug)
+		blk_flush_plug_list(current->plug, false);
+
+	if (q->poll_fn)
+		return q->poll_fn(q, cookie, spin);
+
+	return blk_mq_poll(q, cookie, spin);
+}
+EXPORT_SYMBOL_GPL(blk_poll);
+
 /**
  * blk_cloned_rq_check_limits - Helper function to check a cloned request
  *                              for new the queue limits
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a12b1763508d..868aab100692 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3509,30 +3509,11 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-/**
- * blk_poll - poll for IO completions
- * @q:  the queue
- * @cookie: cookie passed back at IO submission time
- * @spin: whether to spin for completions
- *
- * Description:
- *    Poll for completions on the passed in queue. Returns number of
- *    completed entries found. If @spin is true, then blk_poll will continue
- *    looping until at least one completion is found, unless the task is
- *    otherwise marked running (or we need to reschedule).
- */
-int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 	long state;
 
-	if (!blk_qc_t_valid(cookie) ||
-	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
-		return 0;
-
-	if (current->plug)
-		blk_flush_plug_list(current->plug, false);
-
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
 
 	/*
@@ -3573,7 +3554,7 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	__set_current_state(TASK_RUNNING);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(blk_poll);
+EXPORT_SYMBOL_GPL(blk_mq_poll);
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
 {
diff --git a/block/blk-mq.h b/block/blk-mq.h
index eaaca8fc1c28..def2f85926b2 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -75,6 +75,8 @@ blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last);
 void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 				    struct list_head *list);
 
+int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
+
 /*
  * CPU -> queue mappings
  */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 053ea4b51988..ab703d655be0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -288,6 +288,7 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
+typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t, bool spin);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
@@ -399,6 +400,7 @@ struct request_queue {
 	struct rq_qos		*rq_qos;
 
 	make_request_fn		*make_request_fn;
+	poll_q_fn		*poll_fn;
 	dma_drain_needed_fn	*dma_drain_needed;
 
 	const struct blk_mq_ops	*mq_ops;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] md: enable io polling
  2020-02-11 19:17 [PATCH v2 0/2] Enable polling on stackable devices Andrzej Jakowski
  2020-02-11 19:17 ` [PATCH v2 1/2] block: reintroduce polling on bio level Andrzej Jakowski
@ 2020-02-11 19:17 ` Andrzej Jakowski
  2020-02-11 21:13   ` Keith Busch
  1 sibling, 1 reply; 8+ messages in thread
From: Andrzej Jakowski @ 2020-02-11 19:17 UTC (permalink / raw)
  To: axboe, song; +Cc: linux-block, linux-raid, Artur Paszkiewicz, Andrzej Jakowski

From: Artur Paszkiewicz <artur.paszkiewicz@intel.com>

Provide a callback for polling the mddev which in turn polls the active
member devices in non-spinning manner. Enable it only if all members
support polling.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
---
 drivers/md/md.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 469f551863be..849d22a2108f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5564,6 +5564,28 @@ int mddev_init_writes_pending(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(mddev_init_writes_pending);
 
+static int md_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+{
+	struct mddev *mddev = q->queuedata;
+	struct md_rdev *rdev;
+	int ret = 0;
+	int rv;
+
+	rdev_for_each(rdev, mddev) {
+		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+			continue;
+
+		rv = blk_poll(bdev_get_queue(rdev->bdev), cookie, false);
+		if (rv < 0) {
+			ret = rv;
+			break;
+		}
+		ret += rv;
+	}
+
+	return ret;
+}
+
 static int md_alloc(dev_t dev, char *name)
 {
 	/*
@@ -5628,6 +5650,7 @@ static int md_alloc(dev_t dev, char *name)
 
 	blk_queue_make_request(mddev->queue, md_make_request);
 	blk_set_stacking_limits(&mddev->queue->limits);
+	mddev->queue->poll_fn = md_poll;
 
 	disk = alloc_disk(1 << shift);
 	if (!disk) {
@@ -5932,12 +5955,17 @@ int md_run(struct mddev *mddev)
 
 	if (mddev->queue) {
 		bool nonrot = true;
+		bool poll = true;
 
 		rdev_for_each(rdev, mddev) {
-			if (rdev->raid_disk >= 0 &&
-			    !blk_queue_nonrot(bdev_get_queue(rdev->bdev))) {
-				nonrot = false;
-				break;
+			if (rdev->raid_disk >= 0) {
+				struct request_queue *q;
+
+				q = bdev_get_queue(rdev->bdev);
+				if (!blk_queue_nonrot(q))
+					nonrot = false;
+				if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+					poll = false;
 			}
 		}
 		if (mddev->degraded)
@@ -5946,6 +5974,10 @@ int md_run(struct mddev *mddev)
 			blk_queue_flag_set(QUEUE_FLAG_NONROT, mddev->queue);
 		else
 			blk_queue_flag_clear(QUEUE_FLAG_NONROT, mddev->queue);
+		if (poll)
+			blk_queue_flag_set(QUEUE_FLAG_POLL, mddev->queue);
+		else
+			blk_queue_flag_clear(QUEUE_FLAG_POLL, mddev->queue);
 		mddev->queue->backing_dev_info->congested_data = mddev;
 		mddev->queue->backing_dev_info->congested_fn = md_congested;
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] md: enable io polling
  2020-02-11 19:17 ` [PATCH v2 2/2] md: enable io polling Andrzej Jakowski
@ 2020-02-11 21:13   ` Keith Busch
  2020-02-12 21:00     ` Andrzej Jakowski
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2020-02-11 21:13 UTC (permalink / raw)
  To: Andrzej Jakowski; +Cc: axboe, song, linux-block, linux-raid, Artur Paszkiewicz

On Tue, Feb 11, 2020 at 12:17:29PM -0700, Andrzej Jakowski wrote:
> +static int md_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> +{
> +	struct mddev *mddev = q->queuedata;
> +	struct md_rdev *rdev;
> +	int ret = 0;
> +	int rv;
> +
> +	rdev_for_each(rdev, mddev) {
> +		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
> +			continue;
> +
> +		rv = blk_poll(bdev_get_queue(rdev->bdev), cookie, false);
> +		if (rv < 0) {
> +			ret = rv;
> +			break;
> +		}
> +		ret += rv;
> +	}
> +
> +	return ret;
> +}

I must be missing something: md's make_request_fn always returns
BLK_QC_T_NONE for the cookie, and that couldn't get past blk_poll's
blk_qc_t_valid(cookie) check. How does the initial blk_poll() caller get
a valid cookie for an md backing device's request_queue? And how is the
same cookie valid for more than one request_queue?

Since this is using the same cookie with different request_queues, I'm
really concerned about what happens if you managed to successfully poll
an nvme queue that wasn't designated for polling: blk_qc_t_to_queue_num()
may return a valid poll hctx for one nvme request_queue, but the same
queue_num for a different nvme device may be an IRQ driven context. That
could lead to douple completions and corrupted queues.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] md: enable io polling
  2020-02-11 21:13   ` Keith Busch
@ 2020-02-12 21:00     ` Andrzej Jakowski
  2020-02-12 21:42       ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Andrzej Jakowski @ 2020-02-12 21:00 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, song, linux-block, linux-raid, Artur Paszkiewicz

On 2/11/20 2:13 PM, Keith Busch wrote:
> I must be missing something: md's make_request_fn always returns
> BLK_QC_T_NONE for the cookie, and that couldn't get past blk_poll's
> blk_qc_t_valid(cookie) check. How does the initial blk_poll() caller get
> a valid cookie for an md backing device's request_queue? And how is the
> same cookie valid for more than one request_queue?

That's true md_make_request() always returns BLK_QC_T_NONE. md_make_request()
recursively calls generic_make_request() for the underlying device (e.g. nvme).
That block io request directed to member disk is added into bio_list and is 
processed later by top level generic_make_request(). generic_make_request() 
returns cookie that is returned by blk_mq_make_request().
That cookie is later used to poll for completion. 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] md: enable io polling
  2020-02-12 21:00     ` Andrzej Jakowski
@ 2020-02-12 21:42       ` Keith Busch
  2020-02-13 20:19         ` Andrzej Jakowski
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2020-02-12 21:42 UTC (permalink / raw)
  To: Andrzej Jakowski; +Cc: axboe, song, linux-block, linux-raid, Artur Paszkiewicz

On Wed, Feb 12, 2020 at 02:00:10PM -0700, Andrzej Jakowski wrote:
> On 2/11/20 2:13 PM, Keith Busch wrote:
> > I must be missing something: md's make_request_fn always returns
> > BLK_QC_T_NONE for the cookie, and that couldn't get past blk_poll's
> > blk_qc_t_valid(cookie) check. How does the initial blk_poll() caller get
> > a valid cookie for an md backing device's request_queue? And how is the
> > same cookie valid for more than one request_queue?
> 
> That's true md_make_request() always returns BLK_QC_T_NONE. md_make_request()
> recursively calls generic_make_request() for the underlying device (e.g. nvme).
> That block io request directed to member disk is added into bio_list and is 
> processed later by top level generic_make_request(). generic_make_request() 
> returns cookie that is returned by blk_mq_make_request().
> That cookie is later used to poll for completion. 

Okay, that's a nice subtlety. But it means the original caller gets the
cookie from the last submission in the chain. If md recieves a single
request that has to be split among more than one member disk, the cookie
you're using to control the polling is valid only for one of the
request_queue's and may break others.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] md: enable io polling
  2020-02-12 21:42       ` Keith Busch
@ 2020-02-13 20:19         ` Andrzej Jakowski
  2020-02-14 19:25           ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Andrzej Jakowski @ 2020-02-13 20:19 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, song, linux-block, linux-raid, Artur Paszkiewicz

On 2/12/20 2:42 PM, Keith Busch wrote:
> On Wed, Feb 12, 2020 at 02:00:10PM -0700, Andrzej Jakowski wrote:
>> On 2/11/20 2:13 PM, Keith Busch wrote:
>>> I must be missing something: md's make_request_fn always returns
>>> BLK_QC_T_NONE for the cookie, and that couldn't get past blk_poll's
>>> blk_qc_t_valid(cookie) check. How does the initial blk_poll() caller get
>>> a valid cookie for an md backing device's request_queue? And how is the
>>> same cookie valid for more than one request_queue?
>> That's true md_make_request() always returns BLK_QC_T_NONE. md_make_request()
>> recursively calls generic_make_request() for the underlying device (e.g. nvme).
>> That block io request directed to member disk is added into bio_list and is 
>> processed later by top level generic_make_request(). generic_make_request() 
>> returns cookie that is returned by blk_mq_make_request().
>> That cookie is later used to poll for completion. 
> Okay, that's a nice subtlety. But it means the original caller gets the
> cookie from the last submission in the chain. If md recieves a single
> request that has to be split among more than one member disk, the cookie
> you're using to control the polling is valid only for one of the
> request_queue's and may break others.

Correct, I agree that it is an issue. I can see at least two ways how to solve it:
 1. Provide a mechanism in md accounting for outstanding IOs, storing cookie information 
    for them. md_poll() will then use valid cookie's
 2. Provide similar mechanism abstracted for stackable block devices and block layer could
    handle completions for subordinate bios in an abstracted way in blk_poll() routine.
How do you Guys see this going?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] md: enable io polling
  2020-02-13 20:19         ` Andrzej Jakowski
@ 2020-02-14 19:25           ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2020-02-14 19:25 UTC (permalink / raw)
  To: Andrzej Jakowski; +Cc: axboe, song, linux-block, linux-raid, Artur Paszkiewicz

On Thu, Feb 13, 2020 at 01:19:42PM -0700, Andrzej Jakowski wrote:
> On 2/12/20 2:42 PM, Keith Busch wrote:
> > Okay, that's a nice subtlety. But it means the original caller gets the
> > cookie from the last submission in the chain. If md recieves a single
> > request that has to be split among more than one member disk, the cookie
> > you're using to control the polling is valid only for one of the
> > request_queue's and may break others.
> 
> Correct, I agree that it is an issue. I can see at least two ways how to solve it:
>  1. Provide a mechanism in md accounting for outstanding IOs, storing cookie information 
>     for them. md_poll() will then use valid cookie's
>  2. Provide similar mechanism abstracted for stackable block devices and block layer could
>     handle completions for subordinate bios in an abstracted way in blk_poll() routine.
> How do you Guys see this going?

Honestly, I don't see how this is can be successful without a more
significant change than you may be anticipating. I'd be happy to hear if
there's a better solution, but here's what I'm thinking:

You'd need each stacking layer to return a cookie that its poll function
can turn into a list of { request_queue, blk_qc_t } tuples for each bio
the stacking layer created so that it can chain the poll request to the
next layers.

The problems are that the stacking layers don't get a cookie for the
bio's it submits from within the same md_make_request() context. Even if
you could get the cookie associated with those bios, you either allocate
more memory to track these things, or need polling bio list link fields
added 'struct bio', neither of which seem very appealing.

Do you have a better way in mind?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 19:17 [PATCH v2 0/2] Enable polling on stackable devices Andrzej Jakowski
2020-02-11 19:17 ` [PATCH v2 1/2] block: reintroduce polling on bio level Andrzej Jakowski
2020-02-11 19:17 ` [PATCH v2 2/2] md: enable io polling Andrzej Jakowski
2020-02-11 21:13   ` Keith Busch
2020-02-12 21:00     ` Andrzej Jakowski
2020-02-12 21:42       ` Keith Busch
2020-02-13 20:19         ` Andrzej Jakowski
2020-02-14 19:25           ` Keith Busch

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git