All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/11] Various block optimizations
@ 2018-11-13 15:42 Jens Axboe
  2018-11-13 15:42   ` Jens Axboe
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block

Some of these are optimizations, the latter part is prep work
for supporting polling with aio.

Patches against my for-4.21/block branch. These patches can also
be found in my mq-perf branch, though there are other patches
sitting on top of this series (notably aio polling, as mentioned).

Changes since v1:

- Improve nvme irq disabling for polled IO
- Fix barriers in the ordered wakeup for polled O_DIRECT
- Add patch to allow polling to find any command that is done
- Add patch to control whether polling spins or not
- Have async O_DIRECT mark a bio as pollable
- Don't plug for polling


 block/blk-cgroup.c                |   8 +--
 block/blk-core.c                  |  20 ++++----
 block/blk-flush.c                 |   3 +-
 block/blk-mq-debugfs.c            |   2 +-
 block/blk-mq.c                    | 105 +++++++++++++++++++++-----------------
 block/blk-mq.h                    |  12 ++---
 block/blk-rq-qos.c                |  90 +++++++++-----------------------
 block/blk-rq-qos.h                |  35 ++++++++++---
 block/blk-softirq.c               |   4 +-
 block/blk-sysfs.c                 |  18 +++----
 block/blk-wbt.c                   |   2 +-
 block/elevator.c                  |   9 ++--
 block/genhd.c                     |   8 +--
 drivers/md/dm-table.c             |   2 +-
 drivers/nvme/host/multipath.c     |   6 +--
 drivers/nvme/host/pci.c           |  45 +++++++++-------
 drivers/nvme/target/io-cmd-bdev.c |   2 +-
 drivers/scsi/scsi_lib.c           |   2 +-
 fs/block_dev.c                    |  32 +++++++++---
 fs/direct-io.c                    |   2 +-
 fs/iomap.c                        |   9 ++--
 include/linux/blk-mq-ops.h        | 100 ++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h            |  94 +---------------------------------
 include/linux/blkdev.h            |  37 +++++++++++---
 mm/page_io.c                      |   2 +-
 25 files changed, 347 insertions(+), 302 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/11] nvme: don't disable local ints for polled queue
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
@ 2018-11-13 15:42   ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 02/11] block: add queue_is_mq() helper Jens Axboe
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Keith Busch, linux-nvme

A polled queued doesn't trigger interrupts, so it's always safe
to grab the queue lock without disabling interrupts.

Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/pci.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6aa86dfcb32c..bb22ae567208 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1067,9 +1067,18 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
-	spin_lock_irq(&nvmeq->cq_lock);
+	/*
+	 * Polled queue doesn't have an IRQ, no need to disable ints
+	 */
+	if (!nvmeq->polled)
+		local_irq_disable();
+
+	spin_lock(&nvmeq->cq_lock);
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irq(&nvmeq->cq_lock);
+	spin_unlock(&nvmeq->cq_lock);
+
+	if (!nvmeq->polled)
+		local_irq_enable();
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 01/11] nvme: don't disable local ints for polled queue
@ 2018-11-13 15:42   ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)


A polled queued doesn't trigger interrupts, so it's always safe
to grab the queue lock without disabling interrupts.

Cc: Keith Busch <keith.busch at intel.com>
Cc: linux-nvme at lists.infradead.org
Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 drivers/nvme/host/pci.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6aa86dfcb32c..bb22ae567208 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1067,9 +1067,18 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
-	spin_lock_irq(&nvmeq->cq_lock);
+	/*
+	 * Polled queue doesn't have an IRQ, no need to disable ints
+	 */
+	if (!nvmeq->polled)
+		local_irq_disable();
+
+	spin_lock(&nvmeq->cq_lock);
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irq(&nvmeq->cq_lock);
+	spin_unlock(&nvmeq->cq_lock);
+
+	if (!nvmeq->polled)
+		local_irq_enable();
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/11] block: add queue_is_mq() helper
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
  2018-11-13 15:42   ` Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-14 15:23   ` Christoph Hellwig
  2018-11-13 15:42 ` [PATCH 03/11] blk-mq: embed blk_mq_ops directly in the request queue Jens Axboe
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.

Where the ->mq_ops != NULL check is redundant, remove it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-cgroup.c     |  8 ++++----
 block/blk-core.c       | 10 +++++-----
 block/blk-flush.c      |  3 +--
 block/blk-mq.c         |  2 +-
 block/blk-sysfs.c      | 14 +++++++-------
 block/blk-wbt.c        |  2 +-
 block/elevator.c       |  9 ++++-----
 block/genhd.c          |  8 ++++----
 drivers/md/dm-table.c  |  2 +-
 include/linux/blkdev.h | 10 +++++++---
 10 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6c65791bc3fe..8da8d3773ecf 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1349,7 +1349,7 @@ int blkcg_activate_policy(struct request_queue *q,
 	if (blkcg_policy_enabled(q, pol))
 		return 0;
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_freeze_queue(q);
 pd_prealloc:
 	if (!pd_prealloc) {
@@ -1388,7 +1388,7 @@ int blkcg_activate_policy(struct request_queue *q,
 
 	spin_unlock_irq(q->queue_lock);
 out_bypass_end:
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_unfreeze_queue(q);
 	if (pd_prealloc)
 		pol->pd_free_fn(pd_prealloc);
@@ -1412,7 +1412,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
 	if (!blkcg_policy_enabled(q, pol))
 		return;
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_freeze_queue(q);
 
 	spin_lock_irq(q->queue_lock);
@@ -1430,7 +1430,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
 
 	spin_unlock_irq(q->queue_lock);
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_unfreeze_queue(q);
 }
 EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
diff --git a/block/blk-core.c b/block/blk-core.c
index fdc0ad2686c4..ab6675fd3568 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -268,7 +268,7 @@ void blk_sync_queue(struct request_queue *q)
 	del_timer_sync(&q->timeout);
 	cancel_work_sync(&q->timeout_work);
 
-	if (q->mq_ops) {
+	if (queue_is_mq(q)) {
 		struct blk_mq_hw_ctx *hctx;
 		int i;
 
@@ -317,7 +317,7 @@ void blk_set_queue_dying(struct request_queue *q)
 	 */
 	blk_freeze_queue_start(q);
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_wake_waiters(q);
 
 	/* Make blk_queue_enter() reexamine the DYING flag. */
@@ -410,7 +410,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	 * blk_freeze_queue() should be enough for cases of passthrough
 	 * request.
 	 */
-	if (q->mq_ops && blk_queue_init_done(q))
+	if (queue_is_mq(q) && blk_queue_init_done(q))
 		blk_mq_quiesce_queue(q);
 
 	/* for synchronous bio-based driver finish in-flight integrity i/o */
@@ -428,7 +428,7 @@ void blk_cleanup_queue(struct request_queue *q)
 
 	blk_exit_queue(q);
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_free_queue(q);
 
 	percpu_ref_exit(&q->q_usage_counter);
@@ -1736,7 +1736,7 @@ EXPORT_SYMBOL_GPL(rq_flush_dcache_pages);
  */
 int blk_lld_busy(struct request_queue *q)
 {
-	if (q->mq_ops && q->mq_ops->busy)
+	if (queue_is_mq(q) && q->mq_ops->busy)
 		return q->mq_ops->busy(q);
 
 	return 0;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index c53197dcdd70..3b79bea03462 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -273,8 +273,7 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 	 * assigned to empty flushes, and we deadlock if we are expecting
 	 * other requests to make progress. Don't defer for that case.
 	 */
-	if (!list_empty(&fq->flush_data_in_flight) &&
-	    !(q->mq_ops && q->elevator) &&
+	if (!list_empty(&fq->flush_data_in_flight) && q->elevator &&
 	    time_before(jiffies,
 			fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
 		return;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 411be60d0cb6..eb9b9596d3de 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -150,7 +150,7 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		if (q->mq_ops)
+		if (queue_is_mq(q))
 			blk_mq_run_hw_queues(q, false);
 	}
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index d4b1b84ba8ca..93635a693314 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -68,7 +68,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 	unsigned long nr;
 	int ret, err;
 
-	if (!q->mq_ops)
+	if (!queue_is_mq(q))
 		return -EINVAL;
 
 	ret = queue_var_store(&nr, page, count);
@@ -839,12 +839,12 @@ static void __blk_release_queue(struct work_struct *work)
 
 	blk_queue_free_zone_bitmaps(q);
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_release(q);
 
 	blk_trace_shutdown(q);
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_debugfs_unregister(q);
 
 	bioset_exit(&q->bio_split);
@@ -918,7 +918,7 @@ int blk_register_queue(struct gendisk *disk)
 		goto unlock;
 	}
 
-	if (q->mq_ops) {
+	if (queue_is_mq(q)) {
 		__blk_mq_register_dev(dev, q);
 		blk_mq_debugfs_register(q);
 	}
@@ -929,7 +929,7 @@ int blk_register_queue(struct gendisk *disk)
 
 	blk_throtl_register_queue(q);
 
-	if ((q->mq_ops && q->elevator)) {
+	if (q->elevator) {
 		ret = elv_register_queue(q);
 		if (ret) {
 			mutex_unlock(&q->sysfs_lock);
@@ -978,7 +978,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	 * Remove the sysfs attributes before unregistering the queue data
 	 * structures that can be modified through sysfs.
 	 */
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		blk_mq_unregister_dev(disk_to_dev(disk), q);
 	mutex_unlock(&q->sysfs_lock);
 
@@ -987,7 +987,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	blk_trace_remove_sysfs(disk_to_dev(disk));
 
 	mutex_lock(&q->sysfs_lock);
-	if (q->mq_ops && q->elevator)
+	if (q->elevator)
 		elv_unregister_queue(q);
 	mutex_unlock(&q->sysfs_lock);
 
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 0fc222d4194b..b580763e8b1e 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -709,7 +709,7 @@ void wbt_enable_default(struct request_queue *q)
 	if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
 		return;
 
-	if (q->mq_ops && IS_ENABLED(CONFIG_BLK_WBT_MQ))
+	if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
 		wbt_init(q);
 }
 EXPORT_SYMBOL_GPL(wbt_enable_default);
diff --git a/block/elevator.c b/block/elevator.c
index 19351ffa56b1..9df9ae6b038e 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -674,7 +674,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
 	/*
 	 * Special case for mq, turn off scheduling
 	 */
-	if (q->mq_ops && !strncmp(name, "none", 4))
+	if (!strncmp(name, "none", 4))
 		return elevator_switch(q, NULL);
 
 	strlcpy(elevator_name, name, sizeof(elevator_name));
@@ -692,8 +692,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
 
 static inline bool elv_support_iosched(struct request_queue *q)
 {
-	if (q->mq_ops && q->tag_set && (q->tag_set->flags &
-				BLK_MQ_F_NO_SCHED))
+	if (q->tag_set && (q->tag_set->flags & BLK_MQ_F_NO_SCHED))
 		return false;
 	return true;
 }
@@ -703,7 +702,7 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name,
 {
 	int ret;
 
-	if (!q->mq_ops || !elv_support_iosched(q))
+	if (!queue_is_mq(q) || !elv_support_iosched(q))
 		return count;
 
 	ret = __elevator_change(q, name);
@@ -739,7 +738,7 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
 	}
 	spin_unlock(&elv_list_lock);
 
-	if (q->mq_ops && q->elevator)
+	if (q->elevator)
 		len += sprintf(name+len, "none");
 
 	len += sprintf(len+name, "\n");
diff --git a/block/genhd.c b/block/genhd.c
index cff6bdf27226..0145bcb0cc76 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -47,7 +47,7 @@ static void disk_release_events(struct gendisk *disk);
 
 void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
 {
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		return;
 
 	atomic_inc(&part->in_flight[rw]);
@@ -57,7 +57,7 @@ void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
 
 void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
 {
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		return;
 
 	atomic_dec(&part->in_flight[rw]);
@@ -68,7 +68,7 @@ void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
 void part_in_flight(struct request_queue *q, struct hd_struct *part,
 		    unsigned int inflight[2])
 {
-	if (q->mq_ops) {
+	if (queue_is_mq(q)) {
 		blk_mq_in_flight(q, part, inflight);
 		return;
 	}
@@ -85,7 +85,7 @@ void part_in_flight(struct request_queue *q, struct hd_struct *part,
 void part_in_flight_rw(struct request_queue *q, struct hd_struct *part,
 		       unsigned int inflight[2])
 {
-	if (q->mq_ops) {
+	if (queue_is_mq(q)) {
 		blk_mq_in_flight_rw(q, part, inflight);
 		return;
 	}
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 9038c302d5c2..e42739177107 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -919,7 +919,7 @@ static int device_is_rq_based(struct dm_target *ti, struct dm_dev *dev,
 	struct request_queue *q = bdev_get_queue(dev->bdev);
 	struct verify_rq_based_data *v = data;
 
-	if (q->mq_ops)
+	if (queue_is_mq(q))
 		v->mq_count++;
 	else
 		v->sq_count++;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e67ad2dd025e..3712d1fe48d4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -671,13 +671,17 @@ static inline bool blk_account_rq(struct request *rq)
 
 #define rq_data_dir(rq)		(op_is_write(req_op(rq)) ? WRITE : READ)
 
+static inline bool queue_is_mq(struct request_queue *q)
+{
+	return q->mq_ops;
+}
+
 /*
- * Driver can handle struct request, if it either has an old style
- * request_fn defined, or is blk-mq based.
+ * Only blk-mq based drivers are rq based
  */
 static inline bool queue_is_rq_based(struct request_queue *q)
 {
-	return q->mq_ops;
+	return queue_is_mq(q);
 }
 
 static inline unsigned int blk_queue_cluster(struct request_queue *q)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/11] blk-mq: embed blk_mq_ops directly in the request queue
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
  2018-11-13 15:42   ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 02/11] block: add queue_is_mq() helper Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 04/11] blk-rq-qos: inline check for q->rq_qos functions Jens Axboe
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

This saves an indirect function call everytime we have to
call one of the strategy functions. We keep it const, and just
hack around that a bit in blk_mq_init_allocated_queue(), which
is where we copy the ops in.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c           |   8 +--
 block/blk-mq-debugfs.c     |   2 +-
 block/blk-mq.c             |  22 ++++----
 block/blk-mq.h             |  12 ++---
 block/blk-softirq.c        |   4 +-
 block/blk-sysfs.c          |   4 +-
 drivers/scsi/scsi_lib.c    |   2 +-
 include/linux/blk-mq-ops.h | 100 +++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h     |  94 +---------------------------------
 include/linux/blkdev.h     |   6 ++-
 10 files changed, 132 insertions(+), 122 deletions(-)
 create mode 100644 include/linux/blk-mq-ops.h

diff --git a/block/blk-core.c b/block/blk-core.c
index ab6675fd3568..88400ab166ac 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -656,8 +656,8 @@ struct request *blk_get_request(struct request_queue *q, unsigned int op,
 	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
 
 	req = blk_mq_alloc_request(q, op, flags);
-	if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
-		q->mq_ops->initialize_rq_fn(req);
+	if (!IS_ERR(req) && q->mq_ops.initialize_rq_fn)
+		q->mq_ops.initialize_rq_fn(req);
 
 	return req;
 }
@@ -1736,8 +1736,8 @@ EXPORT_SYMBOL_GPL(rq_flush_dcache_pages);
  */
 int blk_lld_busy(struct request_queue *q)
 {
-	if (queue_is_mq(q) && q->mq_ops->busy)
-		return q->mq_ops->busy(q);
+	if (q->mq_ops.busy)
+		return q->mq_ops.busy(q);
 
 	return 0;
 }
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f021f4817b80..efdfb6258e03 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -354,7 +354,7 @@ static const char *blk_mq_rq_state_name(enum mq_rq_state rq_state)
 
 int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq)
 {
-	const struct blk_mq_ops *const mq_ops = rq->q->mq_ops;
+	const struct blk_mq_ops *const mq_ops = &rq->q->mq_ops;
 	const unsigned int op = rq->cmd_flags & REQ_OP_MASK;
 
 	seq_printf(m, "%p {.op=", rq);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index eb9b9596d3de..6e0cb6adfc90 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -558,7 +558,7 @@ static void __blk_mq_complete_request_remote(void *data)
 	struct request *rq = data;
 	struct request_queue *q = rq->q;
 
-	q->mq_ops->complete(rq);
+	q->mq_ops.complete(rq);
 }
 
 static void __blk_mq_complete_request(struct request *rq)
@@ -586,7 +586,7 @@ static void __blk_mq_complete_request(struct request *rq)
 	}
 
 	if (!test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags)) {
-		q->mq_ops->complete(rq);
+		q->mq_ops.complete(rq);
 		return;
 	}
 
@@ -600,7 +600,7 @@ static void __blk_mq_complete_request(struct request *rq)
 		rq->csd.flags = 0;
 		smp_call_function_single_async(ctx->cpu, &rq->csd);
 	} else {
-		q->mq_ops->complete(rq);
+		q->mq_ops.complete(rq);
 	}
 	put_cpu();
 }
@@ -818,10 +818,10 @@ EXPORT_SYMBOL_GPL(blk_mq_queue_busy);
 static void blk_mq_rq_timed_out(struct request *req, bool reserved)
 {
 	req->rq_flags |= RQF_TIMED_OUT;
-	if (req->q->mq_ops->timeout) {
+	if (req->q->mq_ops.timeout) {
 		enum blk_eh_timer_return ret;
 
-		ret = req->q->mq_ops->timeout(req, reserved);
+		ret = req->q->mq_ops.timeout(req, reserved);
 		if (ret == BLK_EH_DONE)
 			return;
 		WARN_ON_ONCE(ret != BLK_EH_RESET_TIMER);
@@ -1221,7 +1221,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 			bd.last = !blk_mq_get_driver_tag(nxt);
 		}
 
-		ret = q->mq_ops->queue_rq(hctx, &bd);
+		ret = q->mq_ops.queue_rq(hctx, &bd);
 		if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
 			/*
 			 * If an I/O scheduler has been configured and we got a
@@ -1746,7 +1746,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 	 * Any other error (busy), just add it to our list as we
 	 * previously would have done.
 	 */
-	ret = q->mq_ops->queue_rq(hctx, &bd);
+	ret = q->mq_ops.queue_rq(hctx, &bd);
 	switch (ret) {
 	case BLK_STS_OK:
 		blk_mq_update_dispatch_busy(hctx, false);
@@ -2723,7 +2723,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q)
 {
 	/* mark the queue as mq asap */
-	q->mq_ops = set->ops;
+	memcpy((void *) &q->mq_ops, set->ops, sizeof(q->mq_ops));
 
 	q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
 					     blk_mq_poll_stats_bkt,
@@ -2765,7 +2765,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	spin_lock_init(&q->requeue_lock);
 
 	blk_queue_make_request(q, blk_mq_make_request);
-	if (q->mq_ops->poll)
+	if (q->mq_ops.poll)
 		q->poll_fn = blk_mq_poll;
 
 	/*
@@ -2797,7 +2797,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 err_percpu:
 	free_percpu(q->queue_ctx);
 err_exit:
-	q->mq_ops = NULL;
+	memset((void *) &q->mq_ops, 0, sizeof(q->mq_ops));
 	return ERR_PTR(-ENOMEM);
 }
 EXPORT_SYMBOL(blk_mq_init_allocated_queue);
@@ -3328,7 +3328,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 
 		hctx->poll_invoked++;
 
-		ret = q->mq_ops->poll(hctx, rq->tag);
+		ret = q->mq_ops.poll(hctx, rq->tag);
 		if (ret > 0) {
 			hctx->poll_success++;
 			set_current_state(TASK_RUNNING);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index facb6e9ddce4..1eb6a3e8af58 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -99,8 +99,8 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
 {
 	int hctx_type = 0;
 
-	if (q->mq_ops->rq_flags_to_type)
-		hctx_type = q->mq_ops->rq_flags_to_type(q, flags);
+	if (q->mq_ops.rq_flags_to_type)
+		hctx_type = q->mq_ops.rq_flags_to_type(q, flags);
 
 	return blk_mq_map_queue_type(q, hctx_type, cpu);
 }
@@ -187,16 +187,16 @@ static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 
-	if (q->mq_ops->put_budget)
-		q->mq_ops->put_budget(hctx);
+	if (q->mq_ops.put_budget)
+		q->mq_ops.put_budget(hctx);
 }
 
 static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 
-	if (q->mq_ops->get_budget)
-		return q->mq_ops->get_budget(hctx);
+	if (q->mq_ops.get_budget)
+		return q->mq_ops.get_budget(hctx);
 	return true;
 }
 
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 1534066e306e..2f4176668470 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -34,7 +34,7 @@ static __latent_entropy void blk_done_softirq(struct softirq_action *h)
 
 		rq = list_entry(local_list.next, struct request, ipi_list);
 		list_del_init(&rq->ipi_list);
-		rq->q->mq_ops->complete(rq);
+		rq->q->mq_ops.complete(rq);
 	}
 }
 
@@ -102,7 +102,7 @@ void __blk_complete_request(struct request *req)
 	unsigned long flags;
 	bool shared = false;
 
-	BUG_ON(!q->mq_ops->complete);
+	BUG_ON(!q->mq_ops.complete);
 
 	local_irq_save(flags);
 	cpu = smp_processor_id();
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 93635a693314..9661ef5b390f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -380,7 +380,7 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
 {
 	int err, val;
 
-	if (!q->mq_ops || !q->mq_ops->poll)
+	if (!q->mq_ops.poll)
 		return -EINVAL;
 
 	err = kstrtoint(page, 10, &val);
@@ -406,7 +406,7 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 	unsigned long poll_on;
 	ssize_t ret;
 
-	if (!q->mq_ops || !q->mq_ops->poll)
+	if (!q->mq_ops.poll)
 		return -EINVAL;
 
 	ret = queue_var_store(&poll_on, page, count);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5d83a162d03b..61babcb269ab 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1907,7 +1907,7 @@ struct scsi_device *scsi_device_from_queue(struct request_queue *q)
 {
 	struct scsi_device *sdev = NULL;
 
-	if (q->mq_ops == &scsi_mq_ops)
+	if (q->mq_ops.queue_rq == scsi_mq_ops.queue_rq)
 		sdev = q->queuedata;
 	if (!sdev || !get_device(&sdev->sdev_gendev))
 		sdev = NULL;
diff --git a/include/linux/blk-mq-ops.h b/include/linux/blk-mq-ops.h
new file mode 100644
index 000000000000..0940c26875ca
--- /dev/null
+++ b/include/linux/blk-mq-ops.h
@@ -0,0 +1,100 @@
+#ifndef BLK_MQ_OPS_H
+#define BLK_MQ_OPS_H
+
+struct blk_mq_queue_data;
+struct blk_mq_hw_ctx;
+struct blk_mq_tag_set;
+
+typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
+		const struct blk_mq_queue_data *);
+/* takes rq->cmd_flags as input, returns a hardware type index */
+typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
+typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
+typedef void (put_budget_fn)(struct blk_mq_hw_ctx *);
+typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
+typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int);
+typedef void (exit_hctx_fn)(struct blk_mq_hw_ctx *, unsigned int);
+typedef int (init_request_fn)(struct blk_mq_tag_set *set, struct request *,
+		unsigned int, unsigned int);
+typedef void (exit_request_fn)(struct blk_mq_tag_set *set, struct request *,
+		unsigned int);
+
+typedef bool (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
+		bool);
+typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);
+typedef int (poll_fn)(struct blk_mq_hw_ctx *, unsigned int);
+typedef int (map_queues_fn)(struct blk_mq_tag_set *set);
+typedef bool (busy_fn)(struct request_queue *);
+typedef void (complete_fn)(struct request *);
+
+struct blk_mq_ops {
+	/*
+	 * Queue request
+	 */
+	queue_rq_fn		*queue_rq;
+
+	/*
+	 * Return a queue map type for the given request/bio flags
+	 */
+	rq_flags_to_type_fn	*rq_flags_to_type;
+
+	/*
+	 * Reserve budget before queue request, once .queue_rq is
+	 * run, it is driver's responsibility to release the
+	 * reserved budget. Also we have to handle failure case
+	 * of .get_budget for avoiding I/O deadlock.
+	 */
+	get_budget_fn		*get_budget;
+	put_budget_fn		*put_budget;
+
+	/*
+	 * Called on request timeout
+	 */
+	timeout_fn		*timeout;
+
+	/*
+	 * Called to poll for completion of a specific tag.
+	 */
+	poll_fn			*poll;
+
+	complete_fn		*complete;
+
+	/*
+	 * Called when the block layer side of a hardware queue has been
+	 * set up, allowing the driver to allocate/init matching structures.
+	 * Ditto for exit/teardown.
+	 */
+	init_hctx_fn		*init_hctx;
+	exit_hctx_fn		*exit_hctx;
+
+	/*
+	 * Called for every command allocated by the block layer to allow
+	 * the driver to set up driver specific data.
+	 *
+	 * Tag greater than or equal to queue_depth is for setting up
+	 * flush request.
+	 *
+	 * Ditto for exit/teardown.
+	 */
+	init_request_fn		*init_request;
+	exit_request_fn		*exit_request;
+	/* Called from inside blk_get_request() */
+	void (*initialize_rq_fn)(struct request *rq);
+
+	/*
+	 * If set, returns whether or not this queue currently is busy
+	 */
+	busy_fn			*busy;
+
+	map_queues_fn		*map_queues;
+
+#ifdef CONFIG_BLK_DEBUG_FS
+	/*
+	 * Used by the debugfs implementation to show driver-specific
+	 * information about a request.
+	 */
+	void (*show_rq)(struct seq_file *m, struct request *rq);
+#endif
+};
+
+#endif
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 929e8abc5535..e32e9293e5a0 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,7 @@
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
 #include <linux/srcu.h>
+#include <linux/blk-mq-ops.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -115,99 +116,6 @@ struct blk_mq_queue_data {
 	bool last;
 };
 
-typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
-		const struct blk_mq_queue_data *);
-/* takes rq->cmd_flags as input, returns a hardware type index */
-typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
-typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
-typedef void (put_budget_fn)(struct blk_mq_hw_ctx *);
-typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
-typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int);
-typedef void (exit_hctx_fn)(struct blk_mq_hw_ctx *, unsigned int);
-typedef int (init_request_fn)(struct blk_mq_tag_set *set, struct request *,
-		unsigned int, unsigned int);
-typedef void (exit_request_fn)(struct blk_mq_tag_set *set, struct request *,
-		unsigned int);
-
-typedef bool (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
-		bool);
-typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);
-typedef int (poll_fn)(struct blk_mq_hw_ctx *, unsigned int);
-typedef int (map_queues_fn)(struct blk_mq_tag_set *set);
-typedef bool (busy_fn)(struct request_queue *);
-typedef void (complete_fn)(struct request *);
-
-
-struct blk_mq_ops {
-	/*
-	 * Queue request
-	 */
-	queue_rq_fn		*queue_rq;
-
-	/*
-	 * Return a queue map type for the given request/bio flags
-	 */
-	rq_flags_to_type_fn	*rq_flags_to_type;
-
-	/*
-	 * Reserve budget before queue request, once .queue_rq is
-	 * run, it is driver's responsibility to release the
-	 * reserved budget. Also we have to handle failure case
-	 * of .get_budget for avoiding I/O deadlock.
-	 */
-	get_budget_fn		*get_budget;
-	put_budget_fn		*put_budget;
-
-	/*
-	 * Called on request timeout
-	 */
-	timeout_fn		*timeout;
-
-	/*
-	 * Called to poll for completion of a specific tag.
-	 */
-	poll_fn			*poll;
-
-	complete_fn		*complete;
-
-	/*
-	 * Called when the block layer side of a hardware queue has been
-	 * set up, allowing the driver to allocate/init matching structures.
-	 * Ditto for exit/teardown.
-	 */
-	init_hctx_fn		*init_hctx;
-	exit_hctx_fn		*exit_hctx;
-
-	/*
-	 * Called for every command allocated by the block layer to allow
-	 * the driver to set up driver specific data.
-	 *
-	 * Tag greater than or equal to queue_depth is for setting up
-	 * flush request.
-	 *
-	 * Ditto for exit/teardown.
-	 */
-	init_request_fn		*init_request;
-	exit_request_fn		*exit_request;
-	/* Called from inside blk_get_request() */
-	void (*initialize_rq_fn)(struct request *rq);
-
-	/*
-	 * If set, returns whether or not this queue currently is busy
-	 */
-	busy_fn			*busy;
-
-	map_queues_fn		*map_queues;
-
-#ifdef CONFIG_BLK_DEBUG_FS
-	/*
-	 * Used by the debugfs implementation to show driver-specific
-	 * information about a request.
-	 */
-	void (*show_rq)(struct seq_file *m, struct request *rq);
-#endif
-};
-
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3712d1fe48d4..ad8474ec8c58 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -28,6 +28,8 @@
 #include <linux/scatterlist.h>
 #include <linux/blkzoned.h>
 
+#include <linux/blk-mq-ops.h>
+
 struct module;
 struct scsi_ioctl_command;
 
@@ -406,7 +408,7 @@ struct request_queue {
 	poll_q_fn		*poll_fn;
 	dma_drain_needed_fn	*dma_drain_needed;
 
-	const struct blk_mq_ops	*mq_ops;
+	const struct blk_mq_ops	mq_ops;
 
 	/* sw queues */
 	struct blk_mq_ctx __percpu	*queue_ctx;
@@ -673,7 +675,7 @@ static inline bool blk_account_rq(struct request *rq)
 
 static inline bool queue_is_mq(struct request_queue *q)
 {
-	return q->mq_ops;
+	return q->mq_ops.queue_rq != NULL;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/11] blk-rq-qos: inline check for q->rq_qos functions
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (2 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 03/11] blk-mq: embed blk_mq_ops directly in the request queue Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 05/11] block: avoid ordered task state change for polled IO Jens Axboe
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Josef Bacik

Put the short code in the fast path, where we don't have any
functions attached to the queue. This minimizes the impact on
the hot path in the core code.

Cleanup duplicated code by having a macro setup both the inline
check and the actual functions.

Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-rq-qos.c | 90 +++++++++++++---------------------------------
 block/blk-rq-qos.h | 35 ++++++++++++++----
 2 files changed, 52 insertions(+), 73 deletions(-)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 0005dfd568dd..266c9e111475 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -27,76 +27,34 @@ bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit)
 	return atomic_inc_below(&rq_wait->inflight, limit);
 }
 
-void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
-{
-	struct rq_qos *rqos;
-
-	for (rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->cleanup)
-			rqos->ops->cleanup(rqos, bio);
-	}
-}
-
-void rq_qos_done(struct request_queue *q, struct request *rq)
-{
-	struct rq_qos *rqos;
-
-	for (rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->done)
-			rqos->ops->done(rqos, rq);
-	}
-}
-
-void rq_qos_issue(struct request_queue *q, struct request *rq)
-{
-	struct rq_qos *rqos;
-
-	for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->issue)
-			rqos->ops->issue(rqos, rq);
-	}
-}
-
-void rq_qos_requeue(struct request_queue *q, struct request *rq)
-{
-	struct rq_qos *rqos;
-
-	for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->requeue)
-			rqos->ops->requeue(rqos, rq);
-	}
+#define __RQ_QOS_FUNC_ONE(__OP, type)					\
+void __rq_qos_##__OP(struct rq_qos *rqos, type arg)			\
+{									\
+	do {								\
+		if ((rqos)->ops->__OP)					\
+			(rqos)->ops->__OP((rqos), arg);			\
+		(rqos) = (rqos)->next;					\
+	} while (rqos);							\
 }
 
-void rq_qos_throttle(struct request_queue *q, struct bio *bio,
-		     spinlock_t *lock)
-{
-	struct rq_qos *rqos;
-
-	for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->throttle)
-			rqos->ops->throttle(rqos, bio, lock);
-	}
+__RQ_QOS_FUNC_ONE(cleanup, struct bio *);
+__RQ_QOS_FUNC_ONE(done, struct request *);
+__RQ_QOS_FUNC_ONE(issue, struct request *);
+__RQ_QOS_FUNC_ONE(requeue, struct request *);
+__RQ_QOS_FUNC_ONE(done_bio, struct bio *);
+
+#define __RQ_QOS_FUNC_TWO(__OP, type1, type2)				\
+void __rq_qos_##__OP(struct rq_qos *rqos, type1 arg1, type2 arg2)	\
+{									\
+	do {								\
+		if ((rqos)->ops->__OP)					\
+			(rqos)->ops->__OP((rqos), arg1, arg2);		\
+		(rqos) = (rqos)->next;					\
+	} while (rqos);							\
 }
 
-void rq_qos_track(struct request_queue *q, struct request *rq, struct bio *bio)
-{
-	struct rq_qos *rqos;
-
-	for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->track)
-			rqos->ops->track(rqos, rq, bio);
-	}
-}
-
-void rq_qos_done_bio(struct request_queue *q, struct bio *bio)
-{
-	struct rq_qos *rqos;
-
-	for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
-		if (rqos->ops->done_bio)
-			rqos->ops->done_bio(rqos, bio);
-	}
-}
+__RQ_QOS_FUNC_TWO(throttle, struct bio *, spinlock_t *);
+__RQ_QOS_FUNC_TWO(track, struct request *, struct bio *);
 
 /*
  * Return true, if we can't increase the depth further by scaling
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 32b02efbfa66..50558a6ea248 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -98,12 +98,33 @@ void rq_depth_scale_up(struct rq_depth *rqd);
 void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle);
 bool rq_depth_calc_max_depth(struct rq_depth *rqd);
 
-void rq_qos_cleanup(struct request_queue *, struct bio *);
-void rq_qos_done(struct request_queue *, struct request *);
-void rq_qos_issue(struct request_queue *, struct request *);
-void rq_qos_requeue(struct request_queue *, struct request *);
-void rq_qos_done_bio(struct request_queue *q, struct bio *bio);
-void rq_qos_throttle(struct request_queue *, struct bio *, spinlock_t *);
-void rq_qos_track(struct request_queue *q, struct request *, struct bio *);
+#define RQ_QOS_FUNC_ONE(__OP, type)					\
+void __rq_qos_##__OP(struct rq_qos *rqos, type arg);			\
+static inline void rq_qos_##__OP(struct request_queue *q, type arg)	\
+{									\
+	if ((q)->rq_qos)						\
+		__rq_qos_##__OP((q)->rq_qos, arg);			\
+}
+
+#define RQ_QOS_FUNC_TWO(__OP, type1, type2)				\
+void __rq_qos_##__OP(struct rq_qos *rqos, type1 arg1, type2 arg2);	\
+static inline void rq_qos_##__OP(struct request_queue *q, type1 arg1,	\
+				 type2 arg2)				\
+{									\
+	if ((q)->rq_qos)						\
+		__rq_qos_##__OP((q)->rq_qos, arg1, arg2);		\
+}
+
+RQ_QOS_FUNC_ONE(cleanup, struct bio *);
+RQ_QOS_FUNC_ONE(done, struct request *);
+RQ_QOS_FUNC_ONE(issue, struct request *);
+RQ_QOS_FUNC_ONE(requeue, struct request *);
+RQ_QOS_FUNC_ONE(done_bio, struct bio *);
+RQ_QOS_FUNC_TWO(throttle, struct bio *, spinlock_t *);
+RQ_QOS_FUNC_TWO(track, struct request *, struct bio *);
+#undef RQ_QOS_FUNC_ONE
+#undef RQ_QOS_FUNC_TWO
+
 void rq_qos_exit(struct request_queue *);
+
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/11] block: avoid ordered task state change for polled IO
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (3 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 04/11] blk-rq-qos: inline check for q->rq_qos functions Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-14  2:29   ` jianchao.wang
  2018-11-13 15:42 ` [PATCH 06/11] block: add polled wakeup task helper Jens Axboe
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Ensure that writes to the dio/bio waiter field are ordered
correctly. With the smp_rmb() before the READ_ONCE() check,
we should be able to use a more relaxed ordering for the
task state setting.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c | 11 +++++++++--
 fs/iomap.c     |  6 +++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c039abfb2052..2f920c03996e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -181,6 +181,7 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
 	struct task_struct *waiter = bio->bi_private;
 
 	WRITE_ONCE(bio->bi_private, NULL);
+	smp_wmb();
 	wake_up_process(waiter);
 }
 
@@ -237,9 +238,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 
 	qc = submit_bio(&bio);
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+
+		smp_rmb();
 		if (!READ_ONCE(bio.bi_private))
 			break;
+
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
 		    !blk_poll(bdev_get_queue(bdev), qc))
 			io_schedule();
@@ -305,6 +309,7 @@ static void blkdev_bio_end_io(struct bio *bio)
 			struct task_struct *waiter = dio->waiter;
 
 			WRITE_ONCE(dio->waiter, NULL);
+			smp_wmb();
 			wake_up_process(waiter);
 		}
 	}
@@ -403,7 +408,9 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 		return -EIOCBQUEUED;
 
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+
+		smp_rmb();
 		if (!READ_ONCE(dio->waiter))
 			break;
 
diff --git a/fs/iomap.c b/fs/iomap.c
index f61d13dfdf09..7898927e758e 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1524,7 +1524,9 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 	if (atomic_dec_and_test(&dio->ref)) {
 		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
+
 			WRITE_ONCE(dio->submit.waiter, NULL);
+			smp_wmb();
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
 			struct inode *inode = file_inode(dio->iocb->ki_filp);
@@ -1888,7 +1890,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			return -EIOCBQUEUED;
 
 		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+
+			smp_rmb();
 			if (!READ_ONCE(dio->submit.waiter))
 				break;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/11] block: add polled wakeup task helper
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (4 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 05/11] block: avoid ordered task state change for polled IO Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:52   ` Keith Busch
  2018-11-13 15:42 ` [PATCH 07/11] block: have ->poll_fn() return number of entries polled Jens Axboe
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

If we're polling for IO on a device that doesn't use interrupts, then
IO completion loop (and wake of task) is done by submitting task itself.
If that is the case, then we don't need to enter the wake_up_process()
function, we can simply mark ourselves as TASK_RUNNING.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c         |  6 ++----
 fs/iomap.c             |  3 +--
 include/linux/blkdev.h | 19 +++++++++++++++++++
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2f920c03996e..0ed9be8906a8 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -181,8 +181,7 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
 	struct task_struct *waiter = bio->bi_private;
 
 	WRITE_ONCE(bio->bi_private, NULL);
-	smp_wmb();
-	wake_up_process(waiter);
+	blk_wake_io_task(waiter);
 }
 
 static ssize_t
@@ -309,8 +308,7 @@ static void blkdev_bio_end_io(struct bio *bio)
 			struct task_struct *waiter = dio->waiter;
 
 			WRITE_ONCE(dio->waiter, NULL);
-			smp_wmb();
-			wake_up_process(waiter);
+			blk_wake_io_task(waiter);
 		}
 	}
 
diff --git a/fs/iomap.c b/fs/iomap.c
index 7898927e758e..a182699e28db 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1526,8 +1526,7 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 			struct task_struct *waiter = dio->submit.waiter;
 
 			WRITE_ONCE(dio->submit.waiter, NULL);
-			smp_wmb();
-			wake_up_process(waiter);
+			blk_wake_io_task(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
 			struct inode *inode = file_inode(dio->iocb->ki_filp);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ad8474ec8c58..d1ef8cbbea04 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1798,4 +1798,23 @@ static inline int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 
 #endif /* CONFIG_BLOCK */
 
+static inline void blk_wake_io_task(struct task_struct *waiter)
+{
+	/*
+	 * If we're polling, the task itself is doing the completions. For
+	 * that case, we don't need to signal a wakeup, it's enough to just
+	 * mark us as RUNNING.
+	 */
+	if (waiter == current)
+		__set_current_state(TASK_RUNNING);
+	else {
+		/*
+		 * Ensure the callers waiter store is ordered and seen
+		 * by the ->bi_end_io() function.
+		 */
+		smp_wmb();
+		wake_up_process(waiter);
+	}
+}
+
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/11] block: have ->poll_fn() return number of entries polled
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (5 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 06/11] block: add polled wakeup task helper Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 08/11] blk-mq: when polling for IO, look for any completion Jens Axboe
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We currently only really support sync poll, ie poll with 1
IO in flight. This prepares us for supporting async poll.

Note that the returned value isn't necessarily 100% accurate.
If poll races with IRQ completion, we assume that the fact
that the task is now runnable means we found at least one
entry. In reality it could be more than 1, or not even 1.
This is fine, the caller will just need to take this into
account.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c                | 18 +++++++++---------
 drivers/nvme/host/multipath.c |  4 ++--
 include/linux/blkdev.h        |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6e0cb6adfc90..f8c2e6544903 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -38,7 +38,7 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -3305,7 +3305,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	return true;
 }
 
-static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
+static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 {
 	struct request_queue *q = hctx->queue;
 	long state;
@@ -3318,7 +3318,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	 * straight to the busy poll loop.
 	 */
 	if (blk_mq_poll_hybrid_sleep(q, hctx, rq))
-		return true;
+		return 1;
 
 	hctx->poll_considered++;
 
@@ -3332,30 +3332,30 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 		if (ret > 0) {
 			hctx->poll_success++;
 			set_current_state(TASK_RUNNING);
-			return true;
+			return ret;
 		}
 
 		if (signal_pending_state(state, current))
 			set_current_state(TASK_RUNNING);
 
 		if (current->state == TASK_RUNNING)
-			return true;
+			return 1;
 		if (ret < 0)
 			break;
 		cpu_relax();
 	}
 
 	__set_current_state(TASK_RUNNING);
-	return false;
+	return 0;
 }
 
-static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_mq_hw_ctx *hctx;
 	struct request *rq;
 
 	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
-		return false;
+		return 0;
 
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
 	if (!blk_qc_t_is_internal(cookie))
@@ -3369,7 +3369,7 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 		 * so we should be safe with just the NULL check.
 		 */
 		if (!rq)
-			return false;
+			return 0;
 	}
 
 	return __blk_mq_poll(hctx, rq);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 5e3cc8c59a39..0484c1f9c8ce 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,11 +220,11 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 	return ret;
 }
 
-static bool nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
+static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
 {
 	struct nvme_ns_head *head = q->queuedata;
 	struct nvme_ns *ns;
-	bool found = false;
+	int found = 0;
 	int srcu_idx;
 
 	srcu_idx = srcu_read_lock(&head->srcu);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d1ef8cbbea04..b2af6c68b78d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -287,7 +287,7 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
-typedef bool (poll_q_fn) (struct request_queue *q, blk_qc_t);
+typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/11] blk-mq: when polling for IO, look for any completion
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (6 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 07/11] block: have ->poll_fn() return number of entries polled Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 09/11] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

If we want to support async IO polling, then we have to allow
finding completions that aren't just for the one we are
looking for. Always pass in -1 to the mq_ops->poll() helper,
and have that return how many events were found in this poll
loop.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c          | 69 +++++++++++++++++++++++------------------
 drivers/nvme/host/pci.c | 32 +++++++++----------
 2 files changed, 54 insertions(+), 47 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f8c2e6544903..03b1af0151ca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3266,9 +3266,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	 *  0:	use half of prev avg
 	 * >0:	use this specific value
 	 */
-	if (q->poll_nsec == -1)
-		return false;
-	else if (q->poll_nsec > 0)
+	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
 		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
@@ -3305,21 +3303,36 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	return true;
 }
 
-static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
+static bool blk_mq_poll_hybrid(struct request_queue *q,
+			       struct blk_mq_hw_ctx *hctx, blk_qc_t cookie)
+{
+	struct request *rq;
+
+	if (q->poll_nsec == -1)
+		return false;
+
+	if (!blk_qc_t_is_internal(cookie))
+		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
+	else {
+		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
+		/*
+		 * With scheduling, if the request has completed, we'll
+		 * get a NULL return here, as we clear the sched tag when
+		 * that happens. The request still remains valid, like always,
+		 * so we should be safe with just the NULL check.
+		 */
+		if (!rq)
+			return false;
+	}
+
+	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+}
+
+static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	long state;
 
-	/*
-	 * If we sleep, have the caller restart the poll loop to reset
-	 * the state. Like for the other success return cases, the
-	 * caller is responsible for checking if the IO completed. If
-	 * the IO isn't complete, we'll get called again and will go
-	 * straight to the busy poll loop.
-	 */
-	if (blk_mq_poll_hybrid_sleep(q, hctx, rq))
-		return 1;
-
 	hctx->poll_considered++;
 
 	state = current->state;
@@ -3328,7 +3341,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 
 		hctx->poll_invoked++;
 
-		ret = q->mq_ops.poll(hctx, rq->tag);
+		ret = q->mq_ops.poll(hctx, -1U);
 		if (ret > 0) {
 			hctx->poll_success++;
 			set_current_state(TASK_RUNNING);
@@ -3352,27 +3365,23 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_mq_hw_ctx *hctx;
-	struct request *rq;
 
 	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		return 0;
 
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
-	if (!blk_qc_t_is_internal(cookie))
-		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
-	else {
-		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
-		/*
-		 * With scheduling, if the request has completed, we'll
-		 * get a NULL return here, as we clear the sched tag when
-		 * that happens. The request still remains valid, like always,
-		 * so we should be safe with just the NULL check.
-		 */
-		if (!rq)
-			return 0;
-	}
 
-	return __blk_mq_poll(hctx, rq);
+	/*
+	 * If we sleep, have the caller restart the poll loop to reset
+	 * the state. Like for the other success return cases, the
+	 * caller is responsible for checking if the IO completed. If
+	 * the IO isn't complete, we'll get called again and will go
+	 * straight to the busy poll loop.
+	 */
+	if (blk_mq_poll_hybrid(q, hctx, cookie))
+		return 1;
+
+	return __blk_mq_poll(hctx);
 }
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb22ae567208..adeb8f516bf9 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -995,13 +995,18 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 	nvme_end_request(req, cqe->status, cqe->result);
 }
 
-static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
+static int nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
 {
+	int nr = 0;
+
 	while (start != end) {
+		nr++;
 		nvme_handle_cqe(nvmeq, start);
 		if (++start == nvmeq->q_depth)
 			start = 0;
 	}
+
+	return nr;
 }
 
 static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
@@ -1012,22 +1017,17 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
 	}
 }
 
-static inline bool nvme_process_cq(struct nvme_queue *nvmeq, u16 *start,
-		u16 *end, int tag)
+static inline void nvme_process_cq(struct nvme_queue *nvmeq, u16 *start,
+				   u16 *end)
 {
-	bool found = false;
-
 	*start = nvmeq->cq_head;
-	while (!found && nvme_cqe_pending(nvmeq)) {
-		if (nvmeq->cqes[nvmeq->cq_head].command_id == tag)
-			found = true;
+	while (nvme_cqe_pending(nvmeq))
 		nvme_update_cq_head(nvmeq);
-	}
+
 	*end = nvmeq->cq_head;
 
 	if (*start != *end)
 		nvme_ring_cq_doorbell(nvmeq);
-	return found;
 }
 
 static irqreturn_t nvme_irq(int irq, void *data)
@@ -1039,7 +1039,7 @@ static irqreturn_t nvme_irq(int irq, void *data)
 	spin_lock(&nvmeq->cq_lock);
 	if (nvmeq->cq_head != nvmeq->last_cq_head)
 		ret = IRQ_HANDLED;
-	nvme_process_cq(nvmeq, &start, &end, -1);
+	nvme_process_cq(nvmeq, &start, &end);
 	nvmeq->last_cq_head = nvmeq->cq_head;
 	spin_unlock(&nvmeq->cq_lock);
 
@@ -1062,7 +1062,6 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 {
 	u16 start, end;
-	bool found;
 
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
@@ -1074,14 +1073,13 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 		local_irq_disable();
 
 	spin_lock(&nvmeq->cq_lock);
-	found = nvme_process_cq(nvmeq, &start, &end, tag);
+	nvme_process_cq(nvmeq, &start, &end);
 	spin_unlock(&nvmeq->cq_lock);
 
 	if (!nvmeq->polled)
 		local_irq_enable();
 
-	nvme_complete_cqes(nvmeq, start, end);
-	return found;
+	return nvme_complete_cqes(nvmeq, start, end);
 }
 
 static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
@@ -1414,7 +1412,7 @@ static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 		nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
 
 	spin_lock_irq(&nvmeq->cq_lock);
-	nvme_process_cq(nvmeq, &start, &end, -1);
+	nvme_process_cq(nvmeq, &start, &end);
 	spin_unlock_irq(&nvmeq->cq_lock);
 
 	nvme_complete_cqes(nvmeq, start, end);
@@ -2209,7 +2207,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
 		unsigned long flags;
 
 		spin_lock_irqsave(&nvmeq->cq_lock, flags);
-		nvme_process_cq(nvmeq, &start, &end, -1);
+		nvme_process_cq(nvmeq, &start, &end);
 		spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
 
 		nvme_complete_cqes(nvmeq, start, end);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/11] block: make blk_poll() take a parameter on whether to spin or not
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (7 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 08/11] blk-mq: when polling for IO, look for any completion Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to Jens Axboe
  2018-11-13 15:42 ` [PATCH 11/11] block: don't plug for aio/O_DIRECT HIPRI IO Jens Axboe
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

blk_poll() has always kept spinning until it found an IO. This is
fine for SYNC polling, since we need to find one request we have
pending, but in preparation for ASYNC polling it can be beneficial
to just check if we have any entries available or not.

Existing callers are converted to pass in 'spin == true', to retain
the old behavior.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c                  |  4 ++--
 block/blk-mq.c                    | 10 +++++-----
 drivers/nvme/host/multipath.c     |  4 ++--
 drivers/nvme/target/io-cmd-bdev.c |  2 +-
 fs/block_dev.c                    |  4 ++--
 fs/direct-io.c                    |  2 +-
 fs/iomap.c                        |  2 +-
 include/linux/blkdev.h            |  4 ++--
 mm/page_io.c                      |  2 +-
 9 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 88400ab166ac..0500c693fdae 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1363,14 +1363,14 @@ blk_qc_t submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
-bool blk_poll(struct request_queue *q, blk_qc_t cookie)
+bool blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	if (!q->poll_fn || !blk_qc_t_valid(cookie))
 		return false;
 
 	if (current->plug)
 		blk_flush_plug_list(current->plug, false);
-	return q->poll_fn(q, cookie);
+	return q->poll_fn(q, cookie, spin);
 }
 EXPORT_SYMBOL_GPL(blk_poll);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 03b1af0151ca..a4043b9a27f5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -38,7 +38,7 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -3328,7 +3328,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
+static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, bool spin)
 {
 	struct request_queue *q = hctx->queue;
 	long state;
@@ -3353,7 +3353,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 
 		if (current->state == TASK_RUNNING)
 			return 1;
-		if (ret < 0)
+		if (ret < 0 || !spin)
 			break;
 		cpu_relax();
 	}
@@ -3362,7 +3362,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 	return 0;
 }
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 
@@ -3381,7 +3381,7 @@ static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 	if (blk_mq_poll_hybrid(q, hctx, cookie))
 		return 1;
 
-	return __blk_mq_poll(hctx);
+	return __blk_mq_poll(hctx, spin);
 }
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 0484c1f9c8ce..243da973416f 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,7 +220,7 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 	return ret;
 }
 
-static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
+static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc, bool spin)
 {
 	struct nvme_ns_head *head = q->queuedata;
 	struct nvme_ns *ns;
@@ -230,7 +230,7 @@ static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
 	srcu_idx = srcu_read_lock(&head->srcu);
 	ns = srcu_dereference(head->current_path[numa_node_id()], &head->srcu);
 	if (likely(ns && nvme_path_is_optimized(ns)))
-		found = ns->queue->poll_fn(q, qc);
+		found = ns->queue->poll_fn(q, qc, spin);
 	srcu_read_unlock(&head->srcu, srcu_idx);
 	return found;
 }
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c1ec3475a140..f6971b45bc54 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -116,7 +116,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 
 	cookie = submit_bio(bio);
 
-	blk_poll(bdev_get_queue(req->ns->bdev), cookie);
+	blk_poll(bdev_get_queue(req->ns->bdev), cookie, true);
 }
 
 static void nvmet_bdev_execute_flush(struct nvmet_req *req)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0ed9be8906a8..7810f5b588ea 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -244,7 +244,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 			break;
 
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(bdev), qc))
+		    !blk_poll(bdev_get_queue(bdev), qc, true))
 			io_schedule();
 	}
 	__set_current_state(TASK_RUNNING);
@@ -413,7 +413,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 			break;
 
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(bdev), qc))
+		    !blk_poll(bdev_get_queue(bdev), qc, true))
 			io_schedule();
 	}
 	__set_current_state(TASK_RUNNING);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index ea07d5a34317..a5a4e5a1423e 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -518,7 +518,7 @@ static struct bio *dio_await_one(struct dio *dio)
 		dio->waiter = current;
 		spin_unlock_irqrestore(&dio->bio_lock, flags);
 		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(dio->bio_disk->queue, dio->bio_cookie))
+		    !blk_poll(dio->bio_disk->queue, dio->bio_cookie, true))
 			io_schedule();
 		/* wake up sets us TASK_RUNNING */
 		spin_lock_irqsave(&dio->bio_lock, flags);
diff --git a/fs/iomap.c b/fs/iomap.c
index a182699e28db..0cd680ab753b 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1898,7 +1898,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			if (!(iocb->ki_flags & IOCB_HIPRI) ||
 			    !dio->submit.last_queue ||
 			    !blk_poll(dio->submit.last_queue,
-					 dio->submit.cookie))
+					 dio->submit.cookie, true))
 				io_schedule();
 		}
 		__set_current_state(TASK_RUNNING);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b2af6c68b78d..0bcb51dbde10 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -287,7 +287,7 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
-typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t);
+typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t, bool spin);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
@@ -893,7 +893,7 @@ extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 int blk_status_to_errno(blk_status_t status);
 blk_status_t errno_to_blk_status(int errno);
 
-bool blk_poll(struct request_queue *q, blk_qc_t cookie);
+bool blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
 
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
diff --git a/mm/page_io.c b/mm/page_io.c
index d4d1c89bcddd..64ddac7b7bc0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -409,7 +409,7 @@ int swap_readpage(struct page *page, bool synchronous)
 		if (!READ_ONCE(bio->bi_private))
 			break;
 
-		if (!blk_poll(disk->queue, qc))
+		if (!blk_poll(disk->queue, qc, true))
 			break;
 	}
 	__set_current_state(TASK_RUNNING);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (8 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 09/11] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  2018-11-13 15:42 ` [PATCH 11/11] block: don't plug for aio/O_DIRECT HIPRI IO Jens Axboe
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
those kinds of requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7810f5b588ea..c124982b810d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -386,6 +386,9 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 
 		nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
 		if (!nr_pages) {
+			if (iocb->ki_flags & IOCB_HIPRI)
+				bio->bi_opf |= REQ_HIPRI;
+
 			qc = submit_bio(bio);
 			break;
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/11] block: don't plug for aio/O_DIRECT HIPRI IO
  2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
                   ` (9 preceding siblings ...)
  2018-11-13 15:42 ` [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to Jens Axboe
@ 2018-11-13 15:42 ` Jens Axboe
  10 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:42 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Those will go straight to issue inside blk-mq, so don't bother
setting up a block plug for them.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c124982b810d..9dc695a3af4e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -356,7 +356,13 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 	dio->multi_bio = false;
 	dio->should_dirty = is_read && iter_is_iovec(iter);
 
-	blk_start_plug(&plug);
+	/*
+	 * Don't plug for HIPRI/polled IO, as those should go straight
+	 * to issue
+	 */
+	if (!(iocb->ki_flags & IOCB_HIPRI))
+		blk_start_plug(&plug);
+
 	for (;;) {
 		bio_set_dev(bio, bdev);
 		bio->bi_iter.bi_sector = pos >> 9;
@@ -403,7 +409,9 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 		submit_bio(bio);
 		bio = bio_alloc(GFP_KERNEL, nr_pages);
 	}
-	blk_finish_plug(&plug);
+
+	if (!(iocb->ki_flags & IOCB_HIPRI))
+		blk_finish_plug(&plug);
 
 	if (!is_sync)
 		return -EIOCBQUEUED;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/11] block: add polled wakeup task helper
  2018-11-13 15:42 ` [PATCH 06/11] block: add polled wakeup task helper Jens Axboe
@ 2018-11-13 15:52   ` Keith Busch
  2018-11-13 15:58     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2018-11-13 15:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Tue, Nov 13, 2018 at 08:42:28AM -0700, Jens Axboe wrote:
> If we're polling for IO on a device that doesn't use interrupts, then
> IO completion loop (and wake of task) is done by submitting task itself.
> If that is the case, then we don't need to enter the wake_up_process()
> function, we can simply mark ourselves as TASK_RUNNING.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/block_dev.c         |  6 ++----
>  fs/iomap.c             |  3 +--
>  include/linux/blkdev.h | 19 +++++++++++++++++++
>  3 files changed, 22 insertions(+), 6 deletions(-)

One more for swap read:

---
diff --git a/mm/page_io.c b/mm/page_io.c
index d4d1c89bcddd..57572ff46016 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -140,7 +140,7 @@ static void end_swap_bio_read(struct bio *bio)
 	unlock_page(page);
 	WRITE_ONCE(bio->bi_private, NULL);
 	bio_put(bio);
-	wake_up_process(waiter);
+	blk_wake_io_task(waiter);
 	put_task_struct(waiter);
 }
 
--
 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 2f920c03996e..0ed9be8906a8 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -181,8 +181,7 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>  	struct task_struct *waiter = bio->bi_private;
>  
>  	WRITE_ONCE(bio->bi_private, NULL);
> -	smp_wmb();
> -	wake_up_process(waiter);
> +	blk_wake_io_task(waiter);
>  }
>  
>  static ssize_t
> @@ -309,8 +308,7 @@ static void blkdev_bio_end_io(struct bio *bio)
>  			struct task_struct *waiter = dio->waiter;
>  
>  			WRITE_ONCE(dio->waiter, NULL);
> -			smp_wmb();
> -			wake_up_process(waiter);
> +			blk_wake_io_task(waiter);
>  		}
>  	}
>  
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 7898927e758e..a182699e28db 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1526,8 +1526,7 @@ static void iomap_dio_bio_end_io(struct bio *bio)
>  			struct task_struct *waiter = dio->submit.waiter;
>  
>  			WRITE_ONCE(dio->submit.waiter, NULL);
> -			smp_wmb();
> -			wake_up_process(waiter);
> +			blk_wake_io_task(waiter);
>  		} else if (dio->flags & IOMAP_DIO_WRITE) {
>  			struct inode *inode = file_inode(dio->iocb->ki_filp);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index ad8474ec8c58..d1ef8cbbea04 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1798,4 +1798,23 @@ static inline int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
>  
>  #endif /* CONFIG_BLOCK */
>  
> +static inline void blk_wake_io_task(struct task_struct *waiter)
> +{
> +	/*
> +	 * If we're polling, the task itself is doing the completions. For
> +	 * that case, we don't need to signal a wakeup, it's enough to just
> +	 * mark us as RUNNING.
> +	 */
> +	if (waiter == current)
> +		__set_current_state(TASK_RUNNING);
> +	else {
> +		/*
> +		 * Ensure the callers waiter store is ordered and seen
> +		 * by the ->bi_end_io() function.
> +		 */
> +		smp_wmb();
> +		wake_up_process(waiter);
> +	}
> +}
> +
>  #endif
> -- 
> 2.17.1
> 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/11] block: add polled wakeup task helper
  2018-11-13 15:52   ` Keith Busch
@ 2018-11-13 15:58     ` Jens Axboe
  2018-11-13 16:59       ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 15:58 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block

On 11/13/18 8:52 AM, Keith Busch wrote:
> On Tue, Nov 13, 2018 at 08:42:28AM -0700, Jens Axboe wrote:
>> If we're polling for IO on a device that doesn't use interrupts, then
>> IO completion loop (and wake of task) is done by submitting task itself.
>> If that is the case, then we don't need to enter the wake_up_process()
>> function, we can simply mark ourselves as TASK_RUNNING.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  fs/block_dev.c         |  6 ++----
>>  fs/iomap.c             |  3 +--
>>  include/linux/blkdev.h | 19 +++++++++++++++++++
>>  3 files changed, 22 insertions(+), 6 deletions(-)
> 
> One more for swap read:
> 
> ---
> diff --git a/mm/page_io.c b/mm/page_io.c
> index d4d1c89bcddd..57572ff46016 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -140,7 +140,7 @@ static void end_swap_bio_read(struct bio *bio)
>  	unlock_page(page);
>  	WRITE_ONCE(bio->bi_private, NULL);
>  	bio_put(bio);
> -	wake_up_process(waiter);
> +	blk_wake_io_task(waiter);
>  	put_task_struct(waiter);
>  }
>  

Indeed, thanks Keith! I'll fold it in.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/11] block: add polled wakeup task helper
  2018-11-13 15:58     ` Jens Axboe
@ 2018-11-13 16:59       ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-13 16:59 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block

On 11/13/18 8:58 AM, Jens Axboe wrote:
> On 11/13/18 8:52 AM, Keith Busch wrote:
>> On Tue, Nov 13, 2018 at 08:42:28AM -0700, Jens Axboe wrote:
>>> If we're polling for IO on a device that doesn't use interrupts, then
>>> IO completion loop (and wake of task) is done by submitting task itself.
>>> If that is the case, then we don't need to enter the wake_up_process()
>>> function, we can simply mark ourselves as TASK_RUNNING.
>>>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>> ---
>>>  fs/block_dev.c         |  6 ++----
>>>  fs/iomap.c             |  3 +--
>>>  include/linux/blkdev.h | 19 +++++++++++++++++++
>>>  3 files changed, 22 insertions(+), 6 deletions(-)
>>
>> One more for swap read:
>>
>> ---
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index d4d1c89bcddd..57572ff46016 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -140,7 +140,7 @@ static void end_swap_bio_read(struct bio *bio)
>>  	unlock_page(page);
>>  	WRITE_ONCE(bio->bi_private, NULL);
>>  	bio_put(bio);
>> -	wake_up_process(waiter);
>> +	blk_wake_io_task(waiter);
>>  	put_task_struct(waiter);
>>  }
>>  
> 
> Indeed, thanks Keith! I'll fold it in.

Ditto for the patch 5 changes, jfyi.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/11] block: avoid ordered task state change for polled IO
  2018-11-13 15:42 ` [PATCH 05/11] block: avoid ordered task state change for polled IO Jens Axboe
@ 2018-11-14  2:29   ` jianchao.wang
  2018-11-14  2:35     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: jianchao.wang @ 2018-11-14  2:29 UTC (permalink / raw)
  To: Jens Axboe, linux-block

Hi Jens

On 11/13/18 11:42 PM, Jens Axboe wrote:
> @@ -181,6 +181,7 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>  	struct task_struct *waiter = bio->bi_private;
>  
>  	WRITE_ONCE(bio->bi_private, NULL);
> +	smp_wmb();
>  	wake_up_process(waiter);
>  }


The wake up path has contained a full memory barrier with the raw_spin_lock
and following smp_mb__after_spinlock

Please refer to:

wake_up_process
  -> try_to_wake_up

	raw_spin_lock_irqsave(&p->pi_lock, flags);
	smp_mb__after_spinlock();

So the smp_wmb here is not necessary.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/11] block: avoid ordered task state change for polled IO
  2018-11-14  2:29   ` jianchao.wang
@ 2018-11-14  2:35     ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-14  2:35 UTC (permalink / raw)
  To: jianchao.wang, linux-block

On 11/13/18 7:29 PM, jianchao.wang wrote:
> Hi Jens
> 
> On 11/13/18 11:42 PM, Jens Axboe wrote:
>> @@ -181,6 +181,7 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>>  	struct task_struct *waiter = bio->bi_private;
>>  
>>  	WRITE_ONCE(bio->bi_private, NULL);
>> +	smp_wmb();
>>  	wake_up_process(waiter);
>>  }
> 
> 
> The wake up path has contained a full memory barrier with the raw_spin_lock
> and following smp_mb__after_spinlock
> 
> Please refer to:
> 
> wake_up_process
>   -> try_to_wake_up
> 
> 	raw_spin_lock_irqsave(&p->pi_lock, flags);
> 	smp_mb__after_spinlock();
> 
> So the smp_wmb here is not necessary.

And I guess we don't need it for the next patch either, since the path
that doesn't take wake_up_process() is local.

I'll update them (again), thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/11] block: add queue_is_mq() helper
  2018-11-13 15:42 ` [PATCH 02/11] block: add queue_is_mq() helper Jens Axboe
@ 2018-11-14 15:23   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-11-14 15:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Tue, Nov 13, 2018 at 08:42:24AM -0700, Jens Axboe wrote:
> Various spots check for q->mq_ops being non-NULL, but provide
> a helper to do this instead.
> 
> Where the ->mq_ops != NULL check is redundant, remove it.

With the legacy request path gone this isn't really any different from
queue_is_rq_based.  So either switch everyone to queue_is_rq_based,
or switch all queue_is_rq_based users to queue_is_mq as well and
delete it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to
  2018-11-16  8:48     ` Christoph Hellwig
@ 2018-11-16 15:19       ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-11-16 15:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 11/16/18 1:48 AM, Christoph Hellwig wrote:
> On Fri, Nov 16, 2018 at 12:47:39AM -0800, Christoph Hellwig wrote:
>> On Thu, Nov 15, 2018 at 12:51:34PM -0700, Jens Axboe wrote:
>>> Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
>>> those kinds of requests.
>>
>> Looks fine,
> 
> Actually.  Who is going to poll for them?  With the separate poll
> queue this means they are now on the poll queue where there are
> no interrupts, but aio won't end up calling blk_poll..

It's a prep patch, I have patches that enable polling for libaio.
So aio will call blk_poll.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to
  2018-11-16  8:47   ` Christoph Hellwig
@ 2018-11-16  8:48     ` Christoph Hellwig
  2018-11-16 15:19       ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2018-11-16  8:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Fri, Nov 16, 2018 at 12:47:39AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 15, 2018 at 12:51:34PM -0700, Jens Axboe wrote:
> > Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
> > those kinds of requests.
> 
> Looks fine,

Actually.  Who is going to poll for them?  With the separate poll
queue this means they are now on the poll queue where there are
no interrupts, but aio won't end up calling blk_poll..

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to
  2018-11-15 19:51 ` [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to Jens Axboe
@ 2018-11-16  8:47   ` Christoph Hellwig
  2018-11-16  8:48     ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2018-11-16  8:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Thu, Nov 15, 2018 at 12:51:34PM -0700, Jens Axboe wrote:
> Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
> those kinds of requests.

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to
  2018-11-15 19:51 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
@ 2018-11-15 19:51 ` Jens Axboe
  2018-11-16  8:47   ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-11-15 19:51 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
those kinds of requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7810f5b588ea..c124982b810d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -386,6 +386,9 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 
 		nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
 		if (!nr_pages) {
+			if (iocb->ki_flags & IOCB_HIPRI)
+				bio->bi_opf |= REQ_HIPRI;
+
 			qc = submit_bio(bio);
 			break;
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-11-16 15:19 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-13 15:42 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
2018-11-13 15:42 ` [PATCH 01/11] nvme: don't disable local ints for polled queue Jens Axboe
2018-11-13 15:42   ` Jens Axboe
2018-11-13 15:42 ` [PATCH 02/11] block: add queue_is_mq() helper Jens Axboe
2018-11-14 15:23   ` Christoph Hellwig
2018-11-13 15:42 ` [PATCH 03/11] blk-mq: embed blk_mq_ops directly in the request queue Jens Axboe
2018-11-13 15:42 ` [PATCH 04/11] blk-rq-qos: inline check for q->rq_qos functions Jens Axboe
2018-11-13 15:42 ` [PATCH 05/11] block: avoid ordered task state change for polled IO Jens Axboe
2018-11-14  2:29   ` jianchao.wang
2018-11-14  2:35     ` Jens Axboe
2018-11-13 15:42 ` [PATCH 06/11] block: add polled wakeup task helper Jens Axboe
2018-11-13 15:52   ` Keith Busch
2018-11-13 15:58     ` Jens Axboe
2018-11-13 16:59       ` Jens Axboe
2018-11-13 15:42 ` [PATCH 07/11] block: have ->poll_fn() return number of entries polled Jens Axboe
2018-11-13 15:42 ` [PATCH 08/11] blk-mq: when polling for IO, look for any completion Jens Axboe
2018-11-13 15:42 ` [PATCH 09/11] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
2018-11-13 15:42 ` [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to Jens Axboe
2018-11-13 15:42 ` [PATCH 11/11] block: don't plug for aio/O_DIRECT HIPRI IO Jens Axboe
2018-11-15 19:51 [PATCHSET v2 0/11] Various block optimizations Jens Axboe
2018-11-15 19:51 ` [PATCH 10/11] block: for async O_DIRECT, mark us as polling if asked to Jens Axboe
2018-11-16  8:47   ` Christoph Hellwig
2018-11-16  8:48     ` Christoph Hellwig
2018-11-16 15:19       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.