linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] block: loop: improve loop with AIO
@ 2015-06-09 13:49 Ming Lei
  2015-06-09 13:49 ` [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read Ming Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-09 13:49 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Dave Chinner

Hi Guys,

There are about 3 advantages to use direct I/O and AIO on
read/write loop's backing file:

1) double cache can be avoided, then memory usage gets
decreased a lot

2) not like user space direct I/O, there isn't cost of
pinning pages

3) avoid context switch for obtaining good throughput
- in buffered file read, random I/O throughput is often obtained
only if they are submitted concurrently from lots of tasks; but for
sequential I/O, most of times they can be hit from page cache, so
concurrent submissions often introduce unnecessary context switch
and can't improve throughput much. There was such discussion[1]
to use non-blocking I/O to improve the problem for application.
- with direct I/O and AIO, concurrent submissions can be
avoided and random read throughput can't be affected meantime

So this patchset trys to improve loop via AIO, and about 45% memory
usage can be decreased, see detailed data in commit log of patch4,
also IO throughput isn't affected too.

V5:
	- don't introduce IOCB_DONT_DIRTY_PAGE and bypass dirtying
	for ITER_KVEC and ITER_BVEC direct IO(read), as required by
	Christoph

V4:
	- add detailed commit log for 'use kthread_work'
	- allow userspace(sysfs, losetup) to decide if dio/aio is
	used as suggested by Christoph and Dave Chinner
	- only use dio if the backing block device's min io size
	is 512 as pointed by Dave Chinner & Christoph
V3:
	- based on Al's iov_iter work and Christoph's kiocb changes
	- use kthread_work
	- introduce IOCB_DONT_DIRTY_PAGE flag
	- set QUEUE_FLAG_NOMERGES for loop's request queue
V2:
	- remove 'extra' parameter to aio_kernel_alloc()
	- try to avoid memory allcation inside queue req callback
	- introduce 'use_mq' sysfs file for enabling kernel aio or disabling it
V1:
	- link:
		http://marc.info/?t=140803157700004&r=1&w=2
	- improve failure path in aio_kernel_submit()

 drivers/block/loop.c | 229 ++++++++++++++++++++++++++++++++++++++++++++++++++----------------
 drivers/block/loop.h |  13 ++--
 fs/direct-io.c       |   9 ++-
 3 files changed, 186 insertions(+), 65 deletions(-)

Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read
  2015-06-09 13:49 [PATCH v5 0/5] block: loop: improve loop with AIO Ming Lei
@ 2015-06-09 13:49 ` Ming Lei
  2015-06-10  7:35   ` Christoph Hellwig
  2015-06-09 13:49 ` [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2015-06-09 13:49 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Dave Chinner, Ming Lei

When direct read IO is submitted from kernel, it is often
unnecessary to dirty pages, for example of loop, dirtying pages
have been considered in the upper filesystem(over loop) side
already, and they don't need to be dirtied again.

So this patch doesn't dirtying pages for ITER_BVEC/ITER_KVEC
direct read, and loop should be the 1st case to use ITER_BVEC/ITER_KVEC
for direct read I/O.

The patch is based on previous Dave's patch.

Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 fs/direct-io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 745d234..07e4b28 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -120,6 +120,7 @@ struct dio {
 	int page_errors;		/* errno from get_user_pages() */
 	int is_async;			/* is IO async ? */
 	bool defer_completion;		/* defer AIO completion to workqueue? */
+	bool should_dirty;		/* if pages should be dirtied */
 	int io_error;			/* IO error in completion path */
 	unsigned long refcount;		/* direct_io_worker() and bios */
 	struct bio *bio_list;		/* singly linked via bi_private */
@@ -393,7 +394,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
 	dio->refcount++;
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
-	if (dio->is_async && dio->rw == READ)
+	if (dio->is_async && dio->rw == READ && dio->should_dirty)
 		bio_set_pages_dirty(bio);
 
 	if (sdio->submit_io)
@@ -464,13 +465,14 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 	if (!uptodate)
 		dio->io_error = -EIO;
 
-	if (dio->is_async && dio->rw == READ) {
+	if (dio->is_async && dio->rw == READ && dio->should_dirty) {
 		bio_check_pages_dirty(bio);	/* transfers ownership */
 	} else {
 		bio_for_each_segment_all(bvec, bio, i) {
 			struct page *page = bvec->bv_page;
 
-			if (dio->rw == READ && !PageCompound(page))
+			if (dio->rw == READ && !PageCompound(page) &&
+					dio->should_dirty)
 				set_page_dirty_lock(page);
 			page_cache_release(page);
 		}
@@ -1217,6 +1219,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;
 
+	dio->should_dirty = (iter->type == ITER_IOVEC);
 	sdio.iter = iter;
 	sdio.final_block_in_request =
 		(offset + iov_iter_count(iter)) >> blkbits;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop
  2015-06-09 13:49 [PATCH v5 0/5] block: loop: improve loop with AIO Ming Lei
  2015-06-09 13:49 ` [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read Ming Lei
@ 2015-06-09 13:49 ` Ming Lei
  2015-06-10  7:36   ` Christoph Hellwig
  2015-06-09 13:49 ` [PATCH v5 3/5] block: loop: use kthread_work Ming Lei
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2015-06-09 13:49 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Dave Chinner, Ming Lei

It doesn't make sense to enable merge because the I/O
submitted to backing file is handled page by page.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 40580dc..10cc583 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1595,6 +1595,12 @@ static int loop_add(struct loop_device **l, int i)
 	}
 	lo->lo_queue->queuedata = lo;
 
+	/*
+	 * It doesn't make sense to enable merge because the I/O
+	 * submitted to backing file is handled page by page.
+	 */
+	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, lo->lo_queue);
+
 	INIT_LIST_HEAD(&lo->write_cmd_head);
 	INIT_WORK(&lo->write_work, loop_queue_write_work);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/5] block: loop: use kthread_work
  2015-06-09 13:49 [PATCH v5 0/5] block: loop: improve loop with AIO Ming Lei
  2015-06-09 13:49 ` [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read Ming Lei
  2015-06-09 13:49 ` [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
@ 2015-06-09 13:49 ` Ming Lei
  2015-06-09 13:49 ` [PATCH v5 4/5] block: loop: prepare for supporing direct IO Ming Lei
  2015-06-09 13:49 ` [PATCH v5 5/5] block: loop: support DIO & AIO Ming Lei
  4 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-09 13:49 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Dave Chinner, Ming Lei

The following patch will use dio/aio to submit IO to backing file,
then it needn't to schedule IO concurrently from work, so
use kthread_work for decreasing context switch cost a lot.

For non-AIO case, single thread has been used for long long time,
and it was just converted to work in v4.0, which has caused performance
regression for fedora live booting already. In discussion[1], even
though submitting I/O via work concurrently can improve random read IO
throughput, meantime it might hurt sequential read IO performance, so
better to restore to single thread behaviour.

For the following AIO support, it is better to use multi hw-queue
with per-hwq kthread than current work approach suppose there is so
high performance requirement for loop.

[1] http://marc.info/?t=143082678400002&r=1&w=2

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 79 ++++++++++++++++------------------------------------
 drivers/block/loop.h | 10 +++----
 2 files changed, 28 insertions(+), 61 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 10cc583..555b895 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -688,6 +688,23 @@ static void loop_config_discard(struct loop_device *lo)
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
+static void loop_unprepare_queue(struct loop_device *lo)
+{
+	flush_kthread_worker(&lo->worker);
+	kthread_stop(lo->worker_task);
+}
+
+static int loop_prepare_queue(struct loop_device *lo)
+{
+	init_kthread_worker(&lo->worker);
+	lo->worker_task = kthread_run(kthread_worker_fn,
+			&lo->worker, "loop%d", lo->lo_number);
+	if (IS_ERR(lo->worker_task))
+		return -ENOMEM;
+	set_user_nice(lo->worker_task, MIN_NICE);
+	return 0;
+}
+
 static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 		       struct block_device *bdev, unsigned int arg)
 {
@@ -745,11 +762,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	size = get_loop_size(lo, file);
 	if ((loff_t)(sector_t)size != size)
 		goto out_putf;
-	error = -ENOMEM;
-	lo->wq = alloc_workqueue("kloopd%d",
-			WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16,
-			lo->lo_number);
-	if (!lo->wq)
+	error = loop_prepare_queue(lo);
+	if (error)
 		goto out_putf;
 
 	error = 0;
@@ -903,8 +917,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_flags = 0;
 	if (!part_shift)
 		lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
-	destroy_workqueue(lo->wq);
-	lo->wq = NULL;
+	loop_unprepare_queue(lo);
 	mutex_unlock(&lo->lo_ctl_mutex);
 	/*
 	 * Need not hold lo_ctl_mutex to fput backing file.
@@ -1461,23 +1474,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (lo->lo_state != Lo_bound)
 		return -EIO;
 
-	if (cmd->rq->cmd_flags & REQ_WRITE) {
-		struct loop_device *lo = cmd->rq->q->queuedata;
-		bool need_sched = true;
-
-		spin_lock_irq(&lo->lo_lock);
-		if (lo->write_started)
-			need_sched = false;
-		else
-			lo->write_started = true;
-		list_add_tail(&cmd->list, &lo->write_cmd_head);
-		spin_unlock_irq(&lo->lo_lock);
-
-		if (need_sched)
-			queue_work(lo->wq, &lo->write_work);
-	} else {
-		queue_work(lo->wq, &cmd->read_work);
-	}
+	queue_kthread_work(&lo->worker, &cmd->work);
 
 	return BLK_MQ_RQ_QUEUE_OK;
 }
@@ -1499,35 +1496,10 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 	blk_mq_complete_request(cmd->rq);
 }
 
-static void loop_queue_write_work(struct work_struct *work)
-{
-	struct loop_device *lo =
-		container_of(work, struct loop_device, write_work);
-	LIST_HEAD(cmd_list);
-
-	spin_lock_irq(&lo->lo_lock);
- repeat:
-	list_splice_init(&lo->write_cmd_head, &cmd_list);
-	spin_unlock_irq(&lo->lo_lock);
-
-	while (!list_empty(&cmd_list)) {
-		struct loop_cmd *cmd = list_first_entry(&cmd_list,
-				struct loop_cmd, list);
-		list_del_init(&cmd->list);
-		loop_handle_cmd(cmd);
-	}
-
-	spin_lock_irq(&lo->lo_lock);
-	if (!list_empty(&lo->write_cmd_head))
-		goto repeat;
-	lo->write_started = false;
-	spin_unlock_irq(&lo->lo_lock);
-}
-
-static void loop_queue_read_work(struct work_struct *work)
+static void loop_queue_work(struct kthread_work *work)
 {
 	struct loop_cmd *cmd =
-		container_of(work, struct loop_cmd, read_work);
+		container_of(work, struct loop_cmd, work);
 
 	loop_handle_cmd(cmd);
 }
@@ -1539,7 +1511,7 @@ static int loop_init_request(void *data, struct request *rq,
 	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
 
 	cmd->rq = rq;
-	INIT_WORK(&cmd->read_work, loop_queue_read_work);
+	init_kthread_work(&cmd->work, loop_queue_work);
 
 	return 0;
 }
@@ -1601,9 +1573,6 @@ static int loop_add(struct loop_device **l, int i)
 	 */
 	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, lo->lo_queue);
 
-	INIT_LIST_HEAD(&lo->write_cmd_head);
-	INIT_WORK(&lo->write_work, loop_queue_write_work);
-
 	disk = lo->lo_disk = alloc_disk(1 << part_shift);
 	if (!disk)
 		goto out_free_queue;
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 25e8997..b6c7d21 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -14,7 +14,7 @@
 #include <linux/blk-mq.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-#include <linux/workqueue.h>
+#include <linux/kthread.h>
 #include <uapi/linux/loop.h>
 
 /* Possible states of device */
@@ -54,12 +54,10 @@ struct loop_device {
 	gfp_t		old_gfp_mask;
 
 	spinlock_t		lo_lock;
-	struct workqueue_struct *wq;
-	struct list_head	write_cmd_head;
-	struct work_struct	write_work;
-	bool			write_started;
 	int			lo_state;
 	struct mutex		lo_ctl_mutex;
+	struct kthread_worker	worker;
+	struct task_struct	*worker_task;
 
 	struct request_queue	*lo_queue;
 	struct blk_mq_tag_set	tag_set;
@@ -67,7 +65,7 @@ struct loop_device {
 };
 
 struct loop_cmd {
-	struct work_struct read_work;
+	struct kthread_work work;
 	struct request *rq;
 	struct list_head list;
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/5] block: loop: prepare for supporing direct IO
  2015-06-09 13:49 [PATCH v5 0/5] block: loop: improve loop with AIO Ming Lei
                   ` (2 preceding siblings ...)
  2015-06-09 13:49 ` [PATCH v5 3/5] block: loop: use kthread_work Ming Lei
@ 2015-06-09 13:49 ` Ming Lei
  2015-06-10  7:40   ` Christoph Hellwig
  2015-06-09 13:49 ` [PATCH v5 5/5] block: loop: support DIO & AIO Ming Lei
  4 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2015-06-09 13:49 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Dave Chinner, Ming Lei

This patches provides two approaches for enabling direct IO
from user space:

	- userspace(such as losetup) can pass 'file' which is
	opened/fcntl as O_DIRECT
	- sysfs file is provided to run dio tests easily

Also __loop_update_dio() is introduced to check if direct I/O
can be used on current loop setting.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/block/loop.h |  1 +
 2 files changed, 77 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 555b895..0b1ee2b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -421,6 +421,44 @@ struct switch_request {
 	struct completion wait;
 };
 
+static void __loop_update_dio(struct loop_device *lo, bool dio)
+{
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
+	bool use_dio;
+
+	/*
+	 * loop block's logical block size is 512, now
+	 * we support direct I/O only if the backing
+	 * block devices' minimize I/O size is 512.
+	 */
+	if (dio) {
+		if (inode->i_sb->s_bdev &&
+			bdev_io_min(inode->i_sb->s_bdev) == 512)
+			use_dio = true;
+		else
+			use_dio = false;
+	} else {
+		use_dio = false;
+	}
+
+	if (lo->use_dio == use_dio)
+		return;
+
+	/* flush dirty pages to avoid stale data from following dio */
+	if (use_dio)
+		vfs_fsync(file, 0);
+
+	blk_mq_freeze_queue(lo->lo_queue);
+	lo->use_dio = use_dio;
+	blk_mq_unfreeze_queue(lo->lo_queue);
+}
+
+static inline void loop_update_dio(struct loop_device *lo)
+{
+	__loop_update_dio(lo, io_is_direct(lo->lo_backing_file));
+}
+
 /*
  * Do the actual switch; called from the BIO completion routine
  */
@@ -441,6 +479,7 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
 		mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
 	lo->old_gfp_mask = mapping_gfp_mask(mapping);
 	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
+	loop_update_dio(lo);
 }
 
 /*
@@ -627,6 +666,41 @@ static ssize_t loop_attr_partscan_show(struct loop_device *lo, char *buf)
 	return sprintf(buf, "%s\n", partscan ? "1" : "0");
 }
 
+static ssize_t loop_attr_do_show_use_dio(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct loop_device *lo = disk->private_data;
+
+	return sprintf(buf, "%s\n", lo->use_dio ? "1" : "0");
+}
+
+ssize_t loop_attr_do_store_use_dio(struct device *dev,
+		struct device_attribute *attr, const char *buf,
+		size_t count)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct loop_device *lo = disk->private_data;
+	int err;
+	unsigned long v;
+
+	if (!lo->lo_backing_file)
+		return -ENODEV;
+
+	err = kstrtoul(buf, 10, &v);
+	if (err < 0)
+		return err;
+
+	__loop_update_dio(lo, !!v);
+	if  (lo->use_dio != !!v)
+		return -EINVAL;
+	return count;
+}
+
+static struct device_attribute loop_attr_use_dio =
+	__ATTR(use_dio, S_IRUGO | S_IWUSR, loop_attr_do_show_use_dio,
+			loop_attr_do_store_use_dio);
+
 LOOP_ATTR_RO(backing_file);
 LOOP_ATTR_RO(offset);
 LOOP_ATTR_RO(sizelimit);
@@ -639,6 +713,7 @@ static struct attribute *loop_attrs[] = {
 	&loop_attr_sizelimit.attr,
 	&loop_attr_autoclear.attr,
 	&loop_attr_partscan.attr,
+	&loop_attr_use_dio.attr,
 	NULL,
 };
 
@@ -783,6 +858,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
 		blk_queue_flush(lo->lo_queue, REQ_FLUSH);
 
+	loop_update_dio(lo);
 	set_capacity(lo->lo_disk, size);
 	bd_set_size(bdev, size << 9);
 	loop_sysfs_init(lo);
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index b6c7d21..d1de221 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -58,6 +58,7 @@ struct loop_device {
 	struct mutex		lo_ctl_mutex;
 	struct kthread_worker	worker;
 	struct task_struct	*worker_task;
+	bool			use_dio;
 
 	struct request_queue	*lo_queue;
 	struct blk_mq_tag_set	tag_set;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 5/5] block: loop: support DIO & AIO
  2015-06-09 13:49 [PATCH v5 0/5] block: loop: improve loop with AIO Ming Lei
                   ` (3 preceding siblings ...)
  2015-06-09 13:49 ` [PATCH v5 4/5] block: loop: prepare for supporing direct IO Ming Lei
@ 2015-06-09 13:49 ` Ming Lei
  2015-06-10  7:46   ` Christoph Hellwig
  4 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2015-06-09 13:49 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Dave Chinner, Ming Lei

There are about 3 advantages to use direct I/O and AIO on
read/write loop's backing file:

1) double cache can be avoided, then memory usage gets
decreased a lot

2) not like user space direct I/O, there isn't cost of
pinning pages

3) avoid context switch for obtaining good throughput
- in buffered file read, random I/O top throughput is often obtained
only if they are submitted concurrently from lots of tasks; but for
sequential I/O, most of times they can be hit from page cache, so
concurrent submissions often introduce unnecessary context switch
and can't improve throughput much. There was such discussion[1]
to use non-blocking I/O to improve the problem for application.
- with direct I/O and AIO, concurrent submissions can be
avoided and random read throughput can't be affected meantime

Follows my fio test result:

1. 16 jobs fio test inside ext4 file system over loop block
1) How to run
	- linux kernel: 4.1.0-rc2-next-20150506 with the patchset
	- the loop block is over one image on HDD.
	- linux psync, 16 jobs, size 400M, ext4 over loop block
	- test result: IOPS from fio output

2) Throughput result:
        -------------------------------------------------------------
        test cases          |randread   |read   |randwrite  |write  |
        -------------------------------------------------------------
        base                |240        |8705   |3763       |20914
        -------------------------------------------------------------
        base+loop aio       |242        |9258   |4577       |21451
        -------------------------------------------------------------

3) context switch
        - context switch decreased by ~16% with loop aio for randread,
	and decreased by ~33% for read

4) memory usage
	- After these four tests with loop aio: ~10% memory becomes used
	- After these four tests without loop aio: more than 55% memory
	becomes used

2. single job fio test inside ext4 file system over loop block(for Maxim Patlasov)
1) How to run
	- linux kernel: 4.1.0-rc2-next-20150506 with the patchset
	- the loop block is over one image on HDD.
	- linux psync, 1 job, size 4000M, ext4 over loop block
	- test result: IOPS from fio output

2) Throughput result:
        -------------------------------------------------------------
        test cases          |randread   |read   |randwrite  |write  |
        -------------------------------------------------------------
        base                |109        |21180  |4192       |22782
        -------------------------------------------------------------
        base+loop aio       |114        |21018  |5404       |22670
        -------------------------------------------------------------

3) context switch
        - context switch decreased by ~10% with loop aio for randread,
	and decreased by ~50% for read

4) memory usage
	- After these four tests with loop aio: ~10% memory becomes used
	- After these four tests without loop aio: more than 55% memory
	becomes used

Both 'context switch' and 'memory usage' data are got from sar.

[1] https://lwn.net/Articles/612483/
[2] sar graph when running fio over loop without the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-nonaio.pdf

[3] sar graph when running fio over loop with the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-aio.pdf

[4] sar graph when running fio over loop without the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-nonaio-1job.pdf

[5] sar graph when running fio over loop with the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-aio-1job.pdf

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/block/loop.h |  2 ++
 2 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0b1ee2b..9c557d3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -388,6 +388,65 @@ static int lo_req_flush(struct loop_device *lo, struct request *rq)
 
 	return ret;
 }
+static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
+	struct request *rq = cmd->rq;
+
+	if (ret > 0)
+		ret = 0;
+	else if (ret < 0)
+		ret = -EIO;
+
+	rq->errors = ret;
+	blk_mq_complete_request(rq);
+}
+
+static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
+		     loff_t pos, bool rw)
+{
+	struct iov_iter iter;
+	struct bio_vec *bvec;
+	struct bio *bio = cmd->rq->bio;
+	struct file *file = lo->lo_backing_file;
+	int ret;
+
+	/* nomerge for loop request queue */
+	WARN_ON(cmd->rq->bio != cmd->rq->biotail);
+
+	bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+	iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
+		      bio_segments(bio), blk_rq_bytes(cmd->rq));
+
+	cmd->iocb.ki_pos = pos;
+	cmd->iocb.ki_filp = file;
+	cmd->iocb.ki_complete = lo_rw_aio_complete;
+	cmd->iocb.ki_flags = IOCB_DIRECT;
+
+	if (rw == WRITE)
+		ret = file->f_op->write_iter(&cmd->iocb, &iter);
+	else
+		ret = file->f_op->read_iter(&cmd->iocb, &iter);
+
+	if (ret != -EIOCBQUEUED)
+		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
+	return 0;
+}
+
+
+static inline int lo_rw_simple(struct loop_device *lo,
+		struct request *rq, loff_t pos, bool rw)
+{
+	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
+
+	if (cmd->use_aio)
+		return lo_rw_aio(lo, cmd, pos, rw);
+
+	if (rw == WRITE)
+		return lo_write_simple(lo, rq, pos);
+	else
+		return lo_read_simple(lo, rq, pos);
+}
 
 static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 {
@@ -404,13 +463,13 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 		else if (lo->transfer)
 			ret = lo_write_transfer(lo, rq, pos);
 		else
-			ret = lo_write_simple(lo, rq, pos);
+			ret = lo_rw_simple(lo, rq, pos, WRITE);
 
 	} else {
 		if (lo->transfer)
 			ret = lo_read_transfer(lo, rq, pos);
 		else
-			ret = lo_read_simple(lo, rq, pos);
+			ret = lo_rw_simple(lo, rq, pos, READ);
 	}
 
 	return ret;
@@ -1550,6 +1609,12 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (lo->lo_state != Lo_bound)
 		return -EIO;
 
+	if (lo->use_dio && !lo->transfer &&
+			!(cmd->rq->cmd_flags & (REQ_FLUSH | REQ_DISCARD)))
+		cmd->use_aio = true;
+	else
+		cmd->use_aio = false;
+
 	queue_kthread_work(&lo->worker, &cmd->work);
 
 	return BLK_MQ_RQ_QUEUE_OK;
@@ -1569,7 +1634,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
  failed:
 	if (ret)
 		cmd->rq->errors = -EIO;
-	blk_mq_complete_request(cmd->rq);
+	if (!cmd->use_aio || ret)
+		blk_mq_complete_request(cmd->rq);
 }
 
 static void loop_queue_work(struct kthread_work *work)
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index d1de221..fb2237c 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -69,6 +69,8 @@ struct loop_cmd {
 	struct kthread_work work;
 	struct request *rq;
 	struct list_head list;
+	bool use_aio;           /* use AIO interface to handle I/O */
+	struct kiocb iocb;
 };
 
 /* Support for loadable transfer modules */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read
  2015-06-09 13:49 ` [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read Ming Lei
@ 2015-06-10  7:35   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2015-06-10  7:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo, Dave Chinner

On Tue, Jun 09, 2015 at 09:49:22PM +0800, Ming Lei wrote:
> When direct read IO is submitted from kernel, it is often
> unnecessary to dirty pages, for example of loop, dirtying pages
> have been considered in the upper filesystem(over loop) side
> already, and they don't need to be dirtied again.
> 
> So this patch doesn't dirtying pages for ITER_BVEC/ITER_KVEC
> direct read, and loop should be the 1st case to use ITER_BVEC/ITER_KVEC
> for direct read I/O.
> 
> The patch is based on previous Dave's patch.

Looks good, thanks!

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop
  2015-06-09 13:49 ` [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
@ 2015-06-10  7:36   ` Christoph Hellwig
  2015-06-22  6:32     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-06-10  7:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo, Dave Chinner

On Tue, Jun 09, 2015 at 09:49:23PM +0800, Ming Lei wrote:
> It doesn't make sense to enable merge because the I/O
> submitted to backing file is handled page by page.

Looks fine, but does it make any difference?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/5] block: loop: prepare for supporing direct IO
  2015-06-09 13:49 ` [PATCH v5 4/5] block: loop: prepare for supporing direct IO Ming Lei
@ 2015-06-10  7:40   ` Christoph Hellwig
  2015-06-22 12:19     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-06-10  7:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo, Dave Chinner

On Tue, Jun 09, 2015 at 09:49:25PM +0800, Ming Lei wrote:
> This patches provides two approaches for enabling direct IO
> from user space:
> 
> 	- userspace(such as losetup) can pass 'file' which is
> 	opened/fcntl as O_DIRECT
> 	- sysfs file is provided to run dio tests easily

I really don't think two interfaces are a good idea here.  The interface
for loop has always been that userspaces passes a FD and we use it,
so let's keep it that way.

And while we're at it please also send a patch for losetup so that we
can actually use the feature.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] block: loop: support DIO & AIO
  2015-06-09 13:49 ` [PATCH v5 5/5] block: loop: support DIO & AIO Ming Lei
@ 2015-06-10  7:46   ` Christoph Hellwig
  2015-06-22 12:09     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-06-10  7:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo, Dave Chinner

> +	int ret;
> +
> +	/* nomerge for loop request queue */
> +	WARN_ON(cmd->rq->bio != cmd->rq->biotail);
> +
> +	bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> +	iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
> +		      bio_segments(bio), blk_rq_bytes(cmd->rq));
> +
> +	cmd->iocb.ki_pos = pos;
> +	cmd->iocb.ki_filp = file;
> +	cmd->iocb.ki_complete = lo_rw_aio_complete;
> +	cmd->iocb.ki_flags = IOCB_DIRECT;
> +
> +	if (rw == WRITE)
> +		ret = file->f_op->write_iter(&cmd->iocb, &iter);
> +	else
> +		ret = file->f_op->read_iter(&cmd->iocb, &iter);

I think we really need a vfs_ wrapper here similar to what I did a while
ago, e.g. vfs_iter_read/write_async.

> +static inline int lo_rw_simple(struct loop_device *lo,
> +		struct request *rq, loff_t pos, bool rw)
> +{
> +	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
> +
> +	if (cmd->use_aio)
> +		return lo_rw_aio(lo, cmd, pos, rw);
> +
> +	if (rw == WRITE)
> +		return lo_write_simple(lo, rq, pos);
> +	else
> +		return lo_read_simple(lo, rq, pos);
> +}

And the io_submit style read/write also works for buffered I/O, so no
need to keep lo_write_simple/lo_read_simple around.

> @@ -1569,7 +1634,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
>   failed:
>  	if (ret)
>  		cmd->rq->errors = -EIO;
> -	blk_mq_complete_request(cmd->rq);
> +	if (!cmd->use_aio || ret)
> +		blk_mq_complete_request(cmd->rq);

If you don't complete the request here setting req->error doesn't
make sense.  I'd suggest to move the blk_mq_complete_request for
everything but the trivial error case into the actual I/O handlers
to clean this up a bit, too.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop
  2015-06-10  7:36   ` Christoph Hellwig
@ 2015-06-22  6:32     ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-22  6:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
	Dave Chinner

On Wed, Jun 10, 2015 at 3:36 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Jun 09, 2015 at 09:49:23PM +0800, Ming Lei wrote:
>> It doesn't make sense to enable merge because the I/O
>> submitted to backing file is handled page by page.
>
> Looks fine, but does it make any difference?

Looks no, see the following result, please:

- iops tested by fio: loop over image on HDD.
                         patched  vs. non-patched
    randread     158        161
    read             8716        8784
    randwrite     6076        5893
    write             21896        22066

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] block: loop: support DIO & AIO
  2015-06-10  7:46   ` Christoph Hellwig
@ 2015-06-22 12:09     ` Ming Lei
  2015-06-22 16:00       ` Christoph Hellwig
  2015-06-23 12:43       ` Ming Lei
  0 siblings, 2 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-22 12:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
	Dave Chinner

On Wed, Jun 10, 2015 at 3:46 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> +     int ret;
>> +
>> +     /* nomerge for loop request queue */
>> +     WARN_ON(cmd->rq->bio != cmd->rq->biotail);
>> +
>> +     bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>> +     iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
>> +                   bio_segments(bio), blk_rq_bytes(cmd->rq));
>> +
>> +     cmd->iocb.ki_pos = pos;
>> +     cmd->iocb.ki_filp = file;
>> +     cmd->iocb.ki_complete = lo_rw_aio_complete;
>> +     cmd->iocb.ki_flags = IOCB_DIRECT;
>> +
>> +     if (rw == WRITE)
>> +             ret = file->f_op->write_iter(&cmd->iocb, &iter);
>> +     else
>> +             ret = file->f_op->read_iter(&cmd->iocb, &iter);
>
> I think we really need a vfs_ wrapper here similar to what I did a while
> ago, e.g. vfs_iter_read/write_async.

For the general async interface, it is a bit complicated than sync interfaces:

-  iocb need to be one parameter, because it often depends on callers, such
as loop can preallocate it
- direct I/O need to be another parameter(in loop we can use the same helper
to handle sync request)
- bvec and the segment number are another two parameters
- not mention the common parameters(file, offset, pos, complete...)

And this kind of interfaces appeared in V1/V2, looks AIO guys
doesn't care that, then I moved the helper into loop, and it becomes
quite simple now. If we convert it to vfs_iter_read/write_async(), more
source code are introduced, I think.

So how about considering that if there are other uses in the future?

>
>> +static inline int lo_rw_simple(struct loop_device *lo,
>> +             struct request *rq, loff_t pos, bool rw)
>> +{
>> +     struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
>> +
>> +     if (cmd->use_aio)
>> +             return lo_rw_aio(lo, cmd, pos, rw);
>> +
>> +     if (rw == WRITE)
>> +             return lo_write_simple(lo, rq, pos);
>> +     else
>> +             return lo_read_simple(lo, rq, pos);
>> +}
>
> And the io_submit style read/write also works for buffered I/O, so no
> need to keep lo_write_simple/lo_read_simple around.

That is really a good idea.

>
>> @@ -1569,7 +1634,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
>>   failed:
>>       if (ret)
>>               cmd->rq->errors = -EIO;
>> -     blk_mq_complete_request(cmd->rq);
>> +     if (!cmd->use_aio || ret)
>> +             blk_mq_complete_request(cmd->rq);
>
> If you don't complete the request here setting req->error doesn't
> make sense.  I'd suggest to move the blk_mq_complete_request for

The request with ->erros set is really completed here, and the curent
rule is very simple:

      - complete sync/submit failed requests in loop_handle_cmd()
      - complete async requests submitted successfully in its .complete

> everything but the trivial error case into the actual I/O handlers
> to clean this up a bit, too.

That need to copy the code for handling error in other handlers.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/5] block: loop: prepare for supporing direct IO
  2015-06-10  7:40   ` Christoph Hellwig
@ 2015-06-22 12:19     ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-22 12:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
	Dave Chinner

On Wed, Jun 10, 2015 at 3:40 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Jun 09, 2015 at 09:49:25PM +0800, Ming Lei wrote:
>> This patches provides two approaches for enabling direct IO
>> from user space:
>>
>>       - userspace(such as losetup) can pass 'file' which is
>>       opened/fcntl as O_DIRECT
>>       - sysfs file is provided to run dio tests easily
>
> I really don't think two interfaces are a good idea here.  The interface
> for loop has always been that userspaces passes a FD and we use it,
> so let's keep it that way.

That makes sense.

>
> And while we're at it please also send a patch for losetup so that we
> can actually use the feature.

I have wrote 2 patches to enable direct-io for backing file in util-linux,
and they can be found in below tree:

http://kernel.ubuntu.com/git/ming/util-linux.git/log/?h=losetup-dio

I will post them in util-linux mail list later.

thanks
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] block: loop: support DIO & AIO
  2015-06-22 12:09     ` Ming Lei
@ 2015-06-22 16:00       ` Christoph Hellwig
  2015-06-23  2:59         ` Ming Lei
  2015-06-23 12:43       ` Ming Lei
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-06-22 16:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo, Dave Chinner

On Mon, Jun 22, 2015 at 08:09:55PM +0800, Ming Lei wrote:
> For the general async interface, it is a bit complicated than sync interfaces:
> 
> -  iocb need to be one parameter, because it often depends on callers, such
> as loop can preallocate it
> - direct I/O need to be another parameter(in loop we can use the same helper
> to handle sync request)
> - bvec and the segment number are another two parameters
> - not mention the common parameters(file, offset, pos, complete...)

We only really need iocb + iov_iter, they carry everything we need.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] block: loop: support DIO & AIO
  2015-06-22 16:00       ` Christoph Hellwig
@ 2015-06-23  2:59         ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-23  2:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
	Dave Chinner

On Tue, Jun 23, 2015 at 12:00 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Jun 22, 2015 at 08:09:55PM +0800, Ming Lei wrote:
>> For the general async interface, it is a bit complicated than sync interfaces:
>>
>> -  iocb need to be one parameter, because it often depends on callers, such
>> as loop can preallocate it
>> - direct I/O need to be another parameter(in loop we can use the same helper
>> to handle sync request)
>> - bvec and the segment number are another two parameters
>> - not mention the common parameters(file, offset, pos, complete...)
>
> We only really need iocb + iov_iter, they carry everything we need.

Then the helper becomes the fowllowing:

ssize_t vfs_iter_async_write(struct kiocb *kiocb, struct iov_iter *iter)
{
       ssize_t ret;

       iter->type |= WRITE;
       ret = file->f_op->write_iter(&kiocb, iter);

       return ret;
}

I am wondering its value and we can do that easily in call site, also it isn't
easy to name it since sometimes we may need to let the helper handle
sync requests, such as loop's case.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] block: loop: support DIO & AIO
  2015-06-22 12:09     ` Ming Lei
  2015-06-22 16:00       ` Christoph Hellwig
@ 2015-06-23 12:43       ` Ming Lei
  1 sibling, 0 replies; 16+ messages in thread
From: Ming Lei @ 2015-06-23 12:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
	Dave Chinner

On Mon, Jun 22, 2015 at 8:09 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Wed, Jun 10, 2015 at 3:46 PM, Christoph Hellwig <hch@infradead.org> wrote:
>>> +     int ret;
>>> +
>>> +     /* nomerge for loop request queue */
>>> +     WARN_ON(cmd->rq->bio != cmd->rq->biotail);
>>> +
>>> +     bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>>> +     iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
>>> +                   bio_segments(bio), blk_rq_bytes(cmd->rq));
>>> +
>>> +     cmd->iocb.ki_pos = pos;
>>> +     cmd->iocb.ki_filp = file;
>>> +     cmd->iocb.ki_complete = lo_rw_aio_complete;
>>> +     cmd->iocb.ki_flags = IOCB_DIRECT;
>>> +
>>> +     if (rw == WRITE)
>>> +             ret = file->f_op->write_iter(&cmd->iocb, &iter);
>>> +     else
>>> +             ret = file->f_op->read_iter(&cmd->iocb, &iter);
>>
>> I think we really need a vfs_ wrapper here similar to what I did a while
>> ago, e.g. vfs_iter_read/write_async.
>
> For the general async interface, it is a bit complicated than sync interfaces:
>
> -  iocb need to be one parameter, because it often depends on callers, such
> as loop can preallocate it
> - direct I/O need to be another parameter(in loop we can use the same helper
> to handle sync request)
> - bvec and the segment number are another two parameters
> - not mention the common parameters(file, offset, pos, complete...)
>
> And this kind of interfaces appeared in V1/V2, looks AIO guys
> doesn't care that, then I moved the helper into loop, and it becomes
> quite simple now. If we convert it to vfs_iter_read/write_async(), more
> source code are introduced, I think.
>
> So how about considering that if there are other uses in the future?
>
>>
>>> +static inline int lo_rw_simple(struct loop_device *lo,
>>> +             struct request *rq, loff_t pos, bool rw)
>>> +{
>>> +     struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
>>> +
>>> +     if (cmd->use_aio)
>>> +             return lo_rw_aio(lo, cmd, pos, rw);
>>> +
>>> +     if (rw == WRITE)
>>> +             return lo_write_simple(lo, rq, pos);
>>> +     else
>>> +             return lo_read_simple(lo, rq, pos);
>>> +}
>>
>> And the io_submit style read/write also works for buffered I/O, so no
>> need to keep lo_write_simple/lo_read_simple around.
>
> That is really a good idea.

There is still one issue to convert lo_write/read_simple as io_submit
style: flush_dcache_page() should be done just after the page is
written by kernel, and it isn't good to do that at batch after the
request is completed.

But flush_dcache_page() isn't needed for dio/aio case, which can be
another benifit by using dio/aio for loop.

>
>>
>>> @@ -1569,7 +1634,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
>>>   failed:
>>>       if (ret)
>>>               cmd->rq->errors = -EIO;
>>> -     blk_mq_complete_request(cmd->rq);
>>> +     if (!cmd->use_aio || ret)
>>> +             blk_mq_complete_request(cmd->rq);
>>
>> If you don't complete the request here setting req->error doesn't
>> make sense.  I'd suggest to move the blk_mq_complete_request for
>
> The request with ->erros set is really completed here, and the curent
> rule is very simple:
>
>       - complete sync/submit failed requests in loop_handle_cmd()
>       - complete async requests submitted successfully in its .complete
>
>> everything but the trivial error case into the actual I/O handlers
>> to clean this up a bit, too.
>
> That need to copy the code for handling error in other handlers.
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-06-23 12:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 13:49 [PATCH v5 0/5] block: loop: improve loop with AIO Ming Lei
2015-06-09 13:49 ` [PATCH v5 1/5] fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read Ming Lei
2015-06-10  7:35   ` Christoph Hellwig
2015-06-09 13:49 ` [PATCH v5 2/5] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
2015-06-10  7:36   ` Christoph Hellwig
2015-06-22  6:32     ` Ming Lei
2015-06-09 13:49 ` [PATCH v5 3/5] block: loop: use kthread_work Ming Lei
2015-06-09 13:49 ` [PATCH v5 4/5] block: loop: prepare for supporing direct IO Ming Lei
2015-06-10  7:40   ` Christoph Hellwig
2015-06-22 12:19     ` Ming Lei
2015-06-09 13:49 ` [PATCH v5 5/5] block: loop: support DIO & AIO Ming Lei
2015-06-10  7:46   ` Christoph Hellwig
2015-06-22 12:09     ` Ming Lei
2015-06-22 16:00       ` Christoph Hellwig
2015-06-23  2:59         ` Ming Lei
2015-06-23 12:43       ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).