All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] block: loop: improve loop with AIO
@ 2015-05-06 17:08 Ming Lei
  2015-05-06 17:08 ` [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Ming Lei @ 2015-05-06 17:08 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo

Hi Guys,

There are about 3 advantages to use direct I/O and AIO on
read/write loop's backing file:

1) double cache can be avoided, then memory usage gets
decreased a lot

2) not like user space direct I/O, there isn't cost of
pinning pages

3) avoid context switch for obtaining good throughput
- in buffered file read, random I/O throughput is often obtained
only if they are submitted concurrently from lots of tasks; but for
sequential I/O, most of times they can be hit from page cache, so
concurrent submissions often introduce unnecessary context switch
and can't improve throughput much. There was such discussion[1]
to use non-blocking I/O to improve the problem for application.
- with direct I/O and AIO, concurrent submissions can be
avoided and random read throughput can't be affected meantime

So this patchset trys to improve loop via AIO, and about 45% memory
usage can be decreased, see detailed data in commit log of patch4,
also IO throughput isn't affected too.

V3:
	- based on Al's iov_iter work and Christoph's kiocb changes
	- use kthread_work
	- introduce IOCB_DONT_DIRTY_PAGE flag
	- set QUEUE_FLAG_NOMERGES for loop's request queue
V2:
	- remove 'extra' parameter to aio_kernel_alloc()
	- try to avoid memory allcation inside queue req callback
	- introduce 'use_mq' sysfs file for enabling kernel aio or disabling it
V1:
	- link:
		http://marc.info/?t=140803157700004&r=1&w=2
	- improve failure path in aio_kernel_submit()

 drivers/block/loop.c | 166 ++++++++++++++++++++++++++++++++++++++++++++----------------------
 drivers/block/loop.h |  14 +++---
 fs/direct-io.c       |  10 ++--
 include/linux/fs.h   |   1 +
 4 files changed, 126 insertions(+), 65 deletions(-)

Thanks,
Ming Lei


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-06 17:08 [PATCH v3 0/4] block: loop: improve loop with AIO Ming Lei
@ 2015-05-06 17:08 ` Ming Lei
  2015-05-07  7:16   ` Christoph Hellwig
  2015-05-06 17:08 ` [PATCH v3 2/4] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-06 17:08 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Ming Lei

When direct IO is submitted from kernel, it is often unnecessary
to dirty pages, for example of loop, dirtying pages have been
considered in the upper filesystem(over loop) side already, and
they don't need to be dirtied again.

So this patch introduces IOCB_DONT_DIRTY_PAGE flag for direct IO,
and other ITER_BVEC and ITER_KVEC cases might benefit from it too.

The patch is based on previous Dave's patch.

Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 fs/direct-io.c     | 10 +++++++---
 include/linux/fs.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 745d234..bac0354 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -120,6 +120,7 @@ struct dio {
 	int page_errors;		/* errno from get_user_pages() */
 	int is_async;			/* is IO async ? */
 	bool defer_completion;		/* defer AIO completion to workqueue? */
+	bool should_dirty;		/* if pages should be dirtied */
 	int io_error;			/* IO error in completion path */
 	unsigned long refcount;		/* direct_io_worker() and bios */
 	struct bio *bio_list;		/* singly linked via bi_private */
@@ -393,7 +394,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
 	dio->refcount++;
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
-	if (dio->is_async && dio->rw == READ)
+	if (dio->is_async && dio->rw == READ && dio->should_dirty)
 		bio_set_pages_dirty(bio);
 
 	if (sdio->submit_io)
@@ -464,13 +465,14 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 	if (!uptodate)
 		dio->io_error = -EIO;
 
-	if (dio->is_async && dio->rw == READ) {
+	if (dio->is_async && dio->rw == READ && dio->should_dirty) {
 		bio_check_pages_dirty(bio);	/* transfers ownership */
 	} else {
 		bio_for_each_segment_all(bvec, bio, i) {
 			struct page *page = bvec->bv_page;
 
-			if (dio->rw == READ && !PageCompound(page))
+			if (dio->rw == READ && !PageCompound(page) &&
+					dio->should_dirty)
 				set_page_dirty_lock(page);
 			page_cache_release(page);
 		}
@@ -1216,6 +1218,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;
+	dio->should_dirty =
+		iocb->ki_flags & IOCB_DONT_DIRTY_PAGE ? false : true;
 
 	sdio.iter = iter;
 	sdio.final_block_in_request =
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35ec87e..e2a8baa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -317,6 +317,7 @@ struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_DONT_DIRTY_PAGE	(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/4] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop
  2015-05-06 17:08 [PATCH v3 0/4] block: loop: improve loop with AIO Ming Lei
  2015-05-06 17:08 ` [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO Ming Lei
@ 2015-05-06 17:08 ` Ming Lei
  2015-05-06 17:08 ` [PATCH v3 3/4] block: loop: use kthread_work Ming Lei
  2015-05-06 17:08 ` [PATCH v3 4/4] block: loop: support DIO & AIO Ming Lei
  3 siblings, 0 replies; 22+ messages in thread
From: Ming Lei @ 2015-05-06 17:08 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Ming Lei

It doesn't make sense to enable merge because the I/O
submitted to backing file is handled page by page.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 1bee523..e9f563d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1572,6 +1572,12 @@ static int loop_add(struct loop_device **l, int i)
 	}
 	lo->lo_queue->queuedata = lo;
 
+	/*
+	 * It doesn't make sense to enable merge because the I/O
+	 * submitted to backing file is handled page by page.
+	 */
+	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, lo->lo_queue);
+
 	INIT_LIST_HEAD(&lo->write_cmd_head);
 	INIT_WORK(&lo->write_work, loop_queue_write_work);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-06 17:08 [PATCH v3 0/4] block: loop: improve loop with AIO Ming Lei
  2015-05-06 17:08 ` [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO Ming Lei
  2015-05-06 17:08 ` [PATCH v3 2/4] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
@ 2015-05-06 17:08 ` Ming Lei
  2015-05-07  7:17   ` Christoph Hellwig
  2015-05-06 17:08 ` [PATCH v3 4/4] block: loop: support DIO & AIO Ming Lei
  3 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-06 17:08 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Ming Lei

The following patch will use dio/aio to submit IO to backing file,
then it isn't good to schedule IO concurrently from work, so
use kthread_work.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 79 ++++++++++++++++------------------------------------
 drivers/block/loop.h | 10 +++----
 2 files changed, 28 insertions(+), 61 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index e9f563d..94c9eec 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -666,6 +666,23 @@ static void loop_config_discard(struct loop_device *lo)
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
+static void loop_unprepare_queue(struct loop_device *lo)
+{
+	flush_kthread_worker(&lo->worker);
+	kthread_stop(lo->worker_task);
+}
+
+static int loop_prepare_queue(struct loop_device *lo)
+{
+	init_kthread_worker(&lo->worker);
+	lo->worker_task = kthread_run(kthread_worker_fn,
+			&lo->worker, "loop%d", lo->lo_number);
+	if (IS_ERR(lo->worker_task))
+		return -ENOMEM;
+	set_user_nice(lo->worker_task, MIN_NICE);
+	return 0;
+}
+
 static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 		       struct block_device *bdev, unsigned int arg)
 {
@@ -723,11 +740,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	size = get_loop_size(lo, file);
 	if ((loff_t)(sector_t)size != size)
 		goto out_putf;
-	error = -ENOMEM;
-	lo->wq = alloc_workqueue("kloopd%d",
-			WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16,
-			lo->lo_number);
-	if (!lo->wq)
+	error = loop_prepare_queue(lo);
+	if (error)
 		goto out_putf;
 
 	error = 0;
@@ -876,8 +890,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_flags = 0;
 	if (!part_shift)
 		lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
-	destroy_workqueue(lo->wq);
-	lo->wq = NULL;
+	loop_unprepare_queue(lo);
 	mutex_unlock(&lo->lo_ctl_mutex);
 	/*
 	 * Need not hold lo_ctl_mutex to fput backing file.
@@ -1438,23 +1451,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (lo->lo_state != Lo_bound)
 		return -EIO;
 
-	if (cmd->rq->cmd_flags & REQ_WRITE) {
-		struct loop_device *lo = cmd->rq->q->queuedata;
-		bool need_sched = true;
-
-		spin_lock_irq(&lo->lo_lock);
-		if (lo->write_started)
-			need_sched = false;
-		else
-			lo->write_started = true;
-		list_add_tail(&cmd->list, &lo->write_cmd_head);
-		spin_unlock_irq(&lo->lo_lock);
-
-		if (need_sched)
-			queue_work(lo->wq, &lo->write_work);
-	} else {
-		queue_work(lo->wq, &cmd->read_work);
-	}
+	queue_kthread_work(&lo->worker, &cmd->work);
 
 	return BLK_MQ_RQ_QUEUE_OK;
 }
@@ -1476,35 +1473,10 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 	blk_mq_complete_request(cmd->rq);
 }
 
-static void loop_queue_write_work(struct work_struct *work)
-{
-	struct loop_device *lo =
-		container_of(work, struct loop_device, write_work);
-	LIST_HEAD(cmd_list);
-
-	spin_lock_irq(&lo->lo_lock);
- repeat:
-	list_splice_init(&lo->write_cmd_head, &cmd_list);
-	spin_unlock_irq(&lo->lo_lock);
-
-	while (!list_empty(&cmd_list)) {
-		struct loop_cmd *cmd = list_first_entry(&cmd_list,
-				struct loop_cmd, list);
-		list_del_init(&cmd->list);
-		loop_handle_cmd(cmd);
-	}
-
-	spin_lock_irq(&lo->lo_lock);
-	if (!list_empty(&lo->write_cmd_head))
-		goto repeat;
-	lo->write_started = false;
-	spin_unlock_irq(&lo->lo_lock);
-}
-
-static void loop_queue_read_work(struct work_struct *work)
+static void loop_queue_work(struct kthread_work *work)
 {
 	struct loop_cmd *cmd =
-		container_of(work, struct loop_cmd, read_work);
+		container_of(work, struct loop_cmd, work);
 
 	loop_handle_cmd(cmd);
 }
@@ -1516,7 +1488,7 @@ static int loop_init_request(void *data, struct request *rq,
 	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
 
 	cmd->rq = rq;
-	INIT_WORK(&cmd->read_work, loop_queue_read_work);
+	init_kthread_work(&cmd->work, loop_queue_work);
 
 	return 0;
 }
@@ -1578,9 +1550,6 @@ static int loop_add(struct loop_device **l, int i)
 	 */
 	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, lo->lo_queue);
 
-	INIT_LIST_HEAD(&lo->write_cmd_head);
-	INIT_WORK(&lo->write_work, loop_queue_write_work);
-
 	disk = lo->lo_disk = alloc_disk(1 << part_shift);
 	if (!disk)
 		goto out_free_queue;
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 49564ed..54c6aa5 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -14,7 +14,7 @@
 #include <linux/blk-mq.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-#include <linux/workqueue.h>
+#include <linux/kthread.h>
 #include <uapi/linux/loop.h>
 
 /* Possible states of device */
@@ -54,12 +54,10 @@ struct loop_device {
 	gfp_t		old_gfp_mask;
 
 	spinlock_t		lo_lock;
-	struct workqueue_struct *wq;
-	struct list_head	write_cmd_head;
-	struct work_struct	write_work;
-	bool			write_started;
 	int			lo_state;
 	struct mutex		lo_ctl_mutex;
+	struct kthread_worker	worker;
+	struct task_struct	*worker_task;
 
 	struct request_queue	*lo_queue;
 	struct blk_mq_tag_set	tag_set;
@@ -67,7 +65,7 @@ struct loop_device {
 };
 
 struct loop_cmd {
-	struct work_struct read_work;
+	struct kthread_work work;
 	struct request *rq;
 	struct list_head list;
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/4] block: loop: support DIO & AIO
  2015-05-06 17:08 [PATCH v3 0/4] block: loop: improve loop with AIO Ming Lei
                   ` (2 preceding siblings ...)
  2015-05-06 17:08 ` [PATCH v3 3/4] block: loop: use kthread_work Ming Lei
@ 2015-05-06 17:08 ` Ming Lei
  2015-05-07  7:24   ` Christoph Hellwig
  3 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-06 17:08 UTC (permalink / raw)
  To: linux-kernel, Dave Kleikamp
  Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov,
	Andrew Morton, Alexander Viro, Tejun Heo, Ming Lei

There are about 3 advantages to use direct I/O and AIO on
read/write loop's backing file:

1) double cache can be avoided, then memory usage gets
decreased a lot

2) not like user space direct I/O, there isn't cost of
pinning pages

3) avoid context switch for obtaining good throughput
- in buffered file read, random I/O throughput is often obtained
only if they are submitted concurrently from lots of tasks; but for
sequential I/O, most of times they can be hit from page cache, so
concurrent submissions often introduce unnecessary context switch
and can't improve throughput much. There was such discussion[1]
to use non-blocking I/O to improve the problem for application.
- with direct I/O and AIO, concurrent submissions can be
avoided and random read throughput can't be affected meantime

Follows my fio test result:

1. 16 jobs fio test inside ext4 file system over loop block
1) How to run
	- linux kernel: 4.1.0-rc2-next-20150506 with the patchset
	- the loop block is over one image on HDD.
	- linux psync, 16 jobs, size 400M, ext4 over loop block
	- test result: IOPS from fio output

2) Throughput result:
        -------------------------------------------------------------
        test cases          |randread   |read   |randwrite  |write  |
        -------------------------------------------------------------
        base                |240        |8705   |3763       |20914
        -------------------------------------------------------------
        base+loop aio       |242        |9258   |4577       |21451
        -------------------------------------------------------------

3) context switch
        - context switch decreased by ~16% with loop aio for randread,
	and decreased by ~33% for read

4) memory usage
	- After these four tests with loop aio: ~10% memory becomes used
	- After these four tests without loop aio: more than 55% memory
	becomes used

2. single job fio test inside ext4 file system over loop block(for Maxim Patlasov)
1) How to run
	- linux kernel: 4.1.0-rc2-next-20150506 with the patchset
	- the loop block is over one image on HDD.
	- linux psync, 1 job, size 4000M, ext4 over loop block
	- test result: IOPS from fio output

2) Throughput result:
        -------------------------------------------------------------
        test cases          |randread   |read   |randwrite  |write  |
        -------------------------------------------------------------
        base                |109        |21180  |4192       |22782
        -------------------------------------------------------------
        base+loop aio       |114        |21018  |5404       |22670
        -------------------------------------------------------------

3) context switch
        - context switch decreased by ~10% with loop aio for randread,
	and decreased by ~50% for read

4) memory usage
	- After these four tests with loop aio: ~10% memory becomes used
	- After these four tests without loop aio: more than 55% memory
	becomes used

Both 'context switch' and 'memory usage' data are got from sar.

[1] https://lwn.net/Articles/612483/
[2] sar graph when running fio over loop without the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-nonaio.pdf

[3] sar graph when running fio over loop with the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-aio.pdf

[4] sar graph when running fio over loop without the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-nonaio-1job.pdf

[5] sar graph when running fio over loop with the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-aio-1job.pdf

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/block/loop.h |  4 +++
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 94c9eec..3652581 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -388,6 +388,65 @@ static int lo_req_flush(struct loop_device *lo, struct request *rq)
 
 	return ret;
 }
+static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
+	struct request *rq = cmd->rq;
+
+	if (ret > 0)
+		ret = 0;
+	else if (ret < 0)
+		ret = -EIO;
+
+	rq->errors = ret;
+	blk_mq_complete_request(rq);
+}
+
+static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
+		     loff_t pos, bool rw)
+{
+	struct iov_iter iter;
+	struct bio_vec *bvec;
+	struct bio *bio = cmd->rq->bio;
+	struct file *file = lo->lo_backing_file;
+	int ret;
+
+	/* nomerge for loop request queue */
+	WARN_ON(cmd->rq->bio != cmd->rq->biotail);
+
+	bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+	iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
+		      bio_segments(bio), blk_rq_bytes(cmd->rq));
+
+	cmd->iocb.ki_pos = pos;
+	cmd->iocb.ki_filp = file;
+	cmd->iocb.ki_complete = lo_rw_aio_complete;
+	cmd->iocb.ki_flags = IOCB_DONT_DIRTY_PAGE | IOCB_DIRECT;
+
+	if (rw == WRITE)
+		ret = file->f_op->write_iter(&cmd->iocb, &iter);
+	else
+		ret = file->f_op->read_iter(&cmd->iocb, &iter);
+
+	if (ret != -EIOCBQUEUED)
+		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
+	return 0;
+}
+
+
+static inline int lo_rw_simple(struct loop_device *lo,
+		struct request *rq, loff_t pos, bool rw)
+{
+	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
+
+	if (cmd->use_aio)
+		return lo_rw_aio(lo, cmd, pos, rw);
+
+	if (rw == WRITE)
+		return lo_write_simple(lo, rq, pos);
+	else
+		return lo_read_simple(lo, rq, pos);
+}
 
 static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 {
@@ -404,13 +463,13 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 		else if (lo->transfer)
 			ret = lo_write_transfer(lo, rq, pos);
 		else
-			ret = lo_write_simple(lo, rq, pos);
+			ret = lo_rw_simple(lo, rq, pos, WRITE);
 
 	} else {
 		if (lo->transfer)
 			ret = lo_read_transfer(lo, rq, pos);
 		else
-			ret = lo_read_simple(lo, rq, pos);
+			ret = lo_rw_simple(lo, rq, pos, READ);
 	}
 
 	return ret;
@@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
 		mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
 	lo->old_gfp_mask = mapping_gfp_mask(mapping);
 	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
+
+	lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
+	if (lo->support_dio)
+		lo->use_aio = true;
+	else
+		lo->use_aio = false;
 }
 
 /*
@@ -761,6 +826,13 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
 		blk_queue_flush(lo->lo_queue, REQ_FLUSH);
 
+	/* use aio if it is possible */
+	lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
+	if (lo->support_dio)
+		lo->use_aio = true;
+	else
+		lo->use_aio = false;
+
 	set_capacity(lo->lo_disk, size);
 	bd_set_size(bdev, size << 9);
 	loop_sysfs_init(lo);
@@ -1451,6 +1523,12 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (lo->lo_state != Lo_bound)
 		return -EIO;
 
+	if (lo->use_aio && !lo->transfer &&
+			!(cmd->rq->cmd_flags & (REQ_FLUSH | REQ_DISCARD)))
+		cmd->use_aio = true;
+	else
+		cmd->use_aio = false;
+
 	queue_kthread_work(&lo->worker, &cmd->work);
 
 	return BLK_MQ_RQ_QUEUE_OK;
@@ -1470,7 +1548,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
  failed:
 	if (ret)
 		cmd->rq->errors = -EIO;
-	blk_mq_complete_request(cmd->rq);
+	if (!cmd->use_aio || ret)
+		blk_mq_complete_request(cmd->rq);
 }
 
 static void loop_queue_work(struct kthread_work *work)
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 54c6aa5..0af40a0 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -58,6 +58,8 @@ struct loop_device {
 	struct mutex		lo_ctl_mutex;
 	struct kthread_worker	worker;
 	struct task_struct	*worker_task;
+	bool			support_dio;
+	bool			use_aio;
 
 	struct request_queue	*lo_queue;
 	struct blk_mq_tag_set	tag_set;
@@ -68,6 +70,8 @@ struct loop_cmd {
 	struct kthread_work work;
 	struct request *rq;
 	struct list_head list;
+	bool use_aio;           /* use AIO interface to handle I/O */
+	struct kiocb iocb;
 };
 
 /* Support for loadable transfer modules */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-06 17:08 ` [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO Ming Lei
@ 2015-05-07  7:16   ` Christoph Hellwig
  2015-05-07 12:12     ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-07  7:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo

On Thu, May 07, 2015 at 01:08:23AM +0800, Ming Lei wrote:
> When direct IO is submitted from kernel, it is often unnecessary
> to dirty pages, for example of loop, dirtying pages have been
> considered in the upper filesystem(over loop) side already, and
> they don't need to be dirtied again.
> 
> So this patch introduces IOCB_DONT_DIRTY_PAGE flag for direct IO,
> and other ITER_BVEC and ITER_KVEC cases might benefit from it too.
> 
> The patch is based on previous Dave's patch.

in-kernel I/O should never dirty the pages written to or read from,
so just checking the iov_iter type should be enough.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-06 17:08 ` [PATCH v3 3/4] block: loop: use kthread_work Ming Lei
@ 2015-05-07  7:17   ` Christoph Hellwig
  2015-05-07 10:32     ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-07  7:17 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo

On Thu, May 07, 2015 at 01:08:25AM +0800, Ming Lei wrote:
> The following patch will use dio/aio to submit IO to backing file,
> then it isn't good to schedule IO concurrently from work, so
> use kthread_work.

I can't really parse this, what's the specific advantage here?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
  2015-05-06 17:08 ` [PATCH v3 4/4] block: loop: support DIO & AIO Ming Lei
@ 2015-05-07  7:24   ` Christoph Hellwig
  2015-05-07 12:32     ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-07  7:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
	Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
	Tejun Heo, util-linux

> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
>  		mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
>  	lo->old_gfp_mask = mapping_gfp_mask(mapping);
>  	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> +
> +	lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> +	if (lo->support_dio)
> +		lo->use_aio = true;
> +	else
> +		lo->use_aio = false;

We need an explicit userspace op-in for this.  For one direct I/O can't
handle sub-sector size access and people use the loop device as a
workaround for that.  Second this doesn't give anyone seeing negative
results from aio a way to disable it easily.

It think the best way is to require a setup time flag, but enable it to
on in losetup versions that know about it.

> @@ -761,6 +826,13 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
>  	if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
>  		blk_queue_flush(lo->lo_queue, REQ_FLUSH);
>  
> +	/* use aio if it is possible */
> +	lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> +	if (lo->support_dio)
> +		lo->use_aio = true;
> +	else
> +		lo->use_aio = false;

Please always factor out checks like this insted of duplicating them.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-07  7:17   ` Christoph Hellwig
@ 2015-05-07 10:32     ` Ming Lei
  2015-05-11  7:20       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-07 10:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo

On Thu, May 7, 2015 at 3:17 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, May 07, 2015 at 01:08:25AM +0800, Ming Lei wrote:
>> The following patch will use dio/aio to submit IO to backing file,
>> then it isn't good to schedule IO concurrently from work, so
>> use kthread_work.
>
> I can't really parse this, what's the specific advantage here?

Patch 4's commit log provides the test data.

>From the data, it is observed that one thread is enough to get
similar throughput with previous one which submits IO from
work concurrently.

Single thread can decrease context switch a lots, also one thread is
often used to submit AIO in reality.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-07  7:16   ` Christoph Hellwig
@ 2015-05-07 12:12     ` Ming Lei
  2015-05-11  7:19       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-07 12:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo

On Thu, May 7, 2015 at 3:16 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, May 07, 2015 at 01:08:23AM +0800, Ming Lei wrote:
>> When direct IO is submitted from kernel, it is often unnecessary
>> to dirty pages, for example of loop, dirtying pages have been
>> considered in the upper filesystem(over loop) side already, and
>> they don't need to be dirtied again.
>>
>> So this patch introduces IOCB_DONT_DIRTY_PAGE flag for direct IO,
>> and other ITER_BVEC and ITER_KVEC cases might benefit from it too.
>>
>> The patch is based on previous Dave's patch.
>
> in-kernel I/O should never dirty the pages written to or read from,
> so just checking the iov_iter type should be enough.

In case of loop, it is quite specific about dirtying READ pages in
direct IO because fs over loop has considered dirtying these pages
already.

For other cases of ITER_BVEC or ITER_KVEC, if the page is
anonymous or mapped, dirtying is still needed, otherwise
the new written data may be dropped like direct-io from user space.

So I suggest to use the flag of IOCB_DONT_DIRTY_PAGE for
the purpose.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
  2015-05-07  7:24   ` Christoph Hellwig
@ 2015-05-07 12:32     ` Ming Lei
  2015-05-07 22:20       ` Dave Chinner
  0 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-07 12:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
	util-linux

On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
>>               mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
>>       lo->old_gfp_mask = mapping_gfp_mask(mapping);
>>       mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
>> +
>> +     lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
>> +     if (lo->support_dio)
>> +             lo->use_aio = true;
>> +     else
>> +             lo->use_aio = false;
>
> We need an explicit userspace op-in for this.  For one direct I/O can't

Actually this patch is one simplified version, and my old version
has exported two sysfs files(use_aio, use_dio) which can control
if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
I think it isn't necessary because dio/aio works well from the tests,
and userspace shouldn't care if it is AIO or not if the performance
is good.

> handle sub-sector size access and people use the loop device as a
> workaround for that.

Yes, user can do that, could you explain a bit what the problem is?

> Second this doesn't give anyone seeing negative
> results from aio a way to disable it easily.

If there is the requirement, we can export the control interface via
sysfs files easily, but I suggest to not do that from the start for sake
of simplicity.

>
> It think the best way is to require a setup time flag, but enable it to
> on in losetup versions that know about it.

Could you explain a bit why we need losetup involved?

>
>> @@ -761,6 +826,13 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
>>       if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
>>               blk_queue_flush(lo->lo_queue, REQ_FLUSH);
>>
>> +     /* use aio if it is possible */
>> +     lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
>> +     if (lo->support_dio)
>> +             lo->use_aio = true;
>> +     else
>> +             lo->use_aio = false;
>
> Please always factor out checks like this insted of duplicating them.

OK, will do that.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
  2015-05-07 12:32     ` Ming Lei
@ 2015-05-07 22:20       ` Dave Chinner
  2015-05-08  4:09         ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2015-05-07 22:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo, util-linux

On Thu, May 07, 2015 at 08:32:39PM +0800, Ming Lei wrote:
> On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> >> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
> >>               mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
> >>       lo->old_gfp_mask = mapping_gfp_mask(mapping);
> >>       mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> >> +
> >> +     lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> >> +     if (lo->support_dio)
> >> +             lo->use_aio = true;
> >> +     else
> >> +             lo->use_aio = false;
> >
> > We need an explicit userspace op-in for this.  For one direct I/O can't
> 
> Actually this patch is one simplified version, and my old version
> has exported two sysfs files(use_aio, use_dio) which can control
> if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
> I think it isn't necessary because dio/aio works well from the tests,
> and userspace shouldn't care if it is AIO or not if the performance
> is good.

Performance won't always be good.

It looks to me that this has an unbound queue depth for AIO.  What
throttles the amount of IO userspace can throw at an aio-enabled
loop device? If it's unbound, then userspace can throw gigabytes of
random write at the loop device and rather thanbe throttled at 128
outstanding IOs, the queue will just keep growing. That will have
adverse affects on dirty memory throttling, memory reclaim
algorithms, read and write latency, etc.

I suspect that if we are going to make the loop device use AIO, it
will needs a proper queue depth limit (i.e.
/sys/block/loop0/queue/nr_requests) enforced to avoid this sort of
problem...

> > handle sub-sector size access and people use the loop device as a
> > workaround for that.
> 
> Yes, user can do that, could you explain a bit what the problem is?

I have a 4k sector backing device and a 512 byte sector filesystem
image. I can't do 512 byte direct IO to the filesystem image, so I
can't run tools that handle fs images in files using direct Io on
that file. Create a loop device with the filesystem image, and now I
can do 512 byte direct IO to the filesystem image, because all that
direct IO to the filesystem image is now buffered by the loop
device.

If the loop device does direct io in this situation, the backing
filesystem rejects direct IO from the loop device because it is not
sector (4k) sized/aligned. User now swears, shouts and curses you
from afar.

DIO and AIO behaviour needs to be configurable through losetup, and
most definitely not the default behaviour.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
  2015-05-07 22:20       ` Dave Chinner
@ 2015-05-08  4:09         ` Ming Lei
  2015-05-11  1:28           ` Dave Chinner
  0 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-08  4:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo, util-linux

Hi Dave,

Thanks for your comment!

On Fri, May 8, 2015 at 6:20 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 07, 2015 at 08:32:39PM +0800, Ming Lei wrote:
>> On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> >> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
>> >>               mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
>> >>       lo->old_gfp_mask = mapping_gfp_mask(mapping);
>> >>       mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
>> >> +
>> >> +     lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
>> >> +     if (lo->support_dio)
>> >> +             lo->use_aio = true;
>> >> +     else
>> >> +             lo->use_aio = false;
>> >
>> > We need an explicit userspace op-in for this.  For one direct I/O can't
>>
>> Actually this patch is one simplified version, and my old version
>> has exported two sysfs files(use_aio, use_dio) which can control
>> if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
>> I think it isn't necessary because dio/aio works well from the tests,
>> and userspace shouldn't care if it is AIO or not if the performance
>> is good.
>
> Performance won't always be good.
>
> It looks to me that this has an unbound queue depth for AIO.  What
> throttles the amount of IO userspace can throw at an aio-enabled
> loop device? If it's unbound, then userspace can throw gigabytes of
> random write at the loop device and rather thanbe throttled at 128
> outstanding IOs, the queue will just keep growing. That will have
> adverse affects on dirty memory throttling, memory reclaim
> algorithms, read and write latency, etc.
>
> I suspect that if we are going to make the loop device use AIO, it
> will needs a proper queue depth limit (i.e.
> /sys/block/loop0/queue/nr_requests) enforced to avoid this sort of
> problem...

Loop has been converted to blk-mq, and the current queue depth is
128, so there isn't the problem you worried about, is there?

>> > handle sub-sector size access and people use the loop device as a
>> > workaround for that.
>>
>> Yes, user can do that, could you explain a bit what the problem is?
>
> I have a 4k sector backing device and a 512 byte sector filesystem
> image. I can't do 512 byte direct IO to the filesystem image, so I
> can't run tools that handle fs images in files using direct Io on
> that file. Create a loop device with the filesystem image, and now I
> can do 512 byte direct IO to the filesystem image, because all that
> direct IO to the filesystem image is now buffered by the loop
> device.
>
> If the loop device does direct io in this situation, the backing
> filesystem rejects direct IO from the loop device because it is not
> sector (4k) sized/aligned. User now swears, shouts and curses you
> from afar.

Yes, it is one problem, but looks it can be addressed by adding the
following in loop_set_fd():

     if (inode->i_sb->s_bdev)
            blk_queue_logical_block_size(lo->lo_queue,
                                           bdev_io_min(inode->i_sb->s_bdev));

> DIO and AIO behaviour needs to be configurable through losetup, and
> most definitely not the default behaviour.

Could you share if there are other reasons for making it configurable via
losetup suppose the above issues can be fixed?

Thanks,
Ming Lei

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
  2015-05-08  4:09         ` Ming Lei
@ 2015-05-11  1:28           ` Dave Chinner
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2015-05-11  1:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo, util-linux

On Fri, May 08, 2015 at 12:09:04PM +0800, Ming Lei wrote:
> On Fri, May 8, 2015 at 6:20 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, May 07, 2015 at 08:32:39PM +0800, Ming Lei wrote:
> >> On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> >> >> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
> >> >>               mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
> >> >>       lo->old_gfp_mask = mapping_gfp_mask(mapping);
> >> >>       mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> >> >> +
> >> >> +     lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> >> >> +     if (lo->support_dio)
> >> >> +             lo->use_aio = true;
> >> >> +     else
> >> >> +             lo->use_aio = false;
> >> >
> >> > We need an explicit userspace op-in for this.  For one direct I/O can't
> >>
> >> Actually this patch is one simplified version, and my old version
> >> has exported two sysfs files(use_aio, use_dio) which can control
> >> if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
> >> I think it isn't necessary because dio/aio works well from the tests,
> >> and userspace shouldn't care if it is AIO or not if the performance
> >> is good.
> >
> > Performance won't always be good.
> >
> > It looks to me that this has an unbound queue depth for AIO.  What
...
> Loop has been converted to blk-mq, and the current queue depth is
> 128, so there isn't the problem you worried about, is there?

Oh, OK, that's new. I didn't realise this conversion had already
been done.

> >> > handle sub-sector size access and people use the loop device as a
> >> > workaround for that.
> >>
> >> Yes, user can do that, could you explain a bit what the problem is?
> >
> > I have a 4k sector backing device and a 512 byte sector filesystem
> > image. I can't do 512 byte direct IO to the filesystem image, so I
> > can't run tools that handle fs images in files using direct Io on
> > that file. Create a loop device with the filesystem image, and now I
> > can do 512 byte direct IO to the filesystem image, because all that
> > direct IO to the filesystem image is now buffered by the loop
> > device.
> >
> > If the loop device does direct io in this situation, the backing
> > filesystem rejects direct IO from the loop device because it is not
> > sector (4k) sized/aligned. User now swears, shouts and curses you
> > from afar.
> 
> Yes, it is one problem, but looks it can be addressed by adding the
> following in loop_set_fd():
> 
>      if (inode->i_sb->s_bdev)
>             blk_queue_logical_block_size(lo->lo_queue,
>                                            bdev_io_min(inode->i_sb->s_bdev));

How does that address the problem of not being able to do 512 byte
IOs to a loop device that does direct IO to a file hosted on a 4k
sector filesystem?

AFAICT, in that case bdev_io_min(inode->i_sb->s_bdev) will return 4k
because it comes from the backing filesystem, and so the minimum IO
size is still going to be 4k for such configurations...

> > DIO and AIO behaviour needs to be configurable through losetup, and
> > most definitely not the default behaviour.
> 
> Could you share if there are other reasons for making it configurable via
> losetup suppose the above issues can be fixed?

It's not about "fixing issues" - losetup is the tool we use for
configuring loop device behaviour. If a user wants to crete a loop
device with a specific configuration, then that is the tool they
use. You add direct IO for the loop device, that needs to be
configurable because switching to direct IO will significantly
change performance for many workloads loops devices are used for.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-07 12:12     ` Ming Lei
@ 2015-05-11  7:19       ` Christoph Hellwig
  2015-05-11 13:02         ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-11  7:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo

On Thu, May 07, 2015 at 08:12:31PM +0800, Ming Lei wrote:
> In case of loop, it is quite specific about dirtying READ pages in
> direct IO because fs over loop has considered dirtying these pages
> already.
> 
> For other cases of ITER_BVEC or ITER_KVEC, if the page is
> anonymous or mapped, dirtying is still needed, otherwise
> the new written data may be dropped like direct-io from user space.

But those would be user pages.  Anyone who passes a kernel page doesn't
expect it do be dirtied for normal kernel interfaces.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-07 10:32     ` Ming Lei
@ 2015-05-11  7:20       ` Christoph Hellwig
  2015-05-11 13:12         ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-11  7:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo

On Thu, May 07, 2015 at 06:32:03PM +0800, Ming Lei wrote:
> > I can't really parse this, what's the specific advantage here?
> 
> Patch 4's commit log provides the test data.
> 
> >From the data, it is observed that one thread is enough to get
> similar throughput with previous one which submits IO from
> work concurrently.
> 
> Single thread can decrease context switch a lots, also one thread is
> often used to submit AIO in reality.

But we still need to support the non-AIO case.  For one due to
bisectablity, and second even with AIO support we'll still have people
using it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-11  7:19       ` Christoph Hellwig
@ 2015-05-11 13:02         ` Ming Lei
  2015-05-12  7:24           ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-11 13:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo

On Mon, May 11, 2015 at 3:19 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, May 07, 2015 at 08:12:31PM +0800, Ming Lei wrote:
>> In case of loop, it is quite specific about dirtying READ pages in
>> direct IO because fs over loop has considered dirtying these pages
>> already.
>>
>> For other cases of ITER_BVEC or ITER_KVEC, if the page is
>> anonymous or mapped, dirtying is still needed, otherwise
>> the new written data may be dropped like direct-io from user space.
>
> But those would be user pages.  Anyone who passes a kernel page doesn't

Inside kernel it is hard to say one page in page cache is kernel or
user page, :-)

> expect it do be dirtied for normal kernel interfaces.

Currently the coming direct IO introduced for loop should be the 1st direct
read inside kernel, so there isn't the case of passing kernel pages, and
obviously ITER_BVEC/ITER_KVEC doesn't mean the page is 'kernel' page.

Another direct IO inside kernel is __swap_writepage(), and it is still
ITER_BVEC and user page. Maybe in the future swap_readpage()
can use direct IO with BVEC too.

It depends if all kernel direct read I/O to 'user' page(anonymous, mapped,
page cache) via BVEC/KVEC do not need to dirty the page. At least for loop,
I understand the reason is specific, that is why I suggest to use the flag now,
at least it has document benefit because the reason isn't obvious.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-11  7:20       ` Christoph Hellwig
@ 2015-05-11 13:12         ` Ming Lei
  2015-05-12  7:25           ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Ming Lei @ 2015-05-11 13:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo

On Mon, May 11, 2015 at 3:20 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, May 07, 2015 at 06:32:03PM +0800, Ming Lei wrote:
>> > I can't really parse this, what's the specific advantage here?
>>
>> Patch 4's commit log provides the test data.
>>
>> >From the data, it is observed that one thread is enough to get
>> similar throughput with previous one which submits IO from
>> work concurrently.
>>
>> Single thread can decrease context switch a lots, also one thread is
>> often used to submit AIO in reality.
>
> But we still need to support the non-AIO case.  For one due to
> bisectablity, and second even with AIO support we'll still have people
> using it.

For non-AIO case, single thread has been used for long long time,
and it was just converted to work in v4.0, which has caused performance
regression for fedora live booting already. In discussion[1], even though
submitting I/O via work concurrently can improve random IO throughput,
meantime it may hurt sequential IO performance, so maybe better to restore
to single thread behaviour.

For AIO support, I think it is better to use multi hwqueue with per-hwq kthread
than current work approach if there is so high performance requirement for loop.

[1] http://marc.info/?t=143082678400002&r=1&w=2

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-11 13:02         ` Ming Lei
@ 2015-05-12  7:24           ` Christoph Hellwig
  2015-05-12  8:54             ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-12  7:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo

On Mon, May 11, 2015 at 09:02:54PM +0800, Ming Lei wrote:
> > But those would be user pages.  Anyone who passes a kernel page doesn't
> 
> Inside kernel it is hard to say one page in page cache is kernel or
> user page, :-)

Inside the kernel it's easy to say that it's the caller business to
make sure we don't need to carry through hacks like this.

> > expect it do be dirtied for normal kernel interfaces.
> 
> Currently the coming direct IO introduced for loop should be the 1st direct
> read inside kernel, so there isn't the case of passing kernel pages, and
> obviously ITER_BVEC/ITER_KVEC doesn't mean the page is 'kernel' page.
> 
> Another direct IO inside kernel is __swap_writepage(), and it is still
> ITER_BVEC and user page. Maybe in the future swap_readpage()
> can use direct IO with BVEC too.


Swap will have very different rules to deal with page dirtying as it's
part of the VM subsystem.  Again, the caller will now better.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-11 13:12         ` Ming Lei
@ 2015-05-12  7:25           ` Christoph Hellwig
  2015-05-12  8:55             ` Ming Lei
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-05-12  7:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
	Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
	Alexander Viro, Tejun Heo

On Mon, May 11, 2015 at 09:12:56PM +0800, Ming Lei wrote:
> For non-AIO case, single thread has been used for long long time,
> and it was just converted to work in v4.0, which has caused performance
> regression for fedora live booting already. In discussion[1], even though
> submitting I/O via work concurrently can improve random IO throughput,
> meantime it may hurt sequential IO performance, so maybe better to restore
> to single thread behaviour.

Then justify the change based on the non-aio use case and document that
in the changelog.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO
  2015-05-12  7:24           ` Christoph Hellwig
@ 2015-05-12  8:54             ` Ming Lei
  0 siblings, 0 replies; 22+ messages in thread
From: Ming Lei @ 2015-05-12  8:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo

On Tue, May 12, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, May 11, 2015 at 09:02:54PM +0800, Ming Lei wrote:
>> > But those would be user pages.  Anyone who passes a kernel page doesn't
>>
>> Inside kernel it is hard to say one page in page cache is kernel or
>> user page, :-)
>
> Inside the kernel it's easy to say that it's the caller business to
> make sure we don't need to carry through hacks like this.
>
>> > expect it do be dirtied for normal kernel interfaces.
>>
>> Currently the coming direct IO introduced for loop should be the 1st direct
>> read inside kernel, so there isn't the case of passing kernel pages, and
>> obviously ITER_BVEC/ITER_KVEC doesn't mean the page is 'kernel' page.
>>
>> Another direct IO inside kernel is __swap_writepage(), and it is still
>> ITER_BVEC and user page. Maybe in the future swap_readpage()
>> can use direct IO with BVEC too.
>
>
> Swap will have very different rules to deal with page dirtying as it's
> part of the VM subsystem.  Again, the caller will now better.

Yes, the caller knows better, that is why I introduce the flag, so
the caller can set the flag if it need direct-io to bypass the dirtying,
otherwise the caller can just clear the flag.  Or I understand you wrong?


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] block: loop: use kthread_work
  2015-05-12  7:25           ` Christoph Hellwig
@ 2015-05-12  8:55             ` Ming Lei
  0 siblings, 0 replies; 22+ messages in thread
From: Ming Lei @ 2015-05-12  8:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
	Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo

On Tue, May 12, 2015 at 3:25 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, May 11, 2015 at 09:12:56PM +0800, Ming Lei wrote:
>> For non-AIO case, single thread has been used for long long time,
>> and it was just converted to work in v4.0, which has caused performance
>> regression for fedora live booting already. In discussion[1], even though
>> submitting I/O via work concurrently can improve random IO throughput,
>> meantime it may hurt sequential IO performance, so maybe better to restore
>> to single thread behaviour.
>
> Then justify the change based on the non-aio use case and document that
> in the changelog.

OK, will add that in v4.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-05-12  8:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-06 17:08 [PATCH v3 0/4] block: loop: improve loop with AIO Ming Lei
2015-05-06 17:08 ` [PATCH v3 1/4] fs: kiocb: introduce IOCB_DONT_DIRTY_PAGE flag for direct IO Ming Lei
2015-05-07  7:16   ` Christoph Hellwig
2015-05-07 12:12     ` Ming Lei
2015-05-11  7:19       ` Christoph Hellwig
2015-05-11 13:02         ` Ming Lei
2015-05-12  7:24           ` Christoph Hellwig
2015-05-12  8:54             ` Ming Lei
2015-05-06 17:08 ` [PATCH v3 2/4] block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop Ming Lei
2015-05-06 17:08 ` [PATCH v3 3/4] block: loop: use kthread_work Ming Lei
2015-05-07  7:17   ` Christoph Hellwig
2015-05-07 10:32     ` Ming Lei
2015-05-11  7:20       ` Christoph Hellwig
2015-05-11 13:12         ` Ming Lei
2015-05-12  7:25           ` Christoph Hellwig
2015-05-12  8:55             ` Ming Lei
2015-05-06 17:08 ` [PATCH v3 4/4] block: loop: support DIO & AIO Ming Lei
2015-05-07  7:24   ` Christoph Hellwig
2015-05-07 12:32     ` Ming Lei
2015-05-07 22:20       ` Dave Chinner
2015-05-08  4:09         ` Ming Lei
2015-05-11  1:28           ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.