All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Non-blocking AIO
@ 2017-02-14  2:45 Goldwyn Rodrigues
  2017-02-14  2:45 ` [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING Goldwyn Rodrigues
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack

This series adds nonblocking feature to asynchronous I/O writes.
io_submit() can be delayed because of a number of reason:
 - Block allocation for files
 - Data writebacks for direct I/O
 - Sleeping because of waiting to acquire i_rwsem
 - Congested block device

The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
any of these conditions are met. This way userspace can push most
of the write()s to the kernel to the best of its ability to complete
and if it returns -EAGAIN, can defer it to another thread.

In order to enable this, IOCB_FLAG_NONBLOCKING is introduced in 
uapi/linux/aio_abi.h which translates to IOCB_BLOCKING for struct iocb.

This feature is provided for direct I/O of asynchronous I/O only. I have
tested it against xfs, ext4, and btrfs.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
@ 2017-02-14  2:45 ` Goldwyn Rodrigues
  2017-02-14 18:51   ` Jeff Moyer
  2017-02-20  9:23   ` Christoph Hellwig
  2017-02-14  2:45 ` [PATCH 2/7] noblocking aio: Return if cannot get hold of i_rwsem Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This flag informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered
by direct I/O.

IOCB_FLAG_NONBLOCKING is translated to IOCB_NONBLOCKING for
iocb->ki_flags.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/aio.c                     | 3 +++
 include/linux/fs.h           | 1 +
 include/uapi/linux/aio_abi.h | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 428484f..0725756 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1551,6 +1551,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 		req->common.ki_flags |= IOCB_EVENTFD;
 	}
 
+	if (iocb->aio_flags & IOCB_FLAG_NONBLOCKING)
+		req->common.ki_flags |= IOCB_NONBLOCKING;
+
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
 	if (unlikely(ret)) {
 		pr_debug("EFAULT: aio_key\n");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dc0478c..bfbaba6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -322,6 +322,7 @@ struct writeback_control;
 #define IOCB_DSYNC		(1 << 4)
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
+#define IOCB_NONBLOCKING	(1 << 7)
 
 struct kiocb {
 	struct file		*ki_filp;
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index bb2554f..03ce4c2 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -51,8 +51,11 @@ enum {
  *
  * IOCB_FLAG_RESFD - Set if the "aio_resfd" member of the "struct iocb"
  *                   is valid.
+ * IOCB_FLAG_NOBLOCK - Set if the user wants the iocb to fail if it would block
+ *			for operations such as disk allocation.
  */
 #define IOCB_FLAG_RESFD		(1 << 0)
+#define IOCB_FLAG_NONBLOCKING	(1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] noblocking aio: Return if cannot get hold of i_rwsem
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
  2017-02-14  2:45 ` [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING Goldwyn Rodrigues
@ 2017-02-14  2:45 ` Goldwyn Rodrigues
  2017-02-20  9:24   ` Christoph Hellwig
  2017-02-14  2:45 ` [PATCH 3/7] nonblocking aio: return if direct write will trigger writeback Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

A failure to lock i_rwsem would mean there is I/O being performed
by another thread. So, let's bail.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 mm/filemap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index d8d7df8..50cee70 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2885,7 +2885,12 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = file->f_mapping->host;
 	ssize_t ret;
 
-	inode_lock(inode);
+	/* Don't sleep on inode rwsem */
+	if (iocb->ki_flags & IOCB_NONBLOCKING) {
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+	} else
+		inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret > 0)
 		ret = __generic_file_write_iter(iocb, from);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] nonblocking aio: return if direct write will trigger writeback
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
  2017-02-14  2:45 ` [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING Goldwyn Rodrigues
  2017-02-14  2:45 ` [PATCH 2/7] noblocking aio: Return if cannot get hold of i_rwsem Goldwyn Rodrigues
@ 2017-02-14  2:45 ` Goldwyn Rodrigues
  2017-02-20  9:25   ` Christoph Hellwig
  2017-02-14  2:46 ` [PATCH 4/7] nonblocking aio: return on congested block device Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Find out if the write will trigger a wait due to writeback. If yes,
return -EAGAIN.

This introduces a new function filemap_range_has_page() which
returns true if the file's mapping has a page within the range
mentioned.

Return -EINVAL for buffered AIO: there are multiple causes of
delay such as page locks, dirty throttling logic, page loading
from disk etc. which cannot be taken care of.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index bfbaba6..7e657a2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2539,6 +2539,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+				   loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 50cee70..f8e78b3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -424,6 +424,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:		address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+				loff_t start_byte, loff_t end_byte)
+{
+	pgoff_t index = start_byte >> PAGE_SHIFT;
+	pgoff_t end = end_byte >> PAGE_SHIFT;
+	struct pagevec pvec;
+	int ret;
+
+	if (end_byte < start_byte)
+		return 0;
+
+	if (mapping->nrpages == 0)
+		return 0;
+
+	pagevec_init(&pvec, 0);
+	ret = pagevec_lookup(&pvec, mapping, index, 1);
+	if (!ret)
+		return 0;
+	ret = (pvec.pages[0]->index <= end);
+	pagevec_release(&pvec);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 				     loff_t start_byte, loff_t end_byte)
 {
@@ -2543,6 +2576,17 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
 
 	pos = iocb->ki_pos;
 
+	if (iocb->ki_flags & IOCB_NONBLOCKING) {
+		loff_t end = pos + iov_iter_count(from);
+
+		if (iocb->ki_flags & IOCB_DIRECT) {
+			/* Will trigger a writeback */
+			if (filemap_range_has_page(inode->i_mapping, pos, end))
+				return -EAGAIN;
+		} else
+			return -EINVAL;
+	}
+
 	if (limit != RLIM_INFINITY) {
 		if (iocb->ki_pos >= limit) {
 			send_sig(SIGXFSZ, current, 0);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] nonblocking aio: return on congested block device
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2017-02-14  2:45 ` [PATCH 3/7] nonblocking aio: return if direct write will trigger writeback Goldwyn Rodrigues
@ 2017-02-14  2:46 ` Goldwyn Rodrigues
  2017-02-14  3:55   ` Ming Lei
  2017-02-14  2:46 ` [PATCH 5/7] nonblocking aio: ext4 Goldwyn Rodrigues
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues, linux-block

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

A new flag BIO_NONBLOCKING is introduced to identify bio's
orignating from iocb with IOCB_NONBLOCKING. struct request
are requested using BLK_MQ_REQ_NOWAIT if BIO_NONBLOCKING is set.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 block/blk-core.c          | 13 +++++++++++--
 block/blk-mq.c            | 18 ++++++++++++++++--
 fs/direct-io.c            | 11 +++++++++--
 include/linux/blk_types.h |  1 +
 4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..9767573 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1257,6 +1257,11 @@ static struct request *get_request(struct request_queue *q, int op,
 	if (!IS_ERR(rq))
 		return rq;
 
+	if (bio_flagged(bio, BIO_NONBLOCKING)) {
+		blk_put_rl(rl);
+		return ERR_PTR(-EAGAIN);
+	}
+
 	if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
 		blk_put_rl(rl);
 		return rq;
@@ -2035,7 +2040,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
-		if (likely(blk_queue_enter(q, false) == 0)) {
+		if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NONBLOCKING)) == 0)) {
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
@@ -2044,7 +2049,11 @@ blk_qc_t generic_make_request(struct bio *bio)
 		} else {
 			struct bio *bio_next = bio_list_pop(current->bio_list);
 
-			bio_io_error(bio);
+			if (unlikely(bio_flagged(bio, BIO_NONBLOCKING))) {
+				bio->bi_error = -EAGAIN;
+				bio_endio(bio);
+			} else
+				bio_io_error(bio);
 			bio = bio_next;
 		}
 	} while (bio);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 81caceb..7a7c674 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1213,6 +1213,8 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 
 	trace_block_getrq(q, bio, op);
 	blk_mq_set_alloc_data(&alloc_data, q, 0, ctx, hctx);
+	if (bio_flagged(bio, BIO_NONBLOCKING))
+		alloc_data.flags |= BLK_MQ_REQ_NOWAIT;
 	rq = __blk_mq_alloc_request(&alloc_data, op, op_flags);
 
 	data->hctx = alloc_data.hctx;
@@ -1286,8 +1288,14 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		return BLK_QC_T_NONE;
 
 	rq = blk_mq_map_request(q, bio, &data);
-	if (unlikely(!rq))
+	if (unlikely(!rq)) {
+		if (bio_flagged(bio, BIO_NONBLOCKING))
+			bio->bi_error = -EAGAIN;
+		else
+			bio->bi_error = -EIO;
+		bio_endio(bio);
 		return BLK_QC_T_NONE;
+	}
 
 	cookie = blk_tag_to_qc_t(rq->tag, data.hctx->queue_num);
 
@@ -1381,8 +1389,14 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
 		request_count = blk_plug_queued_count(q);
 
 	rq = blk_mq_map_request(q, bio, &data);
-	if (unlikely(!rq))
+	if (unlikely(!rq)) {
+		if (bio_flagged(bio, BIO_NONBLOCKING))
+			bio->bi_error = -EAGAIN;
+		else
+			bio->bi_error = -EIO;
+		bio_endio(bio);
 		return BLK_QC_T_NONE;
+	}
 
 	cookie = blk_tag_to_qc_t(rq->tag, data.hctx->queue_num);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index fb9aa16..9997fed 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -386,6 +386,9 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 	else
 		bio->bi_end_io = dio_bio_end_io;
 
+	if (dio->iocb->ki_flags & IOCB_NONBLOCKING)
+		bio_set_flag(bio, BIO_NONBLOCKING);
+
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
@@ -480,8 +483,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 	unsigned i;
 	int err;
 
-	if (bio->bi_error)
-		dio->io_error = -EIO;
+	if (bio->bi_error) {
+		if (bio_flagged(bio, BIO_NONBLOCKING))
+			dio->io_error = bio->bi_error;
+		else
+			dio->io_error = -EIO;
+	}
 
 	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
 		err = bio->bi_error;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..94855cf 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -119,6 +119,7 @@ struct bio {
 #define BIO_QUIET	6	/* Make BIO Quiet */
 #define BIO_CHAIN	7	/* chained bio, ->bi_remaining in effect */
 #define BIO_REFFED	8	/* bio has elevated ->bi_cnt */
+#define BIO_NONBLOCKING 9	/* don't block over blk device congestion */
 
 /*
  * Flags starting here get preserved by bio_reset() - this includes
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] nonblocking aio: ext4
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2017-02-14  2:46 ` [PATCH 4/7] nonblocking aio: return on congested block device Goldwyn Rodrigues
@ 2017-02-14  2:46 ` Goldwyn Rodrigues
  2017-02-14 19:52   ` Andreas Dilger
  2017-02-14  2:46 ` [PATCH 6/7] nonblocking aio: xfs Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues, linux-ext4

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Return EAGAIN if any of the following checks fail for direct I/O:
 + i_rwsem is lockable
 + Writing beyond end of file (will trigger allocation)
 + Blocks are allocated at the write location

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/ext4/file.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 2a822d3..c8d1e41 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -93,11 +93,16 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 	int o_direct = iocb->ki_flags & IOCB_DIRECT;
+	int nonblocking = iocb->ki_flags & IOCB_NONBLOCKING;
 	int unaligned_aio = 0;
 	int overwrite = 0;
 	ssize_t ret;
 
-	inode_lock(inode);
+	if (o_direct && nonblocking) {
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+	} else
+		inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret <= 0)
 		goto out;
@@ -132,12 +137,18 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (o_direct) {
 		size_t length = iov_iter_count(from);
 		loff_t pos = iocb->ki_pos;
+		unsigned int blkbits = inode->i_blkbits;
+
+		if (nonblocking
+			&& (pos + length > EXT4_BLOCK_ALIGN(i_size_read(inode), blkbits))) {
+			ret = -EAGAIN;
+			goto out;
+		}
 
 		/* check whether we do a DIO overwrite or not */
-		if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
-		    pos + length <= i_size_read(inode)) {
+		if ((ext4_should_dioread_nolock(inode) && !unaligned_aio &&
+			    pos + length <= i_size_read(inode)) || nonblocking) {
 			struct ext4_map_blocks map;
-			unsigned int blkbits = inode->i_blkbits;
 			int err, len;
 
 			map.m_lblk = pos >> blkbits;
@@ -157,8 +168,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			 * non-flags are returned.  So we should check
 			 * these two conditions.
 			 */
-			if (err == len && (map.m_flags & EXT4_MAP_MAPPED))
-				overwrite = 1;
+			if (err == len) {
+			       if (map.m_flags & EXT4_MAP_MAPPED)
+				       overwrite = 1;
+			} else if (nonblocking) {
+				ret = -EAGAIN;
+				goto out;
+			}
 		}
 	}
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] nonblocking aio: xfs
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
                   ` (4 preceding siblings ...)
  2017-02-14  2:46 ` [PATCH 5/7] nonblocking aio: ext4 Goldwyn Rodrigues
@ 2017-02-14  2:46 ` Goldwyn Rodrigues
  2017-02-14  6:44   ` Darrick J. Wong
  2017-02-14  7:43   ` Christoph Hellwig
  2017-02-14  2:46 ` [PATCH 7/7] nonblocking aio: btrfs Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues, linux-xfs

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

If IOCB_NONBLOCKING is set:
	+ Check if writing beyond end of file, if yes return EAGAIN
	- check if writing to a hole which does not have allocated
	  file blocks.
	- Check if i_rwsem is immediately lockable in xfs_rw_ilock()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/xfs/xfs_file.c  | 36 ++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_inode.h |  2 ++
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9a5d64b..42f055f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -51,14 +51,20 @@ static const struct vm_operations_struct xfs_file_vm_ops;
  * Locking primitives for read and write IO paths to ensure we consistently use
  * and order the inode->i_mutex, ip->i_lock and ip->i_iolock.
  */
-static inline void
+static inline int
 xfs_rw_ilock(
 	struct xfs_inode	*ip,
 	int			type)
 {
-	if (type & XFS_IOLOCK_EXCL)
-		inode_lock(VFS_I(ip));
+	if (type & XFS_IOLOCK_EXCL) {
+		if ((type & XFS_IOLOCK_NONBLOCKING) &&
+				!inode_trylock(VFS_I(ip)))
+			return -EAGAIN;
+		else
+			inode_lock(VFS_I(ip));
+	}
 	xfs_ilock(ip, type);
+	return 0;
 }
 
 static inline void
@@ -418,6 +424,24 @@ xfs_file_aio_write_checks(
 	if (error <= 0)
 		return error;
 
+	if (iocb->ki_flags & IOCB_NONBLOCKING) {
+		struct xfs_bmbt_irec	imap;
+		xfs_fileoff_t           offset_fsb, end_fsb;
+		int			nimaps = 1, ret = 0;
+		end_fsb = XFS_B_TO_FSB(ip->i_mount, iocb->ki_pos + count);
+		if (XFS_B_TO_FSB(ip->i_mount, i_size_read(inode)) < end_fsb)
+			return -EAGAIN;
+		/* Check if it is an unallocated hole */
+		offset_fsb = XFS_B_TO_FSBT(ip->i_mount, iocb->ki_pos);
+
+		ret = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
+				&nimaps, 0);
+		if (ret)
+			return ret;
+		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK)
+			return -EAGAIN;
+	}
+
 	error = xfs_break_layouts(inode, iolock, true);
 	if (error)
 		return error;
@@ -555,11 +579,15 @@ xfs_file_dio_aio_write(
 	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
 		unaligned_io = 1;
 		iolock = XFS_IOLOCK_EXCL;
+		if (iocb->ki_flags & IOCB_NONBLOCKING)
+			iolock |= XFS_IOLOCK_NONBLOCKING;
 	} else {
 		iolock = XFS_IOLOCK_SHARED;
 	}
 
-	xfs_rw_ilock(ip, iolock);
+	ret = xfs_rw_ilock(ip, iolock);
+	if (ret)
+		return ret;
 
 	ret = xfs_file_aio_write_checks(iocb, from, &iolock);
 	if (ret)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 71e8a81..1a2d5eb 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -283,6 +283,7 @@ static inline void xfs_ifunlock(struct xfs_inode *ip)
 #define	XFS_ILOCK_SHARED	(1<<3)
 #define	XFS_MMAPLOCK_EXCL	(1<<4)
 #define	XFS_MMAPLOCK_SHARED	(1<<5)
+#define	XFS_IOLOCK_NONBLOCKING	(1<<6)
 
 #define XFS_LOCK_MASK		(XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \
 				| XFS_ILOCK_EXCL | XFS_ILOCK_SHARED \
@@ -291,6 +292,7 @@ static inline void xfs_ifunlock(struct xfs_inode *ip)
 #define XFS_LOCK_FLAGS \
 	{ XFS_IOLOCK_EXCL,	"IOLOCK_EXCL" }, \
 	{ XFS_IOLOCK_SHARED,	"IOLOCK_SHARED" }, \
+	{ XFS_IOLOCK_NONBLOCKING,	"IOLOCK_NONBLOCKING" }, \
 	{ XFS_ILOCK_EXCL,	"ILOCK_EXCL" }, \
 	{ XFS_ILOCK_SHARED,	"ILOCK_SHARED" }, \
 	{ XFS_MMAPLOCK_EXCL,	"MMAPLOCK_EXCL" }, \
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] nonblocking aio: btrfs
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
                   ` (5 preceding siblings ...)
  2017-02-14  2:46 ` [PATCH 6/7] nonblocking aio: xfs Goldwyn Rodrigues
@ 2017-02-14  2:46 ` Goldwyn Rodrigues
  2017-02-14  9:24   ` Nikolay Borisov
  2017-02-20  9:21 ` [PATCH 0/7] Non-blocking AIO Christoph Hellwig
  2017-03-03 10:50   ` Michael Kerrisk
  8 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-14  2:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, Goldwyn Rodrigues, linux-btrfs

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Return EAGAIN if any of the following checks fail
 + i_rwsem is lockable
 + NODATACOW or PREALLOC is set
 + Can nocow at the desired location
 + Writing beyond end of file

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c  | 25 ++++++++++++++++++++-----
 fs/btrfs/inode.c |  3 +++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 3a14c87..f4dcc7a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1804,12 +1804,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	ssize_t num_written = 0;
 	bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host);
 	ssize_t err;
-	loff_t pos;
-	size_t count;
+	loff_t pos = iocb->ki_pos;
+	size_t count = iov_iter_count(from);
 	loff_t oldsize;
 	int clean_page = 0;
 
-	inode_lock(inode);
+	if ((iocb->ki_flags & IOCB_NONBLOCKING) &&
+			(iocb->ki_flags & IOCB_DIRECT)) {
+		/* Don't sleep on inode rwsem */
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+		/*
+		 * We will allocate space in case nodatacow is not set,
+		 * so bail
+		 */
+		if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+					      BTRFS_INODE_PREALLOC)) ||
+		    check_can_nocow(inode, pos, &count) <= 0) {
+			inode_unlock(inode);
+			return -EAGAIN;
+		}
+	} else
+		inode_lock(inode);
+
 	err = generic_write_checks(iocb, from);
 	if (err <= 0) {
 		inode_unlock(inode);
@@ -1843,8 +1860,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	 */
 	update_time_for_write(inode);
 
-	pos = iocb->ki_pos;
-	count = iov_iter_count(from);
 	start_pos = round_down(pos, root->sectorsize);
 	oldsize = i_size_read(inode);
 	if (start_pos > oldsize) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index be4da91..9911ebd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8690,6 +8690,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		if (offset + count <= inode->i_size) {
 			inode_unlock(inode);
 			relock = true;
+		} else if (iocb->ki_flags & IOCB_NONBLOCKING) {
+			ret = -EAGAIN;
+			goto out;
 		}
 		ret = btrfs_delalloc_reserve_space(inode, offset, count);
 		if (ret)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] nonblocking aio: return on congested block device
  2017-02-14  2:46 ` [PATCH 4/7] nonblocking aio: return on congested block device Goldwyn Rodrigues
@ 2017-02-14  3:55   ` Ming Lei
  2017-02-15 11:13     ` Goldwyn Rodrigues
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-02-14  3:55 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: Linux FS Devel, Jan Kara, Goldwyn Rodrigues, linux-block

On Tue, Feb 14, 2017 at 10:46 AM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> A new flag BIO_NONBLOCKING is introduced to identify bio's
> orignating from iocb with IOCB_NONBLOCKING. struct request
> are requested using BLK_MQ_REQ_NOWAIT if BIO_NONBLOCKING is set.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  block/blk-core.c          | 13 +++++++++++--
>  block/blk-mq.c            | 18 ++++++++++++++++--
>  fs/direct-io.c            | 11 +++++++++--
>  include/linux/blk_types.h |  1 +
>  4 files changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 14d7c07..9767573 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1257,6 +1257,11 @@ static struct request *get_request(struct request_queue *q, int op,
>         if (!IS_ERR(rq))
>                 return rq;
>
> +       if (bio_flagged(bio, BIO_NONBLOCKING)) {
> +               blk_put_rl(rl);
> +               return ERR_PTR(-EAGAIN);
> +       }
> +
>         if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
>                 blk_put_rl(rl);
>                 return rq;
> @@ -2035,7 +2040,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>         do {
>                 struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>
> -               if (likely(blk_queue_enter(q, false) == 0)) {
> +               if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NONBLOCKING)) == 0)) {
>                         ret = q->make_request_fn(q, bio);
>
>                         blk_queue_exit(q);
> @@ -2044,7 +2049,11 @@ blk_qc_t generic_make_request(struct bio *bio)
>                 } else {
>                         struct bio *bio_next = bio_list_pop(current->bio_list);
>
> -                       bio_io_error(bio);
> +                       if (unlikely(bio_flagged(bio, BIO_NONBLOCKING))) {
> +                               bio->bi_error = -EAGAIN;
> +                               bio_endio(bio);
> +                       } else
> +                               bio_io_error(bio);
>                         bio = bio_next;
>                 }
>         } while (bio);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 81caceb..7a7c674 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1213,6 +1213,8 @@ static struct request *blk_mq_map_request(struct request_queue *q,
>
>         trace_block_getrq(q, bio, op);
>         blk_mq_set_alloc_data(&alloc_data, q, 0, ctx, hctx);
> +       if (bio_flagged(bio, BIO_NONBLOCKING))
> +               alloc_data.flags |= BLK_MQ_REQ_NOWAIT;
>         rq = __blk_mq_alloc_request(&alloc_data, op, op_flags);
>
>         data->hctx = alloc_data.hctx;
> @@ -1286,8 +1288,14 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 return BLK_QC_T_NONE;
>
>         rq = blk_mq_map_request(q, bio, &data);
> -       if (unlikely(!rq))
> +       if (unlikely(!rq)) {
> +               if (bio_flagged(bio, BIO_NONBLOCKING))
> +                       bio->bi_error = -EAGAIN;
> +               else
> +                       bio->bi_error = -EIO;
> +               bio_endio(bio);
>                 return BLK_QC_T_NONE;
> +       }
>
>         cookie = blk_tag_to_qc_t(rq->tag, data.hctx->queue_num);
>
> @@ -1381,8 +1389,14 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
>                 request_count = blk_plug_queued_count(q);
>
>         rq = blk_mq_map_request(q, bio, &data);
> -       if (unlikely(!rq))
> +       if (unlikely(!rq)) {
> +               if (bio_flagged(bio, BIO_NONBLOCKING))
> +                       bio->bi_error = -EAGAIN;
> +               else
> +                       bio->bi_error = -EIO;
> +               bio_endio(bio);
>                 return BLK_QC_T_NONE;
> +       }

There are other places in which blocking may be triggered, such
as allocating for bio clone, wbt_wait(), and sleep in .make_request(),
like md, dm and bcache's.

IMO it should be hard to deal with all, so what is the expection for
flag of BIO_NONBLOCKING?

>
>         cookie = blk_tag_to_qc_t(rq->tag, data.hctx->queue_num);
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index fb9aa16..9997fed 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -386,6 +386,9 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
>         else
>                 bio->bi_end_io = dio_bio_end_io;
>
> +       if (dio->iocb->ki_flags & IOCB_NONBLOCKING)
> +               bio_set_flag(bio, BIO_NONBLOCKING);
> +
>         sdio->bio = bio;
>         sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
>  }
> @@ -480,8 +483,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
>         unsigned i;
>         int err;
>
> -       if (bio->bi_error)
> -               dio->io_error = -EIO;
> +       if (bio->bi_error) {
> +               if (bio_flagged(bio, BIO_NONBLOCKING))
> +                       dio->io_error = bio->bi_error;
> +               else
> +                       dio->io_error = -EIO;
> +       }
>
>         if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
>                 err = bio->bi_error;
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index cd395ec..94855cf 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -119,6 +119,7 @@ struct bio {
>  #define BIO_QUIET      6       /* Make BIO Quiet */
>  #define BIO_CHAIN      7       /* chained bio, ->bi_remaining in effect */
>  #define BIO_REFFED     8       /* bio has elevated ->bi_cnt */
> +#define BIO_NONBLOCKING 9      /* don't block over blk device congestion */
>
>  /*
>   * Flags starting here get preserved by bio_reset() - this includes
> --
> 2.10.2
>



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] nonblocking aio: xfs
  2017-02-14  2:46 ` [PATCH 6/7] nonblocking aio: xfs Goldwyn Rodrigues
@ 2017-02-14  6:44   ` Darrick J. Wong
  2017-02-14  7:43   ` Christoph Hellwig
  1 sibling, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2017-02-14  6:44 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues, linux-xfs

On Mon, Feb 13, 2017 at 08:46:02PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Send the whole patchset to linux-xfs, please.

> If IOCB_NONBLOCKING is set:
> 	+ Check if writing beyond end of file, if yes return EAGAIN
> 	- check if writing to a hole which does not have allocated
> 	  file blocks.
> 	- Check if i_rwsem is immediately lockable in xfs_rw_ilock()
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/xfs/xfs_file.c  | 36 ++++++++++++++++++++++++++++++++----
>  fs/xfs/xfs_inode.h |  2 ++
>  2 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 9a5d64b..42f055f 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -51,14 +51,20 @@ static const struct vm_operations_struct xfs_file_vm_ops;
>   * Locking primitives for read and write IO paths to ensure we consistently use
>   * and order the inode->i_mutex, ip->i_lock and ip->i_iolock.
>   */
> -static inline void
> +static inline int
>  xfs_rw_ilock(
>  	struct xfs_inode	*ip,
>  	int			type)
>  {
> -	if (type & XFS_IOLOCK_EXCL)
> -		inode_lock(VFS_I(ip));
> +	if (type & XFS_IOLOCK_EXCL) {
> +		if ((type & XFS_IOLOCK_NONBLOCKING) &&
> +				!inode_trylock(VFS_I(ip)))
> +			return -EAGAIN;
> +		else
> +			inode_lock(VFS_I(ip));
> +	}
>  	xfs_ilock(ip, type);
> +	return 0;

This function went away in 4.10-rc1.

>  }
>  
>  static inline void
> @@ -418,6 +424,24 @@ xfs_file_aio_write_checks(
>  	if (error <= 0)
>  		return error;
>  
> +	if (iocb->ki_flags & IOCB_NONBLOCKING) {
> +		struct xfs_bmbt_irec	imap;
> +		xfs_fileoff_t           offset_fsb, end_fsb;
> +		int			nimaps = 1, ret = 0;
> +		end_fsb = XFS_B_TO_FSB(ip->i_mount, iocb->ki_pos + count);
> +		if (XFS_B_TO_FSB(ip->i_mount, i_size_read(inode)) < end_fsb)
> +			return -EAGAIN;
> +		/* Check if it is an unallocated hole */
> +		offset_fsb = XFS_B_TO_FSBT(ip->i_mount, iocb->ki_pos);
> +
> +		ret = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> +				&nimaps, 0);

You need to hold the ilock to call _bmapi_read, and I don't think
it's held here.  Did you CONFIG_XFS_DEBUG=y?

> +		if (ret)
> +			return ret;
> +		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK)
> +			return -EAGAIN;
> +	}
> +
>  	error = xfs_break_layouts(inode, iolock, true);
>  	if (error)
>  		return error;
> @@ -555,11 +579,15 @@ xfs_file_dio_aio_write(
>  	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
>  		unaligned_io = 1;
>  		iolock = XFS_IOLOCK_EXCL;
> +		if (iocb->ki_flags & IOCB_NONBLOCKING)
> +			iolock |= XFS_IOLOCK_NONBLOCKING;
>  	} else {
>  		iolock = XFS_IOLOCK_SHARED;
>  	}
>  
> -	xfs_rw_ilock(ip, iolock);
> +	ret = xfs_rw_ilock(ip, iolock);
> +	if (ret)
> +		return ret;
>  
>  	ret = xfs_file_aio_write_checks(iocb, from, &iolock);
>  	if (ret)
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 71e8a81..1a2d5eb 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -283,6 +283,7 @@ static inline void xfs_ifunlock(struct xfs_inode *ip)
>  #define	XFS_ILOCK_SHARED	(1<<3)
>  #define	XFS_MMAPLOCK_EXCL	(1<<4)
>  #define	XFS_MMAPLOCK_SHARED	(1<<5)
> +#define	XFS_IOLOCK_NONBLOCKING	(1<<6)

What is the expected behavior if this is passed directly to xfs_ilock?

--D

>  
>  #define XFS_LOCK_MASK		(XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \
>  				| XFS_ILOCK_EXCL | XFS_ILOCK_SHARED \
> @@ -291,6 +292,7 @@ static inline void xfs_ifunlock(struct xfs_inode *ip)
>  #define XFS_LOCK_FLAGS \
>  	{ XFS_IOLOCK_EXCL,	"IOLOCK_EXCL" }, \
>  	{ XFS_IOLOCK_SHARED,	"IOLOCK_SHARED" }, \
> +	{ XFS_IOLOCK_NONBLOCKING,	"IOLOCK_NONBLOCKING" }, \
>  	{ XFS_ILOCK_EXCL,	"ILOCK_EXCL" }, \
>  	{ XFS_ILOCK_SHARED,	"ILOCK_SHARED" }, \
>  	{ XFS_MMAPLOCK_EXCL,	"MMAPLOCK_EXCL" }, \
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] nonblocking aio: xfs
  2017-02-14  2:46 ` [PATCH 6/7] nonblocking aio: xfs Goldwyn Rodrigues
  2017-02-14  6:44   ` Darrick J. Wong
@ 2017-02-14  7:43   ` Christoph Hellwig
  2017-02-15 15:30     ` Goldwyn Rodrigues
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2017-02-14  7:43 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues, linux-xfs

On Mon, Feb 13, 2017 at 08:46:02PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Please Cc the while series to all lists, otherwiwe it's impossible
to review the thing.

> 
> If IOCB_NONBLOCKING is set:
> 	+ Check if writing beyond end of file, if yes return EAGAIN
> 	- check if writing to a hole which does not have allocated
> 	  file blocks.
> 	- Check if i_rwsem is immediately lockable in xfs_rw_ilock()

Why the + vs - above? 

> -static inline void
> +static inline int
>  xfs_rw_ilock(

This function has been removed a while ago.

>  
> +	if (iocb->ki_flags & IOCB_NONBLOCKING) {
> +		struct xfs_bmbt_irec	imap;
> +		xfs_fileoff_t           offset_fsb, end_fsb;
> +		int			nimaps = 1, ret = 0;
> +		end_fsb = XFS_B_TO_FSB(ip->i_mount, iocb->ki_pos + count);
> +		if (XFS_B_TO_FSB(ip->i_mount, i_size_read(inode)) < end_fsb)
> +			return -EAGAIN;

Bogus check, XFS supports async write beyond EOF if the space is
preallocated.

> +		/* Check if it is an unallocated hole */
> +		offset_fsb = XFS_B_TO_FSBT(ip->i_mount, iocb->ki_pos);
> +
> +		ret = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> +				&nimaps, 0);
> +		if (ret)
> +			return ret;
> +		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK)
> +			return -EAGAIN;

This would need the ilock.  But extent list lookups are expensive,
so please don't add another one here.  Just return when we hit the
first unallocated extent in xfs_bmapi_write - either a short write or
-EAGAIN if nothing has been written.

>  	error = xfs_break_layouts(inode, iolock, true);
>  	if (error)
>  		return error;

Additionally this can drop the iolock, so you might get a new
hole after it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] nonblocking aio: btrfs
  2017-02-14  2:46 ` [PATCH 7/7] nonblocking aio: btrfs Goldwyn Rodrigues
@ 2017-02-14  9:24   ` Nikolay Borisov
  0 siblings, 0 replies; 28+ messages in thread
From: Nikolay Borisov @ 2017-02-14  9:24 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel; +Cc: jack, linux-btrfs



On 14.02.2017 04:46, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Return EAGAIN if any of the following checks fail
>  + i_rwsem is lockable
>  + NODATACOW or PREALLOC is set

perhaps you wanted to say "NODATACOW or PREALLOC is _not_ set"? See below



>  + Can nocow at the desired location
>  + Writing beyond end of file
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/file.c  | 25 ++++++++++++++++++++-----
>  fs/btrfs/inode.c |  3 +++
>  2 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 3a14c87..f4dcc7a 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1804,12 +1804,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>  	ssize_t num_written = 0;
>  	bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host);
>  	ssize_t err;
> -	loff_t pos;
> -	size_t count;
> +	loff_t pos = iocb->ki_pos;
> +	size_t count = iov_iter_count(from);
>  	loff_t oldsize;
>  	int clean_page = 0;
>  
> -	inode_lock(inode);
> +	if ((iocb->ki_flags & IOCB_NONBLOCKING) &&
> +			(iocb->ki_flags & IOCB_DIRECT)) {
> +		/* Don't sleep on inode rwsem */
> +		if (!inode_trylock(inode))
> +			return -EAGAIN;
> +		/*
> +		 * We will allocate space in case nodatacow is not set,
> +		 * so bail
> +		 */
> +		if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> +					      BTRFS_INODE_PREALLOC)) ||

The first part of the condition will evaluate to true only when both
btrfs_inode_nodatacow and btrfs_inodeprealloc are not set. The code
contradicts your statement in the change log.

> +		    check_can_nocow(inode, pos, &count) <= 0) {
> +			inode_unlock(inode);
> +			return -EAGAIN;
> +		}
> +	} else
> +		inode_lock(inode);
> +
>  	err = generic_write_checks(iocb, from);
>  	if (err <= 0) {
>  		inode_unlock(inode);
> @@ -1843,8 +1860,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>  	 */
>  	update_time_for_write(inode);
>  
> -	pos = iocb->ki_pos;
> -	count = iov_iter_count(from);
>  	start_pos = round_down(pos, root->sectorsize);
>  	oldsize = i_size_read(inode);
>  	if (start_pos > oldsize) {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index be4da91..9911ebd 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8690,6 +8690,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  		if (offset + count <= inode->i_size) {
>  			inode_unlock(inode);
>  			relock = true;
> +		} else if (iocb->ki_flags & IOCB_NONBLOCKING) {
> +			ret = -EAGAIN;
> +			goto out;
>  		}
>  		ret = btrfs_delalloc_reserve_space(inode, offset, count);
>  		if (ret)
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING
  2017-02-14  2:45 ` [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING Goldwyn Rodrigues
@ 2017-02-14 18:51   ` Jeff Moyer
  2017-02-15 14:51     ` Goldwyn Rodrigues
  2017-02-20  9:23   ` Christoph Hellwig
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Moyer @ 2017-02-14 18:51 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues

Goldwyn Rodrigues <rgoldwyn@suse.de> writes:

> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> This flag informs kernel to bail out if an AIO request will block
> for reasons such as file allocations, or a writeback triggered
> by direct I/O.
>
> IOCB_FLAG_NONBLOCKING is translated to IOCB_NONBLOCKING for
> iocb->ki_flags.

This shouldn't be an option, it should be the default.  This is what
users want when using this interface to begin with.

Cheers,
Jeff

>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/aio.c                     | 3 +++
>  include/linux/fs.h           | 1 +
>  include/uapi/linux/aio_abi.h | 3 +++
>  3 files changed, 7 insertions(+)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 428484f..0725756 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1551,6 +1551,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  		req->common.ki_flags |= IOCB_EVENTFD;
>  	}
>  
> +	if (iocb->aio_flags & IOCB_FLAG_NONBLOCKING)
> +		req->common.ki_flags |= IOCB_NONBLOCKING;
> +
>  	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
>  	if (unlikely(ret)) {
>  		pr_debug("EFAULT: aio_key\n");
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index dc0478c..bfbaba6 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -322,6 +322,7 @@ struct writeback_control;
>  #define IOCB_DSYNC		(1 << 4)
>  #define IOCB_SYNC		(1 << 5)
>  #define IOCB_WRITE		(1 << 6)
> +#define IOCB_NONBLOCKING	(1 << 7)
>  
>  struct kiocb {
>  	struct file		*ki_filp;
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index bb2554f..03ce4c2 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -51,8 +51,11 @@ enum {
>   *
>   * IOCB_FLAG_RESFD - Set if the "aio_resfd" member of the "struct iocb"
>   *                   is valid.
> + * IOCB_FLAG_NOBLOCK - Set if the user wants the iocb to fail if it would block
> + *			for operations such as disk allocation.
>   */
>  #define IOCB_FLAG_RESFD		(1 << 0)
> +#define IOCB_FLAG_NONBLOCKING	(1 << 1)
>  
>  /* read() from /dev/aio returns these structures. */
>  struct io_event {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] nonblocking aio: ext4
  2017-02-14  2:46 ` [PATCH 5/7] nonblocking aio: ext4 Goldwyn Rodrigues
@ 2017-02-14 19:52   ` Andreas Dilger
  2017-02-15 11:20       ` Goldwyn Rodrigues
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Dilger @ 2017-02-14 19:52 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]


> On Feb 13, 2017, at 7:46 PM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> 
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Return EAGAIN if any of the following checks fail for direct I/O:
> + i_rwsem is lockable
> + Writing beyond end of file (will trigger allocation)
> + Blocks are allocated at the write location
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> fs/ext4/file.c | 28 ++++++++++++++++++++++------
> 1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 2a822d3..c8d1e41 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -93,11 +93,16 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> 	struct inode *inode = file_inode(iocb->ki_filp);
> 	int o_direct = iocb->ki_flags & IOCB_DIRECT;
> +	int nonblocking = iocb->ki_flags & IOCB_NONBLOCKING;
> 	int unaligned_aio = 0;
> 	int overwrite = 0;
> 	ssize_t ret;
> 
> -	inode_lock(inode);
> +	if (o_direct && nonblocking) {
> +		if (!inode_trylock(inode))
> +			return -EAGAIN;

Why do these all return -EAGAIN instead of -EWOULDBLOCK?  -EAGAIN is already
used in a number of places, and -EWOULDBLOCK seems more correct in the
"nonblocking" case?

> +	} else
> +		inode_lock(inode);

(style) "else" blocks should have braces when the "if" block has braces

> 	ret = generic_write_checks(iocb, from);
> 	if (ret <= 0)
> 		goto out;
> @@ -132,12 +137,18 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> 	if (o_direct) {
> 		size_t length = iov_iter_count(from);
> 		loff_t pos = iocb->ki_pos;
> +		unsigned int blkbits = inode->i_blkbits;
> +
> +		if (nonblocking
> +			&& (pos + length > EXT4_BLOCK_ALIGN(i_size_read(inode), blkbits))) {

(style) "&&" should go at the end of the previous line
(style) continued lines should align after '(' on previous line
(style) no need for parenthesis around that comparison

> +			ret = -EAGAIN;
> +			goto out;
> +		}
> 
> 		/* check whether we do a DIO overwrite or not */
> -		if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
> -		    pos + length <= i_size_read(inode)) {
> +		if ((ext4_should_dioread_nolock(inode) && !unaligned_aio &&
> +			    pos + length <= i_size_read(inode)) || nonblocking) {

(style) continued line should align after second '(' of previous line

> 			struct ext4_map_blocks map;
> -			unsigned int blkbits = inode->i_blkbits;
> 			int err, len;
> 
> 			map.m_lblk = pos >> blkbits;
> @@ -157,8 +168,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> 			 * non-flags are returned.  So we should check
> 			 * these two conditions.
> 			 */
> -			if (err == len && (map.m_flags & EXT4_MAP_MAPPED))
> -				overwrite = 1;
> +			if (err == len) {
> +			       if (map.m_flags & EXT4_MAP_MAPPED)
> +				       overwrite = 1;
> +			} else if (nonblocking) {
> +				ret = -EAGAIN;
> +				goto out;
> +			}
> 		}
> 	}
> 
> --
> 2.10.2
> 


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] nonblocking aio: return on congested block device
  2017-02-14  3:55   ` Ming Lei
@ 2017-02-15 11:13     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-15 11:13 UTC (permalink / raw)
  To: Ming Lei, Goldwyn Rodrigues; +Cc: Linux FS Devel, Jan Kara, linux-block



On 02/13/2017 09:55 PM, Ming Lei wrote:
> On Tue, Feb 14, 2017 at 10:46 AM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> A new flag BIO_NONBLOCKING is introduced to identify bio's
>> orignating from iocb with IOCB_NONBLOCKING. struct request
>> are requested using BLK_MQ_REQ_NOWAIT if BIO_NONBLOCKING is set.
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> ---
>>  block/blk-core.c          | 13 +++++++++++--
>>  block/blk-mq.c            | 18 ++++++++++++++++--
>>  fs/direct-io.c            | 11 +++++++++--
>>  include/linux/blk_types.h |  1 +
>>  4 files changed, 37 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 14d7c07..9767573 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1257,6 +1257,11 @@ static struct request *get_request(struct request_queue *q, int op,
>>         if (!IS_ERR(rq))
>>                 return rq;
>>
>> +       if (bio_flagged(bio, BIO_NONBLOCKING)) {
>> +               blk_put_rl(rl);
>> +               return ERR_PTR(-EAGAIN);
>> +       }
>> +
>>         if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
>>                 blk_put_rl(rl);
>>                 return rq;
>> @@ -2035,7 +2040,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>>         do {
>>                 struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>>
>> -               if (likely(blk_queue_enter(q, false) == 0)) {
>> +               if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NONBLOCKING)) == 0)) {
>>                         ret = q->make_request_fn(q, bio);
>>
>>                         blk_queue_exit(q);
>> @@ -2044,7 +2049,11 @@ blk_qc_t generic_make_request(struct bio *bio)
>>                 } else {
>>                         struct bio *bio_next = bio_list_pop(current->bio_list);
>>
>> -                       bio_io_error(bio);
>> +                       if (unlikely(bio_flagged(bio, BIO_NONBLOCKING))) {
>> +                               bio->bi_error = -EAGAIN;
>> +                               bio_endio(bio);
>> +                       } else
>> +                               bio_io_error(bio);
>>                         bio = bio_next;
>>                 }
>>         } while (bio);
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 81caceb..7a7c674 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1213,6 +1213,8 @@ static struct request *blk_mq_map_request(struct request_queue *q,
>>
>>         trace_block_getrq(q, bio, op);
>>         blk_mq_set_alloc_data(&alloc_data, q, 0, ctx, hctx);
>> +       if (bio_flagged(bio, BIO_NONBLOCKING))
>> +               alloc_data.flags |= BLK_MQ_REQ_NOWAIT;
>>         rq = __blk_mq_alloc_request(&alloc_data, op, op_flags);
>>
>>         data->hctx = alloc_data.hctx;
>> @@ -1286,8 +1288,14 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>                 return BLK_QC_T_NONE;
>>
>>         rq = blk_mq_map_request(q, bio, &data);
>> -       if (unlikely(!rq))
>> +       if (unlikely(!rq)) {
>> +               if (bio_flagged(bio, BIO_NONBLOCKING))
>> +                       bio->bi_error = -EAGAIN;
>> +               else
>> +                       bio->bi_error = -EIO;
>> +               bio_endio(bio);
>>                 return BLK_QC_T_NONE;
>> +       }
>>
>>         cookie = blk_tag_to_qc_t(rq->tag, data.hctx->queue_num);
>>
>> @@ -1381,8 +1389,14 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
>>                 request_count = blk_plug_queued_count(q);
>>
>>         rq = blk_mq_map_request(q, bio, &data);
>> -       if (unlikely(!rq))
>> +       if (unlikely(!rq)) {
>> +               if (bio_flagged(bio, BIO_NONBLOCKING))
>> +                       bio->bi_error = -EAGAIN;
>> +               else
>> +                       bio->bi_error = -EIO;
>> +               bio_endio(bio);
>>                 return BLK_QC_T_NONE;
>> +       }
> 
> There are other places in which blocking may be triggered, such
> as allocating for bio clone, wbt_wait(), and sleep in .make_request(),
> like md, dm and bcache's.
> 
> IMO it should be hard to deal with all, so what is the expection for
> flag of BIO_NONBLOCKING?
> 


This is a part of the effort of making direct AIO nonblocking. I should
have posted all patches to all lists.

BIO_NONBLOCKING is a bio generated from a non-blocking AIO. Ideally it
should bail whenever it sees that it would block and sleep if the
blockqueue is congested (only). While earlier we had thought of limiting
it to get_request(), I expanded it to blk-mq (which now I think may not
be as useful).

>>
>>         cookie = blk_tag_to_qc_t(rq->tag, data.hctx->queue_num);
>>
>> diff --git a/fs/direct-io.c b/fs/direct-io.c
>> index fb9aa16..9997fed 100644
>> --- a/fs/direct-io.c
>> +++ b/fs/direct-io.c
>> @@ -386,6 +386,9 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
>>         else
>>                 bio->bi_end_io = dio_bio_end_io;
>>
>> +       if (dio->iocb->ki_flags & IOCB_NONBLOCKING)
>> +               bio_set_flag(bio, BIO_NONBLOCKING);
>> +
>>         sdio->bio = bio;
>>         sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
>>  }
>> @@ -480,8 +483,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
>>         unsigned i;
>>         int err;
>>
>> -       if (bio->bi_error)
>> -               dio->io_error = -EIO;
>> +       if (bio->bi_error) {
>> +               if (bio_flagged(bio, BIO_NONBLOCKING))
>> +                       dio->io_error = bio->bi_error;
>> +               else
>> +                       dio->io_error = -EIO;
>> +       }
>>
>>         if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
>>                 err = bio->bi_error;
>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>> index cd395ec..94855cf 100644
>> --- a/include/linux/blk_types.h
>> +++ b/include/linux/blk_types.h
>> @@ -119,6 +119,7 @@ struct bio {
>>  #define BIO_QUIET      6       /* Make BIO Quiet */
>>  #define BIO_CHAIN      7       /* chained bio, ->bi_remaining in effect */
>>  #define BIO_REFFED     8       /* bio has elevated ->bi_cnt */
>> +#define BIO_NONBLOCKING 9      /* don't block over blk device congestion */
>>
>>  /*
>>   * Flags starting here get preserved by bio_reset() - this includes
>> --
>> 2.10.2
>>
> 
> 
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] nonblocking aio: ext4
  2017-02-14 19:52   ` Andreas Dilger
@ 2017-02-15 11:20       ` Goldwyn Rodrigues
  0 siblings, 0 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-15 11:20 UTC (permalink / raw)
  To: Andreas Dilger, Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, linux-ext4



On 02/14/2017 01:52 PM, Andreas Dilger wrote:
> 
>> On Feb 13, 2017, at 7:46 PM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>>
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> Return EAGAIN if any of the following checks fail for direct I/O:
>> + i_rwsem is lockable
>> + Writing beyond end of file (will trigger allocation)
>> + Blocks are allocated at the write location
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> ---
>> fs/ext4/file.c | 28 ++++++++++++++++++++++------
>> 1 file changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> index 2a822d3..c8d1e41 100644
>> --- a/fs/ext4/file.c
>> +++ b/fs/ext4/file.c
>> @@ -93,11 +93,16 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> {
>> 	struct inode *inode = file_inode(iocb->ki_filp);
>> 	int o_direct = iocb->ki_flags & IOCB_DIRECT;
>> +	int nonblocking = iocb->ki_flags & IOCB_NONBLOCKING;
>> 	int unaligned_aio = 0;
>> 	int overwrite = 0;
>> 	ssize_t ret;
>>
>> -	inode_lock(inode);
>> +	if (o_direct && nonblocking) {
>> +		if (!inode_trylock(inode))
>> +			return -EAGAIN;
> 
> Why do these all return -EAGAIN instead of -EWOULDBLOCK?  -EAGAIN is already
> used in a number of places, and -EWOULDBLOCK seems more correct in the
> "nonblocking" case?

It is the same :)
#define EWOULDBLOCK     EAGAIN  /* Operation would block */

I didn�t know before I started this work either.
Anyways, I based this on 4.9.9 but there are changes in ext4 code in
4.10-rcx so I need to redo the patches. Thanks for the style reviews.

> 
>> +	} else
>> +		inode_lock(inode);
> 
> (style) "else" blocks should have braces when the "if" block has braces
> 
>> 	ret = generic_write_checks(iocb, from);
>> 	if (ret <= 0)
>> 		goto out;
>> @@ -132,12 +137,18 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> 	if (o_direct) {
>> 		size_t length = iov_iter_count(from);
>> 		loff_t pos = iocb->ki_pos;
>> +		unsigned int blkbits = inode->i_blkbits;
>> +
>> +		if (nonblocking
>> +			&& (pos + length > EXT4_BLOCK_ALIGN(i_size_read(inode), blkbits))) {
> 
> (style) "&&" should go at the end of the previous line
> (style) continued lines should align after '(' on previous line
> (style) no need for parenthesis around that comparison
> 
>> +			ret = -EAGAIN;
>> +			goto out;
>> +		}
>>
>> 		/* check whether we do a DIO overwrite or not */
>> -		if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
>> -		    pos + length <= i_size_read(inode)) {
>> +		if ((ext4_should_dioread_nolock(inode) && !unaligned_aio &&
>> +			    pos + length <= i_size_read(inode)) || nonblocking) {
> 
> (style) continued line should align after second '(' of previous line
> 
>> 			struct ext4_map_blocks map;
>> -			unsigned int blkbits = inode->i_blkbits;
>> 			int err, len;
>>
>> 			map.m_lblk = pos >> blkbits;
>> @@ -157,8 +168,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> 			 * non-flags are returned.  So we should check
>> 			 * these two conditions.
>> 			 */
>> -			if (err == len && (map.m_flags & EXT4_MAP_MAPPED))
>> -				overwrite = 1;
>> +			if (err == len) {
>> +			       if (map.m_flags & EXT4_MAP_MAPPED)
>> +				       overwrite = 1;
>> +			} else if (nonblocking) {
>> +				ret = -EAGAIN;
>> +				goto out;
>> +			}
>> 		}
>> 	}
>>
>> --
>> 2.10.2
>>
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] nonblocking aio: ext4
@ 2017-02-15 11:20       ` Goldwyn Rodrigues
  0 siblings, 0 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-15 11:20 UTC (permalink / raw)
  To: Andreas Dilger, Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, linux-ext4



On 02/14/2017 01:52 PM, Andreas Dilger wrote:
> 
>> On Feb 13, 2017, at 7:46 PM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>>
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> Return EAGAIN if any of the following checks fail for direct I/O:
>> + i_rwsem is lockable
>> + Writing beyond end of file (will trigger allocation)
>> + Blocks are allocated at the write location
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> ---
>> fs/ext4/file.c | 28 ++++++++++++++++++++++------
>> 1 file changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> index 2a822d3..c8d1e41 100644
>> --- a/fs/ext4/file.c
>> +++ b/fs/ext4/file.c
>> @@ -93,11 +93,16 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> {
>> 	struct inode *inode = file_inode(iocb->ki_filp);
>> 	int o_direct = iocb->ki_flags & IOCB_DIRECT;
>> +	int nonblocking = iocb->ki_flags & IOCB_NONBLOCKING;
>> 	int unaligned_aio = 0;
>> 	int overwrite = 0;
>> 	ssize_t ret;
>>
>> -	inode_lock(inode);
>> +	if (o_direct && nonblocking) {
>> +		if (!inode_trylock(inode))
>> +			return -EAGAIN;
> 
> Why do these all return -EAGAIN instead of -EWOULDBLOCK?  -EAGAIN is already
> used in a number of places, and -EWOULDBLOCK seems more correct in the
> "nonblocking" case?

It is the same :)
#define EWOULDBLOCK     EAGAIN  /* Operation would block */

I didn’t know before I started this work either.
Anyways, I based this on 4.9.9 but there are changes in ext4 code in
4.10-rcx so I need to redo the patches. Thanks for the style reviews.

> 
>> +	} else
>> +		inode_lock(inode);
> 
> (style) "else" blocks should have braces when the "if" block has braces
> 
>> 	ret = generic_write_checks(iocb, from);
>> 	if (ret <= 0)
>> 		goto out;
>> @@ -132,12 +137,18 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> 	if (o_direct) {
>> 		size_t length = iov_iter_count(from);
>> 		loff_t pos = iocb->ki_pos;
>> +		unsigned int blkbits = inode->i_blkbits;
>> +
>> +		if (nonblocking
>> +			&& (pos + length > EXT4_BLOCK_ALIGN(i_size_read(inode), blkbits))) {
> 
> (style) "&&" should go at the end of the previous line
> (style) continued lines should align after '(' on previous line
> (style) no need for parenthesis around that comparison
> 
>> +			ret = -EAGAIN;
>> +			goto out;
>> +		}
>>
>> 		/* check whether we do a DIO overwrite or not */
>> -		if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
>> -		    pos + length <= i_size_read(inode)) {
>> +		if ((ext4_should_dioread_nolock(inode) && !unaligned_aio &&
>> +			    pos + length <= i_size_read(inode)) || nonblocking) {
> 
> (style) continued line should align after second '(' of previous line
> 
>> 			struct ext4_map_blocks map;
>> -			unsigned int blkbits = inode->i_blkbits;
>> 			int err, len;
>>
>> 			map.m_lblk = pos >> blkbits;
>> @@ -157,8 +168,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> 			 * non-flags are returned.  So we should check
>> 			 * these two conditions.
>> 			 */
>> -			if (err == len && (map.m_flags & EXT4_MAP_MAPPED))
>> -				overwrite = 1;
>> +			if (err == len) {
>> +			       if (map.m_flags & EXT4_MAP_MAPPED)
>> +				       overwrite = 1;
>> +			} else if (nonblocking) {
>> +				ret = -EAGAIN;
>> +				goto out;
>> +			}
>> 		}
>> 	}
>>
>> --
>> 2.10.2
>>
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING
  2017-02-14 18:51   ` Jeff Moyer
@ 2017-02-15 14:51     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-15 14:51 UTC (permalink / raw)
  To: Jeff Moyer, Goldwyn Rodrigues; +Cc: linux-fsdevel, jack



On 02/14/2017 12:51 PM, Jeff Moyer wrote:
> Goldwyn Rodrigues <rgoldwyn@suse.de> writes:
> 
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> This flag informs kernel to bail out if an AIO request will block
>> for reasons such as file allocations, or a writeback triggered
>> by direct I/O.
>>
>> IOCB_FLAG_NONBLOCKING is translated to IOCB_NONBLOCKING for
>> iocb->ki_flags.
> 
> This shouldn't be an option, it should be the default.  This is what
> users want when using this interface to begin with.
> 

:)

While I appreciate the appreciation for the efforts, we can't let
existing applications fail which would receive -EAGAIN for all aio/dio
requests they are sending to the kernel and expecting to complete.



> Cheers,
> Jeff
> 
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> ---
>>  fs/aio.c                     | 3 +++
>>  include/linux/fs.h           | 1 +
>>  include/uapi/linux/aio_abi.h | 3 +++
>>  3 files changed, 7 insertions(+)
>>
>> diff --git a/fs/aio.c b/fs/aio.c
>> index 428484f..0725756 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1551,6 +1551,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>>  		req->common.ki_flags |= IOCB_EVENTFD;
>>  	}
>>  
>> +	if (iocb->aio_flags & IOCB_FLAG_NONBLOCKING)
>> +		req->common.ki_flags |= IOCB_NONBLOCKING;
>> +
>>  	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
>>  	if (unlikely(ret)) {
>>  		pr_debug("EFAULT: aio_key\n");
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index dc0478c..bfbaba6 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -322,6 +322,7 @@ struct writeback_control;
>>  #define IOCB_DSYNC		(1 << 4)
>>  #define IOCB_SYNC		(1 << 5)
>>  #define IOCB_WRITE		(1 << 6)
>> +#define IOCB_NONBLOCKING	(1 << 7)
>>  
>>  struct kiocb {
>>  	struct file		*ki_filp;
>> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
>> index bb2554f..03ce4c2 100644
>> --- a/include/uapi/linux/aio_abi.h
>> +++ b/include/uapi/linux/aio_abi.h
>> @@ -51,8 +51,11 @@ enum {
>>   *
>>   * IOCB_FLAG_RESFD - Set if the "aio_resfd" member of the "struct iocb"
>>   *                   is valid.
>> + * IOCB_FLAG_NOBLOCK - Set if the user wants the iocb to fail if it would block
>> + *			for operations such as disk allocation.
>>   */
>>  #define IOCB_FLAG_RESFD		(1 << 0)
>> +#define IOCB_FLAG_NONBLOCKING	(1 << 1)
>>  
>>  /* read() from /dev/aio returns these structures. */
>>  struct io_event {
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] nonblocking aio: xfs
  2017-02-14  7:43   ` Christoph Hellwig
@ 2017-02-15 15:30     ` Goldwyn Rodrigues
  2017-02-15 16:11       ` Goldwyn Rodrigues
  0 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-15 15:30 UTC (permalink / raw)
  To: Christoph Hellwig, Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, linux-xfs



On 02/14/2017 01:43 AM, Christoph Hellwig wrote:
> On Mon, Feb 13, 2017 at 08:46:02PM -0600, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Please Cc the while series to all lists, otherwiwe it's impossible
> to review the thing.
> 
>>
>> If IOCB_NONBLOCKING is set:
>> 	+ Check if writing beyond end of file, if yes return EAGAIN
>> 	- check if writing to a hole which does not have allocated
>> 	  file blocks.
>> 	- Check if i_rwsem is immediately lockable in xfs_rw_ilock()
> 
> Why the + vs - above? 

A mistake. It does not mean anything besides bullets.

> 
>> -static inline void
>> +static inline int
>>  xfs_rw_ilock(
> 
> This function has been removed a while ago.

And thanks for putting in xfs_ilock_nowait(). This can't be done in a
cleaner way. I was very skeptical of adding yet another flag.

> 
>>  
>> +	if (iocb->ki_flags & IOCB_NONBLOCKING) {
>> +		struct xfs_bmbt_irec	imap;
>> +		xfs_fileoff_t           offset_fsb, end_fsb;
>> +		int			nimaps = 1, ret = 0;
>> +		end_fsb = XFS_B_TO_FSB(ip->i_mount, iocb->ki_pos + count);
>> +		if (XFS_B_TO_FSB(ip->i_mount, i_size_read(inode)) < end_fsb)
>> +			return -EAGAIN;
> 
> Bogus check, XFS supports async write beyond EOF if the space is
> preallocated.

I assume this will be covered by xfs_bmapi_write().

> 
>> +		/* Check if it is an unallocated hole */
>> +		offset_fsb = XFS_B_TO_FSBT(ip->i_mount, iocb->ki_pos);
>> +
>> +		ret = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>> +				&nimaps, 0);
>> +		if (ret)
>> +			return ret;
>> +		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK)
>> +			return -EAGAIN;
> 
> This would need the ilock.  But extent list lookups are expensive,
> so please don't add another one here.  Just return when we hit the
> first unallocated extent in xfs_bmapi_write - either a short write or
> -EAGAIN if nothing has been written.

So in xfs_bmapi_write (and referring to), I would need a new flag, say
XFS_BMAPI_NOALLOC which would bail to error0, setting error=-EAGAIN if
need_alloc or was_delay is set. This flag XFS_BMAPI_NOALLOC is set in
xfs_iomap_write_direct(). I did not understand short writes. Where can I
get a short write?

If I understand correctly, we do add the flag.

> 
>>  	error = xfs_break_layouts(inode, iolock, true);
>>  	if (error)
>>  		return error;
> 
> Additionally this can drop the iolock, so you might get a new
> hole after it.
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] nonblocking aio: xfs
  2017-02-15 15:30     ` Goldwyn Rodrigues
@ 2017-02-15 16:11       ` Goldwyn Rodrigues
  2017-02-16 20:21         ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-15 16:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, jack, linux-xfs



On 02/15/2017 09:30 AM, Goldwyn Rodrigues wrote:
> 
> 
> On 02/14/2017 01:43 AM, Christoph Hellwig wrote:
>> On Mon, Feb 13, 2017 at 08:46:02PM -0600, Goldwyn Rodrigues wrote:
>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> Please Cc the while series to all lists, otherwiwe it's impossible
>> to review the thing.
>>
>>>
>>> If IOCB_NONBLOCKING is set:
>>> 	+ Check if writing beyond end of file, if yes return EAGAIN
>>> 	- check if writing to a hole which does not have allocated
>>> 	  file blocks.
>>> 	- Check if i_rwsem is immediately lockable in xfs_rw_ilock()
>>
>> Why the + vs - above? 
> 
> A mistake. It does not mean anything besides bullets.
> 
>>
>>> -static inline void
>>> +static inline int
>>>  xfs_rw_ilock(
>>
>> This function has been removed a while ago.
> 
> And thanks for putting in xfs_ilock_nowait(). This can't be done in a
> cleaner way. I was very skeptical of adding yet another flag.
> 
>>
>>>  
>>> +	if (iocb->ki_flags & IOCB_NONBLOCKING) {
>>> +		struct xfs_bmbt_irec	imap;
>>> +		xfs_fileoff_t           offset_fsb, end_fsb;
>>> +		int			nimaps = 1, ret = 0;
>>> +		end_fsb = XFS_B_TO_FSB(ip->i_mount, iocb->ki_pos + count);
>>> +		if (XFS_B_TO_FSB(ip->i_mount, i_size_read(inode)) < end_fsb)
>>> +			return -EAGAIN;
>>
>> Bogus check, XFS supports async write beyond EOF if the space is
>> preallocated.
> 
> I assume this will be covered by xfs_bmapi_write().
> 
>>
>>> +		/* Check if it is an unallocated hole */
>>> +		offset_fsb = XFS_B_TO_FSBT(ip->i_mount, iocb->ki_pos);
>>> +
>>> +		ret = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>>> +				&nimaps, 0);
>>> +		if (ret)
>>> +			return ret;
>>> +		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK)
>>> +			return -EAGAIN;
>>
>> This would need the ilock.  But extent list lookups are expensive,
>> so please don't add another one here.  Just return when we hit the
>> first unallocated extent in xfs_bmapi_write - either a short write or
>> -EAGAIN if nothing has been written.
> 
> So in xfs_bmapi_write (and referring to), I would need a new flag, say
> XFS_BMAPI_NOALLOC which would bail to error0, setting error=-EAGAIN if
> need_alloc or was_delay is set. This flag XFS_BMAPI_NOALLOC is set in
> xfs_iomap_write_direct(). I did not understand short writes. Where can I
> get a short write?
> 
> If I understand correctly, we do add the flag.

Replying to myself to correct myself.

On reading a bit more, I figured that we perform
xfs_file_iomap_begin->xfs_iomap_write_direct. At this point we have
already performed xfs_bmapi_read(). So, a check in xfs_file_iomap_begin
should be good enough. So, the flag required would be with iomap flags,
say IOMAP_NONBLOCKING. IOW, we don't need to go all the way to
xfs_bmap_write() and return when imap.br_startblock == HOLESTARTBLOCK.

> 
>>
>>>  	error = xfs_break_layouts(inode, iolock, true);
>>>  	if (error)
>>>  		return error;
>>
>> Additionally this can drop the iolock, so you might get a new
>> hole after it.
>>
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] nonblocking aio: xfs
  2017-02-15 16:11       ` Goldwyn Rodrigues
@ 2017-02-16 20:21         ` Christoph Hellwig
  2017-02-16 20:44           ` Goldwyn Rodrigues
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2017-02-16 20:21 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: Christoph Hellwig, linux-fsdevel, jack, linux-xfs

On Wed, Feb 15, 2017 at 10:11:38AM -0600, Goldwyn Rodrigues wrote:
> > I did not understand short writes. Where can I
> > get a short write?

If you have a write request of N bytes, and you've already wrіtten
M of them you return M from the *write system call instead of -EAGAIN.
This is standard practice on e.g. sockets.

> > 
> > If I understand correctly, we do add the flag.
> 
> Replying to myself to correct myself.
> 
> On reading a bit more, I figured that we perform
> xfs_file_iomap_begin->xfs_iomap_write_direct. At this point we have
> already performed xfs_bmapi_read(). So, a check in xfs_file_iomap_begin
> should be good enough. So, the flag required would be with iomap flags,
> say IOMAP_NONBLOCKING. IOW, we don't need to go all the way to
> xfs_bmap_write() and return when imap.br_startblock == HOLESTARTBLOCK.

Yes, except that reflinked files with shared extents will also need
some additional special casing - for those xfs_bmapi_read can return
an allocated extent, but we might still have to perform a block
allocation for a write.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] nonblocking aio: xfs
  2017-02-16 20:21         ` Christoph Hellwig
@ 2017-02-16 20:44           ` Goldwyn Rodrigues
  0 siblings, 0 replies; 28+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-16 20:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, jack, linux-xfs



On 02/16/2017 02:21 PM, Christoph Hellwig wrote:
> On Wed, Feb 15, 2017 at 10:11:38AM -0600, Goldwyn Rodrigues wrote:
>>> I did not understand short writes. Where can I
>>> get a short write?
> 
> If you have a write request of N bytes, and you've already wrіtten
> M of them you return M from the *write system call instead of -EAGAIN.
> This is standard practice on e.g. sockets.

Oh, I assume that would be taken care of in the existing code, at least
with the modified patch. I will double check that anyways.

> 
>>>
>>> If I understand correctly, we do add the flag.
>>
>> Replying to myself to correct myself.
>>
>> On reading a bit more, I figured that we perform
>> xfs_file_iomap_begin->xfs_iomap_write_direct. At this point we have
>> already performed xfs_bmapi_read(). So, a check in xfs_file_iomap_begin
>> should be good enough. So, the flag required would be with iomap flags,
>> say IOMAP_NONBLOCKING. IOW, we don't need to go all the way to
>> xfs_bmap_write() and return when imap.br_startblock == HOLESTARTBLOCK.
> 
> Yes, except that reflinked files with shared extents will also need
> some additional special casing - for those xfs_bmapi_read can return
> an allocated extent, but we might still have to perform a block
> allocation for a write.
> 

Yes, I forgot that. I will put in a check for reflinks as well.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/7] Non-blocking AIO
  2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
                   ` (6 preceding siblings ...)
  2017-02-14  2:46 ` [PATCH 7/7] nonblocking aio: btrfs Goldwyn Rodrigues
@ 2017-02-20  9:21 ` Christoph Hellwig
  2017-03-03 10:50   ` Michael Kerrisk
  8 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-02-20  9:21 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack

> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> any of these conditions are met. This way userspace can push most
> of the write()s to the kernel to the best of its ability to complete
> and if it returns -EAGAIN, can defer it to another thread.
> 
> In order to enable this, IOCB_FLAG_NONBLOCKING is introduced in 
> uapi/linux/aio_abi.h which translates to IOCB_BLOCKING for struct iocb.

I think this is missing a NON in the second line.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING
  2017-02-14  2:45 ` [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING Goldwyn Rodrigues
  2017-02-14 18:51   ` Jeff Moyer
@ 2017-02-20  9:23   ` Christoph Hellwig
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-02-20  9:23 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues

On Mon, Feb 13, 2017 at 08:45:57PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> This flag informs kernel to bail out if an AIO request will block
> for reasons such as file allocations, or a writeback triggered
> by direct I/O.
> 
> IOCB_FLAG_NONBLOCKING is translated to IOCB_NONBLOCKING for
> iocb->ki_flags.

Back when we did the rwf non0-blocking flag there were some comments
that blocking has a specific meaning in unix, in that we can use poll
or select to wait for it.  Maybe this should have a different name
like NOWAIT?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] noblocking aio: Return if cannot get hold of i_rwsem
  2017-02-14  2:45 ` [PATCH 2/7] noblocking aio: Return if cannot get hold of i_rwsem Goldwyn Rodrigues
@ 2017-02-20  9:24   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-02-20  9:24 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues

> +	/* Don't sleep on inode rwsem */
> +	if (iocb->ki_flags & IOCB_NONBLOCKING) {
> +		if (!inode_trylock(inode))
> +			return -EAGAIN;
> +	} else
> +		inode_lock(inode);

A way to avoid the additional branch in the fast path would be:

	if (!inode_trylock(inode)) {
		if (iocb->ki_flags & IOCB_NONBLOCKING)
			return -EAGAIN;
		inode_lock(inode);
	}

but otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] nonblocking aio: return if direct write will trigger writeback
  2017-02-14  2:45 ` [PATCH 3/7] nonblocking aio: return if direct write will trigger writeback Goldwyn Rodrigues
@ 2017-02-20  9:25   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-02-20  9:25 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-fsdevel, jack, Goldwyn Rodrigues

On Mon, Feb 13, 2017 at 08:45:59PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Find out if the write will trigger a wait due to writeback. If yes,
> return -EAGAIN.
> 
> This introduces a new function filemap_range_has_page() which
> returns true if the file's mapping has a page within the range
> mentioned.
> 
> Return -EINVAL for buffered AIO: there are multiple causes of
> delay such as page locks, dirty throttling logic, page loading
> from disk etc. which cannot be taken care of.

I would prefer to add a flags argument to filemap_write_and_wait_range
so that it will return -EAGAIN if it encounters a page instead of
introducing a test before use race.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/7] Non-blocking AIO
@ 2017-03-03 10:50   ` Michael Kerrisk
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2017-03-03 10:50 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: Linux-Fsdevel, Jan Kara, Linux API

[CC += linux-api]

Hello Goldwyn,

Since this is a kernel-user-space API change, please CC linux-api@
(and for future iterations of this patch series).

The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to linux-api@vger.kernel.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html

Cheers,

Michael



On Tue, Feb 14, 2017 at 3:45 AM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> This series adds nonblocking feature to asynchronous I/O writes.
> io_submit() can be delayed because of a number of reason:
>  - Block allocation for files
>  - Data writebacks for direct I/O
>  - Sleeping because of waiting to acquire i_rwsem
>  - Congested block device
>
> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> any of these conditions are met. This way userspace can push most
> of the write()s to the kernel to the best of its ability to complete
> and if it returns -EAGAIN, can defer it to another thread.
>
> In order to enable this, IOCB_FLAG_NONBLOCKING is introduced in
> uapi/linux/aio_abi.h which translates to IOCB_BLOCKING for struct iocb.
>
> This feature is provided for direct I/O of asynchronous I/O only. I have
> tested it against xfs, ext4, and btrfs.
>
> --
> Goldwyn
>
>



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/7] Non-blocking AIO
@ 2017-03-03 10:50   ` Michael Kerrisk
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2017-03-03 10:50 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: Linux-Fsdevel, Jan Kara, Linux API

[CC += linux-api]

Hello Goldwyn,

Since this is a kernel-user-space API change, please CC linux-api@
(and for future iterations of this patch series).

The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html

Cheers,

Michael



On Tue, Feb 14, 2017 at 3:45 AM, Goldwyn Rodrigues <rgoldwyn-l3A5Bk7waGM@public.gmane.org> wrote:
> This series adds nonblocking feature to asynchronous I/O writes.
> io_submit() can be delayed because of a number of reason:
>  - Block allocation for files
>  - Data writebacks for direct I/O
>  - Sleeping because of waiting to acquire i_rwsem
>  - Congested block device
>
> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> any of these conditions are met. This way userspace can push most
> of the write()s to the kernel to the best of its ability to complete
> and if it returns -EAGAIN, can defer it to another thread.
>
> In order to enable this, IOCB_FLAG_NONBLOCKING is introduced in
> uapi/linux/aio_abi.h which translates to IOCB_BLOCKING for struct iocb.
>
> This feature is provided for direct I/O of asynchronous I/O only. I have
> tested it against xfs, ext4, and btrfs.
>
> --
> Goldwyn
>
>



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-03-03 10:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-14  2:45 [PATCH 0/7] Non-blocking AIO Goldwyn Rodrigues
2017-02-14  2:45 ` [PATCH 1/7] nonblocking aio: Introduce IOCB_FLAG_NONBLOCKING Goldwyn Rodrigues
2017-02-14 18:51   ` Jeff Moyer
2017-02-15 14:51     ` Goldwyn Rodrigues
2017-02-20  9:23   ` Christoph Hellwig
2017-02-14  2:45 ` [PATCH 2/7] noblocking aio: Return if cannot get hold of i_rwsem Goldwyn Rodrigues
2017-02-20  9:24   ` Christoph Hellwig
2017-02-14  2:45 ` [PATCH 3/7] nonblocking aio: return if direct write will trigger writeback Goldwyn Rodrigues
2017-02-20  9:25   ` Christoph Hellwig
2017-02-14  2:46 ` [PATCH 4/7] nonblocking aio: return on congested block device Goldwyn Rodrigues
2017-02-14  3:55   ` Ming Lei
2017-02-15 11:13     ` Goldwyn Rodrigues
2017-02-14  2:46 ` [PATCH 5/7] nonblocking aio: ext4 Goldwyn Rodrigues
2017-02-14 19:52   ` Andreas Dilger
2017-02-15 11:20     ` Goldwyn Rodrigues
2017-02-15 11:20       ` Goldwyn Rodrigues
2017-02-14  2:46 ` [PATCH 6/7] nonblocking aio: xfs Goldwyn Rodrigues
2017-02-14  6:44   ` Darrick J. Wong
2017-02-14  7:43   ` Christoph Hellwig
2017-02-15 15:30     ` Goldwyn Rodrigues
2017-02-15 16:11       ` Goldwyn Rodrigues
2017-02-16 20:21         ` Christoph Hellwig
2017-02-16 20:44           ` Goldwyn Rodrigues
2017-02-14  2:46 ` [PATCH 7/7] nonblocking aio: btrfs Goldwyn Rodrigues
2017-02-14  9:24   ` Nikolay Borisov
2017-02-20  9:21 ` [PATCH 0/7] Non-blocking AIO Christoph Hellwig
2017-03-03 10:50 ` Michael Kerrisk
2017-03-03 10:50   ` Michael Kerrisk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.