linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/10 v12] No wait AIO
@ 2017-06-15 15:59 Goldwyn Rodrigues
  2017-06-15 15:59 ` [PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags Goldwyn Rodrigues
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm

This series adds nonblocking feature to asynchronous I/O writes.
io_submit() can be delayed because of a number of reason:
 - Block allocation for files
 - Data writebacks for direct I/O
 - Sleeping because of waiting to acquire i_rwsem
 - Congested block device

The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
any of these conditions are met. This way userspace can push most
of the write()s to the kernel to the best of its ability to complete
and if it returns -EAGAIN, can defer it to another thread.

In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
not use aio_flags because it is not currently checked for invalidity
in the kernel.

This feature is provided for direct I/O of asynchronous I/O only. I have
tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
The nowait feature is for request based devices. In the future, I intend to
add support to stacked devices such as md.

Applications will have to check supportability by sending a async direct write
and any other error besides -EAGAIN would mean it is not supported.

First two patches are prep patches into nowait I/O.

Changes since v1:
 + changed name from _NONBLOCKING to *_NOWAIT
 + filemap_range_has_page call moved to closer to (just before) calling filemap_write_and_wait_range().
 + BIO_NOWAIT limited to get_request()
 + XFS fixes 
	- included reflink 
	- use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag
	- Translate the flag through IOMAP_NOWAIT (iomap) to check for
	  block allocation for the file.
 + ext4 coding style

Changes since v2:
 + Using aio_reserved1 as aio_rw_flags instead of aio_flags
 + blk-mq support
 + xfs uptodate with kernel and reflink changes

 Changes since v3:
  + Added FS_NOWAIT, which is set if the filesystem supports NOWAIT feature.
  + Checks in generic_make_request() to make sure BIO_NOWAIT comes in
    for async direct writes only.
  + Added QUEUE_FLAG_NOWAIT, which is set if the device supports BIO_NOWAIT.
    This is added (rather not set) to block devices such as dm/md currently.

 Changes since v4:
  + Ported AIO code to use RWF_* flags. Check for RWF_* flags in
    generic_file_write_iter().
  + Changed IOCB_RW_FLAGS_NOWAIT to RWF_NOWAIT.

 Changes since v5:
  + BIO_NOWAIT to REQ_NOWAIT
  + Common helper for RWF flags.

 Changes since v6:
  + REQ_NOWAIT will be ignored for request based devices since they
    cannot block. So, removed QUEUE_FLAG_NOWAIT since it is not
    required in the current implementation. It will be resurrected
    when we program for stacked devices.
  + changed kiocb_rw_flags() to kiocb_set_rw_flags() in order to accomodate
    for errors. Moved checks in the function.

 Changes since v7:
  + split patches into prep so the main patches are smaller and easier
    to understand
  + All patches are reviewed or acked!
 
 Changes since v8:
 + Err out AIO reads with -EINVAL flagged as RWF_NOWAIT

 Changes since v9:
 + Retract - Err out AIO reads with -EINVAL flagged as RWF_NOWAIT
 + XFS returns EAGAIN if extent list is not in memory
 + Man page updates to io_submit with iocb description and nowait features.

 Changes since v10:
 + Corrected comment and subject in "return on congested block device"

 Changes since v11:
 + FMODE_AIO_NOWAIT to show AIO NOWAIT support. This is to block
   non-supported filesystems instead of returning ENOTSUPP in each
   individual filesystem

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
@ 2017-06-15 15:59 ` Goldwyn Rodrigues
  2017-06-15 15:59 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/read_write.c    | 12 +++---------
 include/linux/fs.h | 14 ++++++++++++++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 47c1d4484df9..53c816c61122 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -678,16 +678,10 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 	struct kiocb kiocb;
 	ssize_t ret;
 
-	if (flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC))
-		return -EOPNOTSUPP;
-
 	init_sync_kiocb(&kiocb, filp);
-	if (flags & RWF_HIPRI)
-		kiocb.ki_flags |= IOCB_HIPRI;
-	if (flags & RWF_DSYNC)
-		kiocb.ki_flags |= IOCB_DSYNC;
-	if (flags & RWF_SYNC)
-		kiocb.ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+	ret = kiocb_set_rw_flags(&kiocb, flags);
+	if (ret)
+		return ret;
 	kiocb.ki_pos = *ppos;
 
 	if (type == READ)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 803e5a9b2654..f53867140f43 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3056,6 +3056,20 @@ static inline int iocb_flags(struct file *file)
 	return res;
 }
 
+static inline int kiocb_set_rw_flags(struct kiocb *ki, int flags)
+{
+	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC)))
+		return -EOPNOTSUPP;
+
+	if (flags & RWF_HIPRI)
+		ki->ki_flags |= IOCB_HIPRI;
+	if (flags & RWF_DSYNC)
+		ki->ki_flags |= IOCB_DSYNC;
+	if (flags & RWF_SYNC)
+		ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+	return 0;
+}
+
 static inline ino_t parent_ino(struct dentry *dentry)
 {
 	ino_t res;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] fs: Introduce filemap_range_has_page()
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
  2017-06-15 15:59 ` [PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags Goldwyn Rodrigues
@ 2017-06-15 15:59 ` Goldwyn Rodrigues
       [not found]   ` <20170615160002.17233-3-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
  2017-06-15 15:59 ` [PATCH 03/10] fs: Use RWF_* flags for AIO operations Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f53867140f43..dc0ab585cd56 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+				  loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6f1be573a5e6..87aba7698584 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:           address space structure to wait for
+ * @start_byte:        offset in bytes where the range starts
+ * @end_byte:          offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+			   loff_t start_byte, loff_t end_byte)
+{
+	pgoff_t index = start_byte >> PAGE_SHIFT;
+	pgoff_t end = end_byte >> PAGE_SHIFT;
+	struct pagevec pvec;
+	int ret;
+
+	if (end_byte < start_byte)
+		return 0;
+
+	if (mapping->nrpages == 0)
+		return 0;
+
+	pagevec_init(&pvec, 0);
+	ret = pagevec_lookup(&pvec, mapping, index, 1);
+	if (!ret)
+		return 0;
+	ret = (pvec.pages[0]->index <= end);
+	pagevec_release(&pvec);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 				     loff_t start_byte, loff_t end_byte)
 {
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/10] fs: Use RWF_* flags for AIO operations
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
  2017-06-15 15:59 ` [PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags Goldwyn Rodrigues
  2017-06-15 15:59 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
@ 2017-06-15 15:59 ` Goldwyn Rodrigues
  2017-06-15 15:59 ` [PATCH 04/10] fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will
carry the RWF_* flags. We cannot use aio_flags because they are not
checked for validity which may break existing applications.

Note, the only place RWF_HIPRI comes in effect is dio_await_one().
All the rest of the locations, aio code return -EIOCBQUEUED before the
checks for RWF_HIPRI.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/aio.c                     | 8 +++++++-
 include/uapi/linux/aio_abi.h | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f52d925ee259..020fa0045e3c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1541,7 +1541,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
-	if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
+	if (unlikely(iocb->aio_reserved2)) {
 		pr_debug("EINVAL: reserve field set\n");
 		return -EINVAL;
 	}
@@ -1586,6 +1586,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 		req->common.ki_flags |= IOCB_EVENTFD;
 	}
 
+	ret = kiocb_set_rw_flags(&req->common, iocb->aio_rw_flags);
+	if (unlikely(ret)) {
+		pr_debug("EINVAL: aio_rw_flags\n");
+		goto out_put_req;
+	}
+
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
 	if (unlikely(ret)) {
 		pr_debug("EFAULT: aio_key\n");
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index bb2554f7fbd1..a2d4a8ac94ca 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -79,7 +79,7 @@ struct io_event {
 struct iocb {
 	/* these are internal to the kernel/libc. */
 	__u64	aio_data;	/* data to be returned in event's data */
-	__u32	PADDED(aio_key, aio_reserved1);
+	__u32	PADDED(aio_key, aio_rw_flags);
 				/* the kernel sets aio_key to the req # */
 
 	/* common fields */
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/10] fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2017-06-15 15:59 ` [PATCH 03/10] fs: Use RWF_* flags for AIO operations Goldwyn Rodrigues
@ 2017-06-15 15:59 ` Goldwyn Rodrigues
       [not found]   ` <20170615160002.17233-5-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
  2017-06-15 15:59 ` [PATCH 05/10] fs: return if direct write will trigger writeback Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

RWF_NOWAIT informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered,
or would block while allocating requests while performing
direct I/O.

RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags.

FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable
of returning -EAGAIN if the AIO call will block. This must be set by
supporting filesystems in the ->open() call.

Filesystems xfs, btrfs and ext4 would be supported in the following patches.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/aio.c                |  6 ++++++
 include/linux/fs.h      | 11 ++++++++++-
 include/uapi/linux/fs.h |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/aio.c b/fs/aio.c
index 020fa0045e3c..34027b67e2f4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1592,6 +1592,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 		goto out_put_req;
 	}
 
+	if ((req->common.ki_flags & IOCB_NOWAIT) &&
+			!(req->common.ki_flags & IOCB_DIRECT)) {
+		ret = -EOPNOTSUPP;
+		goto out_put_req;
+	}
+
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
 	if (unlikely(ret)) {
 		pr_debug("EFAULT: aio_key\n");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dc0ab585cd56..017aca01c35e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -142,6 +142,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x4000000)
 
+/* File is capable of returning -EAGAIN if AIO will block */
+#define FMODE_AIO_NOWAIT	((__force fmode_t)0x8000000)
+
 /*
  * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
  * that indicates that they should check the contents of the iovec are
@@ -268,6 +271,7 @@ struct writeback_control;
 #define IOCB_DSYNC		(1 << 4)
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
+#define IOCB_NOWAIT		(1 << 7)
 
 struct kiocb {
 	struct file		*ki_filp;
@@ -3060,9 +3064,14 @@ static inline int iocb_flags(struct file *file)
 
 static inline int kiocb_set_rw_flags(struct kiocb *ki, int flags)
 {
-	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC)))
+	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT)))
 		return -EOPNOTSUPP;
 
+	if (flags & RWF_NOWAIT) {
+		if (!(ki->ki_filp->f_mode & FMODE_AIO_NOWAIT))
+			return -EOPNOTSUPP;
+		ki->ki_flags |= IOCB_NOWAIT;
+	}
 	if (flags & RWF_HIPRI)
 		ki->ki_flags |= IOCB_HIPRI;
 	if (flags & RWF_DSYNC)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 24e61a54feaa..29969fb7f9a7 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -360,5 +360,6 @@ struct fscrypt_key {
 #define RWF_HIPRI			0x00000001 /* high priority request, poll if possible */
 #define RWF_DSYNC			0x00000002 /* per-IO O_DSYNC */
 #define RWF_SYNC			0x00000004 /* per-IO O_SYNC */
+#define RWF_NOWAIT			0x00000008 /* per-IO, return -EAGAIN if operation would block */
 
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/10] fs: return if direct write will trigger writeback
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2017-06-15 15:59 ` [PATCH 04/10] fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT Goldwyn Rodrigues
@ 2017-06-15 15:59 ` Goldwyn Rodrigues
       [not found] ` <20170615160002.17233-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Find out if the write will trigger a wait due to writeback. If yes,
return -EAGAIN.

Return -EINVAL for buffered AIO: there are multiple causes of
delay such as page locks, dirty throttling logic, page loading
from disk etc. which cannot be taken care of.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 mm/filemap.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 87aba7698584..c6fd1977a280 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2675,6 +2675,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
 
 	pos = iocb->ki_pos;
 
+	if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+		return -EINVAL;
+
 	if (limit != RLIM_INFINITY) {
 		if (iocb->ki_pos >= limit) {
 			send_sig(SIGXFSZ, current, 0);
@@ -2743,9 +2746,17 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	write_len = iov_iter_count(from);
 	end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-	written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
-	if (written)
-		goto out;
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		/* If there are pages to writeback, return */
+		if (filemap_range_has_page(inode->i_mapping, pos,
+					   pos + iov_iter_count(from)))
+			return -EAGAIN;
+	} else {
+		written = filemap_write_and_wait_range(mapping, pos,
+							pos + write_len - 1);
+		if (written)
+			goto out;
+	}
 
 	/*
 	 * After a write we want buffered reads to be sure to go to disk to get
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/10] fs: Introduce IOMAP_NOWAIT
       [not found] ` <20170615160002.17233-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
@ 2017-06-15 15:59   ` Goldwyn Rodrigues
  2017-06-15 18:25   ` [PATCH 0/10 v12] No wait AIO Andrew Morton
  1 sibling, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: jack-IBi9RG/b67k, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	linux-block-u79uwXL29TY76Z2rM5mHXA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn-IBi9RG/b67k@public.gmane.org>

IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps.
This is used by XFS in the XFS patch.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn-IBi9RG/b67k@public.gmane.org>
---
 fs/iomap.c            | 2 ++
 include/linux/iomap.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index 4b10892967a5..5d85ec6e7b20 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -879,6 +879,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	} else {
 		dio->flags |= IOMAP_DIO_WRITE;
 		flags |= IOMAP_WRITE;
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			flags |= IOMAP_NOWAIT;
 	}
 
 	ret = filemap_write_and_wait_range(mapping, start, end);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f753e788da31..69f4e9470084 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -52,6 +52,7 @@ struct iomap {
 #define IOMAP_REPORT		(1 << 2) /* report extent status, e.g. FIEMAP */
 #define IOMAP_FAULT		(1 << 3) /* mapping for page fault */
 #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
+#define IOMAP_NOWAIT		(1 << 5) /* Don't wait for writeback */
 
 struct iomap_ops {
 	/*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/10] block: return on congested block device
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
                   ` (5 preceding siblings ...)
       [not found] ` <20170615160002.17233-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
@ 2017-06-15 15:59 ` Goldwyn Rodrigues
  2017-06-15 16:42   ` Jens Axboe
  2017-06-15 16:00 ` [PATCH 08/10] ext4: nowait aio support Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 15:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

A new bio operation flag REQ_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.

Stacked devices such as md (the ones with make_request_fn hooks)
currently are not supported because it may block for housekeeping.
For example, an md can have a part of the device suspended.
For this reason, only request based devices are supported.
In the future, this feature will be expanded to stacked devices
by teaching them how to handle the REQ_NOWAIT flags.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 block/blk-core.c          | 23 +++++++++++++++++++++--
 block/blk-mq-sched.c      |  3 +++
 block/blk-mq.c            |  2 ++
 fs/direct-io.c            | 10 ++++++++--
 include/linux/bio.h       |  6 ++++++
 include/linux/blk_types.h |  2 ++
 6 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a7421b772d0e..972d6fdb1432 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1256,6 +1256,11 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 	if (!IS_ERR(rq))
 		return rq;
 
+	if (op & REQ_NOWAIT) {
+		blk_put_rl(rl);
+		return ERR_PTR(-EAGAIN);
+	}
+
 	if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
 		blk_put_rl(rl);
 		return rq;
@@ -1900,6 +1905,16 @@ generic_make_request_checks(struct bio *bio)
 		goto end_io;
 	}
 
+	/*
+	 * For a REQ_NOWAIT based request, return -EOPNOTSUPP
+	 * if queue is not a request based queue.
+	 */
+
+	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q)) {
+		err = -EOPNOTSUPP;
+		goto end_io;
+	}
+
 	part = bio->bi_bdev->bd_part;
 	if (should_fail_request(part, bio->bi_iter.bi_size) ||
 	    should_fail_request(&part_to_disk(part)->part0,
@@ -2057,7 +2072,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
-		if (likely(blk_queue_enter(q, false) == 0)) {
+		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
 			struct bio_list lower, same;
 
 			/* Create a fresh bio_list for all subordinate requests */
@@ -2082,7 +2097,11 @@ blk_qc_t generic_make_request(struct bio *bio)
 			bio_list_merge(&bio_list_on_stack[0], &same);
 			bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
 		} else {
-			bio_io_error(bio);
+			if (unlikely(!blk_queue_dying(q) &&
+					(bio->bi_opf & REQ_NOWAIT)))
+				bio_wouldblock_error(bio);
+			else
+				bio_io_error(bio);
 		}
 		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 1f5b692526ae..9a1dea8b964e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -83,6 +83,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
 	if (likely(!data->hctx))
 		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
 
+	if (op & REQ_NOWAIT)
+		data->flags |= BLK_MQ_REQ_NOWAIT;
+
 	if (e) {
 		data->flags |= BLK_MQ_REQ_INTERNAL;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bb66c96850b1..86d86626ef00 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1562,6 +1562,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
 	if (unlikely(!rq)) {
 		__wbt_done(q->rq_wb, wb_acct);
+		if (bio->bi_opf & REQ_NOWAIT)
+			bio_wouldblock_error(bio);
 		return BLK_QC_T_NONE;
 	}
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea77de8..139ebd5ae1c7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -480,8 +480,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 	unsigned i;
 	int err;
 
-	if (bio->bi_error)
-		dio->io_error = -EIO;
+	if (bio->bi_error) {
+		if (bio->bi_error == -EAGAIN && (bio->bi_opf & REQ_NOWAIT))
+			dio->io_error = -EAGAIN;
+		else
+			dio->io_error = -EIO;
+	}
 
 	if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
 		err = bio->bi_error;
@@ -1197,6 +1201,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	if (iov_iter_rw(iter) == WRITE) {
 		dio->op = REQ_OP_WRITE;
 		dio->op_flags = REQ_SYNC | REQ_IDLE;
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			dio->op_flags |= REQ_NOWAIT;
 	} else {
 		dio->op = REQ_OP_READ;
 	}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d1b04b0e99cf..cc0aa4315383 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -418,6 +418,12 @@ static inline void bio_io_error(struct bio *bio)
 	bio_endio(bio);
 }
 
+static inline void bio_wouldblock_error(struct bio *bio)
+{
+	bio->bi_error = -EAGAIN;
+	bio_endio(bio);
+}
+
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 61339bc44400..c87990aec66e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -205,6 +205,7 @@ enum req_flag_bits {
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
+	__REQ_NOWAIT,           /* Don't wait if request will block */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -223,6 +224,7 @@ enum req_flag_bits {
 #define REQ_BACKGROUND		(1ULL << __REQ_BACKGROUND)
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
+#define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/10] ext4: nowait aio support
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
                   ` (6 preceding siblings ...)
  2017-06-15 15:59 ` [PATCH 07/10] block: return on congested block device Goldwyn Rodrigues
@ 2017-06-15 16:00 ` Goldwyn Rodrigues
  2017-06-15 16:00 ` [PATCH 09/10] xfs: " Goldwyn Rodrigues
  2017-06-15 16:00 ` [PATCH 10/10] btrfs: " Goldwyn Rodrigues
  9 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 16:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Return EAGAIN if any of the following checks fail for direct I/O:
  + i_rwsem is lockable
  + Writing beyond end of file (will trigger allocation)
  + Blocks are not allocated at the write location

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 02ce7e7bbdf5..cfb4770657fc 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -216,7 +216,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		return ext4_dax_write_iter(iocb, from);
 #endif
 
-	inode_lock(inode);
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+	} else {
+		inode_lock(inode);
+	}
+
 	ret = ext4_write_checks(iocb, from);
 	if (ret <= 0)
 		goto out;
@@ -235,9 +241,15 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 	iocb->private = &overwrite;
 	/* Check whether we do a DIO overwrite or not */
-	if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
-	    ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
-		overwrite = 1;
+	if (o_direct && !unaligned_aio) {
+		if (ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from))) {
+			if (ext4_should_dioread_nolock(inode))
+				overwrite = 1;
+		} else if (iocb->ki_flags & IOCB_NOWAIT) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	}
 
 	ret = __generic_file_write_iter(iocb, from);
 	inode_unlock(inode);
@@ -435,6 +447,10 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
 		if (ret < 0)
 			return ret;
 	}
+
+	/* Set the flags to support nowait AIO */
+	filp->f_mode |= FMODE_NOWAIT_AIO;
+
 	return dquot_file_open(inode, filp);
 }
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/10] xfs: nowait aio support
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
                   ` (7 preceding siblings ...)
  2017-06-15 16:00 ` [PATCH 08/10] ext4: nowait aio support Goldwyn Rodrigues
@ 2017-06-15 16:00 ` Goldwyn Rodrigues
  2017-06-15 16:00 ` [PATCH 10/10] btrfs: " Goldwyn Rodrigues
  9 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 16:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.

IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.

Return -EAGAIN if we don't have extent list in memory.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_file.c  | 20 +++++++++++++++-----
 fs/xfs/xfs_iomap.c | 22 ++++++++++++++++++++++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5fb5a0958a14..e159eb381d9f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -541,8 +541,11 @@ xfs_file_dio_aio_write(
 		iolock = XFS_IOLOCK_SHARED;
 	}
 
-	xfs_ilock(ip, iolock);
-
+	if (!xfs_ilock_nowait(ip, iolock)) {
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			return -EAGAIN;
+		xfs_ilock(ip, iolock);
+	}
 	ret = xfs_file_aio_write_checks(iocb, from, &iolock);
 	if (ret)
 		goto out;
@@ -553,9 +556,15 @@ xfs_file_dio_aio_write(
 	 * otherwise demote the lock if we had to take the exclusive lock
 	 * for other reasons in xfs_file_aio_write_checks.
 	 */
-	if (unaligned_io)
-		inode_dio_wait(inode);
-	else if (iolock == XFS_IOLOCK_EXCL) {
+	if (unaligned_io) {
+		/* If we are going to wait for other DIO to finish, bail */
+		if (iocb->ki_flags & IOCB_NOWAIT) {
+			if (atomic_read(&inode->i_dio_count))
+				return -EAGAIN;
+		} else {
+			inode_dio_wait(inode);
+		}
+	} else if (iolock == XFS_IOLOCK_EXCL) {
 		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
 		iolock = XFS_IOLOCK_SHARED;
 	}
@@ -892,6 +901,7 @@ xfs_file_open(
 		return -EFBIG;
 	if (XFS_FORCED_SHUTDOWN(XFS_M(inode->i_sb)))
 		return -EIO;
+	file->f_mode |= FMODE_AIO_NOWAIT;
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 94e5bdf7304c..05dc87e8c1f5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -995,6 +995,11 @@ xfs_file_iomap_begin(
 		lockmode = xfs_ilock_data_map_shared(ip);
 	}
 
+	if ((flags & IOMAP_NOWAIT) && !(ip->i_df.if_flags & XFS_IFEXTENTS)) {
+		error = -EAGAIN;
+		goto out_unlock;
+	}
+
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 	if ((xfs_fsize_t)offset + length > mp->m_super->s_maxbytes)
 		length = mp->m_super->s_maxbytes - offset;
@@ -1016,6 +1021,15 @@ xfs_file_iomap_begin(
 
 	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
 		if (flags & IOMAP_DIRECT) {
+			/*
+			 * A reflinked inode will result in CoW alloc.
+			 * FIXME: It could still overwrite on unshared extents
+			 * and not need allocation.
+			 */
+			if (flags & IOMAP_NOWAIT) {
+				error = -EAGAIN;
+				goto out_unlock;
+			}
 			/* may drop and re-acquire the ilock */
 			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
 					&lockmode);
@@ -1033,6 +1047,14 @@ xfs_file_iomap_begin(
 
 	if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, &imap, nimaps)) {
 		/*
+		 * If nowait is set bail since we are going to make
+		 * allocations.
+		 */
+		if (flags & IOMAP_NOWAIT) {
+			error = -EAGAIN;
+			goto out_unlock;
+		}
+		/*
 		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
 		 * pages to keep the chunks of work done where somewhat symmetric
 		 * with the work writeback does. This is a completely arbitrary
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/10] btrfs: nowait aio support
  2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
                   ` (8 preceding siblings ...)
  2017-06-15 16:00 ` [PATCH 09/10] xfs: " Goldwyn Rodrigues
@ 2017-06-15 16:00 ` Goldwyn Rodrigues
  9 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 16:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, axboe, linux-api, viro, akpm, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Return EAGAIN if any of the following checks fail
 + i_rwsem is not lockable
 + NODATACOW or PREALLOC is not set
 + Cannot nocow at the desired location
 + Writing beyond end of file which is not allocated

Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c  | 33 +++++++++++++++++++++++++++------
 fs/btrfs/inode.c |  3 +++
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index da1096eb1a40..59e2dccdf75b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1875,12 +1875,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	ssize_t num_written = 0;
 	bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host);
 	ssize_t err;
-	loff_t pos;
-	size_t count;
+	loff_t pos = iocb->ki_pos;
+	size_t count = iov_iter_count(from);
 	loff_t oldsize;
 	int clean_page = 0;
 
-	inode_lock(inode);
+	if ((iocb->ki_flags & IOCB_NOWAIT) &&
+			(iocb->ki_flags & IOCB_DIRECT)) {
+		/* Don't sleep on inode rwsem */
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+		/*
+		 * We will allocate space in case nodatacow is not set,
+		 * so bail
+		 */
+		if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+					      BTRFS_INODE_PREALLOC)) ||
+		    check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
+			inode_unlock(inode);
+			return -EAGAIN;
+		}
+	} else
+		inode_lock(inode);
+
 	err = generic_write_checks(iocb, from);
 	if (err <= 0) {
 		inode_unlock(inode);
@@ -1914,8 +1931,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	 */
 	update_time_for_write(inode);
 
-	pos = iocb->ki_pos;
-	count = iov_iter_count(from);
 	start_pos = round_down(pos, fs_info->sectorsize);
 	oldsize = i_size_read(inode);
 	if (start_pos > oldsize) {
@@ -3071,13 +3086,19 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence)
 	return offset;
 }
 
+static int btrfs_file_open(struct inode *inode, struct file *filp)
+{
+	filp->f_mode |= FMODE_AIO_NOWAIT;
+	return generic_file_open(inode, filp);
+}
+
 const struct file_operations btrfs_file_operations = {
 	.llseek		= btrfs_file_llseek,
 	.read_iter      = generic_file_read_iter,
 	.splice_read	= generic_file_splice_read,
 	.write_iter	= btrfs_file_write_iter,
 	.mmap		= btrfs_file_mmap,
-	.open		= generic_file_open,
+	.open		= btrfs_file_open,
 	.release	= btrfs_release_file,
 	.fsync		= btrfs_sync_file,
 	.fallocate	= btrfs_fallocate,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ef3c98c527c1..861979802aa3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8755,6 +8755,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			dio_data.overwrite = 1;
 			inode_unlock(inode);
 			relock = true;
+		} else if (iocb->ki_flags & IOCB_NOWAIT) {
+			ret = -EAGAIN;
+			goto out;
 		}
 		ret = btrfs_delalloc_reserve_space(inode, offset, count);
 		if (ret)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/10] block: return on congested block device
  2017-06-15 15:59 ` [PATCH 07/10] block: return on congested block device Goldwyn Rodrigues
@ 2017-06-15 16:42   ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-06-15 16:42 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: jack, hch, linux-block, linux-api, viro, akpm, Goldwyn Rodrigues

On 06/15/2017 09:59 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> A new bio operation flag REQ_NOWAIT is introduced to identify bio's
> orignating from iocb with IOCB_NOWAIT. This flag indicates
> to return immediately if a request cannot be made instead
> of retrying.
> 
> Stacked devices such as md (the ones with make_request_fn hooks)
> currently are not supported because it may block for housekeeping.
> For example, an md can have a part of the device suspended.
> For this reason, only request based devices are supported.
> In the future, this feature will be expanded to stacked devices
> by teaching them how to handle the REQ_NOWAIT flags.

Looks fine to me now. I don't know what tree will take these patches,
but if it's not block, then you can add:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/10 v12] No wait AIO
       [not found] ` <20170615160002.17233-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
  2017-06-15 15:59   ` [PATCH 06/10] fs: Introduce IOMAP_NOWAIT Goldwyn Rodrigues
@ 2017-06-15 18:25   ` Andrew Morton
  2017-06-15 21:51     ` Goldwyn Rodrigues
  2017-06-16  8:54     ` Jan Kara
  1 sibling, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2017-06-15 18:25 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, jack-IBi9RG/b67k,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-block-u79uwXL29TY76Z2rM5mHXA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Thu, 15 Jun 2017 10:59:52 -0500 Goldwyn Rodrigues <rgoldwyn-l3A5Bk7waGM@public.gmane.org> wrote:

> This series adds nonblocking feature to asynchronous I/O writes.
> io_submit() can be delayed because of a number of reason:
>  - Block allocation for files
>  - Data writebacks for direct I/O
>  - Sleeping because of waiting to acquire i_rwsem
>  - Congested block device
> 
> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> any of these conditions are met. This way userspace can push most
> of the write()s to the kernel to the best of its ability to complete
> and if it returns -EAGAIN, can defer it to another thread.
> 
> In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
> uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
> IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
> iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
> not use aio_flags because it is not currently checked for invalidity
> in the kernel.
> 
> This feature is provided for direct I/O of asynchronous I/O only. I have
> tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
> The nowait feature is for request based devices. In the future, I intend to
> add support to stacked devices such as md.
> 
> Applications will have to check supportability by sending a async direct write
> and any other error besides -EAGAIN would mean it is not supported.
> 

How accurate it this?  For example, the changes to
generic_file_direct_write() appear to greatly reduce the chances of
blocking but there are surely race opportunities which will still
result in userspace unexpectedly experiencing blocking in a succeednig
write() call?

If correct then I think there should be some discussion and perhaps
testing results in the changelog.


I have only minor quibbles - I'll grab the patch series for some -next
testing (at least).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] fs: Introduce filemap_range_has_page()
       [not found]   ` <20170615160002.17233-3-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
@ 2017-06-15 18:25     ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-06-15 18:25 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, jack-IBi9RG/b67k,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-block-u79uwXL29TY76Z2rM5mHXA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, Goldwyn Rodrigues

On Thu, 15 Jun 2017 10:59:54 -0500 Goldwyn Rodrigues <rgoldwyn-l3A5Bk7waGM@public.gmane.org> wrote:

> From: Goldwyn Rodrigues <rgoldwyn-IBi9RG/b67k@public.gmane.org>
> 
> filemap_range_has_page() return true if the file's mapping has
> a page within the range mentioned. This function will be used
> to check if a write() call will cause a writeback of previous
> writes.
> 
> ...
>
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
>  extern void filemap_fdatawait_keep_errors(struct address_space *);
>  extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
>  				   loff_t lend);
> +extern int filemap_range_has_page(struct address_space *, loff_t lstart,
> +				  loff_t lend);
>  extern int filemap_write_and_wait(struct address_space *mapping);
>  extern int filemap_write_and_wait_range(struct address_space *mapping,
>  				        loff_t lstart, loff_t lend);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 6f1be573a5e6..87aba7698584 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
>  }
>  EXPORT_SYMBOL(filemap_flush);
>  
> +/**
> + * filemap_range_has_page - check if a page exists in range.
> + * @mapping:           address space structure to wait for

"to wait for" seems wrong.

> + * @start_byte:        offset in bytes where the range starts
> + * @end_byte:          offset in bytes where the range ends (inclusive)
> + *
> + * Find at least one page in the range supplied, usually used to check if
> + * direct writing in this range will trigger a writeback.
> + */
> +int filemap_range_has_page(struct address_space *mapping,
> +			   loff_t start_byte, loff_t end_byte)

Would a bool return type be better?

> +{
> +	pgoff_t index = start_byte >> PAGE_SHIFT;
> +	pgoff_t end = end_byte >> PAGE_SHIFT;
> +	struct pagevec pvec;
> +	int ret;
> +
> +	if (end_byte < start_byte)
> +		return 0;
> +
> +	if (mapping->nrpages == 0)
> +		return 0;
> +
> +	pagevec_init(&pvec, 0);
> +	ret = pagevec_lookup(&pvec, mapping, index, 1);
> +	if (!ret)
> +		return 0;
> +	ret = (pvec.pages[0]->index <= end);
> +	pagevec_release(&pvec);
> +	return ret;
> +}
> +EXPORT_SYMBOL(filemap_range_has_page);
> +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/10 v12] No wait AIO
  2017-06-15 18:25   ` [PATCH 0/10 v12] No wait AIO Andrew Morton
@ 2017-06-15 21:51     ` Goldwyn Rodrigues
       [not found]       ` <1003b3e8-a775-e8ac-d1ca-11055d941a98-l3A5Bk7waGM@public.gmane.org>
  2017-06-16  8:54     ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel, jack, hch, linux-block, axboe, linux-api, viro



On 06/15/2017 01:25 PM, Andrew Morton wrote:
> On Thu, 15 Jun 2017 10:59:52 -0500 Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> 
>> This series adds nonblocking feature to asynchronous I/O writes.
>> io_submit() can be delayed because of a number of reason:
>>  - Block allocation for files
>>  - Data writebacks for direct I/O
>>  - Sleeping because of waiting to acquire i_rwsem
>>  - Congested block device
>>
>> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
>> any of these conditions are met. This way userspace can push most
>> of the write()s to the kernel to the best of its ability to complete
>> and if it returns -EAGAIN, can defer it to another thread.
>>
>> In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
>> uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
>> IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
>> iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
>> not use aio_flags because it is not currently checked for invalidity
>> in the kernel.
>>
>> This feature is provided for direct I/O of asynchronous I/O only. I have
>> tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
>> The nowait feature is for request based devices. In the future, I intend to
>> add support to stacked devices such as md.
>>
>> Applications will have to check supportability by sending a async direct write
>> and any other error besides -EAGAIN would mean it is not supported.
>>
> 
> How accurate it this?  For example, the changes to
> generic_file_direct_write() appear to greatly reduce the chances of
> blocking but there are surely race opportunities which will still
> result in userspace unexpectedly experiencing blocking in a succeednig
> write() call?

We are not reducing the chance of blocking, but detecting if the call
would block and return to userspace as soon as possible rather than
waiting for the blocking factor. One of the blocking factor is the mutex
inode->i_rwsem (formerly i_mutex). The performance gain should come from
the application depending on how they use it. Here is an example:

A database application has compute and I/O threads. This effort will
allow the compute threads to push writes without the need of context
switch to I/O thread, since it knows that it will end soon enough
without blocking. If a IOCB does block (and returns -EAGAIN), it would
be deferred to the I/O thread. Usually the compute thread should know
the offsets of writes, and be careful not to overwrite other writes.

> 
> If correct then I think there should be some discussion and perhaps
> testing results in the changelog.

I will be posting one test case to xfstests.


> I have only minor quibbles - I'll grab the patch series for some -next
> testing (at least).
> 

I agree to the quibbles you have on patch 02/10. Should I send the
entire fixed series, just the 02/10 patch, or would you prefer to fix it?

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/10 v12] No wait AIO
       [not found]       ` <1003b3e8-a775-e8ac-d1ca-11055d941a98-l3A5Bk7waGM@public.gmane.org>
@ 2017-06-15 22:01         ` Andrew Morton
       [not found]           ` <20170615150100.52c0387406e6ce5167dc098e-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2017-06-15 22:01 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, jack-IBi9RG/b67k,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-block-u79uwXL29TY76Z2rM5mHXA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Thu, 15 Jun 2017 16:51:41 -0500 Goldwyn Rodrigues <rgoldwyn-l3A5Bk7waGM@public.gmane.org> wrote:

> > I have only minor quibbles - I'll grab the patch series for some -next
> > testing (at least).
> > 
> 
> I agree to the quibbles you have on patch 02/10. Should I send the
> entire fixed series, just the 02/10 patch, or would you prefer to fix it?

This?

--- a/include/linux/fs.h~fs-introduce-filemap_range_has_page-fix
+++ a/include/linux/fs.h
@@ -2517,8 +2517,8 @@ extern int filemap_fdatawait(struct addr
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
-extern int filemap_range_has_page(struct address_space *, loff_t lstart,
-				  loff_t lend);
+extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
+				   loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff -puN mm/filemap.c~fs-introduce-filemap_range_has_page-fix mm/filemap.c
--- a/mm/filemap.c~fs-introduce-filemap_range_has_page-fix
+++ a/mm/filemap.c
@@ -378,31 +378,30 @@ EXPORT_SYMBOL(filemap_flush);
 
 /**
  * filemap_range_has_page - check if a page exists in range.
- * @mapping:           address space structure to wait for
+ * @mapping:           address space within which to check
  * @start_byte:        offset in bytes where the range starts
  * @end_byte:          offset in bytes where the range ends (inclusive)
  *
  * Find at least one page in the range supplied, usually used to check if
  * direct writing in this range will trigger a writeback.
  */
-int filemap_range_has_page(struct address_space *mapping,
-			   loff_t start_byte, loff_t end_byte)
+bool filemap_range_has_page(struct address_space *mapping,
+			    loff_t start_byte, loff_t end_byte)
 {
 	pgoff_t index = start_byte >> PAGE_SHIFT;
 	pgoff_t end = end_byte >> PAGE_SHIFT;
 	struct pagevec pvec;
-	int ret;
+	bool ret;
 
 	if (end_byte < start_byte)
-		return 0;
+		return false;
 
 	if (mapping->nrpages == 0)
-		return 0;
+		return false;
 
 	pagevec_init(&pvec, 0);
-	ret = pagevec_lookup(&pvec, mapping, index, 1);
-	if (!ret)
-		return 0;
+	if (!pagevec_lookup(&pvec, mapping, index, 1))
+		return false;
 	ret = (pvec.pages[0]->index <= end);
 	pagevec_release(&pvec);
 	return ret;
_

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/10 v12] No wait AIO
       [not found]           ` <20170615150100.52c0387406e6ce5167dc098e-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2017-06-15 23:49             ` Goldwyn Rodrigues
  0 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-15 23:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, jack-IBi9RG/b67k,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-block-u79uwXL29TY76Z2rM5mHXA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn



On 06/15/2017 05:01 PM, Andrew Morton wrote:
> On Thu, 15 Jun 2017 16:51:41 -0500 Goldwyn Rodrigues <rgoldwyn-l3A5Bk7waGM@public.gmane.org> wrote:
> 
>>> I have only minor quibbles - I'll grab the patch series for some -next
>>> testing (at least).
>>>
>>
>> I agree to the quibbles you have on patch 02/10. Should I send the
>> entire fixed series, just the 02/10 patch, or would you prefer to fix it?
> 
> This?

Perfect. Thanks!

> 
> --- a/include/linux/fs.h~fs-introduce-filemap_range_has_page-fix
> +++ a/include/linux/fs.h
> @@ -2517,8 +2517,8 @@ extern int filemap_fdatawait(struct addr
>  extern void filemap_fdatawait_keep_errors(struct address_space *);
>  extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
>  				   loff_t lend);
> -extern int filemap_range_has_page(struct address_space *, loff_t lstart,
> -				  loff_t lend);
> +extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
> +				   loff_t lend);
>  extern int filemap_write_and_wait(struct address_space *mapping);
>  extern int filemap_write_and_wait_range(struct address_space *mapping,
>  				        loff_t lstart, loff_t lend);
> diff -puN mm/filemap.c~fs-introduce-filemap_range_has_page-fix mm/filemap.c
> --- a/mm/filemap.c~fs-introduce-filemap_range_has_page-fix
> +++ a/mm/filemap.c
> @@ -378,31 +378,30 @@ EXPORT_SYMBOL(filemap_flush);
>  
>  /**
>   * filemap_range_has_page - check if a page exists in range.
> - * @mapping:           address space structure to wait for
> + * @mapping:           address space within which to check
>   * @start_byte:        offset in bytes where the range starts
>   * @end_byte:          offset in bytes where the range ends (inclusive)
>   *
>   * Find at least one page in the range supplied, usually used to check if
>   * direct writing in this range will trigger a writeback.
>   */
> -int filemap_range_has_page(struct address_space *mapping,
> -			   loff_t start_byte, loff_t end_byte)
> +bool filemap_range_has_page(struct address_space *mapping,
> +			    loff_t start_byte, loff_t end_byte)
>  {
>  	pgoff_t index = start_byte >> PAGE_SHIFT;
>  	pgoff_t end = end_byte >> PAGE_SHIFT;
>  	struct pagevec pvec;
> -	int ret;
> +	bool ret;
>  
>  	if (end_byte < start_byte)
> -		return 0;
> +		return false;
>  
>  	if (mapping->nrpages == 0)
> -		return 0;
> +		return false;
>  
>  	pagevec_init(&pvec, 0);
> -	ret = pagevec_lookup(&pvec, mapping, index, 1);
> -	if (!ret)
> -		return 0;
> +	if (!pagevec_lookup(&pvec, mapping, index, 1))
> +		return false;
>  	ret = (pvec.pages[0]->index <= end);
>  	pagevec_release(&pvec);
>  	return ret;
> _
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/10 v12] No wait AIO
  2017-06-15 18:25   ` [PATCH 0/10 v12] No wait AIO Andrew Morton
  2017-06-15 21:51     ` Goldwyn Rodrigues
@ 2017-06-16  8:54     ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2017-06-16  8:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Goldwyn Rodrigues, linux-fsdevel, jack, hch, linux-block, axboe,
	linux-api, viro

On Thu 15-06-17 11:25:28, Andrew Morton wrote:
> On Thu, 15 Jun 2017 10:59:52 -0500 Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> 
> > This series adds nonblocking feature to asynchronous I/O writes.
> > io_submit() can be delayed because of a number of reason:
> >  - Block allocation for files
> >  - Data writebacks for direct I/O
> >  - Sleeping because of waiting to acquire i_rwsem
> >  - Congested block device
> > 
> > The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> > any of these conditions are met. This way userspace can push most
> > of the write()s to the kernel to the best of its ability to complete
> > and if it returns -EAGAIN, can defer it to another thread.
> > 
> > In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
> > uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
> > IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
> > iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
> > not use aio_flags because it is not currently checked for invalidity
> > in the kernel.
> > 
> > This feature is provided for direct I/O of asynchronous I/O only. I have
> > tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
> > The nowait feature is for request based devices. In the future, I intend to
> > add support to stacked devices such as md.
> > 
> > Applications will have to check supportability by sending a async direct write
> > and any other error besides -EAGAIN would mean it is not supported.
> > 
> 
> How accurate it this?  For example, the changes to
> generic_file_direct_write() appear to greatly reduce the chances of
> blocking but there are surely race opportunities which will still
> result in userspace unexpectedly experiencing blocking in a succeednig
> write() call?

Yes, so you are right that there are still possibilities for blocking -
e.g. we could get blocked in reclaim when allocating memory somewhere. Now
we hope what Goldwyn did will be enough for practical purposes as in the
end this is an API to improve performance and so in the worst case app
won't get the performance it expects (this just has to be rare enough that
it all pays off in the end). Also if we spot some place that ends up to
cause blocking in practice, we'll work on improving that...
 
> If correct then I think there should be some discussion and perhaps
> testing results in the changelog.

Probably we could add a note to the first paragraph of the changelog of
patch 4/10 like: Note that we can still block (put the process submitting
IO to sleep) in some rare cases like when there is not enough free memory
or when acquiring some fs-internal sleeping locks.

WRT test results, Goldwyn has some functional tests (for xfstests). We also
have a customer that is working on testing the series with their workload
however that will take some time given it requires updating their software
stack. If you are looking for some synthetic benchmark results, I suppose
we can put something together however it's going to be just a synthetic
benchmark and as such the relevance is limited.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT
       [not found]   ` <20170615160002.17233-5-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
@ 2017-06-17  4:09     ` Al Viro
  2017-06-17 11:53       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2017-06-17  4:09 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, jack-IBi9RG/b67k,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-block-u79uwXL29TY76Z2rM5mHXA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Goldwyn Rodrigues

On Thu, Jun 15, 2017 at 10:59:56AM -0500, Goldwyn Rodrigues wrote:
>  static inline int kiocb_set_rw_flags(struct kiocb *ki, int flags)
>  {
> -	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC)))
> +	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT)))

Minor nit: that calls for something like
	if (unlikely(flags & ~RWF_ALL)
>  		return -EOPNOTSUPP;

with corresponding definition.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT
  2017-06-17  4:09     ` Al Viro
@ 2017-06-17 11:53       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-06-17 11:53 UTC (permalink / raw)
  To: Al Viro
  Cc: Goldwyn Rodrigues, linux-fsdevel, jack, hch, linux-block, axboe,
	linux-api, akpm, Goldwyn Rodrigues

On Sat, Jun 17, 2017 at 05:09:44AM +0100, Al Viro wrote:
> On Thu, Jun 15, 2017 at 10:59:56AM -0500, Goldwyn Rodrigues wrote:
> >  static inline int kiocb_set_rw_flags(struct kiocb *ki, int flags)
> >  {
> > -	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC)))
> > +	if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT)))
> 
> Minor nit: that calls for something like
> 	if (unlikely(flags & ~RWF_ALL)
> >  		return -EOPNOTSUPP;
> 
> with corresponding definition.

Possibly.  Note _ALL is not correct - at least RWF_HIPRI is explicitly
defined a shint that can be ignored.

Maybe RWF_SUPPORTED.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 02/10] fs: Introduce filemap_range_has_page()
  2017-06-06 11:19 [PATCH 0/10 v11] No wait AIO Goldwyn Rodrigues
@ 2017-06-06 11:19 ` Goldwyn Rodrigues
  0 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-06 11:19 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs,
	axboe, linux-api, adam.manzanares, viro, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f53867140f43..dc0ab585cd56 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+				  loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6f1be573a5e6..87aba7698584 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:           address space structure to wait for
+ * @start_byte:        offset in bytes where the range starts
+ * @end_byte:          offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+			   loff_t start_byte, loff_t end_byte)
+{
+	pgoff_t index = start_byte >> PAGE_SHIFT;
+	pgoff_t end = end_byte >> PAGE_SHIFT;
+	struct pagevec pvec;
+	int ret;
+
+	if (end_byte < start_byte)
+		return 0;
+
+	if (mapping->nrpages == 0)
+		return 0;
+
+	pagevec_init(&pvec, 0);
+	ret = pagevec_lookup(&pvec, mapping, index, 1);
+	if (!ret)
+		return 0;
+	ret = (pvec.pages[0]->index <= end);
+	pagevec_release(&pvec);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 				     loff_t start_byte, loff_t end_byte)
 {
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] fs: Introduce filemap_range_has_page()
  2017-06-05  5:35 [PATCH 0/10 v10] No wait AIO Goldwyn Rodrigues
@ 2017-06-05  5:35 ` Goldwyn Rodrigues
  0 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-06-05  5:35 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs,
	axboe, linux-api, adam.manzanares, viro, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f53867140f43..dc0ab585cd56 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+				  loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6f1be573a5e6..87aba7698584 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:           address space structure to wait for
+ * @start_byte:        offset in bytes where the range starts
+ * @end_byte:          offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+			   loff_t start_byte, loff_t end_byte)
+{
+	pgoff_t index = start_byte >> PAGE_SHIFT;
+	pgoff_t end = end_byte >> PAGE_SHIFT;
+	struct pagevec pvec;
+	int ret;
+
+	if (end_byte < start_byte)
+		return 0;
+
+	if (mapping->nrpages == 0)
+		return 0;
+
+	pagevec_init(&pvec, 0);
+	ret = pagevec_lookup(&pvec, mapping, index, 1);
+	if (!ret)
+		return 0;
+	ret = (pvec.pages[0]->index <= end);
+	pagevec_release(&pvec);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 				     loff_t start_byte, loff_t end_byte)
 {
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] fs: Introduce filemap_range_has_page()
  2017-05-24 16:41 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
@ 2017-05-25  8:25   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2017-05-25  8:25 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
	linux-xfs, axboe, linux-api, adam.manzanares, viro,
	Goldwyn Rodrigues

On Wed 24-05-17 11:41:42, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> filemap_range_has_page() return true if the file's mapping has
> a page within the range mentioned. This function will be used
> to check if a write() call will cause a writeback of previous
> writes.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h |  2 ++
>  mm/filemap.c       | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f53867140f43..dc0ab585cd56 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
>  extern void filemap_fdatawait_keep_errors(struct address_space *);
>  extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
>  				   loff_t lend);
> +extern int filemap_range_has_page(struct address_space *, loff_t lstart,
> +				  loff_t lend);
>  extern int filemap_write_and_wait(struct address_space *mapping);
>  extern int filemap_write_and_wait_range(struct address_space *mapping,
>  				        loff_t lstart, loff_t lend);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 6f1be573a5e6..87aba7698584 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
>  }
>  EXPORT_SYMBOL(filemap_flush);
>  
> +/**
> + * filemap_range_has_page - check if a page exists in range.
> + * @mapping:           address space structure to wait for
> + * @start_byte:        offset in bytes where the range starts
> + * @end_byte:          offset in bytes where the range ends (inclusive)
> + *
> + * Find at least one page in the range supplied, usually used to check if
> + * direct writing in this range will trigger a writeback.
> + */
> +int filemap_range_has_page(struct address_space *mapping,
> +			   loff_t start_byte, loff_t end_byte)
> +{
> +	pgoff_t index = start_byte >> PAGE_SHIFT;
> +	pgoff_t end = end_byte >> PAGE_SHIFT;
> +	struct pagevec pvec;
> +	int ret;
> +
> +	if (end_byte < start_byte)
> +		return 0;
> +
> +	if (mapping->nrpages == 0)
> +		return 0;
> +
> +	pagevec_init(&pvec, 0);
> +	ret = pagevec_lookup(&pvec, mapping, index, 1);
> +	if (!ret)
> +		return 0;
> +	ret = (pvec.pages[0]->index <= end);
> +	pagevec_release(&pvec);
> +	return ret;
> +}
> +EXPORT_SYMBOL(filemap_range_has_page);
> +
>  static int __filemap_fdatawait_range(struct address_space *mapping,
>  				     loff_t start_byte, loff_t end_byte)
>  {
> -- 
> 2.12.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 02/10] fs: Introduce filemap_range_has_page()
  2017-05-24 16:41 [PATCH 0/10 v9] No wait AIO Goldwyn Rodrigues
@ 2017-05-24 16:41 ` Goldwyn Rodrigues
  2017-05-25  8:25   ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-05-24 16:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs,
	axboe, linux-api, adam.manzanares, viro, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f53867140f43..dc0ab585cd56 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+				  loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6f1be573a5e6..87aba7698584 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:           address space structure to wait for
+ * @start_byte:        offset in bytes where the range starts
+ * @end_byte:          offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+			   loff_t start_byte, loff_t end_byte)
+{
+	pgoff_t index = start_byte >> PAGE_SHIFT;
+	pgoff_t end = end_byte >> PAGE_SHIFT;
+	struct pagevec pvec;
+	int ret;
+
+	if (end_byte < start_byte)
+		return 0;
+
+	if (mapping->nrpages == 0)
+		return 0;
+
+	pagevec_init(&pvec, 0);
+	ret = pagevec_lookup(&pvec, mapping, index, 1);
+	if (!ret)
+		return 0;
+	ret = (pvec.pages[0]->index <= end);
+	pagevec_release(&pvec);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 				     loff_t start_byte, loff_t end_byte)
 {
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] fs: Introduce filemap_range_has_page()
  2017-05-11 19:17 [PATCH 0/10 v8] No wait AIO Goldwyn Rodrigues
@ 2017-05-11 19:17 ` Goldwyn Rodrigues
  0 siblings, 0 replies; 25+ messages in thread
From: Goldwyn Rodrigues @ 2017-05-11 19:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
	avi, axboe, linux-api, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 869c9a6fe58d..2e6fc6a23f91 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2513,6 +2513,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+				  loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 1694623a6289..fae5a361befb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:           address space structure to wait for
+ * @start_byte:        offset in bytes where the range starts
+ * @end_byte:          offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+			   loff_t start_byte, loff_t end_byte)
+{
+	pgoff_t index = start_byte >> PAGE_SHIFT;
+	pgoff_t end = end_byte >> PAGE_SHIFT;
+	struct pagevec pvec;
+	int ret;
+
+	if (end_byte < start_byte)
+		return 0;
+
+	if (mapping->nrpages == 0)
+		return 0;
+
+	pagevec_init(&pvec, 0);
+	ret = pagevec_lookup(&pvec, mapping, index, 1);
+	if (!ret)
+		return 0;
+	ret = (pvec.pages[0]->index <= end);
+	pagevec_release(&pvec);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 				     loff_t start_byte, loff_t end_byte)
 {
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-06-17 11:53 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-15 15:59 [PATCH 0/10 v12] No wait AIO Goldwyn Rodrigues
2017-06-15 15:59 ` [PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags Goldwyn Rodrigues
2017-06-15 15:59 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
     [not found]   ` <20170615160002.17233-3-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2017-06-15 18:25     ` Andrew Morton
2017-06-15 15:59 ` [PATCH 03/10] fs: Use RWF_* flags for AIO operations Goldwyn Rodrigues
2017-06-15 15:59 ` [PATCH 04/10] fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT Goldwyn Rodrigues
     [not found]   ` <20170615160002.17233-5-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2017-06-17  4:09     ` Al Viro
2017-06-17 11:53       ` Christoph Hellwig
2017-06-15 15:59 ` [PATCH 05/10] fs: return if direct write will trigger writeback Goldwyn Rodrigues
     [not found] ` <20170615160002.17233-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2017-06-15 15:59   ` [PATCH 06/10] fs: Introduce IOMAP_NOWAIT Goldwyn Rodrigues
2017-06-15 18:25   ` [PATCH 0/10 v12] No wait AIO Andrew Morton
2017-06-15 21:51     ` Goldwyn Rodrigues
     [not found]       ` <1003b3e8-a775-e8ac-d1ca-11055d941a98-l3A5Bk7waGM@public.gmane.org>
2017-06-15 22:01         ` Andrew Morton
     [not found]           ` <20170615150100.52c0387406e6ce5167dc098e-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2017-06-15 23:49             ` Goldwyn Rodrigues
2017-06-16  8:54     ` Jan Kara
2017-06-15 15:59 ` [PATCH 07/10] block: return on congested block device Goldwyn Rodrigues
2017-06-15 16:42   ` Jens Axboe
2017-06-15 16:00 ` [PATCH 08/10] ext4: nowait aio support Goldwyn Rodrigues
2017-06-15 16:00 ` [PATCH 09/10] xfs: " Goldwyn Rodrigues
2017-06-15 16:00 ` [PATCH 10/10] btrfs: " Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2017-06-06 11:19 [PATCH 0/10 v11] No wait AIO Goldwyn Rodrigues
2017-06-06 11:19 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
2017-06-05  5:35 [PATCH 0/10 v10] No wait AIO Goldwyn Rodrigues
2017-06-05  5:35 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
2017-05-24 16:41 [PATCH 0/10 v9] No wait AIO Goldwyn Rodrigues
2017-05-24 16:41 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues
2017-05-25  8:25   ` Jan Kara
2017-05-11 19:17 [PATCH 0/10 v8] No wait AIO Goldwyn Rodrigues
2017-05-11 19:17 ` [PATCH 02/10] fs: Introduce filemap_range_has_page() Goldwyn Rodrigues

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).