[parent not found: <20170403185307.6243-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>]
* [PATCH 1/8] nowait aio: Introduce IOCB_RW_FLAG_NOWAIT
[not found] ` <20170403185307.6243-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-04 6:48 ` Christoph Hellwig
2017-04-03 18:53 ` [PATCH 8/8] nowait aio: btrfs Goldwyn Rodrigues
1 sibling, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
Cc: jack-IBi9RG/b67k, hch-wEGCiKHe2LqWVfeAwA7xHQ,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
linux-ext4-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, sagi-NQWnxTmZq1alnMjI0IkVqw,
avi-VrcmuVmyx1hWk0Htik3J/w, axboe-tSWWG44O7X1aa/9Udqfwiw,
linux-api-u79uwXL29TY76Z2rM5mHXA, willy-wEGCiKHe2LqWVfeAwA7xHQ,
tom.leiming-Re5JQEeQqe8AvxtiuMwx3w, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn-IBi9RG/b67k@public.gmane.org>
This flag informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered,
or would block while allocating requests while performing
direct I/O.
Unfortunately, aio_flags is not checked for validity, which would
break existing applications which have it set to anything besides zero
or IOCB_FLAG_RESFD. So, we are using aio_reserved1 and renaming it
to aio_rw_flags.
IOCB_RW_FLAG_NOWAIT is translated to IOCB_NOWAIT for
iocb->ki_flags.
Added FS_NOWAIT to make sure VFS knows that the filesystem is capable
of performing direct-AIO with IOCB_RW_FLAG_NOWAIT.
---
fs/aio.c | 15 ++++++++++++++-
include/linux/fs.h | 2 ++
include/uapi/linux/aio_abi.h | 9 ++++++++-
3 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index f52d925..25ae59b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1541,11 +1541,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
ssize_t ret;
/* enforce forwards compatibility on users */
- if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
+ if (unlikely(iocb->aio_reserved2)) {
pr_debug("EINVAL: reserve field set\n");
return -EINVAL;
}
+ if (unlikely(iocb->aio_rw_flags & ~IOCB_RW_FLAG_NOWAIT)) {
+ pr_debug("EINVAL: aio_rw_flags set with incompatible flags\n");
+ return -EINVAL;
+ }
+
/* prevent overflows */
if (unlikely(
(iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
@@ -1586,6 +1591,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
req->common.ki_flags |= IOCB_EVENTFD;
}
+ if (iocb->aio_rw_flags & IOCB_RW_FLAG_NOWAIT) {
+ if (!(file->f_inode->i_sb->s_type->fs_flags & FS_NOWAIT)) {
+ ret = -EOPNOTSUPP;
+ goto out_put_req;
+ }
+ req->common.ki_flags |= IOCB_NOWAIT;
+ }
+
ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
if (unlikely(ret)) {
pr_debug("EFAULT: aio_key\n");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7251f7b..802cfe2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -270,6 +270,7 @@ struct writeback_control;
#define IOCB_DSYNC (1 << 4)
#define IOCB_SYNC (1 << 5)
#define IOCB_WRITE (1 << 6)
+#define IOCB_NOWAIT (1 << 7)
struct kiocb {
struct file *ki_filp;
@@ -2020,6 +2021,7 @@ struct file_system_type {
#define FS_HAS_SUBTYPE 4
#define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
+#define FS_NOWAIT 65536 /* FS supports nowait direct AIO */
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*kill_sb) (struct super_block *);
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index bb2554f..6d98cbe 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -54,6 +54,13 @@ enum {
*/
#define IOCB_FLAG_RESFD (1 << 0)
+/*
+ * Flags for aio_rw_flags member of "struct iocb".
+ * IOCB_RW_FLAG_NOWAIT - Set if the user wants the iocb to fail if it
+ * would block for operations such as disk allocation.
+ */
+#define IOCB_RW_FLAG_NOWAIT (1 << 1)
+
/* read() from /dev/aio returns these structures. */
struct io_event {
__u64 data; /* the data field from the iocb */
@@ -79,7 +86,7 @@ struct io_event {
struct iocb {
/* these are internal to the kernel/libc. */
__u64 aio_data; /* data to be returned in event's data */
- __u32 PADDED(aio_key, aio_reserved1);
+ __u32 PADDED(aio_key, aio_rw_flags);
/* the kernel sets aio_key to the req # */
/* common fields */
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 1/8] nowait aio: Introduce IOCB_RW_FLAG_NOWAIT
2017-04-03 18:53 ` [PATCH 1/8] nowait aio: Introduce IOCB_RW_FLAG_NOWAIT Goldwyn Rodrigues
@ 2017-04-04 6:48 ` Christoph Hellwig
0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2017-04-04 6:48 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On Mon, Apr 03, 2017 at 01:53:00PM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> This flag informs kernel to bail out if an AIO request will block
> for reasons such as file allocations, or a writeback triggered,
> or would block while allocating requests while performing
> direct I/O.
>
> Unfortunately, aio_flags is not checked for validity, which would
> break existing applications which have it set to anything besides zero
> or IOCB_FLAG_RESFD. So, we are using aio_reserved1 and renaming it
> to aio_rw_flags.
>
> IOCB_RW_FLAG_NOWAIT is translated to IOCB_NOWAIT for
> iocb->ki_flags.
Please make this a flag in the RWF_* namespace, and as a preparation
support the existing RWF_* flags for aio.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 8/8] nowait aio: btrfs
[not found] ` <20170403185307.6243-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2017-04-03 18:53 ` [PATCH 1/8] nowait aio: Introduce IOCB_RW_FLAG_NOWAIT Goldwyn Rodrigues
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
1 sibling, 0 replies; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
Cc: jack-IBi9RG/b67k, hch-wEGCiKHe2LqWVfeAwA7xHQ,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
linux-ext4-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, sagi-NQWnxTmZq1alnMjI0IkVqw,
avi-VrcmuVmyx1hWk0Htik3J/w, axboe-tSWWG44O7X1aa/9Udqfwiw,
linux-api-u79uwXL29TY76Z2rM5mHXA, willy-wEGCiKHe2LqWVfeAwA7xHQ,
tom.leiming-Re5JQEeQqe8AvxtiuMwx3w, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn-IBi9RG/b67k@public.gmane.org>
Return EAGAIN if any of the following checks fail
+ i_rwsem is not lockable
+ NODATACOW or PREALLOC is not set
+ Cannot nocow at the desired location
+ Writing beyond end of file which is not allocated
---
fs/btrfs/file.c | 25 ++++++++++++++++++++-----
fs/btrfs/inode.c | 3 +++
fs/btrfs/super.c | 2 +-
3 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 520cb72..a870e5d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1823,12 +1823,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
ssize_t num_written = 0;
bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host);
ssize_t err;
- loff_t pos;
- size_t count;
+ loff_t pos = iocb->ki_pos;
+ size_t count = iov_iter_count(from);
loff_t oldsize;
int clean_page = 0;
- inode_lock(inode);
+ if ((iocb->ki_flags & IOCB_NOWAIT) &&
+ (iocb->ki_flags & IOCB_DIRECT)) {
+ /* Don't sleep on inode rwsem */
+ if (!inode_trylock(inode))
+ return -EAGAIN;
+ /*
+ * We will allocate space in case nodatacow is not set,
+ * so bail
+ */
+ if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+ BTRFS_INODE_PREALLOC)) ||
+ check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
+ inode_unlock(inode);
+ return -EAGAIN;
+ }
+ } else
+ inode_lock(inode);
+
err = generic_write_checks(iocb, from);
if (err <= 0) {
inode_unlock(inode);
@@ -1862,8 +1879,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
*/
update_time_for_write(inode);
- pos = iocb->ki_pos;
- count = iov_iter_count(from);
start_pos = round_down(pos, fs_info->sectorsize);
oldsize = i_size_read(inode);
if (start_pos > oldsize) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a18510b..d91b21a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8627,6 +8627,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
dio_data.overwrite = 1;
inode_unlock(inode);
relock = true;
+ } else if (iocb->ki_flags & IOCB_NOWAIT) {
+ ret = -EAGAIN;
+ goto out;
}
ret = btrfs_delalloc_reserve_space(inode, offset, count);
if (ret)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index da687dc..5e60659 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2175,7 +2175,7 @@ static struct file_system_type btrfs_fs_type = {
.name = "btrfs",
.mount = btrfs_mount,
.kill_sb = btrfs_kill_super,
- .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_NOWAIT,
};
MODULE_ALIAS_FS("btrfs");
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 2/8] nowait aio: Return if cannot get hold of i_rwsem
2017-04-03 18:52 [PATCH 0/8 v4] No wait AIO Goldwyn Rodrigues
[not found] ` <20170403185307.6243-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-03 18:53 ` [PATCH 3/8] nowait aio: return if direct write will trigger writeback Goldwyn Rodrigues
` (4 subsequent siblings)
6 siblings, 0 replies; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
A failure to lock i_rwsem would mean there is I/O being performed
by another thread. So, let's bail.
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
mm/filemap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 1694623..e08f3b9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2982,7 +2982,12 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
ssize_t ret;
- inode_lock(inode);
+ if (!inode_trylock(inode)) {
+ /* Don't sleep on inode rwsem */
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+ inode_lock(inode);
+ }
ret = generic_write_checks(iocb, from);
if (ret > 0)
ret = __generic_file_write_iter(iocb, from);
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 3/8] nowait aio: return if direct write will trigger writeback
2017-04-03 18:52 [PATCH 0/8 v4] No wait AIO Goldwyn Rodrigues
[not found] ` <20170403185307.6243-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2017-04-03 18:53 ` [PATCH 2/8] nowait aio: Return if cannot get hold of i_rwsem Goldwyn Rodrigues
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-03 18:53 ` [PATCH 4/8] nowait-aio: Introduce IOMAP_NOWAIT Goldwyn Rodrigues
` (3 subsequent siblings)
6 siblings, 0 replies; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
Find out if the write will trigger a wait due to writeback. If yes,
return -EAGAIN.
This introduces a new function filemap_range_has_page() which
returns true if the file's mapping has a page within the range
mentioned.
Return -EINVAL for buffered AIO: there are multiple causes of
delay such as page locks, dirty throttling logic, page loading
from disk etc. which cannot be taken care of.
---
include/linux/fs.h | 2 ++
mm/filemap.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 49 insertions(+), 3 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 802cfe2..4721136 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2515,6 +2515,8 @@ extern int filemap_fdatawait(struct address_space *);
extern void filemap_fdatawait_keep_errors(struct address_space *);
extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+ loff_t lend);
extern int filemap_write_and_wait(struct address_space *mapping);
extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index e08f3b9..c020e23 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
}
EXPORT_SYMBOL(filemap_flush);
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping: address space structure to wait for
+ * @start_byte: offset in bytes where the range starts
+ * @end_byte: offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+ loff_t start_byte, loff_t end_byte)
+{
+ pgoff_t index = start_byte >> PAGE_SHIFT;
+ pgoff_t end = end_byte >> PAGE_SHIFT;
+ struct pagevec pvec;
+ int ret;
+
+ if (end_byte < start_byte)
+ return 0;
+
+ if (mapping->nrpages == 0)
+ return 0;
+
+ pagevec_init(&pvec, 0);
+ ret = pagevec_lookup(&pvec, mapping, index, 1);
+ if (!ret)
+ return 0;
+ ret = (pvec.pages[0]->index <= end);
+ pagevec_release(&pvec);
+ return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
static int __filemap_fdatawait_range(struct address_space *mapping,
loff_t start_byte, loff_t end_byte)
{
@@ -2640,6 +2673,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
pos = iocb->ki_pos;
+ if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+ return -EINVAL;
+
if (limit != RLIM_INFINITY) {
if (iocb->ki_pos >= limit) {
send_sig(SIGXFSZ, current, 0);
@@ -2709,9 +2745,17 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
- written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
- if (written)
- goto out;
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ /* If there are pages to writeback, return */
+ if (filemap_range_has_page(inode->i_mapping, pos,
+ pos + iov_iter_count(from)))
+ return -EAGAIN;
+ } else {
+ written = filemap_write_and_wait_range(mapping, pos,
+ pos + write_len - 1);
+ if (written)
+ goto out;
+ }
/*
* After a write we want buffered reads to be sure to go to disk to get
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 4/8] nowait-aio: Introduce IOMAP_NOWAIT
2017-04-03 18:52 [PATCH 0/8 v4] No wait AIO Goldwyn Rodrigues
` (2 preceding siblings ...)
2017-04-03 18:53 ` [PATCH 3/8] nowait aio: return if direct write will trigger writeback Goldwyn Rodrigues
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-03 18:53 ` [PATCH 5/8] nowait aio: return on congested block device Goldwyn Rodrigues
` (2 subsequent siblings)
6 siblings, 0 replies; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps.
This is used by XFS in the XFS patch.
---
fs/iomap.c | 2 ++
include/linux/iomap.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/fs/iomap.c b/fs/iomap.c
index 141c3cd..d1c8175 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -885,6 +885,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
} else {
dio->flags |= IOMAP_DIO_WRITE;
flags |= IOMAP_WRITE;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ flags |= IOMAP_NOWAIT;
}
if (mapping->nrpages) {
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 7291810..53f6af8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -51,6 +51,7 @@ struct iomap {
#define IOMAP_REPORT (1 << 2) /* report extent status, e.g. FIEMAP */
#define IOMAP_FAULT (1 << 3) /* mapping for page fault */
#define IOMAP_DIRECT (1 << 4) /* direct I/O */
+#define IOMAP_NOWAIT (1 << 5) /* Don't wait for writeback */
struct iomap_ops {
/*
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 5/8] nowait aio: return on congested block device
2017-04-03 18:52 [PATCH 0/8 v4] No wait AIO Goldwyn Rodrigues
` (3 preceding siblings ...)
2017-04-03 18:53 ` [PATCH 4/8] nowait-aio: Introduce IOMAP_NOWAIT Goldwyn Rodrigues
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-04 6:49 ` Christoph Hellwig
2017-04-03 18:53 ` [PATCH 6/8] nowait aio: ext4 Goldwyn Rodrigues
2017-04-03 18:53 ` [PATCH 7/8] nowait aio: xfs Goldwyn Rodrigues
6 siblings, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
A new flag BIO_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.
To facilitate this, QUEUE_FLAG_NOWAIT is set to devices
which support this. While currently this is set to
virtio and sd only. Support to more devices will be added soon.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
block/blk-core.c | 24 ++++++++++++++++++++++--
block/blk-mq-sched.c | 3 +++
block/blk-mq.c | 4 ++++
drivers/block/virtio_blk.c | 3 +++
drivers/scsi/sd.c | 3 +++
fs/direct-io.c | 11 +++++++++--
include/linux/bio.h | 6 ++++++
include/linux/blk_types.h | 1 +
include/linux/blkdev.h | 2 ++
9 files changed, 53 insertions(+), 4 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index d772c22..95a9b18 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1232,6 +1232,11 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
if (!IS_ERR(rq))
return rq;
+ if (bio && bio_flagged(bio, BIO_NOWAIT)) {
+ blk_put_rl(rl);
+ return ERR_PTR(-EAGAIN);
+ }
+
if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl);
return rq;
@@ -1870,6 +1875,18 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
+ if (bio_flagged(bio, BIO_NOWAIT)) {
+ if (!blk_queue_nowait(q)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+ if (!(bio->bi_opf & (REQ_SYNC | REQ_IDLE))) {
+ err = -EINVAL;
+ goto end_io;
+ }
+ }
+
+
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
@@ -2021,7 +2038,7 @@ blk_qc_t generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- if (likely(blk_queue_enter(q, false) == 0)) {
+ if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NOWAIT)) == 0)) {
struct bio_list lower, same;
/* Create a fresh bio_list for all subordinate requests */
@@ -2046,7 +2063,10 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_merge(&bio_list_on_stack[0], &same);
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} else {
- bio_io_error(bio);
+ if (unlikely(!blk_queue_dying(q) && bio_flagged(bio, BIO_NOWAIT)))
+ bio_wouldblock_error(bio);
+ else
+ bio_io_error(bio);
}
bio = bio_list_pop(&bio_list_on_stack[0]);
} while (bio);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 09af8ff..40e78b5 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -119,6 +119,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
+ if (likely(bio) && bio_flagged(bio, BIO_NOWAIT))
+ data->flags |= BLK_MQ_REQ_NOWAIT;
+
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6b6e7bc..2d90b12 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1511,6 +1511,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio && bio_flagged(bio, BIO_NOWAIT))
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
@@ -1635,6 +1637,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio && bio_flagged(bio, BIO_NOWAIT))
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1d4c9f8..7481124 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -731,6 +731,9 @@ static int virtblk_probe(struct virtio_device *vdev)
/* No real sector limit. */
blk_queue_max_hw_sectors(q, -1U);
+ /* Request queue supports BIO_NOWAIT */
+ queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
+
/* Host can optionally specify maximum segment size and number of
* segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc..9df85ee 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3177,6 +3177,9 @@ static int sd_probe(struct device *dev)
SD_MOD_TIMEOUT);
}
+ /* Support BIO_NOWAIT */
+ queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, sdp->request_queue);
+
device_initialize(&sdkp->dev);
sdkp->dev.parent = dev;
sdkp->dev.class = &sd_disk_class;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea..f6835d3 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -386,6 +386,9 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
else
bio->bi_end_io = dio_bio_end_io;
+ if (dio->iocb->ki_flags & IOCB_NOWAIT)
+ bio_set_flag(bio, BIO_NOWAIT);
+
sdio->bio = bio;
sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
}
@@ -480,8 +483,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
unsigned i;
int err;
- if (bio->bi_error)
- dio->io_error = -EIO;
+ if (bio->bi_error) {
+ if (bio_flagged(bio, BIO_NOWAIT))
+ dio->io_error = -EAGAIN;
+ else
+ dio->io_error = -EIO;
+ }
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
err = bio->bi_error;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e52119..1a92707 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -425,6 +425,12 @@ static inline void bio_io_error(struct bio *bio)
bio_endio(bio);
}
+static inline void bio_wouldblock_error(struct bio *bio)
+{
+ bio->bi_error = -EAGAIN;
+ bio_endio(bio);
+}
+
struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb..514c08e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -102,6 +102,7 @@ struct bio {
#define BIO_REFFED 8 /* bio has elevated ->bi_cnt */
#define BIO_THROTTLED 9 /* This bio has already been subjected to
* throttling rules. Don't do it again. */
+#define BIO_NOWAIT 10 /* don't block over blk device congestion */
/*
* Flags starting here get preserved by bio_reset() - this includes
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5a7da60..ae38ab6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -611,6 +611,7 @@ struct request_queue {
#define QUEUE_FLAG_DAX 26 /* device supports DAX */
#define QUEUE_FLAG_STATS 27 /* track rq completion times */
#define QUEUE_FLAG_RESTART 28 /* queue needs restart at completion */
+#define QUEUE_FLAG_NOWAIT 29 /* queue supports BIO_NOWAIT */
#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -701,6 +702,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-04-03 18:53 ` [PATCH 5/8] nowait aio: return on congested block device Goldwyn Rodrigues
@ 2017-04-04 6:49 ` Christoph Hellwig
0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2017-04-04 6:49 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
Please make this a REQ_* flag so that it can be passed in the bio,
the request and as an argument to the get_request functions instead
of testing for a bio.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 6/8] nowait aio: ext4
2017-04-03 18:52 [PATCH 0/8 v4] No wait AIO Goldwyn Rodrigues
` (4 preceding siblings ...)
2017-04-03 18:53 ` [PATCH 5/8] nowait aio: return on congested block device Goldwyn Rodrigues
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-04 7:58 ` Jan Kara
2017-04-03 18:53 ` [PATCH 7/8] nowait aio: xfs Goldwyn Rodrigues
6 siblings, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
Return EAGAIN if any of the following checks fail for direct I/O:
+ i_rwsem is lockable
+ Writing beyond end of file (will trigger allocation)
+ Blocks are not allocated at the write location
---
fs/ext4/file.c | 48 +++++++++++++++++++++++++++++++-----------------
fs/ext4/super.c | 6 +++---
2 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8210c1f..e223b9f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -127,27 +127,22 @@ ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos)
return 0;
}
-/* Is IO overwriting allocated and initialized blocks? */
-static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len)
+/* Are IO blocks allocated */
+static bool ext4_blocks_mapped(struct inode *inode, loff_t pos, loff_t len,
+ struct ext4_map_blocks *map)
{
- struct ext4_map_blocks map;
unsigned int blkbits = inode->i_blkbits;
int err, blklen;
if (pos + len > i_size_read(inode))
return false;
- map.m_lblk = pos >> blkbits;
- map.m_len = EXT4_MAX_BLOCKS(len, pos, blkbits);
- blklen = map.m_len;
+ map->m_lblk = pos >> blkbits;
+ map->m_len = EXT4_MAX_BLOCKS(len, pos, blkbits);
+ blklen = map->m_len;
- err = ext4_map_blocks(NULL, inode, &map, 0);
- /*
- * 'err==len' means that all of the blocks have been preallocated,
- * regardless of whether they have been initialized or not. To exclude
- * unwritten extents, we need to check m_flags.
- */
- return err == blklen && (map.m_flags & EXT4_MAP_MAPPED);
+ err = ext4_map_blocks(NULL, inode, map, 0);
+ return err == blklen;
}
static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
@@ -204,6 +199,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
int o_direct = iocb->ki_flags & IOCB_DIRECT;
+ int nowait = iocb->ki_flags & IOCB_NOWAIT;
int unaligned_aio = 0;
int overwrite = 0;
ssize_t ret;
@@ -216,7 +212,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
return ext4_dax_write_iter(iocb, from);
#endif
- inode_lock(inode);
+ if (o_direct && nowait) {
+ if (!inode_trylock(inode))
+ return -EAGAIN;
+ } else {
+ inode_lock(inode);
+ }
+
ret = ext4_write_checks(iocb, from);
if (ret <= 0)
goto out;
@@ -235,9 +237,21 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
iocb->private = &overwrite;
/* Check whether we do a DIO overwrite or not */
- if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
- ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
- overwrite = 1;
+ if (o_direct && !unaligned_aio) {
+ struct ext4_map_blocks map;
+ if (ext4_blocks_mapped(inode, iocb->ki_pos,
+ iov_iter_count(from), &map)) {
+ /* To exclude unwritten extents, we need to check
+ * m_flags.
+ */
+ if (ext4_should_dioread_nolock(inode) &&
+ (map.m_flags & EXT4_MAP_MAPPED))
+ overwrite = 1;
+ } else if (iocb->ki_flags & IOCB_NOWAIT) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ }
ret = __generic_file_write_iter(iocb, from);
inode_unlock(inode);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a9448db..e1d424a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -117,7 +117,7 @@ static struct file_system_type ext2_fs_type = {
.name = "ext2",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("ext2");
MODULE_ALIAS("ext2");
@@ -132,7 +132,7 @@ static struct file_system_type ext3_fs_type = {
.name = "ext3",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
@@ -5623,7 +5623,7 @@ static struct file_system_type ext4_fs_type = {
.name = "ext4",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("ext4");
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 6/8] nowait aio: ext4
2017-04-03 18:53 ` [PATCH 6/8] nowait aio: ext4 Goldwyn Rodrigues
@ 2017-04-04 7:58 ` Jan Kara
2017-04-04 8:41 ` Christoph Hellwig
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-04-04 7:58 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On Mon 03-04-17 13:53:05, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> Return EAGAIN if any of the following checks fail for direct I/O:
> + i_rwsem is lockable
> + Writing beyond end of file (will trigger allocation)
> + Blocks are not allocated at the write location
Patches seem to be missing your Signed-off-by tag...
> @@ -235,9 +237,21 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>
> iocb->private = &overwrite;
> /* Check whether we do a DIO overwrite or not */
> - if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
> - ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
> - overwrite = 1;
> + if (o_direct && !unaligned_aio) {
> + struct ext4_map_blocks map;
> + if (ext4_blocks_mapped(inode, iocb->ki_pos,
> + iov_iter_count(from), &map)) {
> + /* To exclude unwritten extents, we need to check
> + * m_flags.
> + */
> + if (ext4_should_dioread_nolock(inode) &&
> + (map.m_flags & EXT4_MAP_MAPPED))
> + overwrite = 1;
> + } else if (iocb->ki_flags & IOCB_NOWAIT) {
> + ret = -EAGAIN;
> + goto out;
> + }
> + }
Actually, overwriting unwritten extents is relatively complex in ext4 as
well. In particular we need to start a transaction and split out the
written part of the extent. So I don't think we can easily support this
without blocking. As a result I'd keep the condition for IOCB_NOWAIT the
same as for overwrite IO.
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -117,7 +117,7 @@ static struct file_system_type ext2_fs_type = {
> .name = "ext2",
> .mount = ext4_mount,
> .kill_sb = kill_block_super,
> - .fs_flags = FS_REQUIRES_DEV,
> + .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
Can we call it FS_NOWAIT_IO?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 6/8] nowait aio: ext4
2017-04-04 7:58 ` Jan Kara
@ 2017-04-04 8:41 ` Christoph Hellwig
2017-04-04 18:41 ` Goldwyn Rodrigues
0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2017-04-04 8:41 UTC (permalink / raw)
To: Jan Kara
Cc: Goldwyn Rodrigues, linux-fsdevel, jack, hch, linux-block,
linux-btrfs, linux-ext4, linux-xfs, sagi, avi, axboe, linux-api,
willy, tom.leiming, Goldwyn Rodrigues
On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote:
> FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
> Can we call it FS_NOWAIT_IO?
It's way to generic as it's a feature of the particular file_operations
instance. But once we switch to using RWF_* we can just the existing
per-op feature checks for thos and the per-fs flag should just go away.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 6/8] nowait aio: ext4
2017-04-04 8:41 ` Christoph Hellwig
@ 2017-04-04 18:41 ` Goldwyn Rodrigues
2017-04-10 7:45 ` Christoph Hellwig
0 siblings, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-04 18:41 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: linux-fsdevel, jack, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On 04/04/2017 03:41 AM, Christoph Hellwig wrote:
> On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote:
>> FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
>> Can we call it FS_NOWAIT_IO?
>
> It's way to generic as it's a feature of the particular file_operations
> instance. But once we switch to using RWF_* we can just the existing
> per-op feature checks for thos and the per-fs flag should just go away.
>
I am working on incorporating RWF_* flags. However, I am not sure how
RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of
"blocking" information is with the filesystem, it is a per-filesystem
flag to block out (EOPNOTSUPP) the filesystems which do not support it.
--
Goldwyn
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 6/8] nowait aio: ext4
2017-04-04 18:41 ` Goldwyn Rodrigues
@ 2017-04-10 7:45 ` Christoph Hellwig
[not found] ` <20170410074539.GA18250-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2017-04-10 7:45 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, jack, linux-block,
linux-btrfs, linux-ext4, linux-xfs, sagi, avi, axboe, linux-api,
willy, tom.leiming, Goldwyn Rodrigues
On Tue, Apr 04, 2017 at 01:41:09PM -0500, Goldwyn Rodrigues wrote:
> I am working on incorporating RWF_* flags. However, I am not sure how
> RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of
> "blocking" information is with the filesystem, it is a per-filesystem
> flag to block out (EOPNOTSUPP) the filesystems which do not support it.
You need to check the flag in the actual read/write methods as the
support for features on Linux is not a per-file_system_type thing.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 7/8] nowait aio: xfs
2017-04-03 18:52 [PATCH 0/8 v4] No wait AIO Goldwyn Rodrigues
` (5 preceding siblings ...)
2017-04-03 18:53 ` [PATCH 6/8] nowait aio: ext4 Goldwyn Rodrigues
@ 2017-04-03 18:53 ` Goldwyn Rodrigues
2017-04-04 6:52 ` Christoph Hellwig
6 siblings, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-03 18:53 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.
IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.
---
fs/xfs/xfs_file.c | 15 +++++++++++----
fs/xfs/xfs_iomap.c | 13 +++++++++++++
fs/xfs/xfs_super.c | 2 +-
3 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 35703a8..08a5eef 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -541,8 +541,11 @@ xfs_file_dio_aio_write(
iolock = XFS_IOLOCK_SHARED;
}
- xfs_ilock(ip, iolock);
-
+ if (!xfs_ilock_nowait(ip, iolock)) {
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+ xfs_ilock(ip, iolock);
+ }
ret = xfs_file_aio_write_checks(iocb, from, &iolock);
if (ret)
goto out;
@@ -553,9 +556,13 @@ xfs_file_dio_aio_write(
* otherwise demote the lock if we had to take the exclusive lock
* for other reasons in xfs_file_aio_write_checks.
*/
- if (unaligned_io)
+ if (unaligned_io) {
+ /* If we are going to wait for other DIO to finish, bail */
+ if ((iocb->ki_flags & IOCB_NOWAIT) &&
+ atomic_read(&inode->i_dio_count))
+ return -EAGAIN;
inode_dio_wait(inode);
- else if (iolock == XFS_IOLOCK_EXCL) {
+ } else if (iolock == XFS_IOLOCK_EXCL) {
xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
iolock = XFS_IOLOCK_SHARED;
}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 288ee5b..6843725 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1015,6 +1015,11 @@ xfs_file_iomap_begin(
if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
if (flags & IOMAP_DIRECT) {
+ /* A reflinked inode will result in CoW alloc */
+ if (flags & IOMAP_NOWAIT) {
+ error = -EAGAIN;
+ goto out_unlock;
+ }
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &shared,
&lockmode);
@@ -1032,6 +1037,14 @@ xfs_file_iomap_begin(
if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, &imap, nimaps)) {
/*
+ * If nowait is set bail since we are going to make
+ * allocations.
+ */
+ if (flags & IOMAP_NOWAIT) {
+ error = -EAGAIN;
+ goto out_unlock;
+ }
+ /*
* We cap the maximum length we map here to MAX_WRITEBACK_PAGES
* pages to keep the chunks of work done where somewhat symmetric
* with the work writeback does. This is a completely arbitrary
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 685c042..070a30e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1749,7 +1749,7 @@ static struct file_system_type xfs_fs_type = {
.name = "xfs",
.mount = xfs_fs_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("xfs");
--
2.10.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 7/8] nowait aio: xfs
2017-04-03 18:53 ` [PATCH 7/8] nowait aio: xfs Goldwyn Rodrigues
@ 2017-04-04 6:52 ` Christoph Hellwig
[not found] ` <20170404065211.GD21942-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2017-04-04 6:52 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
> + if (unaligned_io) {
> + /* If we are going to wait for other DIO to finish, bail */
> + if ((iocb->ki_flags & IOCB_NOWAIT) &&
> + atomic_read(&inode->i_dio_count))
> + return -EAGAIN;
> inode_dio_wait(inode);
This checks i_dio_count twice in the nowait case, I think it should be:
if (iocb->ki_flags & IOCB_NOWAIT) {
if (atomic_read(&inode->i_dio_count))
return -EAGAIN;
} else {
inode_dio_wait(inode);
}
> if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> if (flags & IOMAP_DIRECT) {
> + /* A reflinked inode will result in CoW alloc */
> + if (flags & IOMAP_NOWAIT) {
> + error = -EAGAIN;
> + goto out_unlock;
> + }
This is a bit pessimistic - just because the inode has any shared
extents we could still write into unshared ones. For now I think this
pessimistic check is fine, but the comment should be corrected.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 5/8] nowait aio: return on congested block device
2017-05-09 12:22 [PATCH 0/8 v7] No wait AIO Goldwyn Rodrigues
@ 2017-05-09 12:22 ` Goldwyn Rodrigues
2017-05-11 7:44 ` Christoph Hellwig
0 siblings, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-05-09 12:22 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
A new bio operation flag REQ_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.
Stacked devices such as md (the ones with make_request_fn hooks)
currently are not supported because it may block for housekeeping.
For example, an md can have a part of the device suspended.
For this reason, only request based devices are supported.
In the future, this feature will be expanded to stacked devices
by teaching them how to handle the REQ_NOWAIT flags.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
block/blk-core.c | 24 ++++++++++++++++++++++--
block/blk-mq-sched.c | 3 +++
block/blk-mq.c | 4 ++++
fs/direct-io.c | 10 ++++++++--
include/linux/bio.h | 6 ++++++
include/linux/blk_types.h | 2 ++
6 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index d772c221cc17..effe934b806b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1232,6 +1232,11 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
if (!IS_ERR(rq))
return rq;
+ if (op & REQ_NOWAIT) {
+ blk_put_rl(rl);
+ return ERR_PTR(-EAGAIN);
+ }
+
if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl);
return rq;
@@ -1870,6 +1875,17 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
+ /*
+ * For a REQ_NOWAIT based request, return -EOPNOTSUPP
+ * if queue does not have QUEUE_FLAG_NOWAIT_SUPPORT set
+ * and if it is not a request based queue.
+ */
+
+ if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
@@ -2021,7 +2037,7 @@ blk_qc_t generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- if (likely(blk_queue_enter(q, false) == 0)) {
+ if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
struct bio_list lower, same;
/* Create a fresh bio_list for all subordinate requests */
@@ -2046,7 +2062,11 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_merge(&bio_list_on_stack[0], &same);
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} else {
- bio_io_error(bio);
+ if (unlikely(!blk_queue_dying(q) &&
+ (bio->bi_opf & REQ_NOWAIT)))
+ bio_wouldblock_error(bio);
+ else
+ bio_io_error(bio);
}
bio = bio_list_pop(&bio_list_on_stack[0]);
} while (bio);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c974a1bbf4cb..019d881d62b7 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -119,6 +119,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
+ if (op & REQ_NOWAIT)
+ data->flags |= BLK_MQ_REQ_NOWAIT;
+
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c7836a1ded97..d7613ae6a269 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1538,6 +1538,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio->bi_opf & REQ_NOWAIT)
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
@@ -1662,6 +1664,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio->bi_opf & REQ_NOWAIT)
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea77de8..139ebd5ae1c7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -480,8 +480,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
unsigned i;
int err;
- if (bio->bi_error)
- dio->io_error = -EIO;
+ if (bio->bi_error) {
+ if (bio->bi_error == -EAGAIN && (bio->bi_opf & REQ_NOWAIT))
+ dio->io_error = -EAGAIN;
+ else
+ dio->io_error = -EIO;
+ }
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
err = bio->bi_error;
@@ -1197,6 +1201,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
if (iov_iter_rw(iter) == WRITE) {
dio->op = REQ_OP_WRITE;
dio->op_flags = REQ_SYNC | REQ_IDLE;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ dio->op_flags |= REQ_NOWAIT;
} else {
dio->op = REQ_OP_READ;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e521194f6fc..1a9270744b1e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -425,6 +425,12 @@ static inline void bio_io_error(struct bio *bio)
bio_endio(bio);
}
+static inline void bio_wouldblock_error(struct bio *bio)
+{
+ bio->bi_error = -EAGAIN;
+ bio_endio(bio);
+}
+
struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb55d0f..5ce4da30ba43 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -187,6 +187,7 @@ enum req_flag_bits {
__REQ_PREFLUSH, /* request for cache flush */
__REQ_RAHEAD, /* read ahead, can fail anytime */
__REQ_BACKGROUND, /* background IO */
+ __REQ_NOWAIT, /* Don't wait if request will block */
__REQ_NR_BITS, /* stops here */
};
@@ -203,6 +204,7 @@ enum req_flag_bits {
#define REQ_PREFLUSH (1ULL << __REQ_PREFLUSH)
#define REQ_RAHEAD (1ULL << __REQ_RAHEAD)
#define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND)
+#define REQ_NOWAIT (1ULL << __REQ_NOWAIT)
#define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
--
2.12.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-05-09 12:22 ` [PATCH 5/8] nowait aio: return on congested block device Goldwyn Rodrigues
@ 2017-05-11 7:44 ` Christoph Hellwig
[not found] ` <20170511074451.GD15626-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2017-05-11 7:44 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
Although lifting the make_request limit is something a lot of users
would appreciate in the near future..
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 5/8] nowait aio: return on congested block device
2017-04-14 12:02 [PATCH 0/8 v6] No wait AIO Goldwyn Rodrigues
@ 2017-04-14 12:02 ` Goldwyn Rodrigues
2017-04-19 6:45 ` Christoph Hellwig
0 siblings, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-14 12:02 UTC (permalink / raw)
To: linux-fsdevel
Cc: jack, hch, linux-block, linux-btrfs, linux-ext4, linux-xfs, sagi,
avi, axboe, linux-api, willy, tom.leiming, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
A new bio operation flag REQ_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.
To facilitate this, QUEUE_FLAG_NOWAIT is set to devices
which support this. While currently this is set to
virtio and sd only. Support to more devices will be added soon
once I am sure they don't block. Currently blocks such as dm/md
block while performing sync.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
block/blk-core.c | 24 ++++++++++++++++++++++--
block/blk-mq-sched.c | 3 +++
block/blk-mq.c | 4 ++++
drivers/block/virtio_blk.c | 3 +++
drivers/scsi/sd.c | 3 +++
fs/direct-io.c | 10 ++++++++--
include/linux/bio.h | 6 ++++++
include/linux/blk_types.h | 2 ++
include/linux/blkdev.h | 3 +++
9 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index d772c221cc17..54698521756b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1232,6 +1232,11 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
if (!IS_ERR(rq))
return rq;
+ if (bio && (bio->bi_opf & REQ_NOWAIT)) {
+ blk_put_rl(rl);
+ return ERR_PTR(-EAGAIN);
+ }
+
if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl);
return rq;
@@ -1870,6 +1875,18 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
+ if (bio->bi_opf & REQ_NOWAIT) {
+ if (!blk_queue_nowait(q)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+ if (!(bio->bi_opf & REQ_SYNC)) {
+ err = -EINVAL;
+ goto end_io;
+ }
+ }
+
+
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
@@ -2021,7 +2038,7 @@ blk_qc_t generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- if (likely(blk_queue_enter(q, false) == 0)) {
+ if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
struct bio_list lower, same;
/* Create a fresh bio_list for all subordinate requests */
@@ -2046,7 +2063,10 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_merge(&bio_list_on_stack[0], &same);
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} else {
- bio_io_error(bio);
+ if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)))
+ bio_wouldblock_error(bio);
+ else
+ bio_io_error(bio);
}
bio = bio_list_pop(&bio_list_on_stack[0]);
} while (bio);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c974a1bbf4cb..9f88190ff395 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -119,6 +119,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
+ if (bio && (bio->bi_opf & REQ_NOWAIT))
+ data->flags |= BLK_MQ_REQ_NOWAIT;
+
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 572966f49596..8b9b1a411ce2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1538,6 +1538,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio && (bio->bi_opf & REQ_NOWAIT))
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
@@ -1662,6 +1664,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio && (bio->bi_opf & REQ_NOWAIT))
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1d4c9f8bc1e1..7481124c5025 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -731,6 +731,9 @@ static int virtblk_probe(struct virtio_device *vdev)
/* No real sector limit. */
blk_queue_max_hw_sectors(q, -1U);
+ /* Request queue supports BIO_NOWAIT */
+ queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
+
/* Host can optionally specify maximum segment size and number of
* segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc79331..9df85ee165be 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3177,6 +3177,9 @@ static int sd_probe(struct device *dev)
SD_MOD_TIMEOUT);
}
+ /* Support BIO_NOWAIT */
+ queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, sdp->request_queue);
+
device_initialize(&sdkp->dev);
sdkp->dev.parent = dev;
sdkp->dev.class = &sd_disk_class;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea77de8..a802168284e1 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -480,8 +480,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
unsigned i;
int err;
- if (bio->bi_error)
- dio->io_error = -EIO;
+ if (bio->bi_error) {
+ if (bio->bi_opf & REQ_NOWAIT)
+ dio->io_error = -EAGAIN;
+ else
+ dio->io_error = -EIO;
+ }
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
err = bio->bi_error;
@@ -1197,6 +1201,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
if (iov_iter_rw(iter) == WRITE) {
dio->op = REQ_OP_WRITE;
dio->op_flags = REQ_SYNC | REQ_IDLE;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ dio->op_flags |= REQ_NOWAIT;
} else {
dio->op = REQ_OP_READ;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e521194f6fc..1a9270744b1e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -425,6 +425,12 @@ static inline void bio_io_error(struct bio *bio)
bio_endio(bio);
}
+static inline void bio_wouldblock_error(struct bio *bio)
+{
+ bio->bi_error = -EAGAIN;
+ bio_endio(bio);
+}
+
struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb55d0f..5ce4da30ba43 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -187,6 +187,7 @@ enum req_flag_bits {
__REQ_PREFLUSH, /* request for cache flush */
__REQ_RAHEAD, /* read ahead, can fail anytime */
__REQ_BACKGROUND, /* background IO */
+ __REQ_NOWAIT, /* Don't wait if request will block */
__REQ_NR_BITS, /* stops here */
};
@@ -203,6 +204,7 @@ enum req_flag_bits {
#define REQ_PREFLUSH (1ULL << __REQ_PREFLUSH)
#define REQ_RAHEAD (1ULL << __REQ_RAHEAD)
#define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND)
+#define REQ_NOWAIT (1ULL << __REQ_NOWAIT)
#define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7548f332121a..df0b1245d955 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -610,6 +610,8 @@ struct request_queue {
#define QUEUE_FLAG_FLUSH_NQ 25 /* flush not queueuable */
#define QUEUE_FLAG_DAX 26 /* device supports DAX */
#define QUEUE_FLAG_STATS 27 /* track rq completion times */
+/* can return immediately on congestion (for REQ_NOWAIT) */
+#define QUEUE_FLAG_NOWAIT 28
#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -700,6 +702,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
--
2.12.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-04-14 12:02 ` [PATCH 5/8] nowait aio: return on congested block device Goldwyn Rodrigues
@ 2017-04-19 6:45 ` Christoph Hellwig
2017-04-19 15:21 ` Goldwyn Rodrigues
2017-04-24 21:10 ` Goldwyn Rodrigues
0 siblings, 2 replies; 40+ messages in thread
From: Christoph Hellwig @ 2017-04-19 6:45 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, jack, hch, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On Fri, Apr 14, 2017 at 07:02:54AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> A new bio operation flag REQ_NOWAIT is introduced to identify bio's
s/bio/block/
> @@ -1232,6 +1232,11 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
> if (!IS_ERR(rq))
> return rq;
>
> + if (bio && (bio->bi_opf & REQ_NOWAIT)) {
> + blk_put_rl(rl);
> + return ERR_PTR(-EAGAIN);
> + }
Please check the op argument instead of touching bio.
> + if (bio->bi_opf & REQ_NOWAIT) {
> + if (!blk_queue_nowait(q)) {
> + err = -EOPNOTSUPP;
> + goto end_io;
> + }
> + if (!(bio->bi_opf & REQ_SYNC)) {
I don't understand this check at all..
> + if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)))
Please break lines after 80 characters.
> @@ -119,6 +119,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
> if (likely(!data->hctx))
> data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
>
> + if (bio && (bio->bi_opf & REQ_NOWAIT))
> + data->flags |= BLK_MQ_REQ_NOWAIT;
Check the op flag again here.
> +++ b/block/blk-mq.c
> @@ -1538,6 +1538,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
> if (unlikely(!rq)) {
> __wbt_done(q->rq_wb, wb_acct);
> + if (bio && (bio->bi_opf & REQ_NOWAIT))
> + bio_wouldblock_error(bio);
bio iѕ dereferences unconditionally above, so you can do the same.
> @@ -1662,6 +1664,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
> rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
> if (unlikely(!rq)) {
> __wbt_done(q->rq_wb, wb_acct);
> + if (bio && (bio->bi_opf & REQ_NOWAIT))
> + bio_wouldblock_error(bio);
Same here. Although blk_sq_make_request is gone anyway in the current
block tree..
> + /* Request queue supports BIO_NOWAIT */
> + queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
BIO_NOWAIT is gone. And the comment would not be needed if the
flag had a more descriptive name, e.g. QUEUE_FLAG_NOWAIT_SUPPORT.
And I think all request based drivers should set the flag implicitly
as ->queuecommand can't sleep, and ->queue_rq only when it's always
offloaded to a workqueue when the BLK_MQ_F_BLOCKING flag is set.
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -480,8 +480,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
> unsigned i;
> int err;
>
> - if (bio->bi_error)
> - dio->io_error = -EIO;
> + if (bio->bi_error) {
> + if (bio->bi_opf & REQ_NOWAIT)
> + dio->io_error = -EAGAIN;
> + else
> + dio->io_error = -EIO;
> + }
Huh? Once REQ_NOWAIT is set all errors are -EAGAIN?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-04-19 6:45 ` Christoph Hellwig
@ 2017-04-19 15:21 ` Goldwyn Rodrigues
2017-04-20 13:43 ` Jan Kara
2017-04-24 21:10 ` Goldwyn Rodrigues
1 sibling, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-19 15:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, jack, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On 04/19/2017 01:45 AM, Christoph Hellwig wrote:
>
>> + if (bio->bi_opf & REQ_NOWAIT) {
>> + if (!blk_queue_nowait(q)) {
>> + err = -EOPNOTSUPP;
>> + goto end_io;
>> + }
>> + if (!(bio->bi_opf & REQ_SYNC)) {
>
> I don't understand this check at all..
It is to ensure at the block layer that NOWAIT comes only for DIRECT
calls only. I should probably change it to REQ_SYNC | REQ_IDLE.
>
>> + if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)))
>
> Please break lines after 80 characters.
>
>> @@ -119,6 +119,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
>> if (likely(!data->hctx))
>> data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
>>
>> + if (bio && (bio->bi_opf & REQ_NOWAIT))
>> + data->flags |= BLK_MQ_REQ_NOWAIT;
>
> Check the op flag again here.
>
>> +++ b/block/blk-mq.c
>> @@ -1538,6 +1538,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>> rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
>> if (unlikely(!rq)) {
>> __wbt_done(q->rq_wb, wb_acct);
>> + if (bio && (bio->bi_opf & REQ_NOWAIT))
>> + bio_wouldblock_error(bio);
>
> bio iѕ dereferences unconditionally above, so you can do the same.
>
>> @@ -1662,6 +1664,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
>> rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
>> if (unlikely(!rq)) {
>> __wbt_done(q->rq_wb, wb_acct);
>> + if (bio && (bio->bi_opf & REQ_NOWAIT))
>> + bio_wouldblock_error(bio);
>
> Same here. Although blk_sq_make_request is gone anyway in the current
> block tree..
>
>> + /* Request queue supports BIO_NOWAIT */
>> + queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
>
> BIO_NOWAIT is gone. And the comment would not be needed if the
> flag had a more descriptive name, e.g. QUEUE_FLAG_NOWAIT_SUPPORT.
>
> And I think all request based drivers should set the flag implicitly
> as ->queuecommand can't sleep, and ->queue_rq only when it's always
> offloaded to a workqueue when the BLK_MQ_F_BLOCKING flag is set.
>
Yes, Do we have a central point (like a probe() function call?) where
this can be done?
--
Goldwyn
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-04-19 15:21 ` Goldwyn Rodrigues
@ 2017-04-20 13:43 ` Jan Kara
0 siblings, 0 replies; 40+ messages in thread
From: Jan Kara @ 2017-04-20 13:43 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: Christoph Hellwig, linux-fsdevel, jack, linux-block, linux-btrfs,
linux-ext4, linux-xfs, sagi, avi, axboe, linux-api, willy,
tom.leiming, Goldwyn Rodrigues
On Wed 19-04-17 10:21:39, Goldwyn Rodrigues wrote:
>
>
> On 04/19/2017 01:45 AM, Christoph Hellwig wrote:
> >
> >> + if (bio->bi_opf & REQ_NOWAIT) {
> >> + if (!blk_queue_nowait(q)) {
> >> + err = -EOPNOTSUPP;
> >> + goto end_io;
> >> + }
> >> + if (!(bio->bi_opf & REQ_SYNC)) {
> >
> > I don't understand this check at all..
>
> It is to ensure at the block layer that NOWAIT comes only for DIRECT
> calls only. I should probably change it to REQ_SYNC | REQ_IDLE.
Ouch. Checking 'REQ_SYNC' for this is
a) unreliable hack
b) layering violation
You just don't care why someone marked bio with REQ_NOWAIT at this place.
Just obey the request if you can, return error if you cannot, but advisory
REQ_SYNC or REQ_IDLE flags have nothing to do with the ability of the block
layer to submit the bio without blocking...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-04-19 6:45 ` Christoph Hellwig
2017-04-19 15:21 ` Goldwyn Rodrigues
@ 2017-04-24 21:10 ` Goldwyn Rodrigues
2017-04-25 2:28 ` Jens Axboe
1 sibling, 1 reply; 40+ messages in thread
From: Goldwyn Rodrigues @ 2017-04-24 21:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, jack, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, axboe, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On 04/19/2017 01:45 AM, Christoph Hellwig wrote:
> On Fri, Apr 14, 2017 at 07:02:54AM -0500, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>
>> + /* Request queue supports BIO_NOWAIT */
>> + queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
>
> BIO_NOWAIT is gone. And the comment would not be needed if the
> flag had a more descriptive name, e.g. QUEUE_FLAG_NOWAIT_SUPPORT.
>
> And I think all request based drivers should set the flag implicitly
> as ->queuecommand can't sleep, and ->queue_rq only when it's always
> offloaded to a workqueue when the BLK_MQ_F_BLOCKING flag is set.
>
We introduced QUEUE_FLAG_NOWAIT for devices which would not wait for
request completions. The ones which wait are MD devices because of sync
or suspend operations.
The only user of BLK_MQ_F_NONBLOCKING seems to be nbd. As you mentioned,
it uses the flag to offload it to a workqueue.
The other way to do it implicitly is to change the flag to
BLK_MAY_BLOCK_REQS and use it for devices which do wait such as md/dm.
Is that what you are hinting at? Or do you have something else in mind?
--
Goldwyn
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/8] nowait aio: return on congested block device
2017-04-24 21:10 ` Goldwyn Rodrigues
@ 2017-04-25 2:28 ` Jens Axboe
0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2017-04-25 2:28 UTC (permalink / raw)
To: Goldwyn Rodrigues, Christoph Hellwig
Cc: linux-fsdevel, jack, linux-block, linux-btrfs, linux-ext4,
linux-xfs, sagi, avi, linux-api, willy, tom.leiming,
Goldwyn Rodrigues
On 04/24/2017 03:10 PM, Goldwyn Rodrigues wrote:
>
>
> On 04/19/2017 01:45 AM, Christoph Hellwig wrote:
>> On Fri, Apr 14, 2017 at 07:02:54AM -0500, Goldwyn Rodrigues wrote:
>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>>
>>
>>> + /* Request queue supports BIO_NOWAIT */
>>> + queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
>>
>> BIO_NOWAIT is gone. And the comment would not be needed if the
>> flag had a more descriptive name, e.g. QUEUE_FLAG_NOWAIT_SUPPORT.
>>
>> And I think all request based drivers should set the flag implicitly
>> as ->queuecommand can't sleep, and ->queue_rq only when it's always
>> offloaded to a workqueue when the BLK_MQ_F_BLOCKING flag is set.
>>
>
> We introduced QUEUE_FLAG_NOWAIT for devices which would not wait for
> request completions. The ones which wait are MD devices because of sync
> or suspend operations.
>
> The only user of BLK_MQ_F_NONBLOCKING seems to be nbd. As you mentioned,
> it uses the flag to offload it to a workqueue.
>
> The other way to do it implicitly is to change the flag to
> BLK_MAY_BLOCK_REQS and use it for devices which do wait such as md/dm.
> Is that what you are hinting at? Or do you have something else in mind?
You are misunderstanding. What Christoph (correctly) says is that request
based drivers do not block, so all of those are fine. What you need to
worry about is drivers that are NOT request based. In other words, drivers
that hook into make_request_fn and process bio's.
--
Jens Axboe
^ permalink raw reply [flat|nested] 40+ messages in thread