* [PATCH] zonefs: fix IOCB_NOWAIT handling
@ 2020-02-21 14:37 Christoph Hellwig
2020-02-24 13:48 ` Damien Le Moal
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2020-02-21 14:37 UTC (permalink / raw)
To: damien.lemoal, naohiro.aota; +Cc: linux-fsdevel
IOCB_NOWAIT can't just be ignored as it breaks applications expecting
it not to block. Just refuse the operation as applications must handle
that (e.g. by falling back to a thread pool).
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/zonefs/super.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8bc6ef82d693..69aee3dfb660 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -601,13 +601,13 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
ssize_t ret;
/*
- * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT
+ * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT
* as this can cause write reordering (e.g. the first aio gets EAGAIN
* on the inode lock but the second goes through but is now unaligned).
*/
- if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb)
- && (iocb->ki_flags & IOCB_NOWAIT))
- iocb->ki_flags &= ~IOCB_NOWAIT;
+ if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) &&
+ (iocb->ki_flags & IOCB_NOWAIT))
+ return -EOPNOTSUPP;
if (iocb->ki_flags & IOCB_NOWAIT) {
if (!inode_trylock(inode))
--
2.24.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] zonefs: fix IOCB_NOWAIT handling
2020-02-21 14:37 [PATCH] zonefs: fix IOCB_NOWAIT handling Christoph Hellwig
@ 2020-02-24 13:48 ` Damien Le Moal
2020-02-24 20:18 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Damien Le Moal @ 2020-02-24 13:48 UTC (permalink / raw)
To: Christoph Hellwig, Naohiro Aota; +Cc: linux-fsdevel
Hi Christoph,
On 2020/02/21 23:37, Christoph Hellwig wrote:
> IOCB_NOWAIT can't just be ignored as it breaks applications expecting
> it not to block. Just refuse the operation as applications must handle
> that (e.g. by falling back to a thread pool).
>
> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/zonefs/super.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8bc6ef82d693..69aee3dfb660 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -601,13 +601,13 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
> ssize_t ret;
>
> /*
> - * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT
> + * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT
> * as this can cause write reordering (e.g. the first aio gets EAGAIN
> * on the inode lock but the second goes through but is now unaligned).
> */
> - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb)
> - && (iocb->ki_flags & IOCB_NOWAIT))
> - iocb->ki_flags &= ~IOCB_NOWAIT;
> + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) &&
> + (iocb->ki_flags & IOCB_NOWAIT))
> + return -EOPNOTSUPP;
>
> if (iocb->ki_flags & IOCB_NOWAIT) {
> if (!inode_trylock(inode))
>
The main problem with allowing IOCB_NOWAIT is that for an application that sends
multiple write AIOs with a single io_submit(), all write AIOs after one that is
failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will
have an unaligned write offset. I am wondering if that is really a problem at
all since the application can catch the -EAGAIN and -EINVAL, indicating that the
write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ?
Dave did not think it is. I was split on this. Would be happy to hear your
opinion. Another solution I was thinking of is something like the hack below,
which may be cleaner and easier for the application to handle as that reduces
the number of errors for a multi-io io_submit() call with RWF_NOWAIT. To do so,
this introduces the IOCB_WRITE_FAIL_FAST kiocb flag that is set in
zonefs_file_dio_write() for a nowait kiocb to a sequential zone file. If this
function fails (with -EAGAIN or any other error), this flag instruct aio_write()
to return the error for the failed kiocb instead of 0. This error return of
aio_write() then stops the io_submit() loop on the first failed iocb.
diff --git a/fs/aio.c b/fs/aio.c
index 5f3d3d814928..9f01541c8ecc 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1568,6 +1568,8 @@ static int aio_write(struct kiocb *req, const struct iocb
*iocb,
return ret;
ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
if (!ret) {
+ ssize_t err;
+
/*
* Open-code file_start_write here to grab freeze protection,
* which will be released by another thread in
@@ -1580,7 +1582,12 @@ static int aio_write(struct kiocb *req, const struct iocb
*iocb,
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
req->ki_flags |= IOCB_WRITE;
- aio_rw_done(req, call_write_iter(file, req, &iter));
+ err = call_write_iter(file, req, &iter);
+ if (err != -EIOCBQUEUED) {
+ aio_rw_done(req, err);
+ if ((req->ki_flags & IOCB_WRITE_FAIL_FAST) && err < 0)
+ ret = err;
+ }
}
kfree(iovec);
return ret;
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8bc6ef82d693..0fa098354e38 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -601,13 +601,14 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb,
struct iov_iter *from)
ssize_t ret;
/*
- * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT
- * as this can cause write reordering (e.g. the first aio gets EAGAIN
- * on the inode lock but the second goes through but is now unaligned).
+ * For async direct IOs to sequential zone files, IOCB_NOWAIT can cause
+ * unaligned writes in case of EAGAIN error or any failure to issue the
+ * direct IO. So tell vfs to stop io_submit() on the first failure to
+ * avoid more failed aios than necessary.
*/
if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb)
&& (iocb->ki_flags & IOCB_NOWAIT))
- iocb->ki_flags &= ~IOCB_NOWAIT;
+ iocb->ki_flags |= IOCB_WRITE_FAIL_FAST;
if (iocb->ki_flags & IOCB_NOWAIT) {
if (!inode_trylock(inode))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3cd4fe6b845e..8e7df1cc92e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -314,6 +314,7 @@ enum rw_hint {
#define IOCB_SYNC (1 << 5)
#define IOCB_WRITE (1 << 6)
#define IOCB_NOWAIT (1 << 7)
+#define IOCB_WRITE_FAIL_FAST (1 << 8)
struct kiocb {
struct file *ki_filp;
Of note is that checking the code path for iomap_direct_io(), I noticed that
there are at least 2 places places where the code can block: dio allocation with
GFP_KERNEL and the BIOs allocations for that dio also with GFP_KERNEL. While the
latter may be necessary to avoid partial failures of large direct IOs that need
to be split into multiple BIOs, shouldn't the dio allocation be done with
GFP_NOWAIT when IOMAP_NOWAIT is set ?
Best regards.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] zonefs: fix IOCB_NOWAIT handling
2020-02-24 13:48 ` Damien Le Moal
@ 2020-02-24 20:18 ` Christoph Hellwig
2020-02-25 9:54 ` Damien Le Moal
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2020-02-24 20:18 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Christoph Hellwig, Naohiro Aota, linux-fsdevel
On Mon, Feb 24, 2020 at 01:48:48PM +0000, Damien Le Moal wrote:
>
> The main problem with allowing IOCB_NOWAIT is that for an application that sends
> multiple write AIOs with a single io_submit(), all write AIOs after one that is
> failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will
> have an unaligned write offset. I am wondering if that is really a problem at
> all since the application can catch the -EAGAIN and -EINVAL, indicating that the
> write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ?
I don't think supporting IOCB_NOWAIT with current zonefs is very useful.
It will be very useful once Zone Append is supported.
But more importantly my patch fixes a bug in the current zonefs
implementation. We need to fix that before 5.6 is released. Any
additional functionaity can still come later if we think that it is
useful.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] zonefs: fix IOCB_NOWAIT handling
2020-02-24 20:18 ` Christoph Hellwig
@ 2020-02-25 9:54 ` Damien Le Moal
0 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2020-02-25 9:54 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Naohiro Aota, linux-fsdevel
On 2020/02/25 5:19, Christoph Hellwig wrote:
> On Mon, Feb 24, 2020 at 01:48:48PM +0000, Damien Le Moal wrote:
>>
>> The main problem with allowing IOCB_NOWAIT is that for an application that sends
>> multiple write AIOs with a single io_submit(), all write AIOs after one that is
>> failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will
>> have an unaligned write offset. I am wondering if that is really a problem at
>> all since the application can catch the -EAGAIN and -EINVAL, indicating that the
>> write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ?
>
> I don't think supporting IOCB_NOWAIT with current zonefs is very useful.
> It will be very useful once Zone Append is supported.
>
> But more importantly my patch fixes a bug in the current zonefs
> implementation. We need to fix that before 5.6 is released. Any
> additional functionaity can still come later if we think that it is
> useful.
>
OK. Fair enough. I will apply your patch.
Thanks !
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-02-25 9:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 14:37 [PATCH] zonefs: fix IOCB_NOWAIT handling Christoph Hellwig
2020-02-24 13:48 ` Damien Le Moal
2020-02-24 20:18 ` Christoph Hellwig
2020-02-25 9:54 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).