linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Consolidate DIO behavior on unaligned EOF read
@ 2020-08-19 20:07 Gabriel Krisman Bertazi
  2020-08-19 20:07 ` [PATCH 1/2] direct-io: defer alignment check until after EOF check Gabriel Krisman Bertazi
  2020-08-19 20:07 ` [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read Gabriel Krisman Bertazi
  0 siblings, 2 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-08-19 20:07 UTC (permalink / raw)
  To: viro, jaegeuk, chao
  Cc: linux-fsdevel, linux-f2fs-devel, Gabriel Krisman Bertazi, kernel

When issuing a DIO read past EOF on a file that doesn't end on a block
boundary, behavior varies among filesystems. Most filesystems (including
the ones using iomap for DIO) will return EOF, but the old generic DIO
mechanism changed behavior accidently after commit 9fe55eea7e4b ("Fix
race when checking i_size on direct i/o read") and started returning
-EINVAL.

This can be observed with a simple read over a file that doesn't end on
a block boundary, forcing the last read to be unaligned.

 while (done < total) {
   ssize_t delta = pread(fd, buf + done, total - done, off + done);
   if (!delta)
     break;
   ...
 }

On the final read, delta will be 0 on most filesystems, including ext4,
xfs and btrfs, but it started being -EINVAL after said commit, for
filesystems using do_blockdev_direct_IO.

BTRFS, even though relying on this generic code, doesn't have this
issue, because it does the verification beforehand, on check_direct_IO.
Finally, f2fs requires a specific fix, since it checks for alignment
before calling the generic code.

It is arguable whether filesystems should actually return EOF or
-EINVAL, but since the original ABI returned 0 and so does most
filesystems and original DIO code, it seems reasonable to consolidate on
that.  Therefore, this patchset fixes the filesystems returning EINVAL
to return EOF.

I ran smoke tests over f2fs, and didn't observe regressions.

Gabriel Krisman Bertazi (1):
  f2fs: Return EOF on unaligned end of file DIO read

Jamie Liu (1):
  direct-io: defer alignment check until after EOF check

 fs/direct-io.c | 31 ++++++++++++++++++-------------
 fs/f2fs/data.c |  3 +++
 2 files changed, 21 insertions(+), 13 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] direct-io: defer alignment check until after EOF check
  2020-08-19 20:07 [PATCH 0/2] Consolidate DIO behavior on unaligned EOF read Gabriel Krisman Bertazi
@ 2020-08-19 20:07 ` Gabriel Krisman Bertazi
  2020-08-26 14:28   ` Jan Kara
  2020-08-19 20:07 ` [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read Gabriel Krisman Bertazi
  1 sibling, 1 reply; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-08-19 20:07 UTC (permalink / raw)
  To: viro, jaegeuk, chao
  Cc: linux-fsdevel, linux-f2fs-devel, Jamie Liu, kernel,
	Gabriel Krisman Bertazi

From: Jamie Liu <jamieliu@google.com>

Prior to commit 9fe55eea7e4b ("Fix race when checking i_size on direct
i/o read"), an unaligned direct read past end of file would trigger EOF,
since generic_file_aio_read detected this read-at-EOF condition and
skipped the direct IO read entirely, returning 0. After that change, the
read now reaches dio_generic, which detects the misalignment and returns
EINVAL.

This consolidates the generic direct-io to follow the same behavior of
filesystems.  Apparently, this fix will only affect ocfs2 since other
filesystems do this verification before calling do_blockdev_direct_IO,
with the exception of f2fs, which has the same bug, but is fixed in the
next patch.

it can be verified by a read loop on a file that does a partial read
before EOF (On file that doesn't end at an aligned address).  The
following code fails on an unaligned file on filesystems without
prior validation without this patch, but not on btrfs, ext4, and xfs.

  while (done < total) {
    ssize_t delta = pread(fd, buf + done, total - done, off + done);
    if (!delta)
      break;
    ...
  }

Fix this regression by moving the misalignment check to after the EOF
check added by commit 74cedf9b6c60 ("direct-io: Fix negative return from
dio read beyond eof").

Signed-off-by: Jamie Liu <jamieliu@google.com>
Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/direct-io.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 183299892465..77400b033d63 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1160,19 +1160,6 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	struct blk_plug plug;
 	unsigned long align = offset | iov_iter_alignment(iter);
 
-	/*
-	 * Avoid references to bdev if not absolutely needed to give
-	 * the early prefetch in the caller enough time.
-	 */
-
-	if (align & blocksize_mask) {
-		if (bdev)
-			blkbits = blksize_bits(bdev_logical_block_size(bdev));
-		blocksize_mask = (1 << blkbits) - 1;
-		if (align & blocksize_mask)
-			goto out;
-	}
-
 	/* watch out for a 0 len io from a tricksy fs */
 	if (iov_iter_rw(iter) == READ && !count)
 		return 0;
@@ -1217,6 +1204,24 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 		goto out;
 	}
 
+	/*
+	 * Avoid references to bdev if not absolutely needed to give
+	 * the early prefetch in the caller enough time.
+	 */
+
+	if (align & blocksize_mask) {
+		if (bdev)
+			blkbits = blksize_bits(bdev_logical_block_size(bdev));
+		blocksize_mask = (1 << blkbits) - 1;
+		if (align & blocksize_mask) {
+			if (iov_iter_rw(iter) == READ && dio->flags & DIO_LOCKING)
+				inode_unlock(inode);
+			kmem_cache_free(dio_cache, dio);
+			retval = -EINVAL;
+			goto out;
+		}
+	}
+
 	/*
 	 * For file extending writes updating i_size before data writeouts
 	 * complete can expose uninitialized blocks in dumb filesystems.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read
  2020-08-19 20:07 [PATCH 0/2] Consolidate DIO behavior on unaligned EOF read Gabriel Krisman Bertazi
  2020-08-19 20:07 ` [PATCH 1/2] direct-io: defer alignment check until after EOF check Gabriel Krisman Bertazi
@ 2020-08-19 20:07 ` Gabriel Krisman Bertazi
  2020-08-20 16:13   ` Jaegeuk Kim
  2020-08-21  1:37   ` Chao Yu
  1 sibling, 2 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-08-19 20:07 UTC (permalink / raw)
  To: viro, jaegeuk, chao
  Cc: linux-fsdevel, linux-f2fs-devel, Gabriel Krisman Bertazi, kernel

Reading past end of file returns EOF for aligned reads but -EINVAL for
unaligned reads on f2fs.  While documentation is not strict about this
corner case, most filesystem returns EOF on this case, like iomap
filesystems.  This patch consolidates the behavior for f2fs, by making
it return EOF(0).

it can be verified by a read loop on a file that does a partial read
before EOF (A file that doesn't end at an aligned address).  The
following code fails on an unaligned file on f2fs, but not on
btrfs, ext4, and xfs.

  while (done < total) {
    ssize_t delta = pread(fd, buf + done, total - done, off + done);
    if (!delta)
      break;
    ...
  }

It is arguable whether filesystems should actually return EOF or
-EINVAL, but since iomap filesystems support it, and so does the
original DIO code, it seems reasonable to consolidate on that.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/f2fs/data.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5f527073143e..d9834ffe1da9 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3510,6 +3510,9 @@ static int check_direct_IO(struct inode *inode, struct iov_iter *iter,
 	unsigned long align = offset | iov_iter_alignment(iter);
 	struct block_device *bdev = inode->i_sb->s_bdev;
 
+	if (iov_iter_rw(iter) == READ && offset >= i_size_read(inode))
+		return 1;
+
 	if (align & blocksize_mask) {
 		if (bdev)
 			blkbits = blksize_bits(bdev_logical_block_size(bdev));
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read
  2020-08-19 20:07 ` [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read Gabriel Krisman Bertazi
@ 2020-08-20 16:13   ` Jaegeuk Kim
  2020-08-21  1:37   ` Chao Yu
  1 sibling, 0 replies; 6+ messages in thread
From: Jaegeuk Kim @ 2020-08-20 16:13 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: viro, chao, linux-fsdevel, linux-f2fs-devel, kernel

Hi Gabriel,

Thank you for the patch. Let me take this separately from the patch set.

Thanks,

On 08/19, Gabriel Krisman Bertazi wrote:
> Reading past end of file returns EOF for aligned reads but -EINVAL for
> unaligned reads on f2fs.  While documentation is not strict about this
> corner case, most filesystem returns EOF on this case, like iomap
> filesystems.  This patch consolidates the behavior for f2fs, by making
> it return EOF(0).
> 
> it can be verified by a read loop on a file that does a partial read
> before EOF (A file that doesn't end at an aligned address).  The
> following code fails on an unaligned file on f2fs, but not on
> btrfs, ext4, and xfs.
> 
>   while (done < total) {
>     ssize_t delta = pread(fd, buf + done, total - done, off + done);
>     if (!delta)
>       break;
>     ...
>   }
> 
> It is arguable whether filesystems should actually return EOF or
> -EINVAL, but since iomap filesystems support it, and so does the
> original DIO code, it seems reasonable to consolidate on that.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/f2fs/data.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 5f527073143e..d9834ffe1da9 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3510,6 +3510,9 @@ static int check_direct_IO(struct inode *inode, struct iov_iter *iter,
>  	unsigned long align = offset | iov_iter_alignment(iter);
>  	struct block_device *bdev = inode->i_sb->s_bdev;
>  
> +	if (iov_iter_rw(iter) == READ && offset >= i_size_read(inode))
> +		return 1;
> +
>  	if (align & blocksize_mask) {
>  		if (bdev)
>  			blkbits = blksize_bits(bdev_logical_block_size(bdev));
> -- 
> 2.28.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read
  2020-08-19 20:07 ` [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read Gabriel Krisman Bertazi
  2020-08-20 16:13   ` Jaegeuk Kim
@ 2020-08-21  1:37   ` Chao Yu
  1 sibling, 0 replies; 6+ messages in thread
From: Chao Yu @ 2020-08-21  1:37 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, viro, jaegeuk, chao
  Cc: linux-fsdevel, linux-f2fs-devel, kernel

On 2020/8/20 4:07, Gabriel Krisman Bertazi wrote:
> Reading past end of file returns EOF for aligned reads but -EINVAL for
> unaligned reads on f2fs.  While documentation is not strict about this
> corner case, most filesystem returns EOF on this case, like iomap
> filesystems.  This patch consolidates the behavior for f2fs, by making
> it return EOF(0).
> 
> it can be verified by a read loop on a file that does a partial read
> before EOF (A file that doesn't end at an aligned address).  The
> following code fails on an unaligned file on f2fs, but not on
> btrfs, ext4, and xfs.
> 
>    while (done < total) {
>      ssize_t delta = pread(fd, buf + done, total - done, off + done);
>      if (!delta)
>        break;
>      ...
>    }
> 
> It is arguable whether filesystems should actually return EOF or
> -EINVAL, but since iomap filesystems support it, and so does the
> original DIO code, it seems reasonable to consolidate on that.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Reviewed-by: Chao Yu <yuchao0@huawei.com>

Thanks,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] direct-io: defer alignment check until after EOF check
  2020-08-19 20:07 ` [PATCH 1/2] direct-io: defer alignment check until after EOF check Gabriel Krisman Bertazi
@ 2020-08-26 14:28   ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2020-08-26 14:28 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: viro, jaegeuk, chao, linux-fsdevel, linux-f2fs-devel, Jamie Liu, kernel

On Wed 19-08-20 16:07:30, Gabriel Krisman Bertazi wrote:
> From: Jamie Liu <jamieliu@google.com>
> 
> Prior to commit 9fe55eea7e4b ("Fix race when checking i_size on direct
> i/o read"), an unaligned direct read past end of file would trigger EOF,
> since generic_file_aio_read detected this read-at-EOF condition and
> skipped the direct IO read entirely, returning 0. After that change, the
> read now reaches dio_generic, which detects the misalignment and returns
> EINVAL.
> 
> This consolidates the generic direct-io to follow the same behavior of
> filesystems.  Apparently, this fix will only affect ocfs2 since other
> filesystems do this verification before calling do_blockdev_direct_IO,
> with the exception of f2fs, which has the same bug, but is fixed in the
> next patch.
> 
> it can be verified by a read loop on a file that does a partial read
> before EOF (On file that doesn't end at an aligned address).  The
> following code fails on an unaligned file on filesystems without
> prior validation without this patch, but not on btrfs, ext4, and xfs.
> 
>   while (done < total) {
>     ssize_t delta = pread(fd, buf + done, total - done, off + done);
>     if (!delta)
>       break;
>     ...
>   }
> 
> Fix this regression by moving the misalignment check to after the EOF
> check added by commit 74cedf9b6c60 ("direct-io: Fix negative return from
> dio read beyond eof").
> 
> Signed-off-by: Jamie Liu <jamieliu@google.com>
> Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Looks sane to me but I'd note that your patch also makes unaligned 0-length
reads succeed (probably don't care). Also your patch makes unaligned DIO reads
write-out page cache before returning EINVAL - that actually looks a bit
strange. Not that it would be outright bug but it seems strange to wait
couple of seconds doing writeback only to return EINVAL... So I'd maybe
restructure the code like:

	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
		inode_lock(inode)
	dio->i_size = i_size_read(inode);
	... i_size checks ...
	... alignment checks ...
	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
		... writeout ...

What do you think?
								Honza

> ---
>  fs/direct-io.c | 31 ++++++++++++++++++-------------
>  1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 183299892465..77400b033d63 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1160,19 +1160,6 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	struct blk_plug plug;
>  	unsigned long align = offset | iov_iter_alignment(iter);
>  
> -	/*
> -	 * Avoid references to bdev if not absolutely needed to give
> -	 * the early prefetch in the caller enough time.
> -	 */
> -
> -	if (align & blocksize_mask) {
> -		if (bdev)
> -			blkbits = blksize_bits(bdev_logical_block_size(bdev));
> -		blocksize_mask = (1 << blkbits) - 1;
> -		if (align & blocksize_mask)
> -			goto out;
> -	}
> -
>  	/* watch out for a 0 len io from a tricksy fs */
>  	if (iov_iter_rw(iter) == READ && !count)
>  		return 0;
> @@ -1217,6 +1204,24 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  		goto out;
>  	}
>  
> +	/*
> +	 * Avoid references to bdev if not absolutely needed to give
> +	 * the early prefetch in the caller enough time.
> +	 */
> +
> +	if (align & blocksize_mask) {
> +		if (bdev)
> +			blkbits = blksize_bits(bdev_logical_block_size(bdev));
> +		blocksize_mask = (1 << blkbits) - 1;
> +		if (align & blocksize_mask) {
> +			if (iov_iter_rw(iter) == READ && dio->flags & DIO_LOCKING)
> +				inode_unlock(inode);
> +			kmem_cache_free(dio_cache, dio);
> +			retval = -EINVAL;
> +			goto out;
> +		}
> +	}
> +
>  	/*
>  	 * For file extending writes updating i_size before data writeouts
>  	 * complete can expose uninitialized blocks in dumb filesystems.
> -- 
> 2.28.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-08-26 14:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 20:07 [PATCH 0/2] Consolidate DIO behavior on unaligned EOF read Gabriel Krisman Bertazi
2020-08-19 20:07 ` [PATCH 1/2] direct-io: defer alignment check until after EOF check Gabriel Krisman Bertazi
2020-08-26 14:28   ` Jan Kara
2020-08-19 20:07 ` [PATCH 2/2] f2fs: Return EOF on unaligned end of file DIO read Gabriel Krisman Bertazi
2020-08-20 16:13   ` Jaegeuk Kim
2020-08-21  1:37   ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).