All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] f2fs: use iomap for direct I/O
@ 2021-07-16 14:39 ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

This series makes f2fs implement direct I/O using iomap_dio_rw() instead
of __blockdev_direct_IO().  In order to do this, it adds f2fs_iomap_ops,
since this is the first use of iomap in f2fs.

The iomap direct I/O implementation is more efficient than the
fs/direct-io.c implementation.  Switching to iomap also avoids the need
to add new features and optimizations to the old implementation.  E.g.,
see https://lore.kernel.org/r/20200710053406.GA25530@infradead.org and
https://lore.kernel.org/r/YKJBWClI7sUeABDs@infradead.org.

In general, this series preserves existing f2fs behavior (such as the
conditions for falling back to buffered I/O) and is only an
implementation change.

Patches 1-5 contain cleanups and fixes for f2fs_file_write_iter().
Patch 6 adds f2fs_iomap_ops, patch 7 and 8 switch direct I/O reads and
writes to iomap, and patch 9 removes obsoleted code.

Careful review is appreciated, as I'm not an expert in all areas here.

This series has been tested with xfstests by running 'gce-xfstests -c
f2fs -g auto -X generic/017' with and without this series; no
regressions were seen.  (Some tests fail both before and after.
generic/017 hangs both before and after, so it had to be excluded.)

This series applies to v5.14-rc1.

Eric Biggers (9):
  f2fs: make f2fs_write_failed() take struct inode
  f2fs: remove allow_outplace_dio()
  f2fs: rework write preallocations
  f2fs: reduce indentation in f2fs_file_write_iter()
  f2fs: fix the f2fs_file_write_iter tracepoint
  f2fs: implement iomap operations
  f2fs: use iomap for direct I/O reads
  f2fs: use iomap for direct I/O writes
  f2fs: remove f2fs_direct_IO()

 fs/f2fs/Kconfig             |   1 +
 fs/f2fs/data.c              | 286 +++++++------------------
 fs/f2fs/f2fs.h              |  29 +--
 fs/f2fs/file.c              | 416 +++++++++++++++++++++++++++++-------
 include/trace/events/f2fs.h |  12 +-
 5 files changed, 421 insertions(+), 323 deletions(-)


base-commit: e73f0f0ee7541171d89f2e2491130c7771ba58d3
-- 
2.32.0


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 0/9] f2fs: use iomap for direct I/O
@ 2021-07-16 14:39 ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

This series makes f2fs implement direct I/O using iomap_dio_rw() instead
of __blockdev_direct_IO().  In order to do this, it adds f2fs_iomap_ops,
since this is the first use of iomap in f2fs.

The iomap direct I/O implementation is more efficient than the
fs/direct-io.c implementation.  Switching to iomap also avoids the need
to add new features and optimizations to the old implementation.  E.g.,
see https://lore.kernel.org/r/20200710053406.GA25530@infradead.org and
https://lore.kernel.org/r/YKJBWClI7sUeABDs@infradead.org.

In general, this series preserves existing f2fs behavior (such as the
conditions for falling back to buffered I/O) and is only an
implementation change.

Patches 1-5 contain cleanups and fixes for f2fs_file_write_iter().
Patch 6 adds f2fs_iomap_ops, patch 7 and 8 switch direct I/O reads and
writes to iomap, and patch 9 removes obsoleted code.

Careful review is appreciated, as I'm not an expert in all areas here.

This series has been tested with xfstests by running 'gce-xfstests -c
f2fs -g auto -X generic/017' with and without this series; no
regressions were seen.  (Some tests fail both before and after.
generic/017 hangs both before and after, so it had to be excluded.)

This series applies to v5.14-rc1.

Eric Biggers (9):
  f2fs: make f2fs_write_failed() take struct inode
  f2fs: remove allow_outplace_dio()
  f2fs: rework write preallocations
  f2fs: reduce indentation in f2fs_file_write_iter()
  f2fs: fix the f2fs_file_write_iter tracepoint
  f2fs: implement iomap operations
  f2fs: use iomap for direct I/O reads
  f2fs: use iomap for direct I/O writes
  f2fs: remove f2fs_direct_IO()

 fs/f2fs/Kconfig             |   1 +
 fs/f2fs/data.c              | 286 +++++++------------------
 fs/f2fs/f2fs.h              |  29 +--
 fs/f2fs/file.c              | 416 +++++++++++++++++++++++++++++-------
 include/trace/events/f2fs.h |  12 +-
 5 files changed, 421 insertions(+), 323 deletions(-)


base-commit: e73f0f0ee7541171d89f2e2491130c7771ba58d3
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 1/9] f2fs: make f2fs_write_failed() take struct inode
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Make f2fs_write_failed() take a 'struct inode' directly rather than a
'struct address_space', as this simplifies it slightly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d2cf48c5a2e4..c478964a5695 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3176,9 +3176,8 @@ static int f2fs_write_data_pages(struct address_space *mapping,
 			FS_CP_DATA_IO : FS_DATA_IO);
 }
 
-static void f2fs_write_failed(struct address_space *mapping, loff_t to)
+static void f2fs_write_failed(struct inode *inode, loff_t to)
 {
-	struct inode *inode = mapping->host;
 	loff_t i_size = i_size_read(inode);
 
 	if (IS_NOQUOTA(inode))
@@ -3410,7 +3409,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
 
 fail:
 	f2fs_put_page(page, 1);
-	f2fs_write_failed(mapping, pos + len);
+	f2fs_write_failed(inode, pos + len);
 	if (drop_atomic)
 		f2fs_drop_inmem_pages_all(sbi, false);
 	return err;
@@ -3600,7 +3599,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
 						count - iov_iter_count(iter));
 		} else if (err < 0) {
-			f2fs_write_failed(mapping, offset + count);
+			f2fs_write_failed(inode, offset + count);
 		}
 	} else {
 		if (err > 0)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 1/9] f2fs: make f2fs_write_failed() take struct inode
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Make f2fs_write_failed() take a 'struct inode' directly rather than a
'struct address_space', as this simplifies it slightly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d2cf48c5a2e4..c478964a5695 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3176,9 +3176,8 @@ static int f2fs_write_data_pages(struct address_space *mapping,
 			FS_CP_DATA_IO : FS_DATA_IO);
 }
 
-static void f2fs_write_failed(struct address_space *mapping, loff_t to)
+static void f2fs_write_failed(struct inode *inode, loff_t to)
 {
-	struct inode *inode = mapping->host;
 	loff_t i_size = i_size_read(inode);
 
 	if (IS_NOQUOTA(inode))
@@ -3410,7 +3409,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
 
 fail:
 	f2fs_put_page(page, 1);
-	f2fs_write_failed(mapping, pos + len);
+	f2fs_write_failed(inode, pos + len);
 	if (drop_atomic)
 		f2fs_drop_inmem_pages_all(sbi, false);
 	return err;
@@ -3600,7 +3599,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
 						count - iov_iter_count(iter));
 		} else if (err < 0) {
-			f2fs_write_failed(mapping, offset + count);
+			f2fs_write_failed(inode, offset + count);
 		}
 	} else {
 		if (err > 0)
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/9] f2fs: remove allow_outplace_dio()
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

We can just check f2fs_lfs_mode() directly.  The block_unaligned_IO()
check is redundant because in LFS mode, f2fs doesn't do direct I/O
writes that aren't block-aligned (due to f2fs_force_buffered_io()
returning true in this case, triggering the fallback to buffered I/O).

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c |  2 +-
 fs/f2fs/f2fs.h | 10 ----------
 fs/f2fs/file.c |  2 +-
 3 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c478964a5695..18cb28a514e6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3551,7 +3551,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (f2fs_force_buffered_io(inode, iocb, iter))
 		return 0;
 
-	do_opu = allow_outplace_dio(inode, iocb, iter);
+	do_opu = (rw == WRITE && f2fs_lfs_mode(sbi));
 
 	trace_f2fs_direct_IO_enter(inode, offset, count, rw);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ee8eb33e2c25..ad7c1b94e23a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4305,16 +4305,6 @@ static inline int block_unaligned_IO(struct inode *inode,
 	return align & blocksize_mask;
 }
 
-static inline int allow_outplace_dio(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
-
-	return (f2fs_lfs_mode(sbi) && (rw == WRITE) &&
-				!block_unaligned_IO(inode, iocb, iter));
-}
-
 static inline bool f2fs_force_buffered_io(struct inode *inode,
 				struct kiocb *iocb, struct iov_iter *iter)
 {
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6afd4562335f..b1cb5b50faac 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4292,7 +4292,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			 * back to buffered IO.
 			 */
 			if (!f2fs_force_buffered_io(inode, iocb, from) &&
-					allow_outplace_dio(inode, iocb, from))
+					f2fs_lfs_mode(F2FS_I_SB(inode)))
 				goto write;
 		}
 		preallocated = true;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 2/9] f2fs: remove allow_outplace_dio()
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

We can just check f2fs_lfs_mode() directly.  The block_unaligned_IO()
check is redundant because in LFS mode, f2fs doesn't do direct I/O
writes that aren't block-aligned (due to f2fs_force_buffered_io()
returning true in this case, triggering the fallback to buffered I/O).

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c |  2 +-
 fs/f2fs/f2fs.h | 10 ----------
 fs/f2fs/file.c |  2 +-
 3 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c478964a5695..18cb28a514e6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3551,7 +3551,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (f2fs_force_buffered_io(inode, iocb, iter))
 		return 0;
 
-	do_opu = allow_outplace_dio(inode, iocb, iter);
+	do_opu = (rw == WRITE && f2fs_lfs_mode(sbi));
 
 	trace_f2fs_direct_IO_enter(inode, offset, count, rw);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ee8eb33e2c25..ad7c1b94e23a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4305,16 +4305,6 @@ static inline int block_unaligned_IO(struct inode *inode,
 	return align & blocksize_mask;
 }
 
-static inline int allow_outplace_dio(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
-
-	return (f2fs_lfs_mode(sbi) && (rw == WRITE) &&
-				!block_unaligned_IO(inode, iocb, iter));
-}
-
 static inline bool f2fs_force_buffered_io(struct inode *inode,
 				struct kiocb *iocb, struct iov_iter *iter)
 {
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6afd4562335f..b1cb5b50faac 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4292,7 +4292,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			 * back to buffered IO.
 			 */
 			if (!f2fs_force_buffered_io(inode, iocb, from) &&
-					allow_outplace_dio(inode, iocb, from))
+					f2fs_lfs_mode(F2FS_I_SB(inode)))
 				goto write;
 		}
 		preallocated = true;
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/9] f2fs: rework write preallocations
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

f2fs_write_begin() assumes that all blocks were preallocated by
default unless FI_NO_PREALLOC is explicitly set.  This invites data
corruption, as there are cases in which not all blocks are preallocated.
Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
buffered_io") fixed one case, but there are others remaining.

Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which
only gets set if all blocks for the current write were preallocated.

Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it
handle some of the logic that was previously in write_iter() directly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c |  55 ++--------------------
 fs/f2fs/f2fs.h |   3 +-
 fs/f2fs/file.c | 123 ++++++++++++++++++++++++++++++++-----------------
 3 files changed, 84 insertions(+), 97 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 18cb28a514e6..cdadaa9daf55 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1370,53 +1370,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 	return 0;
 }
 
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
-{
-	struct inode *inode = file_inode(iocb->ki_filp);
-	struct f2fs_map_blocks map;
-	int flag;
-	int err = 0;
-	bool direct_io = iocb->ki_flags & IOCB_DIRECT;
-
-	map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
-	map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
-	if (map.m_len > map.m_lblk)
-		map.m_len -= map.m_lblk;
-	else
-		map.m_len = 0;
-
-	map.m_next_pgofs = NULL;
-	map.m_next_extent = NULL;
-	map.m_seg_type = NO_CHECK_TYPE;
-	map.m_may_create = true;
-
-	if (direct_io) {
-		map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
-		flag = f2fs_force_buffered_io(inode, iocb, from) ?
-					F2FS_GET_BLOCK_PRE_AIO :
-					F2FS_GET_BLOCK_PRE_DIO;
-		goto map_blocks;
-	}
-	if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
-		err = f2fs_convert_inline_inode(inode);
-		if (err)
-			return err;
-	}
-	if (f2fs_has_inline_data(inode))
-		return err;
-
-	flag = F2FS_GET_BLOCK_PRE_AIO;
-
-map_blocks:
-	err = f2fs_map_blocks(inode, &map, 1, flag);
-	if (map.m_len > 0 && err == -ENOSPC) {
-		if (!direct_io)
-			set_inode_flag(inode, FI_NO_PREALLOC);
-		err = 0;
-	}
-	return err;
-}
-
 void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
 {
 	if (flag == F2FS_GET_BLOCK_PRE_AIO) {
@@ -3210,12 +3163,10 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
 	int flag;
 
 	/*
-	 * we already allocated all the blocks, so we don't need to get
-	 * the block addresses when there is no need to fill the page.
+	 * If a whole page is being written and we already preallocated all the
+	 * blocks, then there is no need to get a block address now.
 	 */
-	if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
-	    !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
-	    !f2fs_verity_in_progress(inode))
+	if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
 		return 0;
 
 	/* f2fs_lock_op avoids race between write CP and convert_inline_page */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ad7c1b94e23a..da1da3111f18 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -699,7 +699,7 @@ enum {
 	FI_INLINE_DOTS,		/* indicate inline dot dentries */
 	FI_DO_DEFRAG,		/* indicate defragment is running */
 	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
-	FI_NO_PREALLOC,		/* indicate skipped preallocated blocks */
+	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
 	FI_HOT_DATA,		/* indicate file is hot */
 	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
 	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
@@ -3604,7 +3604,6 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
 int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
 int f2fs_reserve_new_block(struct dnode_of_data *dn);
 int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
 int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
 struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
 			int op_flags, bool for_write);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index b1cb5b50faac..9b12004e78c6 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4218,10 +4218,72 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	return ret;
 }
 
+/*
+ * Preallocate blocks for a write request, if it is possible and helpful to do
+ * so.  Returns a positive number if blocks may have been preallocated, 0 if no
+ * blocks were preallocated, or a negative errno value if something went
+ * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
+ * requested blocks (not just some of them) have been allocated.
+ */
+static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	const loff_t pos = iocb->ki_pos;
+	const size_t count = iov_iter_count(iter);
+	struct f2fs_map_blocks map = {};
+	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
+		   !f2fs_force_buffered_io(inode, iocb, iter);
+	int flag;
+	int ret;
+
+	/* If it will be an in-place direct write, don't bother. */
+	if (dio && !f2fs_lfs_mode(sbi))
+		return 0;
+
+	/* No-wait I/O can't allocate blocks. */
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return 0;
+
+	/* If it will be a short write, don't bother. */
+	if (iov_iter_fault_in_readable(iter, count) != 0)
+		return 0;
+
+	if (f2fs_has_inline_data(inode)) {
+		/* If the data will fit inline, don't bother. */
+		if (pos + count <= MAX_INLINE_DATA(inode))
+			return 0;
+		ret = f2fs_convert_inline_inode(inode);
+		if (ret)
+			return ret;
+	}
+
+	map.m_lblk = (pos >> inode->i_blkbits);
+	map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
+	map.m_may_create = true;
+	if (dio) {
+		map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
+		flag = F2FS_GET_BLOCK_PRE_DIO;
+	} else {
+		map.m_seg_type = NO_CHECK_TYPE;
+		flag = F2FS_GET_BLOCK_PRE_AIO;
+	}
+
+	ret = f2fs_map_blocks(inode, &map, 1, flag);
+	/* -ENOSPC is only a fatal error if no blocks could be allocated. */
+	if (ret < 0 && !(ret == -ENOSPC && map.m_len > 0))
+		return ret;
+	if (ret == 0)
+		set_inode_flag(inode, FI_PREALLOCATED_ALL);
+	return map.m_len;
+}
+
 static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	loff_t target_size;
+	int preallocated;
 	ssize_t ret;
 
 	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
@@ -4245,84 +4307,59 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 	if (unlikely(IS_IMMUTABLE(inode))) {
 		ret = -EPERM;
-		goto unlock;
+		goto out_unlock;
 	}
 
 	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
 		ret = -EPERM;
-		goto unlock;
+		goto out_unlock;
 	}
 
 	ret = generic_write_checks(iocb, from);
 	if (ret > 0) {
-		bool preallocated = false;
-		size_t target_size = 0;
-		int err;
-
-		if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
-			set_inode_flag(inode, FI_NO_PREALLOC);
-
-		if ((iocb->ki_flags & IOCB_NOWAIT)) {
+		if (iocb->ki_flags & IOCB_NOWAIT) {
 			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
 						iov_iter_count(from)) ||
 				f2fs_has_inline_data(inode) ||
 				f2fs_force_buffered_io(inode, iocb, from)) {
-				clear_inode_flag(inode, FI_NO_PREALLOC);
-				inode_unlock(inode);
 				ret = -EAGAIN;
-				goto out;
+				goto out_unlock;
 			}
-			goto write;
 		}
-
-		if (is_inode_flag_set(inode, FI_NO_PREALLOC))
-			goto write;
-
 		if (iocb->ki_flags & IOCB_DIRECT) {
 			/*
 			 * Convert inline data for Direct I/O before entering
 			 * f2fs_direct_IO().
 			 */
-			err = f2fs_convert_inline_inode(inode);
-			if (err)
-				goto out_err;
-			/*
-			 * If force_buffere_io() is true, we have to allocate
-			 * blocks all the time, since f2fs_direct_IO will fall
-			 * back to buffered IO.
-			 */
-			if (!f2fs_force_buffered_io(inode, iocb, from) &&
-					f2fs_lfs_mode(F2FS_I_SB(inode)))
-				goto write;
+			ret = f2fs_convert_inline_inode(inode);
+			if (ret)
+				goto out_unlock;
 		}
-		preallocated = true;
-		target_size = iocb->ki_pos + iov_iter_count(from);
 
-		err = f2fs_preallocate_blocks(iocb, from);
-		if (err) {
-out_err:
-			clear_inode_flag(inode, FI_NO_PREALLOC);
-			inode_unlock(inode);
-			ret = err;
-			goto out;
+		/* Possibly preallocate the blocks for the write. */
+		target_size = iocb->ki_pos + iov_iter_count(from);
+		preallocated = f2fs_preallocate_blocks(iocb, from);
+		if (preallocated < 0) {
+			ret = preallocated;
+			goto out_unlock;
 		}
-write:
+
 		ret = __generic_file_write_iter(iocb, from);
-		clear_inode_flag(inode, FI_NO_PREALLOC);
 
-		/* if we couldn't write data, we should deallocate blocks. */
-		if (preallocated && i_size_read(inode) < target_size) {
+		/* Don't leave any preallocated blocks around past i_size. */
+		if (preallocated > 0 && inode->i_size < target_size) {
 			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 			down_write(&F2FS_I(inode)->i_mmap_sem);
 			f2fs_truncate(inode);
 			up_write(&F2FS_I(inode)->i_mmap_sem);
 			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 		}
+		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
 
 		if (ret > 0)
 			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
 	}
-unlock:
+out_unlock:
 	inode_unlock(inode);
 out:
 	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

f2fs_write_begin() assumes that all blocks were preallocated by
default unless FI_NO_PREALLOC is explicitly set.  This invites data
corruption, as there are cases in which not all blocks are preallocated.
Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
buffered_io") fixed one case, but there are others remaining.

Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which
only gets set if all blocks for the current write were preallocated.

Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it
handle some of the logic that was previously in write_iter() directly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c |  55 ++--------------------
 fs/f2fs/f2fs.h |   3 +-
 fs/f2fs/file.c | 123 ++++++++++++++++++++++++++++++++-----------------
 3 files changed, 84 insertions(+), 97 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 18cb28a514e6..cdadaa9daf55 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1370,53 +1370,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 	return 0;
 }
 
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
-{
-	struct inode *inode = file_inode(iocb->ki_filp);
-	struct f2fs_map_blocks map;
-	int flag;
-	int err = 0;
-	bool direct_io = iocb->ki_flags & IOCB_DIRECT;
-
-	map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
-	map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
-	if (map.m_len > map.m_lblk)
-		map.m_len -= map.m_lblk;
-	else
-		map.m_len = 0;
-
-	map.m_next_pgofs = NULL;
-	map.m_next_extent = NULL;
-	map.m_seg_type = NO_CHECK_TYPE;
-	map.m_may_create = true;
-
-	if (direct_io) {
-		map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
-		flag = f2fs_force_buffered_io(inode, iocb, from) ?
-					F2FS_GET_BLOCK_PRE_AIO :
-					F2FS_GET_BLOCK_PRE_DIO;
-		goto map_blocks;
-	}
-	if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
-		err = f2fs_convert_inline_inode(inode);
-		if (err)
-			return err;
-	}
-	if (f2fs_has_inline_data(inode))
-		return err;
-
-	flag = F2FS_GET_BLOCK_PRE_AIO;
-
-map_blocks:
-	err = f2fs_map_blocks(inode, &map, 1, flag);
-	if (map.m_len > 0 && err == -ENOSPC) {
-		if (!direct_io)
-			set_inode_flag(inode, FI_NO_PREALLOC);
-		err = 0;
-	}
-	return err;
-}
-
 void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
 {
 	if (flag == F2FS_GET_BLOCK_PRE_AIO) {
@@ -3210,12 +3163,10 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
 	int flag;
 
 	/*
-	 * we already allocated all the blocks, so we don't need to get
-	 * the block addresses when there is no need to fill the page.
+	 * If a whole page is being written and we already preallocated all the
+	 * blocks, then there is no need to get a block address now.
 	 */
-	if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
-	    !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
-	    !f2fs_verity_in_progress(inode))
+	if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
 		return 0;
 
 	/* f2fs_lock_op avoids race between write CP and convert_inline_page */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ad7c1b94e23a..da1da3111f18 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -699,7 +699,7 @@ enum {
 	FI_INLINE_DOTS,		/* indicate inline dot dentries */
 	FI_DO_DEFRAG,		/* indicate defragment is running */
 	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
-	FI_NO_PREALLOC,		/* indicate skipped preallocated blocks */
+	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
 	FI_HOT_DATA,		/* indicate file is hot */
 	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
 	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
@@ -3604,7 +3604,6 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
 int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
 int f2fs_reserve_new_block(struct dnode_of_data *dn);
 int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
 int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
 struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
 			int op_flags, bool for_write);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index b1cb5b50faac..9b12004e78c6 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4218,10 +4218,72 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	return ret;
 }
 
+/*
+ * Preallocate blocks for a write request, if it is possible and helpful to do
+ * so.  Returns a positive number if blocks may have been preallocated, 0 if no
+ * blocks were preallocated, or a negative errno value if something went
+ * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
+ * requested blocks (not just some of them) have been allocated.
+ */
+static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	const loff_t pos = iocb->ki_pos;
+	const size_t count = iov_iter_count(iter);
+	struct f2fs_map_blocks map = {};
+	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
+		   !f2fs_force_buffered_io(inode, iocb, iter);
+	int flag;
+	int ret;
+
+	/* If it will be an in-place direct write, don't bother. */
+	if (dio && !f2fs_lfs_mode(sbi))
+		return 0;
+
+	/* No-wait I/O can't allocate blocks. */
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return 0;
+
+	/* If it will be a short write, don't bother. */
+	if (iov_iter_fault_in_readable(iter, count) != 0)
+		return 0;
+
+	if (f2fs_has_inline_data(inode)) {
+		/* If the data will fit inline, don't bother. */
+		if (pos + count <= MAX_INLINE_DATA(inode))
+			return 0;
+		ret = f2fs_convert_inline_inode(inode);
+		if (ret)
+			return ret;
+	}
+
+	map.m_lblk = (pos >> inode->i_blkbits);
+	map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
+	map.m_may_create = true;
+	if (dio) {
+		map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
+		flag = F2FS_GET_BLOCK_PRE_DIO;
+	} else {
+		map.m_seg_type = NO_CHECK_TYPE;
+		flag = F2FS_GET_BLOCK_PRE_AIO;
+	}
+
+	ret = f2fs_map_blocks(inode, &map, 1, flag);
+	/* -ENOSPC is only a fatal error if no blocks could be allocated. */
+	if (ret < 0 && !(ret == -ENOSPC && map.m_len > 0))
+		return ret;
+	if (ret == 0)
+		set_inode_flag(inode, FI_PREALLOCATED_ALL);
+	return map.m_len;
+}
+
 static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	loff_t target_size;
+	int preallocated;
 	ssize_t ret;
 
 	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
@@ -4245,84 +4307,59 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 	if (unlikely(IS_IMMUTABLE(inode))) {
 		ret = -EPERM;
-		goto unlock;
+		goto out_unlock;
 	}
 
 	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
 		ret = -EPERM;
-		goto unlock;
+		goto out_unlock;
 	}
 
 	ret = generic_write_checks(iocb, from);
 	if (ret > 0) {
-		bool preallocated = false;
-		size_t target_size = 0;
-		int err;
-
-		if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
-			set_inode_flag(inode, FI_NO_PREALLOC);
-
-		if ((iocb->ki_flags & IOCB_NOWAIT)) {
+		if (iocb->ki_flags & IOCB_NOWAIT) {
 			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
 						iov_iter_count(from)) ||
 				f2fs_has_inline_data(inode) ||
 				f2fs_force_buffered_io(inode, iocb, from)) {
-				clear_inode_flag(inode, FI_NO_PREALLOC);
-				inode_unlock(inode);
 				ret = -EAGAIN;
-				goto out;
+				goto out_unlock;
 			}
-			goto write;
 		}
-
-		if (is_inode_flag_set(inode, FI_NO_PREALLOC))
-			goto write;
-
 		if (iocb->ki_flags & IOCB_DIRECT) {
 			/*
 			 * Convert inline data for Direct I/O before entering
 			 * f2fs_direct_IO().
 			 */
-			err = f2fs_convert_inline_inode(inode);
-			if (err)
-				goto out_err;
-			/*
-			 * If force_buffere_io() is true, we have to allocate
-			 * blocks all the time, since f2fs_direct_IO will fall
-			 * back to buffered IO.
-			 */
-			if (!f2fs_force_buffered_io(inode, iocb, from) &&
-					f2fs_lfs_mode(F2FS_I_SB(inode)))
-				goto write;
+			ret = f2fs_convert_inline_inode(inode);
+			if (ret)
+				goto out_unlock;
 		}
-		preallocated = true;
-		target_size = iocb->ki_pos + iov_iter_count(from);
 
-		err = f2fs_preallocate_blocks(iocb, from);
-		if (err) {
-out_err:
-			clear_inode_flag(inode, FI_NO_PREALLOC);
-			inode_unlock(inode);
-			ret = err;
-			goto out;
+		/* Possibly preallocate the blocks for the write. */
+		target_size = iocb->ki_pos + iov_iter_count(from);
+		preallocated = f2fs_preallocate_blocks(iocb, from);
+		if (preallocated < 0) {
+			ret = preallocated;
+			goto out_unlock;
 		}
-write:
+
 		ret = __generic_file_write_iter(iocb, from);
-		clear_inode_flag(inode, FI_NO_PREALLOC);
 
-		/* if we couldn't write data, we should deallocate blocks. */
-		if (preallocated && i_size_read(inode) < target_size) {
+		/* Don't leave any preallocated blocks around past i_size. */
+		if (preallocated > 0 && inode->i_size < target_size) {
 			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 			down_write(&F2FS_I(inode)->i_mmap_sem);
 			f2fs_truncate(inode);
 			up_write(&F2FS_I(inode)->i_mmap_sem);
 			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 		}
+		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
 
 		if (ret > 0)
 			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
 	}
-unlock:
+out_unlock:
 	inode_unlock(inode);
 out:
 	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 4/9] f2fs: reduce indentation in f2fs_file_write_iter()
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Replace 'if (ret > 0)' with 'if (ret <= 0) goto out_unlock;'.
No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 73 +++++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9b12004e78c6..878b2460f79b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4316,49 +4316,50 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	}
 
 	ret = generic_write_checks(iocb, from);
-	if (ret > 0) {
-		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
-						iov_iter_count(from)) ||
-				f2fs_has_inline_data(inode) ||
-				f2fs_force_buffered_io(inode, iocb, from)) {
-				ret = -EAGAIN;
-				goto out_unlock;
-			}
-		}
-		if (iocb->ki_flags & IOCB_DIRECT) {
-			/*
-			 * Convert inline data for Direct I/O before entering
-			 * f2fs_direct_IO().
-			 */
-			ret = f2fs_convert_inline_inode(inode);
-			if (ret)
-				goto out_unlock;
-		}
+	if (ret <= 0)
+		goto out_unlock;
 
-		/* Possibly preallocate the blocks for the write. */
-		target_size = iocb->ki_pos + iov_iter_count(from);
-		preallocated = f2fs_preallocate_blocks(iocb, from);
-		if (preallocated < 0) {
-			ret = preallocated;
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!f2fs_overwrite_io(inode, iocb->ki_pos,
+				       iov_iter_count(from)) ||
+		    f2fs_has_inline_data(inode) ||
+		    f2fs_force_buffered_io(inode, iocb, from)) {
+			ret = -EAGAIN;
 			goto out_unlock;
 		}
+	}
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		/*
+		 * Convert inline data for Direct I/O before entering
+		 * f2fs_direct_IO().
+		 */
+		ret = f2fs_convert_inline_inode(inode);
+		if (ret)
+			goto out_unlock;
+	}
 
-		ret = __generic_file_write_iter(iocb, from);
+	/* Possibly preallocate the blocks for the write. */
+	target_size = iocb->ki_pos + iov_iter_count(from);
+	preallocated = f2fs_preallocate_blocks(iocb, from);
+	if (preallocated < 0) {
+		ret = preallocated;
+		goto out_unlock;
+	}
 
-		/* Don't leave any preallocated blocks around past i_size. */
-		if (preallocated > 0 && inode->i_size < target_size) {
-			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
-			down_write(&F2FS_I(inode)->i_mmap_sem);
-			f2fs_truncate(inode);
-			up_write(&F2FS_I(inode)->i_mmap_sem);
-			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
-		}
-		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
+	ret = __generic_file_write_iter(iocb, from);
 
-		if (ret > 0)
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
+	/* Don't leave any preallocated blocks around past i_size. */
+	if (preallocated > 0 && inode->i_size < target_size) {
+		down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+		down_write(&F2FS_I(inode)->i_mmap_sem);
+		f2fs_truncate(inode);
+		up_write(&F2FS_I(inode)->i_mmap_sem);
+		up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 	}
+	clear_inode_flag(inode, FI_PREALLOCATED_ALL);
+
+	if (ret > 0)
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
 out_unlock:
 	inode_unlock(inode);
 out:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 4/9] f2fs: reduce indentation in f2fs_file_write_iter()
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Replace 'if (ret > 0)' with 'if (ret <= 0) goto out_unlock;'.
No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 73 +++++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9b12004e78c6..878b2460f79b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4316,49 +4316,50 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	}
 
 	ret = generic_write_checks(iocb, from);
-	if (ret > 0) {
-		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
-						iov_iter_count(from)) ||
-				f2fs_has_inline_data(inode) ||
-				f2fs_force_buffered_io(inode, iocb, from)) {
-				ret = -EAGAIN;
-				goto out_unlock;
-			}
-		}
-		if (iocb->ki_flags & IOCB_DIRECT) {
-			/*
-			 * Convert inline data for Direct I/O before entering
-			 * f2fs_direct_IO().
-			 */
-			ret = f2fs_convert_inline_inode(inode);
-			if (ret)
-				goto out_unlock;
-		}
+	if (ret <= 0)
+		goto out_unlock;
 
-		/* Possibly preallocate the blocks for the write. */
-		target_size = iocb->ki_pos + iov_iter_count(from);
-		preallocated = f2fs_preallocate_blocks(iocb, from);
-		if (preallocated < 0) {
-			ret = preallocated;
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!f2fs_overwrite_io(inode, iocb->ki_pos,
+				       iov_iter_count(from)) ||
+		    f2fs_has_inline_data(inode) ||
+		    f2fs_force_buffered_io(inode, iocb, from)) {
+			ret = -EAGAIN;
 			goto out_unlock;
 		}
+	}
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		/*
+		 * Convert inline data for Direct I/O before entering
+		 * f2fs_direct_IO().
+		 */
+		ret = f2fs_convert_inline_inode(inode);
+		if (ret)
+			goto out_unlock;
+	}
 
-		ret = __generic_file_write_iter(iocb, from);
+	/* Possibly preallocate the blocks for the write. */
+	target_size = iocb->ki_pos + iov_iter_count(from);
+	preallocated = f2fs_preallocate_blocks(iocb, from);
+	if (preallocated < 0) {
+		ret = preallocated;
+		goto out_unlock;
+	}
 
-		/* Don't leave any preallocated blocks around past i_size. */
-		if (preallocated > 0 && inode->i_size < target_size) {
-			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
-			down_write(&F2FS_I(inode)->i_mmap_sem);
-			f2fs_truncate(inode);
-			up_write(&F2FS_I(inode)->i_mmap_sem);
-			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
-		}
-		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
+	ret = __generic_file_write_iter(iocb, from);
 
-		if (ret > 0)
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
+	/* Don't leave any preallocated blocks around past i_size. */
+	if (preallocated > 0 && inode->i_size < target_size) {
+		down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+		down_write(&F2FS_I(inode)->i_mmap_sem);
+		f2fs_truncate(inode);
+		up_write(&F2FS_I(inode)->i_mmap_sem);
+		up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 	}
+	clear_inode_flag(inode, FI_PREALLOCATED_ALL);
+
+	if (ret > 0)
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
 out_unlock:
 	inode_unlock(inode);
 out:
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 5/9] f2fs: fix the f2fs_file_write_iter tracepoint
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Pass in the original position and count rather than the position and
count that were updated by the write.  Also use the correct types for
all arguments, in particular the file offset which was being truncated
to 32 bits on 32-bit platforms.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c              |  5 +++--
 include/trace/events/f2fs.h | 12 ++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 878b2460f79b..279252c7f7bc 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4282,6 +4282,8 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	const loff_t orig_pos = iocb->ki_pos;
+	const size_t orig_count = iov_iter_count(from);
 	loff_t target_size;
 	int preallocated;
 	ssize_t ret;
@@ -4363,8 +4365,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 out_unlock:
 	inode_unlock(inode);
 out:
-	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
-					iov_iter_count(from), ret);
+	trace_f2fs_file_write_iter(inode, orig_pos, orig_count, ret);
 	if (ret > 0)
 		ret = generic_write_sync(iocb, ret);
 	return ret;
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 56b113e3cd6a..bffb38622e9b 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -540,17 +540,17 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
 
 TRACE_EVENT(f2fs_file_write_iter,
 
-	TP_PROTO(struct inode *inode, unsigned long offset,
-		unsigned long length, int ret),
+	TP_PROTO(struct inode *inode, loff_t offset, size_t length,
+		 ssize_t ret),
 
 	TP_ARGS(inode, offset, length, ret),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
 		__field(ino_t,	ino)
-		__field(unsigned long, offset)
-		__field(unsigned long, length)
-		__field(int,	ret)
+		__field(loff_t, offset)
+		__field(size_t, length)
+		__field(ssize_t, ret)
 	),
 
 	TP_fast_assign(
@@ -562,7 +562,7 @@ TRACE_EVENT(f2fs_file_write_iter,
 	),
 
 	TP_printk("dev = (%d,%d), ino = %lu, "
-		"offset = %lu, length = %lu, written(err) = %d",
+		"offset = %lld, length = %zu, written(err) = %zd",
 		show_dev_ino(__entry),
 		__entry->offset,
 		__entry->length,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 5/9] f2fs: fix the f2fs_file_write_iter tracepoint
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Pass in the original position and count rather than the position and
count that were updated by the write.  Also use the correct types for
all arguments, in particular the file offset which was being truncated
to 32 bits on 32-bit platforms.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c              |  5 +++--
 include/trace/events/f2fs.h | 12 ++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 878b2460f79b..279252c7f7bc 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4282,6 +4282,8 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	const loff_t orig_pos = iocb->ki_pos;
+	const size_t orig_count = iov_iter_count(from);
 	loff_t target_size;
 	int preallocated;
 	ssize_t ret;
@@ -4363,8 +4365,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 out_unlock:
 	inode_unlock(inode);
 out:
-	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
-					iov_iter_count(from), ret);
+	trace_f2fs_file_write_iter(inode, orig_pos, orig_count, ret);
 	if (ret > 0)
 		ret = generic_write_sync(iocb, ret);
 	return ret;
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 56b113e3cd6a..bffb38622e9b 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -540,17 +540,17 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
 
 TRACE_EVENT(f2fs_file_write_iter,
 
-	TP_PROTO(struct inode *inode, unsigned long offset,
-		unsigned long length, int ret),
+	TP_PROTO(struct inode *inode, loff_t offset, size_t length,
+		 ssize_t ret),
 
 	TP_ARGS(inode, offset, length, ret),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
 		__field(ino_t,	ino)
-		__field(unsigned long, offset)
-		__field(unsigned long, length)
-		__field(int,	ret)
+		__field(loff_t, offset)
+		__field(size_t, length)
+		__field(ssize_t, ret)
 	),
 
 	TP_fast_assign(
@@ -562,7 +562,7 @@ TRACE_EVENT(f2fs_file_write_iter,
 	),
 
 	TP_printk("dev = (%d,%d), ino = %lu, "
-		"offset = %lu, length = %lu, written(err) = %d",
+		"offset = %lld, length = %zu, written(err) = %zd",
 		show_dev_ino(__entry),
 		__entry->offset,
 		__entry->length,
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 6/9] f2fs: implement iomap operations
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Implement 'struct iomap_ops' and 'struct iomap_dio_ops' for f2fs, in
preparation for making f2fs use iomap for direct I/O.

Note that f2fs_iomap_ops may be used for other things besides direct I/O
in the future; however, for now I've only tested it for direct I/O.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/Kconfig |  1 +
 fs/f2fs/data.c  | 95 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/f2fs/f2fs.h  |  2 ++
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 7669de7b49ce..031fbb596450 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -7,6 +7,7 @@ config F2FS_FS
 	select CRYPTO_CRC32
 	select F2FS_FS_XATTR if FS_ENCRYPTION
 	select FS_ENCRYPTION_ALGS if FS_ENCRYPTION
+	select FS_IOMAP
 	select LZ4_COMPRESS if F2FS_FS_LZ4
 	select LZ4_DECOMPRESS if F2FS_FS_LZ4
 	select LZ4HC_COMPRESS if F2FS_FS_LZ4HC
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index cdadaa9daf55..9243159ee753 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -21,6 +21,7 @@
 #include <linux/cleancache.h>
 #include <linux/sched/signal.h>
 #include <linux/fiemap.h>
+#include <linux/iomap.h>
 
 #include "f2fs.h"
 #include "node.h"
@@ -3452,7 +3453,7 @@ static void f2fs_dio_end_io(struct bio *bio)
 	bio_endio(bio);
 }
 
-static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode,
+static void f2fs_dio_submit_bio_old(struct bio *bio, struct inode *inode,
 							loff_t file_offset)
 {
 	struct f2fs_private_dio *dio;
@@ -3481,6 +3482,35 @@ static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode,
 	bio_endio(bio);
 }
 
+static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
+				    struct bio *bio, loff_t file_offset)
+{
+	struct f2fs_private_dio *dio;
+	bool write = (bio_op(bio) == REQ_OP_WRITE);
+
+	dio = f2fs_kzalloc(F2FS_I_SB(inode),
+			sizeof(struct f2fs_private_dio), GFP_NOFS);
+	if (!dio)
+		goto out;
+
+	dio->inode = inode;
+	dio->orig_end_io = bio->bi_end_io;
+	dio->orig_private = bio->bi_private;
+	dio->write = write;
+
+	bio->bi_end_io = f2fs_dio_end_io;
+	bio->bi_private = dio;
+
+	inc_page_count(F2FS_I_SB(inode),
+			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
+
+	return submit_bio(bio);
+out:
+	bio->bi_status = BLK_STS_IOERR;
+	bio_endio(bio);
+	return BLK_QC_T_NONE;
+}
+
 static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
@@ -3529,7 +3559,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 	err = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
 			iter, rw == WRITE ? get_data_block_dio_write :
-			get_data_block_dio, NULL, f2fs_dio_submit_bio,
+			get_data_block_dio, NULL, f2fs_dio_submit_bio_old,
 			rw == WRITE ? DIO_LOCKING | DIO_SKIP_HOLES :
 			DIO_SKIP_HOLES);
 
@@ -4101,3 +4131,64 @@ void f2fs_destroy_bio_entry_cache(void)
 {
 	kmem_cache_destroy(bio_entry_slab);
 }
+
+static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+			    unsigned int flags, struct iomap *iomap,
+			    struct iomap *srcmap)
+{
+	struct f2fs_map_blocks map = {};
+	pgoff_t next_pgofs = 0;
+	int err;
+
+	map.m_lblk = bytes_to_blks(inode, offset);
+	map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1;
+	map.m_next_pgofs = &next_pgofs;
+	map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
+	if (flags & IOMAP_WRITE)
+		map.m_may_create = true;
+
+	err = f2fs_map_blocks(inode, &map, flags & IOMAP_WRITE,
+			      F2FS_GET_BLOCK_DIO);
+	if (err)
+		return err;
+
+	iomap->offset = blks_to_bytes(inode, map.m_lblk);
+
+	if (map.m_flags & (F2FS_MAP_MAPPED | F2FS_MAP_UNWRITTEN)) {
+		iomap->length = blks_to_bytes(inode, map.m_len);
+		if (map.m_flags & F2FS_MAP_MAPPED) {
+			iomap->type = IOMAP_MAPPED;
+			iomap->flags |= IOMAP_F_MERGED;
+		} else {
+			iomap->type = IOMAP_UNWRITTEN;
+		}
+		if (WARN_ON_ONCE(!__is_valid_data_blkaddr(map.m_pblk)))
+			return -EINVAL;
+		iomap->addr = blks_to_bytes(inode, map.m_pblk);
+
+		if (WARN_ON_ONCE(f2fs_is_multi_device(F2FS_I_SB(inode))))
+			return -EINVAL;
+		iomap->bdev = inode->i_sb->s_bdev;
+	} else {
+		iomap->length = blks_to_bytes(inode, next_pgofs) -
+				iomap->offset;
+		iomap->type = IOMAP_HOLE;
+		iomap->addr = IOMAP_NULL_ADDR;
+	}
+
+	if (map.m_flags & F2FS_MAP_NEW)
+		iomap->flags |= IOMAP_F_NEW;
+	if ((inode->i_state & I_DIRTY_DATASYNC) ||
+	    offset + length > i_size_read(inode))
+		iomap->flags |= IOMAP_F_DIRTY;
+
+	return 0;
+}
+
+const struct iomap_ops f2fs_iomap_ops = {
+	.iomap_begin	= f2fs_iomap_begin,
+};
+
+const struct iomap_dio_ops f2fs_iomap_dio_ops = {
+	.submit_io	= f2fs_dio_submit_bio,
+};
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index da1da3111f18..d2b1ef6976c4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3639,6 +3639,8 @@ int f2fs_init_post_read_processing(void);
 void f2fs_destroy_post_read_processing(void);
 int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
+extern const struct iomap_ops f2fs_iomap_ops;
+extern const struct iomap_dio_ops f2fs_iomap_dio_ops;
 
 /*
  * gc.c
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Implement 'struct iomap_ops' and 'struct iomap_dio_ops' for f2fs, in
preparation for making f2fs use iomap for direct I/O.

Note that f2fs_iomap_ops may be used for other things besides direct I/O
in the future; however, for now I've only tested it for direct I/O.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/Kconfig |  1 +
 fs/f2fs/data.c  | 95 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/f2fs/f2fs.h  |  2 ++
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 7669de7b49ce..031fbb596450 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -7,6 +7,7 @@ config F2FS_FS
 	select CRYPTO_CRC32
 	select F2FS_FS_XATTR if FS_ENCRYPTION
 	select FS_ENCRYPTION_ALGS if FS_ENCRYPTION
+	select FS_IOMAP
 	select LZ4_COMPRESS if F2FS_FS_LZ4
 	select LZ4_DECOMPRESS if F2FS_FS_LZ4
 	select LZ4HC_COMPRESS if F2FS_FS_LZ4HC
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index cdadaa9daf55..9243159ee753 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -21,6 +21,7 @@
 #include <linux/cleancache.h>
 #include <linux/sched/signal.h>
 #include <linux/fiemap.h>
+#include <linux/iomap.h>
 
 #include "f2fs.h"
 #include "node.h"
@@ -3452,7 +3453,7 @@ static void f2fs_dio_end_io(struct bio *bio)
 	bio_endio(bio);
 }
 
-static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode,
+static void f2fs_dio_submit_bio_old(struct bio *bio, struct inode *inode,
 							loff_t file_offset)
 {
 	struct f2fs_private_dio *dio;
@@ -3481,6 +3482,35 @@ static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode,
 	bio_endio(bio);
 }
 
+static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
+				    struct bio *bio, loff_t file_offset)
+{
+	struct f2fs_private_dio *dio;
+	bool write = (bio_op(bio) == REQ_OP_WRITE);
+
+	dio = f2fs_kzalloc(F2FS_I_SB(inode),
+			sizeof(struct f2fs_private_dio), GFP_NOFS);
+	if (!dio)
+		goto out;
+
+	dio->inode = inode;
+	dio->orig_end_io = bio->bi_end_io;
+	dio->orig_private = bio->bi_private;
+	dio->write = write;
+
+	bio->bi_end_io = f2fs_dio_end_io;
+	bio->bi_private = dio;
+
+	inc_page_count(F2FS_I_SB(inode),
+			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
+
+	return submit_bio(bio);
+out:
+	bio->bi_status = BLK_STS_IOERR;
+	bio_endio(bio);
+	return BLK_QC_T_NONE;
+}
+
 static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
@@ -3529,7 +3559,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 	err = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
 			iter, rw == WRITE ? get_data_block_dio_write :
-			get_data_block_dio, NULL, f2fs_dio_submit_bio,
+			get_data_block_dio, NULL, f2fs_dio_submit_bio_old,
 			rw == WRITE ? DIO_LOCKING | DIO_SKIP_HOLES :
 			DIO_SKIP_HOLES);
 
@@ -4101,3 +4131,64 @@ void f2fs_destroy_bio_entry_cache(void)
 {
 	kmem_cache_destroy(bio_entry_slab);
 }
+
+static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+			    unsigned int flags, struct iomap *iomap,
+			    struct iomap *srcmap)
+{
+	struct f2fs_map_blocks map = {};
+	pgoff_t next_pgofs = 0;
+	int err;
+
+	map.m_lblk = bytes_to_blks(inode, offset);
+	map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1;
+	map.m_next_pgofs = &next_pgofs;
+	map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
+	if (flags & IOMAP_WRITE)
+		map.m_may_create = true;
+
+	err = f2fs_map_blocks(inode, &map, flags & IOMAP_WRITE,
+			      F2FS_GET_BLOCK_DIO);
+	if (err)
+		return err;
+
+	iomap->offset = blks_to_bytes(inode, map.m_lblk);
+
+	if (map.m_flags & (F2FS_MAP_MAPPED | F2FS_MAP_UNWRITTEN)) {
+		iomap->length = blks_to_bytes(inode, map.m_len);
+		if (map.m_flags & F2FS_MAP_MAPPED) {
+			iomap->type = IOMAP_MAPPED;
+			iomap->flags |= IOMAP_F_MERGED;
+		} else {
+			iomap->type = IOMAP_UNWRITTEN;
+		}
+		if (WARN_ON_ONCE(!__is_valid_data_blkaddr(map.m_pblk)))
+			return -EINVAL;
+		iomap->addr = blks_to_bytes(inode, map.m_pblk);
+
+		if (WARN_ON_ONCE(f2fs_is_multi_device(F2FS_I_SB(inode))))
+			return -EINVAL;
+		iomap->bdev = inode->i_sb->s_bdev;
+	} else {
+		iomap->length = blks_to_bytes(inode, next_pgofs) -
+				iomap->offset;
+		iomap->type = IOMAP_HOLE;
+		iomap->addr = IOMAP_NULL_ADDR;
+	}
+
+	if (map.m_flags & F2FS_MAP_NEW)
+		iomap->flags |= IOMAP_F_NEW;
+	if ((inode->i_state & I_DIRTY_DATASYNC) ||
+	    offset + length > i_size_read(inode))
+		iomap->flags |= IOMAP_F_DIRTY;
+
+	return 0;
+}
+
+const struct iomap_ops f2fs_iomap_ops = {
+	.iomap_begin	= f2fs_iomap_begin,
+};
+
+const struct iomap_dio_ops f2fs_iomap_dio_ops = {
+	.submit_io	= f2fs_dio_submit_bio,
+};
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index da1da3111f18..d2b1ef6976c4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3639,6 +3639,8 @@ int f2fs_init_post_read_processing(void);
 void f2fs_destroy_post_read_processing(void);
 int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
+extern const struct iomap_ops f2fs_iomap_ops;
+extern const struct iomap_dio_ops f2fs_iomap_dio_ops;
 
 /*
  * gc.c
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 7/9] f2fs: use iomap for direct I/O reads
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Convert f2fs_file_read_iter() to use iomap_dio_rw() for direct I/O
rather than using f2fs_direct_IO() via generic_file_read_iter().

Besides the new direct I/O implementation being more efficient
(especially with regards to the block mapping), this change retains the
existing f2fs behavior such as the conditions for falling back to
buffered I/O, the locking of i_gc_rwsem[READ], the iostat gathering, and
the f2fs_direct_IO_{enter,exit} tracepoints.  An exception is that we no
longer fall back to a buffered I/O read if a direct I/O read returns a
short read (previously this was done by generic_file_read_iter()), as
this doesn't appear to be a useful thing to do on f2fs.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/f2fs.h |  7 ++---
 fs/f2fs/file.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d2b1ef6976c4..f869c4a2f79f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3243,10 +3243,9 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
 			sbi->rw_iostat[APP_WRITE_IO] -
 			sbi->rw_iostat[APP_DIRECT_IO];
 
-	if (type == APP_READ_IO || type == APP_DIRECT_READ_IO)
-		sbi->rw_iostat[APP_BUFFERED_READ_IO] =
-			sbi->rw_iostat[APP_READ_IO] -
-			sbi->rw_iostat[APP_DIRECT_READ_IO];
+	if (type == APP_BUFFERED_READ_IO || type == APP_DIRECT_READ_IO)
+		sbi->rw_iostat[APP_READ_IO] += io_bytes;
+
 	spin_unlock(&sbi->iostat_lock);
 
 	f2fs_record_iostat(sbi);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 279252c7f7bc..52de655ef833 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -23,6 +23,7 @@
 #include <linux/nls.h>
 #include <linux/sched/signal.h>
 #include <linux/fileattr.h>
+#include <linux/iomap.h>
 
 #include "f2fs.h"
 #include "node.h"
@@ -4201,20 +4202,93 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	return __f2fs_ioctl(filp, cmd, arg);
 }
 
-static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+/*
+ * Return %true if the given read or write request should use direct I/O, or
+ * %false if it should use buffered I/O.
+ */
+static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
+				struct iov_iter *iter)
+{
+	unsigned int align;
+
+	if (!(iocb->ki_flags & IOCB_DIRECT))
+		return false;
+
+	if (f2fs_force_buffered_io(inode, iocb, iter))
+		return false;
+
+	/*
+	 * Direct I/O not aligned to the disk's logical_block_size will be
+	 * attempted, but will fail with -EINVAL.
+	 *
+	 * f2fs additionally requires that direct I/O be aligned to the
+	 * filesystem block size, which is often a stricter requirement.
+	 * However, f2fs traditionally falls back to buffered I/O on requests
+	 * that are logical_block_size-aligned but not fs-block aligned.
+	 *
+	 * The below logic implements this behavior.
+	 */
+	align = iocb->ki_pos | iov_iter_alignment(iter);
+	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
+	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
+		return false;
+
+	return true;
+}
+
+static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	int ret;
+	struct f2fs_inode_info *fi = F2FS_I(inode);
+	const loff_t pos = iocb->ki_pos;
+	const size_t count = iov_iter_count(to);
+	ssize_t ret;
+
+	if (count == 0)
+		return 0; /* skip atime update */
+
+	trace_f2fs_direct_IO_enter(inode, pos, count, READ);
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!down_read_trylock(&fi->i_gc_rwsem[READ])) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	} else {
+		down_read(&fi->i_gc_rwsem[READ]);
+	}
+
+	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
+
+	up_read(&fi->i_gc_rwsem[READ]);
+
+	file_accessed(file);
+
+	if (ret > 0)
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO, ret);
+	else if (ret == -EIOCBQUEUED)
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
+				   count - iov_iter_count(to));
+out:
+	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
+	return ret;
+}
+
+static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	ssize_t ret;
 
 	if (!f2fs_is_compress_backend_ready(inode))
 		return -EOPNOTSUPP;
 
-	ret = generic_file_read_iter(iocb, iter);
+	if (f2fs_should_use_dio(inode, iocb, to))
+		return f2fs_dio_read_iter(iocb, to);
 
+	ret = filemap_read(iocb, to, 0);
 	if (ret > 0)
-		f2fs_update_iostat(F2FS_I_SB(inode), APP_READ_IO, ret);
-
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_READ_IO, ret);
 	return ret;
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 7/9] f2fs: use iomap for direct I/O reads
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Convert f2fs_file_read_iter() to use iomap_dio_rw() for direct I/O
rather than using f2fs_direct_IO() via generic_file_read_iter().

Besides the new direct I/O implementation being more efficient
(especially with regards to the block mapping), this change retains the
existing f2fs behavior such as the conditions for falling back to
buffered I/O, the locking of i_gc_rwsem[READ], the iostat gathering, and
the f2fs_direct_IO_{enter,exit} tracepoints.  An exception is that we no
longer fall back to a buffered I/O read if a direct I/O read returns a
short read (previously this was done by generic_file_read_iter()), as
this doesn't appear to be a useful thing to do on f2fs.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/f2fs.h |  7 ++---
 fs/f2fs/file.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d2b1ef6976c4..f869c4a2f79f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3243,10 +3243,9 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
 			sbi->rw_iostat[APP_WRITE_IO] -
 			sbi->rw_iostat[APP_DIRECT_IO];
 
-	if (type == APP_READ_IO || type == APP_DIRECT_READ_IO)
-		sbi->rw_iostat[APP_BUFFERED_READ_IO] =
-			sbi->rw_iostat[APP_READ_IO] -
-			sbi->rw_iostat[APP_DIRECT_READ_IO];
+	if (type == APP_BUFFERED_READ_IO || type == APP_DIRECT_READ_IO)
+		sbi->rw_iostat[APP_READ_IO] += io_bytes;
+
 	spin_unlock(&sbi->iostat_lock);
 
 	f2fs_record_iostat(sbi);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 279252c7f7bc..52de655ef833 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -23,6 +23,7 @@
 #include <linux/nls.h>
 #include <linux/sched/signal.h>
 #include <linux/fileattr.h>
+#include <linux/iomap.h>
 
 #include "f2fs.h"
 #include "node.h"
@@ -4201,20 +4202,93 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	return __f2fs_ioctl(filp, cmd, arg);
 }
 
-static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+/*
+ * Return %true if the given read or write request should use direct I/O, or
+ * %false if it should use buffered I/O.
+ */
+static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
+				struct iov_iter *iter)
+{
+	unsigned int align;
+
+	if (!(iocb->ki_flags & IOCB_DIRECT))
+		return false;
+
+	if (f2fs_force_buffered_io(inode, iocb, iter))
+		return false;
+
+	/*
+	 * Direct I/O not aligned to the disk's logical_block_size will be
+	 * attempted, but will fail with -EINVAL.
+	 *
+	 * f2fs additionally requires that direct I/O be aligned to the
+	 * filesystem block size, which is often a stricter requirement.
+	 * However, f2fs traditionally falls back to buffered I/O on requests
+	 * that are logical_block_size-aligned but not fs-block aligned.
+	 *
+	 * The below logic implements this behavior.
+	 */
+	align = iocb->ki_pos | iov_iter_alignment(iter);
+	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
+	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
+		return false;
+
+	return true;
+}
+
+static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	int ret;
+	struct f2fs_inode_info *fi = F2FS_I(inode);
+	const loff_t pos = iocb->ki_pos;
+	const size_t count = iov_iter_count(to);
+	ssize_t ret;
+
+	if (count == 0)
+		return 0; /* skip atime update */
+
+	trace_f2fs_direct_IO_enter(inode, pos, count, READ);
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!down_read_trylock(&fi->i_gc_rwsem[READ])) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	} else {
+		down_read(&fi->i_gc_rwsem[READ]);
+	}
+
+	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
+
+	up_read(&fi->i_gc_rwsem[READ]);
+
+	file_accessed(file);
+
+	if (ret > 0)
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO, ret);
+	else if (ret == -EIOCBQUEUED)
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
+				   count - iov_iter_count(to));
+out:
+	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
+	return ret;
+}
+
+static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	ssize_t ret;
 
 	if (!f2fs_is_compress_backend_ready(inode))
 		return -EOPNOTSUPP;
 
-	ret = generic_file_read_iter(iocb, iter);
+	if (f2fs_should_use_dio(inode, iocb, to))
+		return f2fs_dio_read_iter(iocb, to);
 
+	ret = filemap_read(iocb, to, 0);
 	if (ret > 0)
-		f2fs_update_iostat(F2FS_I_SB(inode), APP_READ_IO, ret);
-
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_READ_IO, ret);
 	return ret;
 }
 
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 8/9] f2fs: use iomap for direct I/O writes
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Convert f2fs_file_write_iter() to use iomap_dio_rw() for direct I/O
rather than using f2fs_direct_IO() via __generic_file_write_iter().

This is more complicated than the read-side conversion, but it follows a
similar pattern.  Some logic in __generic_file_write_iter() needed to be
re-implemented, while other things are now handled by iomap_dio_rw().
Existing f2fs behavior such as the conditions for falling back to
buffered I/O is retained, except for some things which shouldn't matter
such as the exact time that the timestamps are updated.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c |   7 +-
 fs/f2fs/f2fs.h |   7 +-
 fs/f2fs/file.c | 215 ++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 180 insertions(+), 49 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9243159ee753..0d2bb651483d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1363,11 +1363,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 		f2fs_invalidate_compress_page(sbi, old_blkaddr);
 	}
 	f2fs_update_data_blkaddr(dn, dn->data_blkaddr);
-
-	/*
-	 * i_size will be updated by direct_IO. Otherwise, we'll get stale
-	 * data from unwritten block via dio_read.
-	 */
 	return 0;
 }
 
@@ -3130,7 +3125,7 @@ static int f2fs_write_data_pages(struct address_space *mapping,
 			FS_CP_DATA_IO : FS_DATA_IO);
 }
 
-static void f2fs_write_failed(struct inode *inode, loff_t to)
+void f2fs_write_failed(struct inode *inode, loff_t to)
 {
 	loff_t i_size = i_size_read(inode);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f869c4a2f79f..6dbbac05a15c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3238,10 +3238,8 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
 	spin_lock(&sbi->iostat_lock);
 	sbi->rw_iostat[type] += io_bytes;
 
-	if (type == APP_WRITE_IO || type == APP_DIRECT_IO)
-		sbi->rw_iostat[APP_BUFFERED_IO] =
-			sbi->rw_iostat[APP_WRITE_IO] -
-			sbi->rw_iostat[APP_DIRECT_IO];
+	if (type == APP_BUFFERED_IO || type == APP_DIRECT_IO)
+		sbi->rw_iostat[APP_WRITE_IO] += io_bytes;
 
 	if (type == APP_BUFFERED_READ_IO || type == APP_DIRECT_READ_IO)
 		sbi->rw_iostat[APP_READ_IO] += io_bytes;
@@ -3625,6 +3623,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
 				struct writeback_control *wbc,
 				enum iostat_type io_type,
 				int compr_blocks, bool allow_balance);
+void f2fs_write_failed(struct inode *inode, loff_t to);
 void f2fs_invalidate_page(struct page *page, unsigned int offset,
 			unsigned int length);
 int f2fs_release_page(struct page *page, gfp_t wait);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 52de655ef833..6b8eac6b25d4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4292,6 +4292,29 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return ret;
 }
 
+static ssize_t f2fs_write_checks(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	ssize_t count;
+	int err;
+
+	if (IS_IMMUTABLE(inode))
+		return -EPERM;
+
+	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED))
+		return -EPERM;
+
+	count = generic_write_checks(iocb, from);
+	if (count <= 0)
+		return count;
+
+	err = file_modified(file);
+	if (err)
+		return err;
+	return count;
+}
+
 /*
  * Preallocate blocks for a write request, if it is possible and helpful to do
  * so.  Returns a positive number if blocks may have been preallocated, 0 if no
@@ -4299,15 +4322,14 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
  * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
  * requested blocks (not just some of them) have been allocated.
  */
-static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
+static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
+				   bool dio)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	const loff_t pos = iocb->ki_pos;
 	const size_t count = iov_iter_count(iter);
 	struct f2fs_map_blocks map = {};
-	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
-		   !f2fs_force_buffered_io(inode, iocb, iter);
 	int flag;
 	int ret;
 
@@ -4352,13 +4374,153 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
 	return map.m_len;
 }
 
-static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
+					struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	ssize_t ret;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EOPNOTSUPP;
+
+	current->backing_dev_info = inode_to_bdi(inode);
+	ret = generic_perform_write(file, from, iocb->ki_pos);
+	current->backing_dev_info = NULL;
+
+	if (ret > 0) {
+		iocb->ki_pos += ret;
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_IO, ret);
+	}
+	return ret;
+}
+
+static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
+				   bool *may_need_sync)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct f2fs_inode_info *fi = F2FS_I(inode);
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	const bool do_opu = f2fs_lfs_mode(sbi);
+	const int whint_mode = F2FS_OPTION(sbi).whint_mode;
+	const loff_t pos = iocb->ki_pos;
+	const ssize_t count = iov_iter_count(from);
+	const enum rw_hint hint = iocb->ki_hint;
+	unsigned int dio_flags = 0;
+	ssize_t ret;
+
+	trace_f2fs_direct_IO_enter(inode, pos, count, WRITE);
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		/* f2fs_convert_inline_inode() and block allocation can block */
+		if (f2fs_has_inline_data(inode) ||
+		    !f2fs_overwrite_io(inode, pos, count)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	} else {
+		ret = f2fs_convert_inline_inode(inode);
+		if (ret)
+			goto out;
+	}
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!down_read_trylock(&fi->i_gc_rwsem[WRITE])) {
+			ret = -EAGAIN;
+			goto out;
+		}
+		if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) {
+			up_read(&fi->i_gc_rwsem[WRITE]);
+			ret = -EAGAIN;
+			goto out;
+		}
+	} else {
+		down_read(&fi->i_gc_rwsem[WRITE]);
+		if (do_opu)
+			down_read(&fi->i_gc_rwsem[READ]);
+	}
+
+	if (whint_mode == WHINT_MODE_OFF)
+		iocb->ki_hint = WRITE_LIFE_NOT_SET;
+
+	if (pos + count > inode->i_size)
+		dio_flags |= IOMAP_DIO_FORCE_WAIT;
+	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
+			   dio_flags);
+	if (ret == -ENOTBLK)
+		ret = 0;
+
+	if (whint_mode == WHINT_MODE_OFF)
+		iocb->ki_hint = hint;
+
+	if (do_opu)
+		up_read(&fi->i_gc_rwsem[READ]);
+
+	up_read(&fi->i_gc_rwsem[WRITE]);
+
+	if (ret < 0) {
+		if (ret == -EIOCBQUEUED)
+			f2fs_update_iostat(sbi, APP_DIRECT_IO,
+					   count - iov_iter_count(from));
+		goto out;
+	}
+	if (pos + ret > inode->i_size)
+		f2fs_i_size_write(inode, pos + ret);
+	f2fs_update_iostat(sbi, APP_DIRECT_IO, ret);
+	if (!do_opu)
+		set_inode_flag(inode, FI_UPDATE_WRITE);
+
+	if (iov_iter_count(from)) {
+		ssize_t ret2;
+		loff_t bufio_start_pos = iocb->ki_pos;
+
+		/*
+		 * The direct write was partial, so we need to fall back to a
+		 * buffered write for the remainder.
+		 */
+
+		ret2 = f2fs_buffered_write_iter(iocb, from);
+		if (iov_iter_count(from))
+			f2fs_write_failed(inode, iocb->ki_pos);
+		if (ret2 < 0)
+			goto out;
+
+		/*
+		 * Ensure that the pagecache pages are written to disk and
+		 * invalidated to preserve the expected O_DIRECT semantics.
+		 */
+		if (ret2 > 0) {
+			loff_t bufio_end_pos = bufio_start_pos + ret2 - 1;
+
+			ret += ret2;
+
+			ret2 = filemap_write_and_wait_range(file->f_mapping,
+							    bufio_start_pos,
+							    bufio_end_pos);
+			if (ret2 < 0)
+				goto out;
+			invalidate_mapping_pages(file->f_mapping,
+						 bufio_start_pos >> PAGE_SHIFT,
+						 bufio_end_pos >> PAGE_SHIFT);
+		}
+	} else {
+		/* iomap_dio_rw() already handled the generic_write_sync(). */
+		*may_need_sync = false;
+	}
+out:
+	trace_f2fs_direct_IO_exit(inode, pos, count, WRITE, ret);
+	return ret;
+}
+
+static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
 	const loff_t orig_pos = iocb->ki_pos;
 	const size_t orig_count = iov_iter_count(from);
 	loff_t target_size;
+	bool dio;
+	bool may_need_sync = true;
 	int preallocated;
 	ssize_t ret;
 
@@ -4381,48 +4543,26 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		inode_lock(inode);
 	}
 
-	if (unlikely(IS_IMMUTABLE(inode))) {
-		ret = -EPERM;
-		goto out_unlock;
-	}
-
-	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
-		ret = -EPERM;
-		goto out_unlock;
-	}
-
-	ret = generic_write_checks(iocb, from);
+	ret = f2fs_write_checks(iocb, from);
 	if (ret <= 0)
 		goto out_unlock;
 
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		if (!f2fs_overwrite_io(inode, iocb->ki_pos,
-				       iov_iter_count(from)) ||
-		    f2fs_has_inline_data(inode) ||
-		    f2fs_force_buffered_io(inode, iocb, from)) {
-			ret = -EAGAIN;
-			goto out_unlock;
-		}
-	}
-	if (iocb->ki_flags & IOCB_DIRECT) {
-		/*
-		 * Convert inline data for Direct I/O before entering
-		 * f2fs_direct_IO().
-		 */
-		ret = f2fs_convert_inline_inode(inode);
-		if (ret)
-			goto out_unlock;
-	}
+	/* Determine whether we will do a direct write or a buffered write. */
+	dio = f2fs_should_use_dio(inode, iocb, from);
 
 	/* Possibly preallocate the blocks for the write. */
 	target_size = iocb->ki_pos + iov_iter_count(from);
-	preallocated = f2fs_preallocate_blocks(iocb, from);
+	preallocated = f2fs_preallocate_blocks(iocb, from, dio);
 	if (preallocated < 0) {
 		ret = preallocated;
 		goto out_unlock;
 	}
 
-	ret = __generic_file_write_iter(iocb, from);
+	/* Do the actual write. */
+	if (dio)
+		ret = f2fs_dio_write_iter(iocb, from, &may_need_sync);
+	else
+		ret = f2fs_buffered_write_iter(iocb, from);
 
 	/* Don't leave any preallocated blocks around past i_size. */
 	if (preallocated > 0 && inode->i_size < target_size) {
@@ -4433,14 +4573,11 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 	}
 	clear_inode_flag(inode, FI_PREALLOCATED_ALL);
-
-	if (ret > 0)
-		f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
 out_unlock:
 	inode_unlock(inode);
 out:
 	trace_f2fs_file_write_iter(inode, orig_pos, orig_count, ret);
-	if (ret > 0)
+	if (ret > 0 && may_need_sync)
 		ret = generic_write_sync(iocb, ret);
 	return ret;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 8/9] f2fs: use iomap for direct I/O writes
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Convert f2fs_file_write_iter() to use iomap_dio_rw() for direct I/O
rather than using f2fs_direct_IO() via __generic_file_write_iter().

This is more complicated than the read-side conversion, but it follows a
similar pattern.  Some logic in __generic_file_write_iter() needed to be
re-implemented, while other things are now handled by iomap_dio_rw().
Existing f2fs behavior such as the conditions for falling back to
buffered I/O is retained, except for some things which shouldn't matter
such as the exact time that the timestamps are updated.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c |   7 +-
 fs/f2fs/f2fs.h |   7 +-
 fs/f2fs/file.c | 215 ++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 180 insertions(+), 49 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9243159ee753..0d2bb651483d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1363,11 +1363,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 		f2fs_invalidate_compress_page(sbi, old_blkaddr);
 	}
 	f2fs_update_data_blkaddr(dn, dn->data_blkaddr);
-
-	/*
-	 * i_size will be updated by direct_IO. Otherwise, we'll get stale
-	 * data from unwritten block via dio_read.
-	 */
 	return 0;
 }
 
@@ -3130,7 +3125,7 @@ static int f2fs_write_data_pages(struct address_space *mapping,
 			FS_CP_DATA_IO : FS_DATA_IO);
 }
 
-static void f2fs_write_failed(struct inode *inode, loff_t to)
+void f2fs_write_failed(struct inode *inode, loff_t to)
 {
 	loff_t i_size = i_size_read(inode);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f869c4a2f79f..6dbbac05a15c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3238,10 +3238,8 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
 	spin_lock(&sbi->iostat_lock);
 	sbi->rw_iostat[type] += io_bytes;
 
-	if (type == APP_WRITE_IO || type == APP_DIRECT_IO)
-		sbi->rw_iostat[APP_BUFFERED_IO] =
-			sbi->rw_iostat[APP_WRITE_IO] -
-			sbi->rw_iostat[APP_DIRECT_IO];
+	if (type == APP_BUFFERED_IO || type == APP_DIRECT_IO)
+		sbi->rw_iostat[APP_WRITE_IO] += io_bytes;
 
 	if (type == APP_BUFFERED_READ_IO || type == APP_DIRECT_READ_IO)
 		sbi->rw_iostat[APP_READ_IO] += io_bytes;
@@ -3625,6 +3623,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
 				struct writeback_control *wbc,
 				enum iostat_type io_type,
 				int compr_blocks, bool allow_balance);
+void f2fs_write_failed(struct inode *inode, loff_t to);
 void f2fs_invalidate_page(struct page *page, unsigned int offset,
 			unsigned int length);
 int f2fs_release_page(struct page *page, gfp_t wait);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 52de655ef833..6b8eac6b25d4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4292,6 +4292,29 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return ret;
 }
 
+static ssize_t f2fs_write_checks(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	ssize_t count;
+	int err;
+
+	if (IS_IMMUTABLE(inode))
+		return -EPERM;
+
+	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED))
+		return -EPERM;
+
+	count = generic_write_checks(iocb, from);
+	if (count <= 0)
+		return count;
+
+	err = file_modified(file);
+	if (err)
+		return err;
+	return count;
+}
+
 /*
  * Preallocate blocks for a write request, if it is possible and helpful to do
  * so.  Returns a positive number if blocks may have been preallocated, 0 if no
@@ -4299,15 +4322,14 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
  * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
  * requested blocks (not just some of them) have been allocated.
  */
-static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
+static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
+				   bool dio)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	const loff_t pos = iocb->ki_pos;
 	const size_t count = iov_iter_count(iter);
 	struct f2fs_map_blocks map = {};
-	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
-		   !f2fs_force_buffered_io(inode, iocb, iter);
 	int flag;
 	int ret;
 
@@ -4352,13 +4374,153 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
 	return map.m_len;
 }
 
-static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
+					struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	ssize_t ret;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EOPNOTSUPP;
+
+	current->backing_dev_info = inode_to_bdi(inode);
+	ret = generic_perform_write(file, from, iocb->ki_pos);
+	current->backing_dev_info = NULL;
+
+	if (ret > 0) {
+		iocb->ki_pos += ret;
+		f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_IO, ret);
+	}
+	return ret;
+}
+
+static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
+				   bool *may_need_sync)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct f2fs_inode_info *fi = F2FS_I(inode);
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	const bool do_opu = f2fs_lfs_mode(sbi);
+	const int whint_mode = F2FS_OPTION(sbi).whint_mode;
+	const loff_t pos = iocb->ki_pos;
+	const ssize_t count = iov_iter_count(from);
+	const enum rw_hint hint = iocb->ki_hint;
+	unsigned int dio_flags = 0;
+	ssize_t ret;
+
+	trace_f2fs_direct_IO_enter(inode, pos, count, WRITE);
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		/* f2fs_convert_inline_inode() and block allocation can block */
+		if (f2fs_has_inline_data(inode) ||
+		    !f2fs_overwrite_io(inode, pos, count)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	} else {
+		ret = f2fs_convert_inline_inode(inode);
+		if (ret)
+			goto out;
+	}
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!down_read_trylock(&fi->i_gc_rwsem[WRITE])) {
+			ret = -EAGAIN;
+			goto out;
+		}
+		if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) {
+			up_read(&fi->i_gc_rwsem[WRITE]);
+			ret = -EAGAIN;
+			goto out;
+		}
+	} else {
+		down_read(&fi->i_gc_rwsem[WRITE]);
+		if (do_opu)
+			down_read(&fi->i_gc_rwsem[READ]);
+	}
+
+	if (whint_mode == WHINT_MODE_OFF)
+		iocb->ki_hint = WRITE_LIFE_NOT_SET;
+
+	if (pos + count > inode->i_size)
+		dio_flags |= IOMAP_DIO_FORCE_WAIT;
+	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
+			   dio_flags);
+	if (ret == -ENOTBLK)
+		ret = 0;
+
+	if (whint_mode == WHINT_MODE_OFF)
+		iocb->ki_hint = hint;
+
+	if (do_opu)
+		up_read(&fi->i_gc_rwsem[READ]);
+
+	up_read(&fi->i_gc_rwsem[WRITE]);
+
+	if (ret < 0) {
+		if (ret == -EIOCBQUEUED)
+			f2fs_update_iostat(sbi, APP_DIRECT_IO,
+					   count - iov_iter_count(from));
+		goto out;
+	}
+	if (pos + ret > inode->i_size)
+		f2fs_i_size_write(inode, pos + ret);
+	f2fs_update_iostat(sbi, APP_DIRECT_IO, ret);
+	if (!do_opu)
+		set_inode_flag(inode, FI_UPDATE_WRITE);
+
+	if (iov_iter_count(from)) {
+		ssize_t ret2;
+		loff_t bufio_start_pos = iocb->ki_pos;
+
+		/*
+		 * The direct write was partial, so we need to fall back to a
+		 * buffered write for the remainder.
+		 */
+
+		ret2 = f2fs_buffered_write_iter(iocb, from);
+		if (iov_iter_count(from))
+			f2fs_write_failed(inode, iocb->ki_pos);
+		if (ret2 < 0)
+			goto out;
+
+		/*
+		 * Ensure that the pagecache pages are written to disk and
+		 * invalidated to preserve the expected O_DIRECT semantics.
+		 */
+		if (ret2 > 0) {
+			loff_t bufio_end_pos = bufio_start_pos + ret2 - 1;
+
+			ret += ret2;
+
+			ret2 = filemap_write_and_wait_range(file->f_mapping,
+							    bufio_start_pos,
+							    bufio_end_pos);
+			if (ret2 < 0)
+				goto out;
+			invalidate_mapping_pages(file->f_mapping,
+						 bufio_start_pos >> PAGE_SHIFT,
+						 bufio_end_pos >> PAGE_SHIFT);
+		}
+	} else {
+		/* iomap_dio_rw() already handled the generic_write_sync(). */
+		*may_need_sync = false;
+	}
+out:
+	trace_f2fs_direct_IO_exit(inode, pos, count, WRITE, ret);
+	return ret;
+}
+
+static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
 	const loff_t orig_pos = iocb->ki_pos;
 	const size_t orig_count = iov_iter_count(from);
 	loff_t target_size;
+	bool dio;
+	bool may_need_sync = true;
 	int preallocated;
 	ssize_t ret;
 
@@ -4381,48 +4543,26 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		inode_lock(inode);
 	}
 
-	if (unlikely(IS_IMMUTABLE(inode))) {
-		ret = -EPERM;
-		goto out_unlock;
-	}
-
-	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
-		ret = -EPERM;
-		goto out_unlock;
-	}
-
-	ret = generic_write_checks(iocb, from);
+	ret = f2fs_write_checks(iocb, from);
 	if (ret <= 0)
 		goto out_unlock;
 
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		if (!f2fs_overwrite_io(inode, iocb->ki_pos,
-				       iov_iter_count(from)) ||
-		    f2fs_has_inline_data(inode) ||
-		    f2fs_force_buffered_io(inode, iocb, from)) {
-			ret = -EAGAIN;
-			goto out_unlock;
-		}
-	}
-	if (iocb->ki_flags & IOCB_DIRECT) {
-		/*
-		 * Convert inline data for Direct I/O before entering
-		 * f2fs_direct_IO().
-		 */
-		ret = f2fs_convert_inline_inode(inode);
-		if (ret)
-			goto out_unlock;
-	}
+	/* Determine whether we will do a direct write or a buffered write. */
+	dio = f2fs_should_use_dio(inode, iocb, from);
 
 	/* Possibly preallocate the blocks for the write. */
 	target_size = iocb->ki_pos + iov_iter_count(from);
-	preallocated = f2fs_preallocate_blocks(iocb, from);
+	preallocated = f2fs_preallocate_blocks(iocb, from, dio);
 	if (preallocated < 0) {
 		ret = preallocated;
 		goto out_unlock;
 	}
 
-	ret = __generic_file_write_iter(iocb, from);
+	/* Do the actual write. */
+	if (dio)
+		ret = f2fs_dio_write_iter(iocb, from, &may_need_sync);
+	else
+		ret = f2fs_buffered_write_iter(iocb, from);
 
 	/* Don't leave any preallocated blocks around past i_size. */
 	if (preallocated > 0 && inode->i_size < target_size) {
@@ -4433,14 +4573,11 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 	}
 	clear_inode_flag(inode, FI_PREALLOCATED_ALL);
-
-	if (ret > 0)
-		f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
 out_unlock:
 	inode_unlock(inode);
 out:
 	trace_f2fs_file_write_iter(inode, orig_pos, orig_count, ret);
-	if (ret > 0)
+	if (ret > 0 && may_need_sync)
 		ret = generic_write_sync(iocb, ret);
 	return ret;
 }
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 9/9] f2fs: remove f2fs_direct_IO()
  2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
@ 2021-07-16 14:39   ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

From: Eric Biggers <ebiggers@google.com>

Remove f2fs_direct_IO(), since it is no longer used because f2fs now
uses iomap_dio_rw() instead.

Set ->direct_IO to noop_direct_IO rather than NULL.  This is needed to
continue to mark the inodes as supporting direct I/O, as mentioned in
the comment for noop_direct_IO().

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c | 180 +------------------------------------------------
 1 file changed, 1 insertion(+), 179 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 0d2bb651483d..4fbf28f5aaab 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1650,47 +1650,6 @@ static inline u64 blks_to_bytes(struct inode *inode, u64 blks)
 	return (blks << inode->i_blkbits);
 }
 
-static int __get_data_block(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh, int create, int flag,
-			pgoff_t *next_pgofs, int seg_type, bool may_write)
-{
-	struct f2fs_map_blocks map;
-	int err;
-
-	map.m_lblk = iblock;
-	map.m_len = bytes_to_blks(inode, bh->b_size);
-	map.m_next_pgofs = next_pgofs;
-	map.m_next_extent = NULL;
-	map.m_seg_type = seg_type;
-	map.m_may_create = may_write;
-
-	err = f2fs_map_blocks(inode, &map, create, flag);
-	if (!err) {
-		map_bh(bh, inode->i_sb, map.m_pblk);
-		bh->b_state = (bh->b_state & ~F2FS_MAP_FLAGS) | map.m_flags;
-		bh->b_size = blks_to_bytes(inode, map.m_len);
-	}
-	return err;
-}
-
-static int get_data_block_dio_write(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh_result, int create)
-{
-	return __get_data_block(inode, iblock, bh_result, create,
-				F2FS_GET_BLOCK_DIO, NULL,
-				f2fs_rw_hint_to_seg_type(inode->i_write_hint),
-				true);
-}
-
-static int get_data_block_dio(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh_result, int create)
-{
-	return __get_data_block(inode, iblock, bh_result, create,
-				F2FS_GET_BLOCK_DIO, NULL,
-				f2fs_rw_hint_to_seg_type(inode->i_write_hint),
-				false);
-}
-
 static int f2fs_xattr_fiemap(struct inode *inode,
 				struct fiemap_extent_info *fieinfo)
 {
@@ -3410,29 +3369,6 @@ static int f2fs_write_end(struct file *file,
 	return copied;
 }
 
-static int check_direct_IO(struct inode *inode, struct iov_iter *iter,
-			   loff_t offset)
-{
-	unsigned i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned blkbits = i_blkbits;
-	unsigned blocksize_mask = (1 << blkbits) - 1;
-	unsigned long align = offset | iov_iter_alignment(iter);
-	struct block_device *bdev = inode->i_sb->s_bdev;
-
-	if (iov_iter_rw(iter) == READ && offset >= i_size_read(inode))
-		return 1;
-
-	if (align & blocksize_mask) {
-		if (bdev)
-			blkbits = blksize_bits(bdev_logical_block_size(bdev));
-		blocksize_mask = (1 << blkbits) - 1;
-		if (align & blocksize_mask)
-			return -EINVAL;
-		return 1;
-	}
-	return 0;
-}
-
 static void f2fs_dio_end_io(struct bio *bio)
 {
 	struct f2fs_private_dio *dio = bio->bi_private;
@@ -3448,35 +3384,6 @@ static void f2fs_dio_end_io(struct bio *bio)
 	bio_endio(bio);
 }
 
-static void f2fs_dio_submit_bio_old(struct bio *bio, struct inode *inode,
-							loff_t file_offset)
-{
-	struct f2fs_private_dio *dio;
-	bool write = (bio_op(bio) == REQ_OP_WRITE);
-
-	dio = f2fs_kzalloc(F2FS_I_SB(inode),
-			sizeof(struct f2fs_private_dio), GFP_NOFS);
-	if (!dio)
-		goto out;
-
-	dio->inode = inode;
-	dio->orig_end_io = bio->bi_end_io;
-	dio->orig_private = bio->bi_private;
-	dio->write = write;
-
-	bio->bi_end_io = f2fs_dio_end_io;
-	bio->bi_private = dio;
-
-	inc_page_count(F2FS_I_SB(inode),
-			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
-	submit_bio(bio);
-	return;
-out:
-	bio->bi_status = BLK_STS_IOERR;
-	bio_endio(bio);
-}
-
 static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
 				    struct bio *bio, loff_t file_offset)
 {
@@ -3506,91 +3413,6 @@ static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
 	return BLK_QC_T_NONE;
 }
 
-static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct f2fs_inode_info *fi = F2FS_I(inode);
-	size_t count = iov_iter_count(iter);
-	loff_t offset = iocb->ki_pos;
-	int rw = iov_iter_rw(iter);
-	int err;
-	enum rw_hint hint = iocb->ki_hint;
-	int whint_mode = F2FS_OPTION(sbi).whint_mode;
-	bool do_opu;
-
-	err = check_direct_IO(inode, iter, offset);
-	if (err)
-		return err < 0 ? err : 0;
-
-	if (f2fs_force_buffered_io(inode, iocb, iter))
-		return 0;
-
-	do_opu = (rw == WRITE && f2fs_lfs_mode(sbi));
-
-	trace_f2fs_direct_IO_enter(inode, offset, count, rw);
-
-	if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
-		iocb->ki_hint = WRITE_LIFE_NOT_SET;
-
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		if (!down_read_trylock(&fi->i_gc_rwsem[rw])) {
-			iocb->ki_hint = hint;
-			err = -EAGAIN;
-			goto out;
-		}
-		if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) {
-			up_read(&fi->i_gc_rwsem[rw]);
-			iocb->ki_hint = hint;
-			err = -EAGAIN;
-			goto out;
-		}
-	} else {
-		down_read(&fi->i_gc_rwsem[rw]);
-		if (do_opu)
-			down_read(&fi->i_gc_rwsem[READ]);
-	}
-
-	err = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
-			iter, rw == WRITE ? get_data_block_dio_write :
-			get_data_block_dio, NULL, f2fs_dio_submit_bio_old,
-			rw == WRITE ? DIO_LOCKING | DIO_SKIP_HOLES :
-			DIO_SKIP_HOLES);
-
-	if (do_opu)
-		up_read(&fi->i_gc_rwsem[READ]);
-
-	up_read(&fi->i_gc_rwsem[rw]);
-
-	if (rw == WRITE) {
-		if (whint_mode == WHINT_MODE_OFF)
-			iocb->ki_hint = hint;
-		if (err > 0) {
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
-									err);
-			if (!do_opu)
-				set_inode_flag(inode, FI_UPDATE_WRITE);
-		} else if (err == -EIOCBQUEUED) {
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
-						count - iov_iter_count(iter));
-		} else if (err < 0) {
-			f2fs_write_failed(inode, offset + count);
-		}
-	} else {
-		if (err > 0)
-			f2fs_update_iostat(sbi, APP_DIRECT_READ_IO, err);
-		else if (err == -EIOCBQUEUED)
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
-						count - iov_iter_count(iter));
-	}
-
-out:
-	trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
-
-	return err;
-}
-
 void f2fs_invalidate_page(struct page *page, unsigned int offset,
 							unsigned int length)
 {
@@ -4046,7 +3868,7 @@ const struct address_space_operations f2fs_dblock_aops = {
 	.set_page_dirty	= f2fs_set_data_page_dirty,
 	.invalidatepage	= f2fs_invalidate_page,
 	.releasepage	= f2fs_release_page,
-	.direct_IO	= f2fs_direct_IO,
+	.direct_IO	= noop_direct_IO,
 	.bmap		= f2fs_bmap,
 	.swap_activate  = f2fs_swap_activate,
 	.swap_deactivate = f2fs_swap_deactivate,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [f2fs-dev] [PATCH 9/9] f2fs: remove f2fs_direct_IO()
@ 2021-07-16 14:39   ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-16 14:39 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

From: Eric Biggers <ebiggers@google.com>

Remove f2fs_direct_IO(), since it is no longer used because f2fs now
uses iomap_dio_rw() instead.

Set ->direct_IO to noop_direct_IO rather than NULL.  This is needed to
continue to mark the inodes as supporting direct I/O, as mentioned in
the comment for noop_direct_IO().

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/data.c | 180 +------------------------------------------------
 1 file changed, 1 insertion(+), 179 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 0d2bb651483d..4fbf28f5aaab 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1650,47 +1650,6 @@ static inline u64 blks_to_bytes(struct inode *inode, u64 blks)
 	return (blks << inode->i_blkbits);
 }
 
-static int __get_data_block(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh, int create, int flag,
-			pgoff_t *next_pgofs, int seg_type, bool may_write)
-{
-	struct f2fs_map_blocks map;
-	int err;
-
-	map.m_lblk = iblock;
-	map.m_len = bytes_to_blks(inode, bh->b_size);
-	map.m_next_pgofs = next_pgofs;
-	map.m_next_extent = NULL;
-	map.m_seg_type = seg_type;
-	map.m_may_create = may_write;
-
-	err = f2fs_map_blocks(inode, &map, create, flag);
-	if (!err) {
-		map_bh(bh, inode->i_sb, map.m_pblk);
-		bh->b_state = (bh->b_state & ~F2FS_MAP_FLAGS) | map.m_flags;
-		bh->b_size = blks_to_bytes(inode, map.m_len);
-	}
-	return err;
-}
-
-static int get_data_block_dio_write(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh_result, int create)
-{
-	return __get_data_block(inode, iblock, bh_result, create,
-				F2FS_GET_BLOCK_DIO, NULL,
-				f2fs_rw_hint_to_seg_type(inode->i_write_hint),
-				true);
-}
-
-static int get_data_block_dio(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh_result, int create)
-{
-	return __get_data_block(inode, iblock, bh_result, create,
-				F2FS_GET_BLOCK_DIO, NULL,
-				f2fs_rw_hint_to_seg_type(inode->i_write_hint),
-				false);
-}
-
 static int f2fs_xattr_fiemap(struct inode *inode,
 				struct fiemap_extent_info *fieinfo)
 {
@@ -3410,29 +3369,6 @@ static int f2fs_write_end(struct file *file,
 	return copied;
 }
 
-static int check_direct_IO(struct inode *inode, struct iov_iter *iter,
-			   loff_t offset)
-{
-	unsigned i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned blkbits = i_blkbits;
-	unsigned blocksize_mask = (1 << blkbits) - 1;
-	unsigned long align = offset | iov_iter_alignment(iter);
-	struct block_device *bdev = inode->i_sb->s_bdev;
-
-	if (iov_iter_rw(iter) == READ && offset >= i_size_read(inode))
-		return 1;
-
-	if (align & blocksize_mask) {
-		if (bdev)
-			blkbits = blksize_bits(bdev_logical_block_size(bdev));
-		blocksize_mask = (1 << blkbits) - 1;
-		if (align & blocksize_mask)
-			return -EINVAL;
-		return 1;
-	}
-	return 0;
-}
-
 static void f2fs_dio_end_io(struct bio *bio)
 {
 	struct f2fs_private_dio *dio = bio->bi_private;
@@ -3448,35 +3384,6 @@ static void f2fs_dio_end_io(struct bio *bio)
 	bio_endio(bio);
 }
 
-static void f2fs_dio_submit_bio_old(struct bio *bio, struct inode *inode,
-							loff_t file_offset)
-{
-	struct f2fs_private_dio *dio;
-	bool write = (bio_op(bio) == REQ_OP_WRITE);
-
-	dio = f2fs_kzalloc(F2FS_I_SB(inode),
-			sizeof(struct f2fs_private_dio), GFP_NOFS);
-	if (!dio)
-		goto out;
-
-	dio->inode = inode;
-	dio->orig_end_io = bio->bi_end_io;
-	dio->orig_private = bio->bi_private;
-	dio->write = write;
-
-	bio->bi_end_io = f2fs_dio_end_io;
-	bio->bi_private = dio;
-
-	inc_page_count(F2FS_I_SB(inode),
-			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
-	submit_bio(bio);
-	return;
-out:
-	bio->bi_status = BLK_STS_IOERR;
-	bio_endio(bio);
-}
-
 static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
 				    struct bio *bio, loff_t file_offset)
 {
@@ -3506,91 +3413,6 @@ static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
 	return BLK_QC_T_NONE;
 }
 
-static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct f2fs_inode_info *fi = F2FS_I(inode);
-	size_t count = iov_iter_count(iter);
-	loff_t offset = iocb->ki_pos;
-	int rw = iov_iter_rw(iter);
-	int err;
-	enum rw_hint hint = iocb->ki_hint;
-	int whint_mode = F2FS_OPTION(sbi).whint_mode;
-	bool do_opu;
-
-	err = check_direct_IO(inode, iter, offset);
-	if (err)
-		return err < 0 ? err : 0;
-
-	if (f2fs_force_buffered_io(inode, iocb, iter))
-		return 0;
-
-	do_opu = (rw == WRITE && f2fs_lfs_mode(sbi));
-
-	trace_f2fs_direct_IO_enter(inode, offset, count, rw);
-
-	if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
-		iocb->ki_hint = WRITE_LIFE_NOT_SET;
-
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		if (!down_read_trylock(&fi->i_gc_rwsem[rw])) {
-			iocb->ki_hint = hint;
-			err = -EAGAIN;
-			goto out;
-		}
-		if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) {
-			up_read(&fi->i_gc_rwsem[rw]);
-			iocb->ki_hint = hint;
-			err = -EAGAIN;
-			goto out;
-		}
-	} else {
-		down_read(&fi->i_gc_rwsem[rw]);
-		if (do_opu)
-			down_read(&fi->i_gc_rwsem[READ]);
-	}
-
-	err = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
-			iter, rw == WRITE ? get_data_block_dio_write :
-			get_data_block_dio, NULL, f2fs_dio_submit_bio_old,
-			rw == WRITE ? DIO_LOCKING | DIO_SKIP_HOLES :
-			DIO_SKIP_HOLES);
-
-	if (do_opu)
-		up_read(&fi->i_gc_rwsem[READ]);
-
-	up_read(&fi->i_gc_rwsem[rw]);
-
-	if (rw == WRITE) {
-		if (whint_mode == WHINT_MODE_OFF)
-			iocb->ki_hint = hint;
-		if (err > 0) {
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
-									err);
-			if (!do_opu)
-				set_inode_flag(inode, FI_UPDATE_WRITE);
-		} else if (err == -EIOCBQUEUED) {
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
-						count - iov_iter_count(iter));
-		} else if (err < 0) {
-			f2fs_write_failed(inode, offset + count);
-		}
-	} else {
-		if (err > 0)
-			f2fs_update_iostat(sbi, APP_DIRECT_READ_IO, err);
-		else if (err == -EIOCBQUEUED)
-			f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
-						count - iov_iter_count(iter));
-	}
-
-out:
-	trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
-
-	return err;
-}
-
 void f2fs_invalidate_page(struct page *page, unsigned int offset,
 							unsigned int length)
 {
@@ -4046,7 +3868,7 @@ const struct address_space_operations f2fs_dblock_aops = {
 	.set_page_dirty	= f2fs_set_data_page_dirty,
 	.invalidatepage	= f2fs_invalidate_page,
 	.releasepage	= f2fs_release_page,
-	.direct_IO	= f2fs_direct_IO,
+	.direct_IO	= noop_direct_IO,
 	.bmap		= f2fs_bmap,
 	.swap_activate  = f2fs_swap_activate,
 	.swap_deactivate = f2fs_swap_deactivate,
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/9] f2fs: remove allow_outplace_dio()
  2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
@ 2021-07-19  8:41     ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2021-07-19  8:41 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-f2fs-devel, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Fri, Jul 16, 2021 at 09:39:12AM -0500, Eric Biggers wrote:
> +	do_opu = (rw == WRITE && f2fs_lfs_mode(sbi));

Nit: no need for the braces.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 2/9] f2fs: remove allow_outplace_dio()
@ 2021-07-19  8:41     ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2021-07-19  8:41 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Fri, Jul 16, 2021 at 09:39:12AM -0500, Eric Biggers wrote:
> +	do_opu = (rw == WRITE && f2fs_lfs_mode(sbi));

Nit: no need for the braces.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
@ 2021-07-19  8:59     ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2021-07-19  8:59 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-f2fs-devel, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> +				    struct bio *bio, loff_t file_offset)
> +{
> +	struct f2fs_private_dio *dio;
> +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> +
> +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> +	if (!dio)
> +		goto out;
> +
> +	dio->inode = inode;
> +	dio->orig_end_io = bio->bi_end_io;
> +	dio->orig_private = bio->bi_private;
> +	dio->write = write;
> +
> +	bio->bi_end_io = f2fs_dio_end_io;
> +	bio->bi_private = dio;
> +
> +	inc_page_count(F2FS_I_SB(inode),
> +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> +
> +	return submit_bio(bio);

I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
F2FS_DIO_READ counts are only used to check if there is any inflight
I/O at all.  So instead we can increment them once before calling
iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
exit from iomap_dio_rw.  Untested patch below.  Note that all this
would be much simpler to review if the last three patches were folded
into a single one.

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 4fbf28f5aaab..9f9cc49fbe94 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3369,50 +3369,6 @@ static int f2fs_write_end(struct file *file,
 	return copied;
 }
 
-static void f2fs_dio_end_io(struct bio *bio)
-{
-	struct f2fs_private_dio *dio = bio->bi_private;
-
-	dec_page_count(F2FS_I_SB(dio->inode),
-			dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
-	bio->bi_private = dio->orig_private;
-	bio->bi_end_io = dio->orig_end_io;
-
-	kfree(dio);
-
-	bio_endio(bio);
-}
-
-static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
-				    struct bio *bio, loff_t file_offset)
-{
-	struct f2fs_private_dio *dio;
-	bool write = (bio_op(bio) == REQ_OP_WRITE);
-
-	dio = f2fs_kzalloc(F2FS_I_SB(inode),
-			sizeof(struct f2fs_private_dio), GFP_NOFS);
-	if (!dio)
-		goto out;
-
-	dio->inode = inode;
-	dio->orig_end_io = bio->bi_end_io;
-	dio->orig_private = bio->bi_private;
-	dio->write = write;
-
-	bio->bi_end_io = f2fs_dio_end_io;
-	bio->bi_private = dio;
-
-	inc_page_count(F2FS_I_SB(inode),
-			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
-	return submit_bio(bio);
-out:
-	bio->bi_status = BLK_STS_IOERR;
-	bio_endio(bio);
-	return BLK_QC_T_NONE;
-}
-
 void f2fs_invalidate_page(struct page *page, unsigned int offset,
 							unsigned int length)
 {
@@ -4006,6 +3962,18 @@ const struct iomap_ops f2fs_iomap_ops = {
 	.iomap_begin	= f2fs_iomap_begin,
 };
 
+static int f2fs_dio_end_io(struct kiocb *iocb, ssize_t size, int error,
+		unsigned flags)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
+
+	if (iocb->ki_flags & IOCB_WRITE)
+		dec_page_count(sbi, F2FS_DIO_WRITE);
+	else
+		dec_page_count(sbi, F2FS_DIO_READ);
+	return 0;
+}
+
 const struct iomap_dio_ops f2fs_iomap_dio_ops = {
-	.submit_io	= f2fs_dio_submit_bio,
+	.end_io		= f2fs_dio_end_io,
 };
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6dbbac05a15c..abd521dc504a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1750,13 +1750,6 @@ struct f2fs_sb_info {
 #endif
 };
 
-struct f2fs_private_dio {
-	struct inode *inode;
-	void *orig_private;
-	bio_end_io_t *orig_end_io;
-	bool write;
-};
-
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 #define f2fs_show_injection_info(sbi, type)					\
 	printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n",	\
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6b8eac6b25d4..4fed90cc1462 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4259,6 +4259,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		down_read(&fi->i_gc_rwsem[READ]);
 	}
 
+	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
 	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
 
 	up_read(&fi->i_gc_rwsem[READ]);
@@ -4270,6 +4271,8 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	else if (ret == -EIOCBQUEUED)
 		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
 				   count - iov_iter_count(to));
+	else
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
 out:
 	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
 	return ret;
@@ -4446,6 +4449,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 
 	if (pos + count > inode->i_size)
 		dio_flags |= IOMAP_DIO_FORCE_WAIT;
+	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
 	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
 			   dio_flags);
 	if (ret == -ENOTBLK)
@@ -4459,6 +4463,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 
 	up_read(&fi->i_gc_rwsem[WRITE]);
 
+	if (ret <= 0 && ret != -EIOCBQUEUED)
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
+
 	if (ret < 0) {
 		if (ret == -EIOCBQUEUED)
 			f2fs_update_iostat(sbi, APP_DIRECT_IO,

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-19  8:59     ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2021-07-19  8:59 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> +				    struct bio *bio, loff_t file_offset)
> +{
> +	struct f2fs_private_dio *dio;
> +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> +
> +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> +	if (!dio)
> +		goto out;
> +
> +	dio->inode = inode;
> +	dio->orig_end_io = bio->bi_end_io;
> +	dio->orig_private = bio->bi_private;
> +	dio->write = write;
> +
> +	bio->bi_end_io = f2fs_dio_end_io;
> +	bio->bi_private = dio;
> +
> +	inc_page_count(F2FS_I_SB(inode),
> +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> +
> +	return submit_bio(bio);

I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
F2FS_DIO_READ counts are only used to check if there is any inflight
I/O at all.  So instead we can increment them once before calling
iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
exit from iomap_dio_rw.  Untested patch below.  Note that all this
would be much simpler to review if the last three patches were folded
into a single one.

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 4fbf28f5aaab..9f9cc49fbe94 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3369,50 +3369,6 @@ static int f2fs_write_end(struct file *file,
 	return copied;
 }
 
-static void f2fs_dio_end_io(struct bio *bio)
-{
-	struct f2fs_private_dio *dio = bio->bi_private;
-
-	dec_page_count(F2FS_I_SB(dio->inode),
-			dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
-	bio->bi_private = dio->orig_private;
-	bio->bi_end_io = dio->orig_end_io;
-
-	kfree(dio);
-
-	bio_endio(bio);
-}
-
-static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
-				    struct bio *bio, loff_t file_offset)
-{
-	struct f2fs_private_dio *dio;
-	bool write = (bio_op(bio) == REQ_OP_WRITE);
-
-	dio = f2fs_kzalloc(F2FS_I_SB(inode),
-			sizeof(struct f2fs_private_dio), GFP_NOFS);
-	if (!dio)
-		goto out;
-
-	dio->inode = inode;
-	dio->orig_end_io = bio->bi_end_io;
-	dio->orig_private = bio->bi_private;
-	dio->write = write;
-
-	bio->bi_end_io = f2fs_dio_end_io;
-	bio->bi_private = dio;
-
-	inc_page_count(F2FS_I_SB(inode),
-			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
-	return submit_bio(bio);
-out:
-	bio->bi_status = BLK_STS_IOERR;
-	bio_endio(bio);
-	return BLK_QC_T_NONE;
-}
-
 void f2fs_invalidate_page(struct page *page, unsigned int offset,
 							unsigned int length)
 {
@@ -4006,6 +3962,18 @@ const struct iomap_ops f2fs_iomap_ops = {
 	.iomap_begin	= f2fs_iomap_begin,
 };
 
+static int f2fs_dio_end_io(struct kiocb *iocb, ssize_t size, int error,
+		unsigned flags)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
+
+	if (iocb->ki_flags & IOCB_WRITE)
+		dec_page_count(sbi, F2FS_DIO_WRITE);
+	else
+		dec_page_count(sbi, F2FS_DIO_READ);
+	return 0;
+}
+
 const struct iomap_dio_ops f2fs_iomap_dio_ops = {
-	.submit_io	= f2fs_dio_submit_bio,
+	.end_io		= f2fs_dio_end_io,
 };
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6dbbac05a15c..abd521dc504a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1750,13 +1750,6 @@ struct f2fs_sb_info {
 #endif
 };
 
-struct f2fs_private_dio {
-	struct inode *inode;
-	void *orig_private;
-	bio_end_io_t *orig_end_io;
-	bool write;
-};
-
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 #define f2fs_show_injection_info(sbi, type)					\
 	printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n",	\
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6b8eac6b25d4..4fed90cc1462 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4259,6 +4259,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		down_read(&fi->i_gc_rwsem[READ]);
 	}
 
+	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
 	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
 
 	up_read(&fi->i_gc_rwsem[READ]);
@@ -4270,6 +4271,8 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	else if (ret == -EIOCBQUEUED)
 		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
 				   count - iov_iter_count(to));
+	else
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
 out:
 	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
 	return ret;
@@ -4446,6 +4449,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 
 	if (pos + count > inode->i_size)
 		dio_flags |= IOMAP_DIO_FORCE_WAIT;
+	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
 	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
 			   dio_flags);
 	if (ret == -ENOTBLK)
@@ -4459,6 +4463,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 
 	up_read(&fi->i_gc_rwsem[WRITE]);
 
+	if (ret <= 0 && ret != -EIOCBQUEUED)
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
+
 	if (ret < 0) {
 		if (ret == -EIOCBQUEUED)
 			f2fs_update_iostat(sbi, APP_DIRECT_IO,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-19  8:59     ` [f2fs-dev] " Christoph Hellwig
@ 2021-07-22 20:47       ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-22 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Biggers, linux-f2fs-devel, Chao Yu, linux-fsdevel,
	linux-xfs, Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/19, Christoph Hellwig wrote:
> On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > +				    struct bio *bio, loff_t file_offset)
> > +{
> > +	struct f2fs_private_dio *dio;
> > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > +
> > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > +	if (!dio)
> > +		goto out;
> > +
> > +	dio->inode = inode;
> > +	dio->orig_end_io = bio->bi_end_io;
> > +	dio->orig_private = bio->bi_private;
> > +	dio->write = write;
> > +
> > +	bio->bi_end_io = f2fs_dio_end_io;
> > +	bio->bi_private = dio;
> > +
> > +	inc_page_count(F2FS_I_SB(inode),
> > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > +
> > +	return submit_bio(bio);
> 
> I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> F2FS_DIO_READ counts are only used to check if there is any inflight
> I/O at all.  So instead we can increment them once before calling
> iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> exit from iomap_dio_rw.  Untested patch below.  Note that all this
> would be much simpler to review if the last three patches were folded
> into a single one.

Eric, wdyt?

I've merged v1 to v5, including Christoph's comment in v2.

> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 4fbf28f5aaab..9f9cc49fbe94 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3369,50 +3369,6 @@ static int f2fs_write_end(struct file *file,
>  	return copied;
>  }
>  
> -static void f2fs_dio_end_io(struct bio *bio)
> -{
> -	struct f2fs_private_dio *dio = bio->bi_private;
> -
> -	dec_page_count(F2FS_I_SB(dio->inode),
> -			dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> -
> -	bio->bi_private = dio->orig_private;
> -	bio->bi_end_io = dio->orig_end_io;
> -
> -	kfree(dio);
> -
> -	bio_endio(bio);
> -}
> -
> -static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> -				    struct bio *bio, loff_t file_offset)
> -{
> -	struct f2fs_private_dio *dio;
> -	bool write = (bio_op(bio) == REQ_OP_WRITE);
> -
> -	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> -			sizeof(struct f2fs_private_dio), GFP_NOFS);
> -	if (!dio)
> -		goto out;
> -
> -	dio->inode = inode;
> -	dio->orig_end_io = bio->bi_end_io;
> -	dio->orig_private = bio->bi_private;
> -	dio->write = write;
> -
> -	bio->bi_end_io = f2fs_dio_end_io;
> -	bio->bi_private = dio;
> -
> -	inc_page_count(F2FS_I_SB(inode),
> -			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> -
> -	return submit_bio(bio);
> -out:
> -	bio->bi_status = BLK_STS_IOERR;
> -	bio_endio(bio);
> -	return BLK_QC_T_NONE;
> -}
> -
>  void f2fs_invalidate_page(struct page *page, unsigned int offset,
>  							unsigned int length)
>  {
> @@ -4006,6 +3962,18 @@ const struct iomap_ops f2fs_iomap_ops = {
>  	.iomap_begin	= f2fs_iomap_begin,
>  };
>  
> +static int f2fs_dio_end_io(struct kiocb *iocb, ssize_t size, int error,
> +		unsigned flags)
> +{
> +	struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
> +
> +	if (iocb->ki_flags & IOCB_WRITE)
> +		dec_page_count(sbi, F2FS_DIO_WRITE);
> +	else
> +		dec_page_count(sbi, F2FS_DIO_READ);
> +	return 0;
> +}
> +
>  const struct iomap_dio_ops f2fs_iomap_dio_ops = {
> -	.submit_io	= f2fs_dio_submit_bio,
> +	.end_io		= f2fs_dio_end_io,
>  };
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 6dbbac05a15c..abd521dc504a 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1750,13 +1750,6 @@ struct f2fs_sb_info {
>  #endif
>  };
>  
> -struct f2fs_private_dio {
> -	struct inode *inode;
> -	void *orig_private;
> -	bio_end_io_t *orig_end_io;
> -	bool write;
> -};
> -
>  #ifdef CONFIG_F2FS_FAULT_INJECTION
>  #define f2fs_show_injection_info(sbi, type)					\
>  	printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n",	\
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 6b8eac6b25d4..4fed90cc1462 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -4259,6 +4259,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  		down_read(&fi->i_gc_rwsem[READ]);
>  	}
>  
> +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
>  	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
>  
>  	up_read(&fi->i_gc_rwsem[READ]);
> @@ -4270,6 +4271,8 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	else if (ret == -EIOCBQUEUED)
>  		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
>  				   count - iov_iter_count(to));
> +	else
> +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
>  out:
>  	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
>  	return ret;
> @@ -4446,6 +4449,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  
>  	if (pos + count > inode->i_size)
>  		dio_flags |= IOMAP_DIO_FORCE_WAIT;
> +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
>  	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
>  			   dio_flags);
>  	if (ret == -ENOTBLK)
> @@ -4459,6 +4463,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  
>  	up_read(&fi->i_gc_rwsem[WRITE]);
>  
> +	if (ret <= 0 && ret != -EIOCBQUEUED)
> +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
> +
>  	if (ret < 0) {
>  		if (ret == -EIOCBQUEUED)
>  			f2fs_update_iostat(sbi, APP_DIRECT_IO,

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-22 20:47       ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-22 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Eric Biggers,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel

On 07/19, Christoph Hellwig wrote:
> On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > +				    struct bio *bio, loff_t file_offset)
> > +{
> > +	struct f2fs_private_dio *dio;
> > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > +
> > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > +	if (!dio)
> > +		goto out;
> > +
> > +	dio->inode = inode;
> > +	dio->orig_end_io = bio->bi_end_io;
> > +	dio->orig_private = bio->bi_private;
> > +	dio->write = write;
> > +
> > +	bio->bi_end_io = f2fs_dio_end_io;
> > +	bio->bi_private = dio;
> > +
> > +	inc_page_count(F2FS_I_SB(inode),
> > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > +
> > +	return submit_bio(bio);
> 
> I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> F2FS_DIO_READ counts are only used to check if there is any inflight
> I/O at all.  So instead we can increment them once before calling
> iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> exit from iomap_dio_rw.  Untested patch below.  Note that all this
> would be much simpler to review if the last three patches were folded
> into a single one.

Eric, wdyt?

I've merged v1 to v5, including Christoph's comment in v2.

> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 4fbf28f5aaab..9f9cc49fbe94 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3369,50 +3369,6 @@ static int f2fs_write_end(struct file *file,
>  	return copied;
>  }
>  
> -static void f2fs_dio_end_io(struct bio *bio)
> -{
> -	struct f2fs_private_dio *dio = bio->bi_private;
> -
> -	dec_page_count(F2FS_I_SB(dio->inode),
> -			dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> -
> -	bio->bi_private = dio->orig_private;
> -	bio->bi_end_io = dio->orig_end_io;
> -
> -	kfree(dio);
> -
> -	bio_endio(bio);
> -}
> -
> -static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> -				    struct bio *bio, loff_t file_offset)
> -{
> -	struct f2fs_private_dio *dio;
> -	bool write = (bio_op(bio) == REQ_OP_WRITE);
> -
> -	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> -			sizeof(struct f2fs_private_dio), GFP_NOFS);
> -	if (!dio)
> -		goto out;
> -
> -	dio->inode = inode;
> -	dio->orig_end_io = bio->bi_end_io;
> -	dio->orig_private = bio->bi_private;
> -	dio->write = write;
> -
> -	bio->bi_end_io = f2fs_dio_end_io;
> -	bio->bi_private = dio;
> -
> -	inc_page_count(F2FS_I_SB(inode),
> -			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> -
> -	return submit_bio(bio);
> -out:
> -	bio->bi_status = BLK_STS_IOERR;
> -	bio_endio(bio);
> -	return BLK_QC_T_NONE;
> -}
> -
>  void f2fs_invalidate_page(struct page *page, unsigned int offset,
>  							unsigned int length)
>  {
> @@ -4006,6 +3962,18 @@ const struct iomap_ops f2fs_iomap_ops = {
>  	.iomap_begin	= f2fs_iomap_begin,
>  };
>  
> +static int f2fs_dio_end_io(struct kiocb *iocb, ssize_t size, int error,
> +		unsigned flags)
> +{
> +	struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
> +
> +	if (iocb->ki_flags & IOCB_WRITE)
> +		dec_page_count(sbi, F2FS_DIO_WRITE);
> +	else
> +		dec_page_count(sbi, F2FS_DIO_READ);
> +	return 0;
> +}
> +
>  const struct iomap_dio_ops f2fs_iomap_dio_ops = {
> -	.submit_io	= f2fs_dio_submit_bio,
> +	.end_io		= f2fs_dio_end_io,
>  };
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 6dbbac05a15c..abd521dc504a 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1750,13 +1750,6 @@ struct f2fs_sb_info {
>  #endif
>  };
>  
> -struct f2fs_private_dio {
> -	struct inode *inode;
> -	void *orig_private;
> -	bio_end_io_t *orig_end_io;
> -	bool write;
> -};
> -
>  #ifdef CONFIG_F2FS_FAULT_INJECTION
>  #define f2fs_show_injection_info(sbi, type)					\
>  	printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n",	\
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 6b8eac6b25d4..4fed90cc1462 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -4259,6 +4259,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  		down_read(&fi->i_gc_rwsem[READ]);
>  	}
>  
> +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
>  	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
>  
>  	up_read(&fi->i_gc_rwsem[READ]);
> @@ -4270,6 +4271,8 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	else if (ret == -EIOCBQUEUED)
>  		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
>  				   count - iov_iter_count(to));
> +	else
> +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
>  out:
>  	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
>  	return ret;
> @@ -4446,6 +4449,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  
>  	if (pos + count > inode->i_size)
>  		dio_flags |= IOMAP_DIO_FORCE_WAIT;
> +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
>  	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
>  			   dio_flags);
>  	if (ret == -ENOTBLK)
> @@ -4459,6 +4463,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  
>  	up_read(&fi->i_gc_rwsem[WRITE]);
>  
> +	if (ret <= 0 && ret != -EIOCBQUEUED)
> +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
> +
>  	if (ret < 0) {
>  		if (ret == -EIOCBQUEUED)
>  			f2fs_update_iostat(sbi, APP_DIRECT_IO,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-22 20:47       ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-22 20:49         ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-22 20:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Biggers, linux-f2fs-devel, Chao Yu, linux-fsdevel,
	linux-xfs, Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/22, Jaegeuk Kim wrote:
> On 07/19, Christoph Hellwig wrote:
> > On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > > +				    struct bio *bio, loff_t file_offset)
> > > +{
> > > +	struct f2fs_private_dio *dio;
> > > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > > +
> > > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > > +	if (!dio)
> > > +		goto out;
> > > +
> > > +	dio->inode = inode;
> > > +	dio->orig_end_io = bio->bi_end_io;
> > > +	dio->orig_private = bio->bi_private;
> > > +	dio->write = write;
> > > +
> > > +	bio->bi_end_io = f2fs_dio_end_io;
> > > +	bio->bi_private = dio;
> > > +
> > > +	inc_page_count(F2FS_I_SB(inode),
> > > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > > +
> > > +	return submit_bio(bio);
> > 
> > I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> > F2FS_DIO_READ counts are only used to check if there is any inflight
> > I/O at all.  So instead we can increment them once before calling
> > iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> > exit from iomap_dio_rw.  Untested patch below.  Note that all this
> > would be much simpler to review if the last three patches were folded
> > into a single one.
> 
> Eric, wdyt?
> 
> I've merged v1 to v5, including Christoph's comment in v2.

Sorry, I mean patch #1 to #5. You can find them in:
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/log/?h=dev

> 
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 4fbf28f5aaab..9f9cc49fbe94 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -3369,50 +3369,6 @@ static int f2fs_write_end(struct file *file,
> >  	return copied;
> >  }
> >  
> > -static void f2fs_dio_end_io(struct bio *bio)
> > -{
> > -	struct f2fs_private_dio *dio = bio->bi_private;
> > -
> > -	dec_page_count(F2FS_I_SB(dio->inode),
> > -			dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > -
> > -	bio->bi_private = dio->orig_private;
> > -	bio->bi_end_io = dio->orig_end_io;
> > -
> > -	kfree(dio);
> > -
> > -	bio_endio(bio);
> > -}
> > -
> > -static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > -				    struct bio *bio, loff_t file_offset)
> > -{
> > -	struct f2fs_private_dio *dio;
> > -	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > -
> > -	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > -			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > -	if (!dio)
> > -		goto out;
> > -
> > -	dio->inode = inode;
> > -	dio->orig_end_io = bio->bi_end_io;
> > -	dio->orig_private = bio->bi_private;
> > -	dio->write = write;
> > -
> > -	bio->bi_end_io = f2fs_dio_end_io;
> > -	bio->bi_private = dio;
> > -
> > -	inc_page_count(F2FS_I_SB(inode),
> > -			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > -
> > -	return submit_bio(bio);
> > -out:
> > -	bio->bi_status = BLK_STS_IOERR;
> > -	bio_endio(bio);
> > -	return BLK_QC_T_NONE;
> > -}
> > -
> >  void f2fs_invalidate_page(struct page *page, unsigned int offset,
> >  							unsigned int length)
> >  {
> > @@ -4006,6 +3962,18 @@ const struct iomap_ops f2fs_iomap_ops = {
> >  	.iomap_begin	= f2fs_iomap_begin,
> >  };
> >  
> > +static int f2fs_dio_end_io(struct kiocb *iocb, ssize_t size, int error,
> > +		unsigned flags)
> > +{
> > +	struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
> > +
> > +	if (iocb->ki_flags & IOCB_WRITE)
> > +		dec_page_count(sbi, F2FS_DIO_WRITE);
> > +	else
> > +		dec_page_count(sbi, F2FS_DIO_READ);
> > +	return 0;
> > +}
> > +
> >  const struct iomap_dio_ops f2fs_iomap_dio_ops = {
> > -	.submit_io	= f2fs_dio_submit_bio,
> > +	.end_io		= f2fs_dio_end_io,
> >  };
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 6dbbac05a15c..abd521dc504a 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1750,13 +1750,6 @@ struct f2fs_sb_info {
> >  #endif
> >  };
> >  
> > -struct f2fs_private_dio {
> > -	struct inode *inode;
> > -	void *orig_private;
> > -	bio_end_io_t *orig_end_io;
> > -	bool write;
> > -};
> > -
> >  #ifdef CONFIG_F2FS_FAULT_INJECTION
> >  #define f2fs_show_injection_info(sbi, type)					\
> >  	printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n",	\
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 6b8eac6b25d4..4fed90cc1462 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -4259,6 +4259,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  		down_read(&fi->i_gc_rwsem[READ]);
> >  	}
> >  
> > +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> >  	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
> >  
> >  	up_read(&fi->i_gc_rwsem[READ]);
> > @@ -4270,6 +4271,8 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  	else if (ret == -EIOCBQUEUED)
> >  		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
> >  				   count - iov_iter_count(to));
> > +	else
> > +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> >  out:
> >  	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
> >  	return ret;
> > @@ -4446,6 +4449,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
> >  
> >  	if (pos + count > inode->i_size)
> >  		dio_flags |= IOMAP_DIO_FORCE_WAIT;
> > +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
> >  	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
> >  			   dio_flags);
> >  	if (ret == -ENOTBLK)
> > @@ -4459,6 +4463,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
> >  
> >  	up_read(&fi->i_gc_rwsem[WRITE]);
> >  
> > +	if (ret <= 0 && ret != -EIOCBQUEUED)
> > +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
> > +
> >  	if (ret < 0) {
> >  		if (ret == -EIOCBQUEUED)
> >  			f2fs_update_iostat(sbi, APP_DIRECT_IO,

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-22 20:49         ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-22 20:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Eric Biggers,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel

On 07/22, Jaegeuk Kim wrote:
> On 07/19, Christoph Hellwig wrote:
> > On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > > +				    struct bio *bio, loff_t file_offset)
> > > +{
> > > +	struct f2fs_private_dio *dio;
> > > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > > +
> > > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > > +	if (!dio)
> > > +		goto out;
> > > +
> > > +	dio->inode = inode;
> > > +	dio->orig_end_io = bio->bi_end_io;
> > > +	dio->orig_private = bio->bi_private;
> > > +	dio->write = write;
> > > +
> > > +	bio->bi_end_io = f2fs_dio_end_io;
> > > +	bio->bi_private = dio;
> > > +
> > > +	inc_page_count(F2FS_I_SB(inode),
> > > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > > +
> > > +	return submit_bio(bio);
> > 
> > I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> > F2FS_DIO_READ counts are only used to check if there is any inflight
> > I/O at all.  So instead we can increment them once before calling
> > iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> > exit from iomap_dio_rw.  Untested patch below.  Note that all this
> > would be much simpler to review if the last three patches were folded
> > into a single one.
> 
> Eric, wdyt?
> 
> I've merged v1 to v5, including Christoph's comment in v2.

Sorry, I mean patch #1 to #5. You can find them in:
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/log/?h=dev

> 
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 4fbf28f5aaab..9f9cc49fbe94 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -3369,50 +3369,6 @@ static int f2fs_write_end(struct file *file,
> >  	return copied;
> >  }
> >  
> > -static void f2fs_dio_end_io(struct bio *bio)
> > -{
> > -	struct f2fs_private_dio *dio = bio->bi_private;
> > -
> > -	dec_page_count(F2FS_I_SB(dio->inode),
> > -			dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > -
> > -	bio->bi_private = dio->orig_private;
> > -	bio->bi_end_io = dio->orig_end_io;
> > -
> > -	kfree(dio);
> > -
> > -	bio_endio(bio);
> > -}
> > -
> > -static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > -				    struct bio *bio, loff_t file_offset)
> > -{
> > -	struct f2fs_private_dio *dio;
> > -	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > -
> > -	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > -			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > -	if (!dio)
> > -		goto out;
> > -
> > -	dio->inode = inode;
> > -	dio->orig_end_io = bio->bi_end_io;
> > -	dio->orig_private = bio->bi_private;
> > -	dio->write = write;
> > -
> > -	bio->bi_end_io = f2fs_dio_end_io;
> > -	bio->bi_private = dio;
> > -
> > -	inc_page_count(F2FS_I_SB(inode),
> > -			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > -
> > -	return submit_bio(bio);
> > -out:
> > -	bio->bi_status = BLK_STS_IOERR;
> > -	bio_endio(bio);
> > -	return BLK_QC_T_NONE;
> > -}
> > -
> >  void f2fs_invalidate_page(struct page *page, unsigned int offset,
> >  							unsigned int length)
> >  {
> > @@ -4006,6 +3962,18 @@ const struct iomap_ops f2fs_iomap_ops = {
> >  	.iomap_begin	= f2fs_iomap_begin,
> >  };
> >  
> > +static int f2fs_dio_end_io(struct kiocb *iocb, ssize_t size, int error,
> > +		unsigned flags)
> > +{
> > +	struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
> > +
> > +	if (iocb->ki_flags & IOCB_WRITE)
> > +		dec_page_count(sbi, F2FS_DIO_WRITE);
> > +	else
> > +		dec_page_count(sbi, F2FS_DIO_READ);
> > +	return 0;
> > +}
> > +
> >  const struct iomap_dio_ops f2fs_iomap_dio_ops = {
> > -	.submit_io	= f2fs_dio_submit_bio,
> > +	.end_io		= f2fs_dio_end_io,
> >  };
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 6dbbac05a15c..abd521dc504a 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1750,13 +1750,6 @@ struct f2fs_sb_info {
> >  #endif
> >  };
> >  
> > -struct f2fs_private_dio {
> > -	struct inode *inode;
> > -	void *orig_private;
> > -	bio_end_io_t *orig_end_io;
> > -	bool write;
> > -};
> > -
> >  #ifdef CONFIG_F2FS_FAULT_INJECTION
> >  #define f2fs_show_injection_info(sbi, type)					\
> >  	printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n",	\
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 6b8eac6b25d4..4fed90cc1462 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -4259,6 +4259,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  		down_read(&fi->i_gc_rwsem[READ]);
> >  	}
> >  
> > +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> >  	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
> >  
> >  	up_read(&fi->i_gc_rwsem[READ]);
> > @@ -4270,6 +4271,8 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  	else if (ret == -EIOCBQUEUED)
> >  		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
> >  				   count - iov_iter_count(to));
> > +	else
> > +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> >  out:
> >  	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
> >  	return ret;
> > @@ -4446,6 +4449,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
> >  
> >  	if (pos + count > inode->i_size)
> >  		dio_flags |= IOMAP_DIO_FORCE_WAIT;
> > +	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
> >  	ret = iomap_dio_rw(iocb, from, &f2fs_iomap_ops, &f2fs_iomap_dio_ops,
> >  			   dio_flags);
> >  	if (ret == -ENOTBLK)
> > @@ -4459,6 +4463,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
> >  
> >  	up_read(&fi->i_gc_rwsem[WRITE]);
> >  
> > +	if (ret <= 0 && ret != -EIOCBQUEUED)
> > +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_WRITE);
> > +
> >  	if (ret < 0) {
> >  		if (ret == -EIOCBQUEUED)
> >  			f2fs_update_iostat(sbi, APP_DIRECT_IO,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-22 20:47       ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-22 20:54         ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-22 20:54 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Christoph Hellwig, linux-f2fs-devel, Chao Yu, linux-fsdevel,
	linux-xfs, Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Thu, Jul 22, 2021 at 01:47:39PM -0700, Jaegeuk Kim wrote:
> On 07/19, Christoph Hellwig wrote:
> > On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > > +				    struct bio *bio, loff_t file_offset)
> > > +{
> > > +	struct f2fs_private_dio *dio;
> > > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > > +
> > > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > > +	if (!dio)
> > > +		goto out;
> > > +
> > > +	dio->inode = inode;
> > > +	dio->orig_end_io = bio->bi_end_io;
> > > +	dio->orig_private = bio->bi_private;
> > > +	dio->write = write;
> > > +
> > > +	bio->bi_end_io = f2fs_dio_end_io;
> > > +	bio->bi_private = dio;
> > > +
> > > +	inc_page_count(F2FS_I_SB(inode),
> > > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > > +
> > > +	return submit_bio(bio);
> > 
> > I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> > F2FS_DIO_READ counts are only used to check if there is any inflight
> > I/O at all.  So instead we can increment them once before calling
> > iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> > exit from iomap_dio_rw.  Untested patch below.  Note that all this
> > would be much simpler to review if the last three patches were folded
> > into a single one.
> 
> Eric, wdyt?
> 
> I've merged v1 to v5, including Christoph's comment in v2.
> 

I am planning to do this, but I got caught up by the patch
"f2fs: fix wrong inflight page stats for directIO" that was recently added to
f2fs.git#dev, which makes this suggestion no longer viable.  Hence my review
comment on that patch
(https://lkml.kernel.org/r/YPjNGoFzQojO5Amr@sol.localdomain)
and Chao's new version of that patch
(https://lkml.kernel.org/r/20210722131617.749204-1-chao@kernel.org),
although the new version has some issues too as I commented.

If you could just revert "f2fs: fix wrong inflight page stats for directIO"
for now, that would be helpful, as I don't think we want it.

- Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-22 20:54         ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-22 20:54 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Christoph Hellwig,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel

On Thu, Jul 22, 2021 at 01:47:39PM -0700, Jaegeuk Kim wrote:
> On 07/19, Christoph Hellwig wrote:
> > On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > > +				    struct bio *bio, loff_t file_offset)
> > > +{
> > > +	struct f2fs_private_dio *dio;
> > > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > > +
> > > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > > +	if (!dio)
> > > +		goto out;
> > > +
> > > +	dio->inode = inode;
> > > +	dio->orig_end_io = bio->bi_end_io;
> > > +	dio->orig_private = bio->bi_private;
> > > +	dio->write = write;
> > > +
> > > +	bio->bi_end_io = f2fs_dio_end_io;
> > > +	bio->bi_private = dio;
> > > +
> > > +	inc_page_count(F2FS_I_SB(inode),
> > > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > > +
> > > +	return submit_bio(bio);
> > 
> > I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> > F2FS_DIO_READ counts are only used to check if there is any inflight
> > I/O at all.  So instead we can increment them once before calling
> > iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> > exit from iomap_dio_rw.  Untested patch below.  Note that all this
> > would be much simpler to review if the last three patches were folded
> > into a single one.
> 
> Eric, wdyt?
> 
> I've merged v1 to v5, including Christoph's comment in v2.
> 

I am planning to do this, but I got caught up by the patch
"f2fs: fix wrong inflight page stats for directIO" that was recently added to
f2fs.git#dev, which makes this suggestion no longer viable.  Hence my review
comment on that patch
(https://lkml.kernel.org/r/YPjNGoFzQojO5Amr@sol.localdomain)
and Chao's new version of that patch
(https://lkml.kernel.org/r/20210722131617.749204-1-chao@kernel.org),
although the new version has some issues too as I commented.

If you could just revert "f2fs: fix wrong inflight page stats for directIO"
for now, that would be helpful, as I don't think we want it.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-22 20:54         ` [f2fs-dev] " Eric Biggers
@ 2021-07-22 21:57           ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-22 21:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, linux-f2fs-devel, Chao Yu, linux-fsdevel,
	linux-xfs, Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/22, Eric Biggers wrote:
> On Thu, Jul 22, 2021 at 01:47:39PM -0700, Jaegeuk Kim wrote:
> > On 07/19, Christoph Hellwig wrote:
> > > On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > > > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > > > +				    struct bio *bio, loff_t file_offset)
> > > > +{
> > > > +	struct f2fs_private_dio *dio;
> > > > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > > > +
> > > > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > > > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > > > +	if (!dio)
> > > > +		goto out;
> > > > +
> > > > +	dio->inode = inode;
> > > > +	dio->orig_end_io = bio->bi_end_io;
> > > > +	dio->orig_private = bio->bi_private;
> > > > +	dio->write = write;
> > > > +
> > > > +	bio->bi_end_io = f2fs_dio_end_io;
> > > > +	bio->bi_private = dio;
> > > > +
> > > > +	inc_page_count(F2FS_I_SB(inode),
> > > > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > > > +
> > > > +	return submit_bio(bio);
> > > 
> > > I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> > > F2FS_DIO_READ counts are only used to check if there is any inflight
> > > I/O at all.  So instead we can increment them once before calling
> > > iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> > > exit from iomap_dio_rw.  Untested patch below.  Note that all this
> > > would be much simpler to review if the last three patches were folded
> > > into a single one.
> > 
> > Eric, wdyt?
> > 
> > I've merged v1 to v5, including Christoph's comment in v2.
> > 
> 
> I am planning to do this, but I got caught up by the patch
> "f2fs: fix wrong inflight page stats for directIO" that was recently added to
> f2fs.git#dev, which makes this suggestion no longer viable.  Hence my review
> comment on that patch
> (https://lkml.kernel.org/r/YPjNGoFzQojO5Amr@sol.localdomain)
> and Chao's new version of that patch
> (https://lkml.kernel.org/r/20210722131617.749204-1-chao@kernel.org),
> although the new version has some issues too as I commented.
> 
> If you could just revert "f2fs: fix wrong inflight page stats for directIO"
> for now, that would be helpful, as I don't think we want it.

Yup, I dropped it in dev branch, and wait for Chao's next patch on top of
iomap.

> 
> - Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-22 21:57           ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-22 21:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Christoph Hellwig,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel

On 07/22, Eric Biggers wrote:
> On Thu, Jul 22, 2021 at 01:47:39PM -0700, Jaegeuk Kim wrote:
> > On 07/19, Christoph Hellwig wrote:
> > > On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > > > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > > > +				    struct bio *bio, loff_t file_offset)
> > > > +{
> > > > +	struct f2fs_private_dio *dio;
> > > > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > > > +
> > > > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > > > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > > > +	if (!dio)
> > > > +		goto out;
> > > > +
> > > > +	dio->inode = inode;
> > > > +	dio->orig_end_io = bio->bi_end_io;
> > > > +	dio->orig_private = bio->bi_private;
> > > > +	dio->write = write;
> > > > +
> > > > +	bio->bi_end_io = f2fs_dio_end_io;
> > > > +	bio->bi_private = dio;
> > > > +
> > > > +	inc_page_count(F2FS_I_SB(inode),
> > > > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > > > +
> > > > +	return submit_bio(bio);
> > > 
> > > I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> > > F2FS_DIO_READ counts are only used to check if there is any inflight
> > > I/O at all.  So instead we can increment them once before calling
> > > iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> > > exit from iomap_dio_rw.  Untested patch below.  Note that all this
> > > would be much simpler to review if the last three patches were folded
> > > into a single one.
> > 
> > Eric, wdyt?
> > 
> > I've merged v1 to v5, including Christoph's comment in v2.
> > 
> 
> I am planning to do this, but I got caught up by the patch
> "f2fs: fix wrong inflight page stats for directIO" that was recently added to
> f2fs.git#dev, which makes this suggestion no longer viable.  Hence my review
> comment on that patch
> (https://lkml.kernel.org/r/YPjNGoFzQojO5Amr@sol.localdomain)
> and Chao's new version of that patch
> (https://lkml.kernel.org/r/20210722131617.749204-1-chao@kernel.org),
> although the new version has some issues too as I commented.
> 
> If you could just revert "f2fs: fix wrong inflight page stats for directIO"
> for now, that would be helpful, as I don't think we want it.

Yup, I dropped it in dev branch, and wait for Chao's next patch on top of
iomap.

> 
> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-19  8:59     ` [f2fs-dev] " Christoph Hellwig
@ 2021-07-23  1:52       ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-23  1:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-f2fs-devel, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

Hi Christoph,

On Mon, Jul 19, 2021 at 10:59:10AM +0200, Christoph Hellwig wrote:
> On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > +				    struct bio *bio, loff_t file_offset)
> > +{
> > +	struct f2fs_private_dio *dio;
> > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > +
> > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > +	if (!dio)
> > +		goto out;
> > +
> > +	dio->inode = inode;
> > +	dio->orig_end_io = bio->bi_end_io;
> > +	dio->orig_private = bio->bi_private;
> > +	dio->write = write;
> > +
> > +	bio->bi_end_io = f2fs_dio_end_io;
> > +	bio->bi_private = dio;
> > +
> > +	inc_page_count(F2FS_I_SB(inode),
> > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > +
> > +	return submit_bio(bio);
> 
> I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> F2FS_DIO_READ counts are only used to check if there is any inflight
> I/O at all.  So instead we can increment them once before calling
> iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> exit from iomap_dio_rw.  Untested patch below.  Note that all this
> would be much simpler to review if the last three patches were folded
> into a single one.
> 

I am trying to do this, but unfortunately I don't see a way to make it work
correctly in all cases.

The main problem is that when iomap_dio_rw() returns an error (other than
-EIOCBQUEUED), there is no way to know whether ->end_io() has been called or
not.  This is because iomap_dio_rw() can fail either early, before "starting"
the I/O (in which case ->end_io() won't have been called), or later, after
"starting" the I/O (in which case ->end_io() will have been called).  Note that
this can't be worked around by checking whether the iov_iter has been advanced
or not, since a failure could occur between "starting" the I/O and the iov_iter
being advanced for the first time.

Would you be receptive to adding a ->begin_io() callback to struct iomap_dio_ops
in order to allow filesystems to maintain counters like this?

Either way, given the problem here, I think I should leave this out of the
initial conversion and just do a dumb translation of the existing f2fs logic to
start with, like I have in this patch.

- Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-23  1:52       ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-23  1:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

Hi Christoph,

On Mon, Jul 19, 2021 at 10:59:10AM +0200, Christoph Hellwig wrote:
> On Fri, Jul 16, 2021 at 09:39:16AM -0500, Eric Biggers wrote:
> > +static blk_qc_t f2fs_dio_submit_bio(struct inode *inode, struct iomap *iomap,
> > +				    struct bio *bio, loff_t file_offset)
> > +{
> > +	struct f2fs_private_dio *dio;
> > +	bool write = (bio_op(bio) == REQ_OP_WRITE);
> > +
> > +	dio = f2fs_kzalloc(F2FS_I_SB(inode),
> > +			sizeof(struct f2fs_private_dio), GFP_NOFS);
> > +	if (!dio)
> > +		goto out;
> > +
> > +	dio->inode = inode;
> > +	dio->orig_end_io = bio->bi_end_io;
> > +	dio->orig_private = bio->bi_private;
> > +	dio->write = write;
> > +
> > +	bio->bi_end_io = f2fs_dio_end_io;
> > +	bio->bi_private = dio;
> > +
> > +	inc_page_count(F2FS_I_SB(inode),
> > +			write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
> > +
> > +	return submit_bio(bio);
> 
> I don't think there is any need for this mess.  The F2FS_DIO_WRITE /
> F2FS_DIO_READ counts are only used to check if there is any inflight
> I/O at all.  So instead we can increment them once before calling
> iomap_dio_rw, and decrement them in ->end_io or for a failure/noop
> exit from iomap_dio_rw.  Untested patch below.  Note that all this
> would be much simpler to review if the last three patches were folded
> into a single one.
> 

I am trying to do this, but unfortunately I don't see a way to make it work
correctly in all cases.

The main problem is that when iomap_dio_rw() returns an error (other than
-EIOCBQUEUED), there is no way to know whether ->end_io() has been called or
not.  This is because iomap_dio_rw() can fail either early, before "starting"
the I/O (in which case ->end_io() won't have been called), or later, after
"starting" the I/O (in which case ->end_io() will have been called).  Note that
this can't be worked around by checking whether the iov_iter has been advanced
or not, since a failure could occur between "starting" the I/O and the iov_iter
being advanced for the first time.

Would you be receptive to adding a ->begin_io() callback to struct iomap_dio_ops
in order to allow filesystems to maintain counters like this?

Either way, given the problem here, I think I should leave this out of the
initial conversion and just do a dumb translation of the existing f2fs logic to
start with, like I have in this patch.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-23  1:52       ` [f2fs-dev] " Eric Biggers
@ 2021-07-23  5:00         ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2021-07-23  5:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, linux-f2fs-devel, Jaegeuk Kim, Chao Yu,
	linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

On Thu, Jul 22, 2021 at 06:52:33PM -0700, Eric Biggers wrote:
> I am trying to do this, but unfortunately I don't see a way to make it work
> correctly in all cases.
> 
> The main problem is that when iomap_dio_rw() returns an error (other than
> -EIOCBQUEUED), there is no way to know whether ->end_io() has been called or
> not.  This is because iomap_dio_rw() can fail either early, before "starting"
> the I/O (in which case ->end_io() won't have been called), or later, after
> "starting" the I/O (in which case ->end_io() will have been called).  Note that
> this can't be worked around by checking whether the iov_iter has been advanced
> or not, since a failure could occur between "starting" the I/O and the iov_iter
> being advanced for the first time.
> 
> Would you be receptive to adding a ->begin_io() callback to struct iomap_dio_ops
> in order to allow filesystems to maintain counters like this?

I think we can triviall fix this by using the slightly lower level
__iomap_dio_rw API.  Incremental patch to my previous one below:

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4fed90cc1462..11844bd0cb7a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4243,6 +4243,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct f2fs_inode_info *fi = F2FS_I(inode);
 	const loff_t pos = iocb->ki_pos;
 	const size_t count = iov_iter_count(to);
+	struct iomap_dio *dio;
 	ssize_t ret;
 
 	if (count == 0)
@@ -4260,8 +4261,13 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	}
 
 	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
-	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
-
+	dio = __iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
+	if (IS_ERR_OR_NULL(dio)) {
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
+		ret = PTR_ERR_OR_ZERO(dio);
+	} else {
+		ret = iomap_dio_complete(dio);
+	}
 	up_read(&fi->i_gc_rwsem[READ]);
 
 	file_accessed(file);
@@ -4271,8 +4277,6 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	else if (ret == -EIOCBQUEUED)
 		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
 				   count - iov_iter_count(to));
-	else
-		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
 out:
 	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
 	return ret;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-23  5:00         ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2021-07-23  5:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Christoph Hellwig,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Thu, Jul 22, 2021 at 06:52:33PM -0700, Eric Biggers wrote:
> I am trying to do this, but unfortunately I don't see a way to make it work
> correctly in all cases.
> 
> The main problem is that when iomap_dio_rw() returns an error (other than
> -EIOCBQUEUED), there is no way to know whether ->end_io() has been called or
> not.  This is because iomap_dio_rw() can fail either early, before "starting"
> the I/O (in which case ->end_io() won't have been called), or later, after
> "starting" the I/O (in which case ->end_io() will have been called).  Note that
> this can't be worked around by checking whether the iov_iter has been advanced
> or not, since a failure could occur between "starting" the I/O and the iov_iter
> being advanced for the first time.
> 
> Would you be receptive to adding a ->begin_io() callback to struct iomap_dio_ops
> in order to allow filesystems to maintain counters like this?

I think we can triviall fix this by using the slightly lower level
__iomap_dio_rw API.  Incremental patch to my previous one below:

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4fed90cc1462..11844bd0cb7a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4243,6 +4243,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct f2fs_inode_info *fi = F2FS_I(inode);
 	const loff_t pos = iocb->ki_pos;
 	const size_t count = iov_iter_count(to);
+	struct iomap_dio *dio;
 	ssize_t ret;
 
 	if (count == 0)
@@ -4260,8 +4261,13 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	}
 
 	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
-	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
-
+	dio = __iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
+	if (IS_ERR_OR_NULL(dio)) {
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
+		ret = PTR_ERR_OR_ZERO(dio);
+	} else {
+		ret = iomap_dio_complete(dio);
+	}
 	up_read(&fi->i_gc_rwsem[READ]);
 
 	file_accessed(file);
@@ -4271,8 +4277,6 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	else if (ret == -EIOCBQUEUED)
 		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
 				   count - iov_iter_count(to));
-	else
-		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
 out:
 	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
 	return ret;


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] f2fs: implement iomap operations
  2021-07-23  5:00         ` [f2fs-dev] " Christoph Hellwig
@ 2021-07-23  8:05           ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-23  8:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-f2fs-devel, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Fri, Jul 23, 2021 at 06:00:03AM +0100, Christoph Hellwig wrote:
> On Thu, Jul 22, 2021 at 06:52:33PM -0700, Eric Biggers wrote:
> > I am trying to do this, but unfortunately I don't see a way to make it work
> > correctly in all cases.
> > 
> > The main problem is that when iomap_dio_rw() returns an error (other than
> > -EIOCBQUEUED), there is no way to know whether ->end_io() has been called or
> > not.  This is because iomap_dio_rw() can fail either early, before "starting"
> > the I/O (in which case ->end_io() won't have been called), or later, after
> > "starting" the I/O (in which case ->end_io() will have been called).  Note that
> > this can't be worked around by checking whether the iov_iter has been advanced
> > or not, since a failure could occur between "starting" the I/O and the iov_iter
> > being advanced for the first time.
> > 
> > Would you be receptive to adding a ->begin_io() callback to struct iomap_dio_ops
> > in order to allow filesystems to maintain counters like this?
> 
> I think we can triviall fix this by using the slightly lower level
> __iomap_dio_rw API.  Incremental patch to my previous one below:
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 4fed90cc1462..11844bd0cb7a 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -4243,6 +4243,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	struct f2fs_inode_info *fi = F2FS_I(inode);
>  	const loff_t pos = iocb->ki_pos;
>  	const size_t count = iov_iter_count(to);
> +	struct iomap_dio *dio;
>  	ssize_t ret;
>  
>  	if (count == 0)
> @@ -4260,8 +4261,13 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	}
>  
>  	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> -	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
> -
> +	dio = __iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
> +	if (IS_ERR_OR_NULL(dio)) {
> +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> +		ret = PTR_ERR_OR_ZERO(dio);
> +	} else {
> +		ret = iomap_dio_complete(dio);
> +	}
>  	up_read(&fi->i_gc_rwsem[READ]);
>  
>  	file_accessed(file);
> @@ -4271,8 +4277,6 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	else if (ret == -EIOCBQUEUED)
>  		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
>  				   count - iov_iter_count(to));
> -	else
> -		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
>  out:
>  	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
>  	return ret;

I wouldn't call it trivial, but yes that seems to work (after fixing it to
handle EIOCBQUEUED correctly).  Take a look at the v2 I've sent out.  Thanks!

- Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 6/9] f2fs: implement iomap operations
@ 2021-07-23  8:05           ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-23  8:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Fri, Jul 23, 2021 at 06:00:03AM +0100, Christoph Hellwig wrote:
> On Thu, Jul 22, 2021 at 06:52:33PM -0700, Eric Biggers wrote:
> > I am trying to do this, but unfortunately I don't see a way to make it work
> > correctly in all cases.
> > 
> > The main problem is that when iomap_dio_rw() returns an error (other than
> > -EIOCBQUEUED), there is no way to know whether ->end_io() has been called or
> > not.  This is because iomap_dio_rw() can fail either early, before "starting"
> > the I/O (in which case ->end_io() won't have been called), or later, after
> > "starting" the I/O (in which case ->end_io() will have been called).  Note that
> > this can't be worked around by checking whether the iov_iter has been advanced
> > or not, since a failure could occur between "starting" the I/O and the iov_iter
> > being advanced for the first time.
> > 
> > Would you be receptive to adding a ->begin_io() callback to struct iomap_dio_ops
> > in order to allow filesystems to maintain counters like this?
> 
> I think we can triviall fix this by using the slightly lower level
> __iomap_dio_rw API.  Incremental patch to my previous one below:
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 4fed90cc1462..11844bd0cb7a 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -4243,6 +4243,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	struct f2fs_inode_info *fi = F2FS_I(inode);
>  	const loff_t pos = iocb->ki_pos;
>  	const size_t count = iov_iter_count(to);
> +	struct iomap_dio *dio;
>  	ssize_t ret;
>  
>  	if (count == 0)
> @@ -4260,8 +4261,13 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	}
>  
>  	inc_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> -	ret = iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
> -
> +	dio = __iomap_dio_rw(iocb, to, &f2fs_iomap_ops, &f2fs_iomap_dio_ops, 0);
> +	if (IS_ERR_OR_NULL(dio)) {
> +		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
> +		ret = PTR_ERR_OR_ZERO(dio);
> +	} else {
> +		ret = iomap_dio_complete(dio);
> +	}
>  	up_read(&fi->i_gc_rwsem[READ]);
>  
>  	file_accessed(file);
> @@ -4271,8 +4277,6 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	else if (ret == -EIOCBQUEUED)
>  		f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
>  				   count - iov_iter_count(to));
> -	else
> -		dec_page_count(F2FS_I_SB(inode), F2FS_DIO_READ);
>  out:
>  	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
>  	return ret;

I wouldn't call it trivial, but yes that seems to work (after fixing it to
handle EIOCBQUEUED correctly).  Take a look at the v2 I've sent out.  Thanks!

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/9] f2fs: make f2fs_write_failed() take struct inode
  2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
@ 2021-07-25 10:00     ` Chao Yu
  -1 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-25 10:00 UTC (permalink / raw)
  To: Eric Biggers, linux-f2fs-devel, Jaegeuk Kim
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

On 2021/7/16 22:39, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Make f2fs_write_failed() take a 'struct inode' directly rather than a
> 'struct address_space', as this simplifies it slightly.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 1/9] f2fs: make f2fs_write_failed() take struct inode
@ 2021-07-25 10:00     ` Chao Yu
  0 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-25 10:00 UTC (permalink / raw)
  To: Eric Biggers, linux-f2fs-devel, Jaegeuk Kim
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

On 2021/7/16 22:39, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Make f2fs_write_failed() take a 'struct inode' directly rather than a
> 'struct address_space', as this simplifies it slightly.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
@ 2021-07-25 10:50     ` Chao Yu
  -1 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-25 10:50 UTC (permalink / raw)
  To: Eric Biggers, linux-f2fs-devel, Jaegeuk Kim
  Cc: linux-fsdevel, linux-xfs, Satya Tangirala, Changheun Lee,
	Matthew Bobrowski

On 2021/7/16 22:39, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> f2fs_write_begin() assumes that all blocks were preallocated by
> default unless FI_NO_PREALLOC is explicitly set.  This invites data
> corruption, as there are cases in which not all blocks are preallocated.
> Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> buffered_io") fixed one case, but there are others remaining.

Could you please explain which cases we missed to handle previously?
then I can check those related logic before and after the rework.

> -			/*
> -			 * If force_buffere_io() is true, we have to allocate
> -			 * blocks all the time, since f2fs_direct_IO will fall
> -			 * back to buffered IO.
> -			 */
> -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> -					f2fs_lfs_mode(F2FS_I_SB(inode)))

We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
will always allocate two block addresses for each 4k append IO.

I jsut test based on codes of last f2fs dev-test branch.

rm /mnt/f2fs/dio
dd if=/dev/zero  of=/mnt/f2fs/dio bs=4k count=4 oflag=direct

           <...>-763176  [001] ...1 177258.793370: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 0, start blkaddr = 0xe1a2e, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
            <...>-763176  [001] ...1 177258.793462: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 0, start blkaddr = 0xe1a2f, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793575: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 1, start blkaddr = 0xe1a30, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793599: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 1, start blkaddr = 0xe1a31, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793735: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 2, start blkaddr = 0xe1a32, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793769: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 2, start blkaddr = 0xe1a33, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793859: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 3, start blkaddr = 0xe1a34, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793885: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 3, start blkaddr = 0xe1a35, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0

Thanks,

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-25 10:50     ` Chao Yu
  0 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-25 10:50 UTC (permalink / raw)
  To: Eric Biggers, linux-f2fs-devel, Jaegeuk Kim
  Cc: linux-fsdevel, linux-xfs, Matthew Bobrowski, Satya Tangirala,
	Changheun Lee

On 2021/7/16 22:39, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> f2fs_write_begin() assumes that all blocks were preallocated by
> default unless FI_NO_PREALLOC is explicitly set.  This invites data
> corruption, as there are cases in which not all blocks are preallocated.
> Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> buffered_io") fixed one case, but there are others remaining.

Could you please explain which cases we missed to handle previously?
then I can check those related logic before and after the rework.

> -			/*
> -			 * If force_buffere_io() is true, we have to allocate
> -			 * blocks all the time, since f2fs_direct_IO will fall
> -			 * back to buffered IO.
> -			 */
> -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> -					f2fs_lfs_mode(F2FS_I_SB(inode)))

We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
will always allocate two block addresses for each 4k append IO.

I jsut test based on codes of last f2fs dev-test branch.

rm /mnt/f2fs/dio
dd if=/dev/zero  of=/mnt/f2fs/dio bs=4k count=4 oflag=direct

           <...>-763176  [001] ...1 177258.793370: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 0, start blkaddr = 0xe1a2e, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
            <...>-763176  [001] ...1 177258.793462: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 0, start blkaddr = 0xe1a2f, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793575: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 1, start blkaddr = 0xe1a30, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793599: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 1, start blkaddr = 0xe1a31, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793735: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 2, start blkaddr = 0xe1a32, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793769: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 2, start blkaddr = 0xe1a33, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793859: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 3, start blkaddr = 0xe1a34, len = 0x1, flags = 48,seg_type = 1, may_create = 1, err = 0
               dd-763176  [001] ...1 177258.793885: f2fs_map_blocks: dev = (259,1), ino = 6, file offset = 3, start blkaddr = 0xe1a35, len = 0x1, flags = 16,seg_type = 1, may_create = 1, err = 0

Thanks,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
@ 2021-07-25 15:35     ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-25 15:35 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-f2fs-devel, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

Note that, this patch is failing generic/250.

On 07/16, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> f2fs_write_begin() assumes that all blocks were preallocated by
> default unless FI_NO_PREALLOC is explicitly set.  This invites data
> corruption, as there are cases in which not all blocks are preallocated.
> Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> buffered_io") fixed one case, but there are others remaining.
> 
> Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which
> only gets set if all blocks for the current write were preallocated.
> 
> Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it
> handle some of the logic that was previously in write_iter() directly.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/f2fs/data.c |  55 ++--------------------
>  fs/f2fs/f2fs.h |   3 +-
>  fs/f2fs/file.c | 123 ++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 84 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 18cb28a514e6..cdadaa9daf55 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1370,53 +1370,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
>  	return 0;
>  }
>  
> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
> -{
> -	struct inode *inode = file_inode(iocb->ki_filp);
> -	struct f2fs_map_blocks map;
> -	int flag;
> -	int err = 0;
> -	bool direct_io = iocb->ki_flags & IOCB_DIRECT;
> -
> -	map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
> -	map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
> -	if (map.m_len > map.m_lblk)
> -		map.m_len -= map.m_lblk;
> -	else
> -		map.m_len = 0;
> -
> -	map.m_next_pgofs = NULL;
> -	map.m_next_extent = NULL;
> -	map.m_seg_type = NO_CHECK_TYPE;
> -	map.m_may_create = true;
> -
> -	if (direct_io) {
> -		map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
> -		flag = f2fs_force_buffered_io(inode, iocb, from) ?
> -					F2FS_GET_BLOCK_PRE_AIO :
> -					F2FS_GET_BLOCK_PRE_DIO;
> -		goto map_blocks;
> -	}
> -	if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
> -		err = f2fs_convert_inline_inode(inode);
> -		if (err)
> -			return err;
> -	}
> -	if (f2fs_has_inline_data(inode))
> -		return err;
> -
> -	flag = F2FS_GET_BLOCK_PRE_AIO;
> -
> -map_blocks:
> -	err = f2fs_map_blocks(inode, &map, 1, flag);
> -	if (map.m_len > 0 && err == -ENOSPC) {
> -		if (!direct_io)
> -			set_inode_flag(inode, FI_NO_PREALLOC);
> -		err = 0;
> -	}
> -	return err;
> -}
> -
>  void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
>  {
>  	if (flag == F2FS_GET_BLOCK_PRE_AIO) {
> @@ -3210,12 +3163,10 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
>  	int flag;
>  
>  	/*
> -	 * we already allocated all the blocks, so we don't need to get
> -	 * the block addresses when there is no need to fill the page.
> +	 * If a whole page is being written and we already preallocated all the
> +	 * blocks, then there is no need to get a block address now.
>  	 */
> -	if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
> -	    !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
> -	    !f2fs_verity_in_progress(inode))
> +	if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
>  		return 0;
>  
>  	/* f2fs_lock_op avoids race between write CP and convert_inline_page */
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index ad7c1b94e23a..da1da3111f18 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -699,7 +699,7 @@ enum {
>  	FI_INLINE_DOTS,		/* indicate inline dot dentries */
>  	FI_DO_DEFRAG,		/* indicate defragment is running */
>  	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
> -	FI_NO_PREALLOC,		/* indicate skipped preallocated blocks */
> +	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
>  	FI_HOT_DATA,		/* indicate file is hot */
>  	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
>  	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
> @@ -3604,7 +3604,6 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
>  int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
>  int f2fs_reserve_new_block(struct dnode_of_data *dn);
>  int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
>  int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
>  struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
>  			int op_flags, bool for_write);
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index b1cb5b50faac..9b12004e78c6 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -4218,10 +4218,72 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  	return ret;
>  }
>  
> +/*
> + * Preallocate blocks for a write request, if it is possible and helpful to do
> + * so.  Returns a positive number if blocks may have been preallocated, 0 if no
> + * blocks were preallocated, or a negative errno value if something went
> + * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
> + * requested blocks (not just some of them) have been allocated.
> + */
> +static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
> +{
> +	struct inode *inode = file_inode(iocb->ki_filp);
> +	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +	const loff_t pos = iocb->ki_pos;
> +	const size_t count = iov_iter_count(iter);
> +	struct f2fs_map_blocks map = {};
> +	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
> +		   !f2fs_force_buffered_io(inode, iocb, iter);
> +	int flag;
> +	int ret;
> +
> +	/* If it will be an in-place direct write, don't bother. */
> +	if (dio && !f2fs_lfs_mode(sbi))
> +		return 0;
> +
> +	/* No-wait I/O can't allocate blocks. */
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		return 0;
> +
> +	/* If it will be a short write, don't bother. */
> +	if (iov_iter_fault_in_readable(iter, count) != 0)
> +		return 0;
> +
> +	if (f2fs_has_inline_data(inode)) {
> +		/* If the data will fit inline, don't bother. */
> +		if (pos + count <= MAX_INLINE_DATA(inode))
> +			return 0;
> +		ret = f2fs_convert_inline_inode(inode);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	map.m_lblk = (pos >> inode->i_blkbits);
> +	map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
> +	map.m_may_create = true;
> +	if (dio) {
> +		map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
> +		flag = F2FS_GET_BLOCK_PRE_DIO;
> +	} else {
> +		map.m_seg_type = NO_CHECK_TYPE;
> +		flag = F2FS_GET_BLOCK_PRE_AIO;
> +	}
> +
> +	ret = f2fs_map_blocks(inode, &map, 1, flag);
> +	/* -ENOSPC is only a fatal error if no blocks could be allocated. */
> +	if (ret < 0 && !(ret == -ENOSPC && map.m_len > 0))
> +		return ret;
> +	if (ret == 0)
> +		set_inode_flag(inode, FI_PREALLOCATED_ALL);
> +	return map.m_len;
> +}
> +
>  static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file_inode(file);
> +	loff_t target_size;
> +	int preallocated;
>  	ssize_t ret;
>  
>  	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
> @@ -4245,84 +4307,59 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  
>  	if (unlikely(IS_IMMUTABLE(inode))) {
>  		ret = -EPERM;
> -		goto unlock;
> +		goto out_unlock;
>  	}
>  
>  	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
>  		ret = -EPERM;
> -		goto unlock;
> +		goto out_unlock;
>  	}
>  
>  	ret = generic_write_checks(iocb, from);
>  	if (ret > 0) {
> -		bool preallocated = false;
> -		size_t target_size = 0;
> -		int err;
> -
> -		if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
> -			set_inode_flag(inode, FI_NO_PREALLOC);
> -
> -		if ((iocb->ki_flags & IOCB_NOWAIT)) {
> +		if (iocb->ki_flags & IOCB_NOWAIT) {
>  			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
>  						iov_iter_count(from)) ||
>  				f2fs_has_inline_data(inode) ||
>  				f2fs_force_buffered_io(inode, iocb, from)) {
> -				clear_inode_flag(inode, FI_NO_PREALLOC);
> -				inode_unlock(inode);
>  				ret = -EAGAIN;
> -				goto out;
> +				goto out_unlock;
>  			}
> -			goto write;
>  		}
> -
> -		if (is_inode_flag_set(inode, FI_NO_PREALLOC))
> -			goto write;
> -
>  		if (iocb->ki_flags & IOCB_DIRECT) {
>  			/*
>  			 * Convert inline data for Direct I/O before entering
>  			 * f2fs_direct_IO().
>  			 */
> -			err = f2fs_convert_inline_inode(inode);
> -			if (err)
> -				goto out_err;
> -			/*
> -			 * If force_buffere_io() is true, we have to allocate
> -			 * blocks all the time, since f2fs_direct_IO will fall
> -			 * back to buffered IO.
> -			 */
> -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> -				goto write;
> +			ret = f2fs_convert_inline_inode(inode);
> +			if (ret)
> +				goto out_unlock;
>  		}
> -		preallocated = true;
> -		target_size = iocb->ki_pos + iov_iter_count(from);
>  
> -		err = f2fs_preallocate_blocks(iocb, from);
> -		if (err) {
> -out_err:
> -			clear_inode_flag(inode, FI_NO_PREALLOC);
> -			inode_unlock(inode);
> -			ret = err;
> -			goto out;
> +		/* Possibly preallocate the blocks for the write. */
> +		target_size = iocb->ki_pos + iov_iter_count(from);
> +		preallocated = f2fs_preallocate_blocks(iocb, from);
> +		if (preallocated < 0) {
> +			ret = preallocated;
> +			goto out_unlock;
>  		}
> -write:
> +
>  		ret = __generic_file_write_iter(iocb, from);
> -		clear_inode_flag(inode, FI_NO_PREALLOC);
>  
> -		/* if we couldn't write data, we should deallocate blocks. */
> -		if (preallocated && i_size_read(inode) < target_size) {
> +		/* Don't leave any preallocated blocks around past i_size. */
> +		if (preallocated > 0 && inode->i_size < target_size) {
>  			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>  			down_write(&F2FS_I(inode)->i_mmap_sem);
>  			f2fs_truncate(inode);
>  			up_write(&F2FS_I(inode)->i_mmap_sem);
>  			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>  		}
> +		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
>  
>  		if (ret > 0)
>  			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
>  	}
> -unlock:
> +out_unlock:
>  	inode_unlock(inode);
>  out:
>  	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
> -- 
> 2.32.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-25 15:35     ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-25 15:35 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

Note that, this patch is failing generic/250.

On 07/16, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> f2fs_write_begin() assumes that all blocks were preallocated by
> default unless FI_NO_PREALLOC is explicitly set.  This invites data
> corruption, as there are cases in which not all blocks are preallocated.
> Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> buffered_io") fixed one case, but there are others remaining.
> 
> Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which
> only gets set if all blocks for the current write were preallocated.
> 
> Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it
> handle some of the logic that was previously in write_iter() directly.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/f2fs/data.c |  55 ++--------------------
>  fs/f2fs/f2fs.h |   3 +-
>  fs/f2fs/file.c | 123 ++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 84 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 18cb28a514e6..cdadaa9daf55 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1370,53 +1370,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
>  	return 0;
>  }
>  
> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
> -{
> -	struct inode *inode = file_inode(iocb->ki_filp);
> -	struct f2fs_map_blocks map;
> -	int flag;
> -	int err = 0;
> -	bool direct_io = iocb->ki_flags & IOCB_DIRECT;
> -
> -	map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
> -	map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
> -	if (map.m_len > map.m_lblk)
> -		map.m_len -= map.m_lblk;
> -	else
> -		map.m_len = 0;
> -
> -	map.m_next_pgofs = NULL;
> -	map.m_next_extent = NULL;
> -	map.m_seg_type = NO_CHECK_TYPE;
> -	map.m_may_create = true;
> -
> -	if (direct_io) {
> -		map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
> -		flag = f2fs_force_buffered_io(inode, iocb, from) ?
> -					F2FS_GET_BLOCK_PRE_AIO :
> -					F2FS_GET_BLOCK_PRE_DIO;
> -		goto map_blocks;
> -	}
> -	if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
> -		err = f2fs_convert_inline_inode(inode);
> -		if (err)
> -			return err;
> -	}
> -	if (f2fs_has_inline_data(inode))
> -		return err;
> -
> -	flag = F2FS_GET_BLOCK_PRE_AIO;
> -
> -map_blocks:
> -	err = f2fs_map_blocks(inode, &map, 1, flag);
> -	if (map.m_len > 0 && err == -ENOSPC) {
> -		if (!direct_io)
> -			set_inode_flag(inode, FI_NO_PREALLOC);
> -		err = 0;
> -	}
> -	return err;
> -}
> -
>  void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
>  {
>  	if (flag == F2FS_GET_BLOCK_PRE_AIO) {
> @@ -3210,12 +3163,10 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
>  	int flag;
>  
>  	/*
> -	 * we already allocated all the blocks, so we don't need to get
> -	 * the block addresses when there is no need to fill the page.
> +	 * If a whole page is being written and we already preallocated all the
> +	 * blocks, then there is no need to get a block address now.
>  	 */
> -	if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
> -	    !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
> -	    !f2fs_verity_in_progress(inode))
> +	if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
>  		return 0;
>  
>  	/* f2fs_lock_op avoids race between write CP and convert_inline_page */
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index ad7c1b94e23a..da1da3111f18 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -699,7 +699,7 @@ enum {
>  	FI_INLINE_DOTS,		/* indicate inline dot dentries */
>  	FI_DO_DEFRAG,		/* indicate defragment is running */
>  	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
> -	FI_NO_PREALLOC,		/* indicate skipped preallocated blocks */
> +	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
>  	FI_HOT_DATA,		/* indicate file is hot */
>  	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
>  	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
> @@ -3604,7 +3604,6 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
>  int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
>  int f2fs_reserve_new_block(struct dnode_of_data *dn);
>  int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
>  int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
>  struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
>  			int op_flags, bool for_write);
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index b1cb5b50faac..9b12004e78c6 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -4218,10 +4218,72 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  	return ret;
>  }
>  
> +/*
> + * Preallocate blocks for a write request, if it is possible and helpful to do
> + * so.  Returns a positive number if blocks may have been preallocated, 0 if no
> + * blocks were preallocated, or a negative errno value if something went
> + * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
> + * requested blocks (not just some of them) have been allocated.
> + */
> +static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
> +{
> +	struct inode *inode = file_inode(iocb->ki_filp);
> +	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +	const loff_t pos = iocb->ki_pos;
> +	const size_t count = iov_iter_count(iter);
> +	struct f2fs_map_blocks map = {};
> +	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
> +		   !f2fs_force_buffered_io(inode, iocb, iter);
> +	int flag;
> +	int ret;
> +
> +	/* If it will be an in-place direct write, don't bother. */
> +	if (dio && !f2fs_lfs_mode(sbi))
> +		return 0;
> +
> +	/* No-wait I/O can't allocate blocks. */
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		return 0;
> +
> +	/* If it will be a short write, don't bother. */
> +	if (iov_iter_fault_in_readable(iter, count) != 0)
> +		return 0;
> +
> +	if (f2fs_has_inline_data(inode)) {
> +		/* If the data will fit inline, don't bother. */
> +		if (pos + count <= MAX_INLINE_DATA(inode))
> +			return 0;
> +		ret = f2fs_convert_inline_inode(inode);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	map.m_lblk = (pos >> inode->i_blkbits);
> +	map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
> +	map.m_may_create = true;
> +	if (dio) {
> +		map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
> +		flag = F2FS_GET_BLOCK_PRE_DIO;
> +	} else {
> +		map.m_seg_type = NO_CHECK_TYPE;
> +		flag = F2FS_GET_BLOCK_PRE_AIO;
> +	}
> +
> +	ret = f2fs_map_blocks(inode, &map, 1, flag);
> +	/* -ENOSPC is only a fatal error if no blocks could be allocated. */
> +	if (ret < 0 && !(ret == -ENOSPC && map.m_len > 0))
> +		return ret;
> +	if (ret == 0)
> +		set_inode_flag(inode, FI_PREALLOCATED_ALL);
> +	return map.m_len;
> +}
> +
>  static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file_inode(file);
> +	loff_t target_size;
> +	int preallocated;
>  	ssize_t ret;
>  
>  	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
> @@ -4245,84 +4307,59 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  
>  	if (unlikely(IS_IMMUTABLE(inode))) {
>  		ret = -EPERM;
> -		goto unlock;
> +		goto out_unlock;
>  	}
>  
>  	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
>  		ret = -EPERM;
> -		goto unlock;
> +		goto out_unlock;
>  	}
>  
>  	ret = generic_write_checks(iocb, from);
>  	if (ret > 0) {
> -		bool preallocated = false;
> -		size_t target_size = 0;
> -		int err;
> -
> -		if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
> -			set_inode_flag(inode, FI_NO_PREALLOC);
> -
> -		if ((iocb->ki_flags & IOCB_NOWAIT)) {
> +		if (iocb->ki_flags & IOCB_NOWAIT) {
>  			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
>  						iov_iter_count(from)) ||
>  				f2fs_has_inline_data(inode) ||
>  				f2fs_force_buffered_io(inode, iocb, from)) {
> -				clear_inode_flag(inode, FI_NO_PREALLOC);
> -				inode_unlock(inode);
>  				ret = -EAGAIN;
> -				goto out;
> +				goto out_unlock;
>  			}
> -			goto write;
>  		}
> -
> -		if (is_inode_flag_set(inode, FI_NO_PREALLOC))
> -			goto write;
> -
>  		if (iocb->ki_flags & IOCB_DIRECT) {
>  			/*
>  			 * Convert inline data for Direct I/O before entering
>  			 * f2fs_direct_IO().
>  			 */
> -			err = f2fs_convert_inline_inode(inode);
> -			if (err)
> -				goto out_err;
> -			/*
> -			 * If force_buffere_io() is true, we have to allocate
> -			 * blocks all the time, since f2fs_direct_IO will fall
> -			 * back to buffered IO.
> -			 */
> -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> -				goto write;
> +			ret = f2fs_convert_inline_inode(inode);
> +			if (ret)
> +				goto out_unlock;
>  		}
> -		preallocated = true;
> -		target_size = iocb->ki_pos + iov_iter_count(from);
>  
> -		err = f2fs_preallocate_blocks(iocb, from);
> -		if (err) {
> -out_err:
> -			clear_inode_flag(inode, FI_NO_PREALLOC);
> -			inode_unlock(inode);
> -			ret = err;
> -			goto out;
> +		/* Possibly preallocate the blocks for the write. */
> +		target_size = iocb->ki_pos + iov_iter_count(from);
> +		preallocated = f2fs_preallocate_blocks(iocb, from);
> +		if (preallocated < 0) {
> +			ret = preallocated;
> +			goto out_unlock;
>  		}
> -write:
> +
>  		ret = __generic_file_write_iter(iocb, from);
> -		clear_inode_flag(inode, FI_NO_PREALLOC);
>  
> -		/* if we couldn't write data, we should deallocate blocks. */
> -		if (preallocated && i_size_read(inode) < target_size) {
> +		/* Don't leave any preallocated blocks around past i_size. */
> +		if (preallocated > 0 && inode->i_size < target_size) {
>  			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>  			down_write(&F2FS_I(inode)->i_mmap_sem);
>  			f2fs_truncate(inode);
>  			up_write(&F2FS_I(inode)->i_mmap_sem);
>  			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>  		}
> +		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
>  
>  		if (ret > 0)
>  			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
>  	}
> -unlock:
> +out_unlock:
>  	inode_unlock(inode);
>  out:
>  	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
> -- 
> 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-25 15:35     ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-25 15:47       ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-25 15:47 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-f2fs-devel, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/25, Jaegeuk Kim wrote:
> Note that, this patch is failing generic/250.

correction: it's failing in 4.14 and 4.19 after simple cherry-pick, but
giving no failure on 5.4, 5.10, and mainline.

> 
> On 07/16, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > f2fs_write_begin() assumes that all blocks were preallocated by
> > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > corruption, as there are cases in which not all blocks are preallocated.
> > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > buffered_io") fixed one case, but there are others remaining.
> > 
> > Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which
> > only gets set if all blocks for the current write were preallocated.
> > 
> > Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it
> > handle some of the logic that was previously in write_iter() directly.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  fs/f2fs/data.c |  55 ++--------------------
> >  fs/f2fs/f2fs.h |   3 +-
> >  fs/f2fs/file.c | 123 ++++++++++++++++++++++++++++++++-----------------
> >  3 files changed, 84 insertions(+), 97 deletions(-)
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 18cb28a514e6..cdadaa9daf55 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -1370,53 +1370,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
> >  	return 0;
> >  }
> >  
> > -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
> > -{
> > -	struct inode *inode = file_inode(iocb->ki_filp);
> > -	struct f2fs_map_blocks map;
> > -	int flag;
> > -	int err = 0;
> > -	bool direct_io = iocb->ki_flags & IOCB_DIRECT;
> > -
> > -	map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
> > -	map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
> > -	if (map.m_len > map.m_lblk)
> > -		map.m_len -= map.m_lblk;
> > -	else
> > -		map.m_len = 0;
> > -
> > -	map.m_next_pgofs = NULL;
> > -	map.m_next_extent = NULL;
> > -	map.m_seg_type = NO_CHECK_TYPE;
> > -	map.m_may_create = true;
> > -
> > -	if (direct_io) {
> > -		map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
> > -		flag = f2fs_force_buffered_io(inode, iocb, from) ?
> > -					F2FS_GET_BLOCK_PRE_AIO :
> > -					F2FS_GET_BLOCK_PRE_DIO;
> > -		goto map_blocks;
> > -	}
> > -	if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
> > -		err = f2fs_convert_inline_inode(inode);
> > -		if (err)
> > -			return err;
> > -	}
> > -	if (f2fs_has_inline_data(inode))
> > -		return err;
> > -
> > -	flag = F2FS_GET_BLOCK_PRE_AIO;
> > -
> > -map_blocks:
> > -	err = f2fs_map_blocks(inode, &map, 1, flag);
> > -	if (map.m_len > 0 && err == -ENOSPC) {
> > -		if (!direct_io)
> > -			set_inode_flag(inode, FI_NO_PREALLOC);
> > -		err = 0;
> > -	}
> > -	return err;
> > -}
> > -
> >  void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
> >  {
> >  	if (flag == F2FS_GET_BLOCK_PRE_AIO) {
> > @@ -3210,12 +3163,10 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
> >  	int flag;
> >  
> >  	/*
> > -	 * we already allocated all the blocks, so we don't need to get
> > -	 * the block addresses when there is no need to fill the page.
> > +	 * If a whole page is being written and we already preallocated all the
> > +	 * blocks, then there is no need to get a block address now.
> >  	 */
> > -	if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
> > -	    !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
> > -	    !f2fs_verity_in_progress(inode))
> > +	if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
> >  		return 0;
> >  
> >  	/* f2fs_lock_op avoids race between write CP and convert_inline_page */
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index ad7c1b94e23a..da1da3111f18 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -699,7 +699,7 @@ enum {
> >  	FI_INLINE_DOTS,		/* indicate inline dot dentries */
> >  	FI_DO_DEFRAG,		/* indicate defragment is running */
> >  	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
> > -	FI_NO_PREALLOC,		/* indicate skipped preallocated blocks */
> > +	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
> >  	FI_HOT_DATA,		/* indicate file is hot */
> >  	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
> >  	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
> > @@ -3604,7 +3604,6 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
> >  int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
> >  int f2fs_reserve_new_block(struct dnode_of_data *dn);
> >  int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
> > -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
> >  int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
> >  struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
> >  			int op_flags, bool for_write);
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index b1cb5b50faac..9b12004e78c6 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -4218,10 +4218,72 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> >  	return ret;
> >  }
> >  
> > +/*
> > + * Preallocate blocks for a write request, if it is possible and helpful to do
> > + * so.  Returns a positive number if blocks may have been preallocated, 0 if no
> > + * blocks were preallocated, or a negative errno value if something went
> > + * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
> > + * requested blocks (not just some of them) have been allocated.
> > + */
> > +static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
> > +{
> > +	struct inode *inode = file_inode(iocb->ki_filp);
> > +	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > +	const loff_t pos = iocb->ki_pos;
> > +	const size_t count = iov_iter_count(iter);
> > +	struct f2fs_map_blocks map = {};
> > +	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
> > +		   !f2fs_force_buffered_io(inode, iocb, iter);
> > +	int flag;
> > +	int ret;
> > +
> > +	/* If it will be an in-place direct write, don't bother. */
> > +	if (dio && !f2fs_lfs_mode(sbi))
> > +		return 0;
> > +
> > +	/* No-wait I/O can't allocate blocks. */
> > +	if (iocb->ki_flags & IOCB_NOWAIT)
> > +		return 0;
> > +
> > +	/* If it will be a short write, don't bother. */
> > +	if (iov_iter_fault_in_readable(iter, count) != 0)
> > +		return 0;
> > +
> > +	if (f2fs_has_inline_data(inode)) {
> > +		/* If the data will fit inline, don't bother. */
> > +		if (pos + count <= MAX_INLINE_DATA(inode))
> > +			return 0;
> > +		ret = f2fs_convert_inline_inode(inode);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	map.m_lblk = (pos >> inode->i_blkbits);
> > +	map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
> > +	map.m_may_create = true;
> > +	if (dio) {
> > +		map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
> > +		flag = F2FS_GET_BLOCK_PRE_DIO;
> > +	} else {
> > +		map.m_seg_type = NO_CHECK_TYPE;
> > +		flag = F2FS_GET_BLOCK_PRE_AIO;
> > +	}
> > +
> > +	ret = f2fs_map_blocks(inode, &map, 1, flag);
> > +	/* -ENOSPC is only a fatal error if no blocks could be allocated. */
> > +	if (ret < 0 && !(ret == -ENOSPC && map.m_len > 0))
> > +		return ret;
> > +	if (ret == 0)
> > +		set_inode_flag(inode, FI_PREALLOCATED_ALL);
> > +	return map.m_len;
> > +}
> > +
> >  static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  {
> >  	struct file *file = iocb->ki_filp;
> >  	struct inode *inode = file_inode(file);
> > +	loff_t target_size;
> > +	int preallocated;
> >  	ssize_t ret;
> >  
> >  	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
> > @@ -4245,84 +4307,59 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  
> >  	if (unlikely(IS_IMMUTABLE(inode))) {
> >  		ret = -EPERM;
> > -		goto unlock;
> > +		goto out_unlock;
> >  	}
> >  
> >  	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
> >  		ret = -EPERM;
> > -		goto unlock;
> > +		goto out_unlock;
> >  	}
> >  
> >  	ret = generic_write_checks(iocb, from);
> >  	if (ret > 0) {
> > -		bool preallocated = false;
> > -		size_t target_size = 0;
> > -		int err;
> > -
> > -		if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
> > -			set_inode_flag(inode, FI_NO_PREALLOC);
> > -
> > -		if ((iocb->ki_flags & IOCB_NOWAIT)) {
> > +		if (iocb->ki_flags & IOCB_NOWAIT) {
> >  			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
> >  						iov_iter_count(from)) ||
> >  				f2fs_has_inline_data(inode) ||
> >  				f2fs_force_buffered_io(inode, iocb, from)) {
> > -				clear_inode_flag(inode, FI_NO_PREALLOC);
> > -				inode_unlock(inode);
> >  				ret = -EAGAIN;
> > -				goto out;
> > +				goto out_unlock;
> >  			}
> > -			goto write;
> >  		}
> > -
> > -		if (is_inode_flag_set(inode, FI_NO_PREALLOC))
> > -			goto write;
> > -
> >  		if (iocb->ki_flags & IOCB_DIRECT) {
> >  			/*
> >  			 * Convert inline data for Direct I/O before entering
> >  			 * f2fs_direct_IO().
> >  			 */
> > -			err = f2fs_convert_inline_inode(inode);
> > -			if (err)
> > -				goto out_err;
> > -			/*
> > -			 * If force_buffere_io() is true, we have to allocate
> > -			 * blocks all the time, since f2fs_direct_IO will fall
> > -			 * back to buffered IO.
> > -			 */
> > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > -				goto write;
> > +			ret = f2fs_convert_inline_inode(inode);
> > +			if (ret)
> > +				goto out_unlock;
> >  		}
> > -		preallocated = true;
> > -		target_size = iocb->ki_pos + iov_iter_count(from);
> >  
> > -		err = f2fs_preallocate_blocks(iocb, from);
> > -		if (err) {
> > -out_err:
> > -			clear_inode_flag(inode, FI_NO_PREALLOC);
> > -			inode_unlock(inode);
> > -			ret = err;
> > -			goto out;
> > +		/* Possibly preallocate the blocks for the write. */
> > +		target_size = iocb->ki_pos + iov_iter_count(from);
> > +		preallocated = f2fs_preallocate_blocks(iocb, from);
> > +		if (preallocated < 0) {
> > +			ret = preallocated;
> > +			goto out_unlock;
> >  		}
> > -write:
> > +
> >  		ret = __generic_file_write_iter(iocb, from);
> > -		clear_inode_flag(inode, FI_NO_PREALLOC);
> >  
> > -		/* if we couldn't write data, we should deallocate blocks. */
> > -		if (preallocated && i_size_read(inode) < target_size) {
> > +		/* Don't leave any preallocated blocks around past i_size. */
> > +		if (preallocated > 0 && inode->i_size < target_size) {
> >  			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
> >  			down_write(&F2FS_I(inode)->i_mmap_sem);
> >  			f2fs_truncate(inode);
> >  			up_write(&F2FS_I(inode)->i_mmap_sem);
> >  			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
> >  		}
> > +		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
> >  
> >  		if (ret > 0)
> >  			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
> >  	}
> > -unlock:
> > +out_unlock:
> >  	inode_unlock(inode);
> >  out:
> >  	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
> > -- 
> > 2.32.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-25 15:47       ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-25 15:47 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

On 07/25, Jaegeuk Kim wrote:
> Note that, this patch is failing generic/250.

correction: it's failing in 4.14 and 4.19 after simple cherry-pick, but
giving no failure on 5.4, 5.10, and mainline.

> 
> On 07/16, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > f2fs_write_begin() assumes that all blocks were preallocated by
> > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > corruption, as there are cases in which not all blocks are preallocated.
> > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > buffered_io") fixed one case, but there are others remaining.
> > 
> > Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which
> > only gets set if all blocks for the current write were preallocated.
> > 
> > Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it
> > handle some of the logic that was previously in write_iter() directly.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  fs/f2fs/data.c |  55 ++--------------------
> >  fs/f2fs/f2fs.h |   3 +-
> >  fs/f2fs/file.c | 123 ++++++++++++++++++++++++++++++++-----------------
> >  3 files changed, 84 insertions(+), 97 deletions(-)
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 18cb28a514e6..cdadaa9daf55 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -1370,53 +1370,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
> >  	return 0;
> >  }
> >  
> > -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
> > -{
> > -	struct inode *inode = file_inode(iocb->ki_filp);
> > -	struct f2fs_map_blocks map;
> > -	int flag;
> > -	int err = 0;
> > -	bool direct_io = iocb->ki_flags & IOCB_DIRECT;
> > -
> > -	map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
> > -	map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
> > -	if (map.m_len > map.m_lblk)
> > -		map.m_len -= map.m_lblk;
> > -	else
> > -		map.m_len = 0;
> > -
> > -	map.m_next_pgofs = NULL;
> > -	map.m_next_extent = NULL;
> > -	map.m_seg_type = NO_CHECK_TYPE;
> > -	map.m_may_create = true;
> > -
> > -	if (direct_io) {
> > -		map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
> > -		flag = f2fs_force_buffered_io(inode, iocb, from) ?
> > -					F2FS_GET_BLOCK_PRE_AIO :
> > -					F2FS_GET_BLOCK_PRE_DIO;
> > -		goto map_blocks;
> > -	}
> > -	if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
> > -		err = f2fs_convert_inline_inode(inode);
> > -		if (err)
> > -			return err;
> > -	}
> > -	if (f2fs_has_inline_data(inode))
> > -		return err;
> > -
> > -	flag = F2FS_GET_BLOCK_PRE_AIO;
> > -
> > -map_blocks:
> > -	err = f2fs_map_blocks(inode, &map, 1, flag);
> > -	if (map.m_len > 0 && err == -ENOSPC) {
> > -		if (!direct_io)
> > -			set_inode_flag(inode, FI_NO_PREALLOC);
> > -		err = 0;
> > -	}
> > -	return err;
> > -}
> > -
> >  void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
> >  {
> >  	if (flag == F2FS_GET_BLOCK_PRE_AIO) {
> > @@ -3210,12 +3163,10 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
> >  	int flag;
> >  
> >  	/*
> > -	 * we already allocated all the blocks, so we don't need to get
> > -	 * the block addresses when there is no need to fill the page.
> > +	 * If a whole page is being written and we already preallocated all the
> > +	 * blocks, then there is no need to get a block address now.
> >  	 */
> > -	if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
> > -	    !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
> > -	    !f2fs_verity_in_progress(inode))
> > +	if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
> >  		return 0;
> >  
> >  	/* f2fs_lock_op avoids race between write CP and convert_inline_page */
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index ad7c1b94e23a..da1da3111f18 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -699,7 +699,7 @@ enum {
> >  	FI_INLINE_DOTS,		/* indicate inline dot dentries */
> >  	FI_DO_DEFRAG,		/* indicate defragment is running */
> >  	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
> > -	FI_NO_PREALLOC,		/* indicate skipped preallocated blocks */
> > +	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
> >  	FI_HOT_DATA,		/* indicate file is hot */
> >  	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
> >  	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
> > @@ -3604,7 +3604,6 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
> >  int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
> >  int f2fs_reserve_new_block(struct dnode_of_data *dn);
> >  int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
> > -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
> >  int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
> >  struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
> >  			int op_flags, bool for_write);
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index b1cb5b50faac..9b12004e78c6 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -4218,10 +4218,72 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> >  	return ret;
> >  }
> >  
> > +/*
> > + * Preallocate blocks for a write request, if it is possible and helpful to do
> > + * so.  Returns a positive number if blocks may have been preallocated, 0 if no
> > + * blocks were preallocated, or a negative errno value if something went
> > + * seriously wrong.  Also sets FI_PREALLOCATED_ALL on the inode if *all* the
> > + * requested blocks (not just some of them) have been allocated.
> > + */
> > +static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
> > +{
> > +	struct inode *inode = file_inode(iocb->ki_filp);
> > +	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > +	const loff_t pos = iocb->ki_pos;
> > +	const size_t count = iov_iter_count(iter);
> > +	struct f2fs_map_blocks map = {};
> > +	bool dio = (iocb->ki_flags & IOCB_DIRECT) &&
> > +		   !f2fs_force_buffered_io(inode, iocb, iter);
> > +	int flag;
> > +	int ret;
> > +
> > +	/* If it will be an in-place direct write, don't bother. */
> > +	if (dio && !f2fs_lfs_mode(sbi))
> > +		return 0;
> > +
> > +	/* No-wait I/O can't allocate blocks. */
> > +	if (iocb->ki_flags & IOCB_NOWAIT)
> > +		return 0;
> > +
> > +	/* If it will be a short write, don't bother. */
> > +	if (iov_iter_fault_in_readable(iter, count) != 0)
> > +		return 0;
> > +
> > +	if (f2fs_has_inline_data(inode)) {
> > +		/* If the data will fit inline, don't bother. */
> > +		if (pos + count <= MAX_INLINE_DATA(inode))
> > +			return 0;
> > +		ret = f2fs_convert_inline_inode(inode);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	map.m_lblk = (pos >> inode->i_blkbits);
> > +	map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
> > +	map.m_may_create = true;
> > +	if (dio) {
> > +		map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
> > +		flag = F2FS_GET_BLOCK_PRE_DIO;
> > +	} else {
> > +		map.m_seg_type = NO_CHECK_TYPE;
> > +		flag = F2FS_GET_BLOCK_PRE_AIO;
> > +	}
> > +
> > +	ret = f2fs_map_blocks(inode, &map, 1, flag);
> > +	/* -ENOSPC is only a fatal error if no blocks could be allocated. */
> > +	if (ret < 0 && !(ret == -ENOSPC && map.m_len > 0))
> > +		return ret;
> > +	if (ret == 0)
> > +		set_inode_flag(inode, FI_PREALLOCATED_ALL);
> > +	return map.m_len;
> > +}
> > +
> >  static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  {
> >  	struct file *file = iocb->ki_filp;
> >  	struct inode *inode = file_inode(file);
> > +	loff_t target_size;
> > +	int preallocated;
> >  	ssize_t ret;
> >  
> >  	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
> > @@ -4245,84 +4307,59 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  
> >  	if (unlikely(IS_IMMUTABLE(inode))) {
> >  		ret = -EPERM;
> > -		goto unlock;
> > +		goto out_unlock;
> >  	}
> >  
> >  	if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
> >  		ret = -EPERM;
> > -		goto unlock;
> > +		goto out_unlock;
> >  	}
> >  
> >  	ret = generic_write_checks(iocb, from);
> >  	if (ret > 0) {
> > -		bool preallocated = false;
> > -		size_t target_size = 0;
> > -		int err;
> > -
> > -		if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
> > -			set_inode_flag(inode, FI_NO_PREALLOC);
> > -
> > -		if ((iocb->ki_flags & IOCB_NOWAIT)) {
> > +		if (iocb->ki_flags & IOCB_NOWAIT) {
> >  			if (!f2fs_overwrite_io(inode, iocb->ki_pos,
> >  						iov_iter_count(from)) ||
> >  				f2fs_has_inline_data(inode) ||
> >  				f2fs_force_buffered_io(inode, iocb, from)) {
> > -				clear_inode_flag(inode, FI_NO_PREALLOC);
> > -				inode_unlock(inode);
> >  				ret = -EAGAIN;
> > -				goto out;
> > +				goto out_unlock;
> >  			}
> > -			goto write;
> >  		}
> > -
> > -		if (is_inode_flag_set(inode, FI_NO_PREALLOC))
> > -			goto write;
> > -
> >  		if (iocb->ki_flags & IOCB_DIRECT) {
> >  			/*
> >  			 * Convert inline data for Direct I/O before entering
> >  			 * f2fs_direct_IO().
> >  			 */
> > -			err = f2fs_convert_inline_inode(inode);
> > -			if (err)
> > -				goto out_err;
> > -			/*
> > -			 * If force_buffere_io() is true, we have to allocate
> > -			 * blocks all the time, since f2fs_direct_IO will fall
> > -			 * back to buffered IO.
> > -			 */
> > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > -				goto write;
> > +			ret = f2fs_convert_inline_inode(inode);
> > +			if (ret)
> > +				goto out_unlock;
> >  		}
> > -		preallocated = true;
> > -		target_size = iocb->ki_pos + iov_iter_count(from);
> >  
> > -		err = f2fs_preallocate_blocks(iocb, from);
> > -		if (err) {
> > -out_err:
> > -			clear_inode_flag(inode, FI_NO_PREALLOC);
> > -			inode_unlock(inode);
> > -			ret = err;
> > -			goto out;
> > +		/* Possibly preallocate the blocks for the write. */
> > +		target_size = iocb->ki_pos + iov_iter_count(from);
> > +		preallocated = f2fs_preallocate_blocks(iocb, from);
> > +		if (preallocated < 0) {
> > +			ret = preallocated;
> > +			goto out_unlock;
> >  		}
> > -write:
> > +
> >  		ret = __generic_file_write_iter(iocb, from);
> > -		clear_inode_flag(inode, FI_NO_PREALLOC);
> >  
> > -		/* if we couldn't write data, we should deallocate blocks. */
> > -		if (preallocated && i_size_read(inode) < target_size) {
> > +		/* Don't leave any preallocated blocks around past i_size. */
> > +		if (preallocated > 0 && inode->i_size < target_size) {
> >  			down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
> >  			down_write(&F2FS_I(inode)->i_mmap_sem);
> >  			f2fs_truncate(inode);
> >  			up_write(&F2FS_I(inode)->i_mmap_sem);
> >  			up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
> >  		}
> > +		clear_inode_flag(inode, FI_PREALLOCATED_ALL);
> >  
> >  		if (ret > 0)
> >  			f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
> >  	}
> > -unlock:
> > +out_unlock:
> >  	inode_unlock(inode);
> >  out:
> >  	trace_f2fs_file_write_iter(inode, iocb->ki_pos,
> > -- 
> > 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-25 10:50     ` [f2fs-dev] " Chao Yu
@ 2021-07-25 17:57       ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-25 17:57 UTC (permalink / raw)
  To: Chao Yu
  Cc: linux-f2fs-devel, Jaegeuk Kim, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
> On 2021/7/16 22:39, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > f2fs_write_begin() assumes that all blocks were preallocated by
> > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > corruption, as there are cases in which not all blocks are preallocated.
> > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > buffered_io") fixed one case, but there are others remaining.
> 
> Could you please explain which cases we missed to handle previously?
> then I can check those related logic before and after the rework.

Any case where a buffered write happens while not all blocks were preallocated
but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
the preallocations for a direct write that will fall back to a buffered write,
e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.

> 
> > -			/*
> > -			 * If force_buffere_io() is true, we have to allocate
> > -			 * blocks all the time, since f2fs_direct_IO will fall
> > -			 * back to buffered IO.
> > -			 */
> > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > -				goto write;
> 
> We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
> will always allocate two block addresses for each 4k append IO.
> 
> I jsut test based on codes of last f2fs dev-test branch.

Yes, I had misread that due to the weird goto and misleading comment and
translated it into:

        /* If it will be an in-place direct write, don't bother. */
        if (dio && !f2fs_lfs_mode(sbi))
                return 0;

It should be:

        if (dio && f2fs_lfs_mode(sbi))
                return 0;

Do you have a proper explanation for why preallocations shouldn't be done in
this case?  Note that preallocations are still done for buffered writes, which
may be out-of-place as well; how are those different?

- Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-25 17:57       ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-25 17:57 UTC (permalink / raw)
  To: Chao Yu
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
> On 2021/7/16 22:39, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > f2fs_write_begin() assumes that all blocks were preallocated by
> > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > corruption, as there are cases in which not all blocks are preallocated.
> > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > buffered_io") fixed one case, but there are others remaining.
> 
> Could you please explain which cases we missed to handle previously?
> then I can check those related logic before and after the rework.

Any case where a buffered write happens while not all blocks were preallocated
but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
the preallocations for a direct write that will fall back to a buffered write,
e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.

> 
> > -			/*
> > -			 * If force_buffere_io() is true, we have to allocate
> > -			 * blocks all the time, since f2fs_direct_IO will fall
> > -			 * back to buffered IO.
> > -			 */
> > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > -				goto write;
> 
> We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
> will always allocate two block addresses for each 4k append IO.
> 
> I jsut test based on codes of last f2fs dev-test branch.

Yes, I had misread that due to the weird goto and misleading comment and
translated it into:

        /* If it will be an in-place direct write, don't bother. */
        if (dio && !f2fs_lfs_mode(sbi))
                return 0;

It should be:

        if (dio && f2fs_lfs_mode(sbi))
                return 0;

Do you have a proper explanation for why preallocations shouldn't be done in
this case?  Note that preallocations are still done for buffered writes, which
may be out-of-place as well; how are those different?

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-25 15:47       ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-25 18:01         ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-25 18:01 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: linux-f2fs-devel, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Sun, Jul 25, 2021 at 08:47:51AM -0700, Jaegeuk Kim wrote:
> On 07/25, Jaegeuk Kim wrote:
> > Note that, this patch is failing generic/250.
> 
> correction: it's failing in 4.14 and 4.19 after simple cherry-pick, but
> giving no failure on 5.4, 5.10, and mainline.
> 

For me, generic/250 fails on both mainline and f2fs/dev without my changes.
So it isn't a regression.

- Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-25 18:01         ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-25 18:01 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

On Sun, Jul 25, 2021 at 08:47:51AM -0700, Jaegeuk Kim wrote:
> On 07/25, Jaegeuk Kim wrote:
> > Note that, this patch is failing generic/250.
> 
> correction: it's failing in 4.14 and 4.19 after simple cherry-pick, but
> giving no failure on 5.4, 5.10, and mainline.
> 

For me, generic/250 fails on both mainline and f2fs/dev without my changes.
So it isn't a regression.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-25 18:01         ` [f2fs-dev] " Eric Biggers
@ 2021-07-26 19:04           ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-26 19:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-f2fs-devel, Chao Yu, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/25, Eric Biggers wrote:
> On Sun, Jul 25, 2021 at 08:47:51AM -0700, Jaegeuk Kim wrote:
> > On 07/25, Jaegeuk Kim wrote:
> > > Note that, this patch is failing generic/250.
> > 
> > correction: it's failing in 4.14 and 4.19 after simple cherry-pick, but
> > giving no failure on 5.4, 5.10, and mainline.
> > 
> 
> For me, generic/250 fails on both mainline and f2fs/dev without my changes.
> So it isn't a regression.

fyi; I had to change 250 to pass like this. I'm digging the patch.
https://github.com/jaegeuk/xfstests-f2fs/commit/99c11b6550a2a24f831018d2e019eed86e517d44.

> 
> - Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-26 19:04           ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-26 19:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

On 07/25, Eric Biggers wrote:
> On Sun, Jul 25, 2021 at 08:47:51AM -0700, Jaegeuk Kim wrote:
> > On 07/25, Jaegeuk Kim wrote:
> > > Note that, this patch is failing generic/250.
> > 
> > correction: it's failing in 4.14 and 4.19 after simple cherry-pick, but
> > giving no failure on 5.4, 5.10, and mainline.
> > 
> 
> For me, generic/250 fails on both mainline and f2fs/dev without my changes.
> So it isn't a regression.

fyi; I had to change 250 to pass like this. I'm digging the patch.
https://github.com/jaegeuk/xfstests-f2fs/commit/99c11b6550a2a24f831018d2e019eed86e517d44.

> 
> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-25 17:57       ` [f2fs-dev] " Eric Biggers
@ 2021-07-27  2:00         ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-27  2:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Chao Yu, linux-f2fs-devel, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/25, Eric Biggers wrote:
> On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
> > On 2021/7/16 22:39, Eric Biggers wrote:
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > f2fs_write_begin() assumes that all blocks were preallocated by
> > > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > > corruption, as there are cases in which not all blocks are preallocated.
> > > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > > buffered_io") fixed one case, but there are others remaining.
> > 
> > Could you please explain which cases we missed to handle previously?
> > then I can check those related logic before and after the rework.
> 
> Any case where a buffered write happens while not all blocks were preallocated
> but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
> the preallocations for a direct write that will fall back to a buffered write,
> e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.
> 
> > 
> > > -			/*
> > > -			 * If force_buffere_io() is true, we have to allocate
> > > -			 * blocks all the time, since f2fs_direct_IO will fall
> > > -			 * back to buffered IO.
> > > -			 */
> > > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > > -				goto write;
> > 
> > We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
> > will always allocate two block addresses for each 4k append IO.
> > 
> > I jsut test based on codes of last f2fs dev-test branch.
> 
> Yes, I had misread that due to the weird goto and misleading comment and
> translated it into:
> 
>         /* If it will be an in-place direct write, don't bother. */
>         if (dio && !f2fs_lfs_mode(sbi))
>                 return 0;
> 
> It should be:
> 
>         if (dio && f2fs_lfs_mode(sbi))
>                 return 0;

Hmm, this addresses my 250 failure. And, I think the below commit can explain
the case.

commit 47501f87c61ad2aa234add63e1ae231521dbc3f5
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Tue Nov 26 15:01:42 2019 -0800

    f2fs: preallocate DIO blocks when forcing buffered_io

    The previous preallocation and DIO decision like below.

                             allow_outplace_dio              !allow_outplace_dio
    f2fs_force_buffered_io   (*) No_Prealloc / Buffered_IO   Prealloc / Buffered_IO
    !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO

    But, Javier reported Case (*) where zoned device bypassed preallocation but
    fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
    being read.

    In order to fix the issue, actually we need to preallocate blocks whenever
    we fall back to buffered IO like this. No change is made in the other cases.

                             allow_outplace_dio              !allow_outplace_dio
    f2fs_force_buffered_io   (*) Prealloc / Buffered_IO      Prealloc / Buffered_IO
    !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO

    Reported-and-tested-by: Javier Gonzalez <javier@javigon.com>
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Reviewed-by: Chao Yu <yuchao0@huawei.com>
    Reviewed-by: Javier González <javier@javigon.com>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


> 
> Do you have a proper explanation for why preallocations shouldn't be done in
> this case?  Note that preallocations are still done for buffered writes, which
> may be out-of-place as well; how are those different?
> 
> - Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-27  2:00         ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-27  2:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

On 07/25, Eric Biggers wrote:
> On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
> > On 2021/7/16 22:39, Eric Biggers wrote:
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > f2fs_write_begin() assumes that all blocks were preallocated by
> > > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > > corruption, as there are cases in which not all blocks are preallocated.
> > > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > > buffered_io") fixed one case, but there are others remaining.
> > 
> > Could you please explain which cases we missed to handle previously?
> > then I can check those related logic before and after the rework.
> 
> Any case where a buffered write happens while not all blocks were preallocated
> but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
> the preallocations for a direct write that will fall back to a buffered write,
> e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.
> 
> > 
> > > -			/*
> > > -			 * If force_buffere_io() is true, we have to allocate
> > > -			 * blocks all the time, since f2fs_direct_IO will fall
> > > -			 * back to buffered IO.
> > > -			 */
> > > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > > -				goto write;
> > 
> > We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
> > will always allocate two block addresses for each 4k append IO.
> > 
> > I jsut test based on codes of last f2fs dev-test branch.
> 
> Yes, I had misread that due to the weird goto and misleading comment and
> translated it into:
> 
>         /* If it will be an in-place direct write, don't bother. */
>         if (dio && !f2fs_lfs_mode(sbi))
>                 return 0;
> 
> It should be:
> 
>         if (dio && f2fs_lfs_mode(sbi))
>                 return 0;

Hmm, this addresses my 250 failure. And, I think the below commit can explain
the case.

commit 47501f87c61ad2aa234add63e1ae231521dbc3f5
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Tue Nov 26 15:01:42 2019 -0800

    f2fs: preallocate DIO blocks when forcing buffered_io

    The previous preallocation and DIO decision like below.

                             allow_outplace_dio              !allow_outplace_dio
    f2fs_force_buffered_io   (*) No_Prealloc / Buffered_IO   Prealloc / Buffered_IO
    !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO

    But, Javier reported Case (*) where zoned device bypassed preallocation but
    fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
    being read.

    In order to fix the issue, actually we need to preallocate blocks whenever
    we fall back to buffered IO like this. No change is made in the other cases.

                             allow_outplace_dio              !allow_outplace_dio
    f2fs_force_buffered_io   (*) Prealloc / Buffered_IO      Prealloc / Buffered_IO
    !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO

    Reported-and-tested-by: Javier Gonzalez <javier@javigon.com>
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Reviewed-by: Chao Yu <yuchao0@huawei.com>
    Reviewed-by: Javier González <javier@javigon.com>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


> 
> Do you have a proper explanation for why preallocations shouldn't be done in
> this case?  Note that preallocations are still done for buffered writes, which
> may be out-of-place as well; how are those different?
> 
> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-27  2:00         ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-27  3:23           ` Chao Yu
  -1 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-27  3:23 UTC (permalink / raw)
  To: Eric Biggers, Jaegeuk Kim
  Cc: linux-f2fs-devel, linux-fsdevel, linux-xfs, Satya Tangirala,
	Changheun Lee, Matthew Bobrowski

On 2021/7/27 10:00, Jaegeuk Kim wrote:
> On 07/25, Eric Biggers wrote:
>> On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
>>> On 2021/7/16 22:39, Eric Biggers wrote:
>>>> From: Eric Biggers <ebiggers@google.com>
>>>>
>>>> f2fs_write_begin() assumes that all blocks were preallocated by
>>>> default unless FI_NO_PREALLOC is explicitly set.  This invites data
>>>> corruption, as there are cases in which not all blocks are preallocated.
>>>> Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
>>>> buffered_io") fixed one case, but there are others remaining.
>>>
>>> Could you please explain which cases we missed to handle previously?
>>> then I can check those related logic before and after the rework.
>>
>> Any case where a buffered write happens while not all blocks were preallocated
>> but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
>> the preallocations for a direct write that will fall back to a buffered write,
>> e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.

Indeed, IIUC, the buggy code is as below, if any preallocation failed, we need to
set FI_NO_PREALLOC flag.

map_blocks:
	err = f2fs_map_blocks(inode, &map, 1, flag);
	if (map.m_len > 0 && err == -ENOSPC) {
		if (!direct_io)         <----
			set_inode_flag(inode, FI_NO_PREALLOC);
		err = 0;
	}

BTW, it will be better to include above issue details you explained into commit
message?

>>
>>>
>>>> -			/*
>>>> -			 * If force_buffere_io() is true, we have to allocate
>>>> -			 * blocks all the time, since f2fs_direct_IO will fall
>>>> -			 * back to buffered IO.
>>>> -			 */
>>>> -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
>>>> -					f2fs_lfs_mode(F2FS_I_SB(inode)))
>>>> -				goto write;
>>>
>>> We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
>>> will always allocate two block addresses for each 4k append IO.
>>>
>>> I jsut test based on codes of last f2fs dev-test branch.
>>
>> Yes, I had misread that due to the weird goto and misleading comment and
>> translated it into:
>>
>>          /* If it will be an in-place direct write, don't bother. */
>>          if (dio && !f2fs_lfs_mode(sbi))
>>                  return 0;
>>
>> It should be:
>>
>>          if (dio && f2fs_lfs_mode(sbi))
>>                  return 0;
> 
> Hmm, this addresses my 250 failure. And, I think the below commit can explain
> the case.
> 
> commit 47501f87c61ad2aa234add63e1ae231521dbc3f5
> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> Date:   Tue Nov 26 15:01:42 2019 -0800
> 
>      f2fs: preallocate DIO blocks when forcing buffered_io
> 
>      The previous preallocation and DIO decision like below.
> 
>                               allow_outplace_dio              !allow_outplace_dio
>      f2fs_force_buffered_io   (*) No_Prealloc / Buffered_IO   Prealloc / Buffered_IO
>      !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO
> 
>      But, Javier reported Case (*) where zoned device bypassed preallocation but
>      fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
>      being read.
> 
>      In order to fix the issue, actually we need to preallocate blocks whenever
>      we fall back to buffered IO like this. No change is made in the other cases.
> 
>                               allow_outplace_dio              !allow_outplace_dio
>      f2fs_force_buffered_io   (*) Prealloc / Buffered_IO      Prealloc / Buffered_IO
>      !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO
> 
>      Reported-and-tested-by: Javier Gonzalez <javier@javigon.com>
>      Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>      Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>      Reviewed-by: Chao Yu <yuchao0@huawei.com>
>      Reviewed-by: Javier González <javier@javigon.com>
>      Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> 

Thanks for the explain.

> 
>>
>> Do you have a proper explanation for why preallocations shouldn't be done in

See commit f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode"),
f2fs_map_blocks() logic was changed to force allocating a new block address no matter
previous block address was existed if it is called from write path of DIO. So, in such
condition, if we preallocate new block address in f2fs_file_write_iter(), we will
suffer the problem which my trace indicates.

>> this case?  Note that preallocations are still done for buffered writes, which
>> may be out-of-place as well; how are those different?
Got your concern.

For buffered IO, we use F2FS_GET_BLOCK_PRE_AIO, in this mode, we just preserve
filesystem block count and tag NEW_ADDR in dnode block, so, it's fine, double
new block address allocation won't happen during data page writeback.

For direct IO, we use F2FS_GET_BLOCK_PRE_DIO, in this mode, we will allocate
physical block address, after preallocation, if we fallback to buffered IO, we
may suffer double new block address allocation issue... IIUC.

Well, can we relocate preallocation into f2fs_direct_IO() after all cases which
may cause fallbacking DIO to buffered IO?

Thanks,

>>
>> - Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-27  3:23           ` Chao Yu
  0 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-27  3:23 UTC (permalink / raw)
  To: Eric Biggers, Jaegeuk Kim
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

On 2021/7/27 10:00, Jaegeuk Kim wrote:
> On 07/25, Eric Biggers wrote:
>> On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
>>> On 2021/7/16 22:39, Eric Biggers wrote:
>>>> From: Eric Biggers <ebiggers@google.com>
>>>>
>>>> f2fs_write_begin() assumes that all blocks were preallocated by
>>>> default unless FI_NO_PREALLOC is explicitly set.  This invites data
>>>> corruption, as there are cases in which not all blocks are preallocated.
>>>> Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
>>>> buffered_io") fixed one case, but there are others remaining.
>>>
>>> Could you please explain which cases we missed to handle previously?
>>> then I can check those related logic before and after the rework.
>>
>> Any case where a buffered write happens while not all blocks were preallocated
>> but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
>> the preallocations for a direct write that will fall back to a buffered write,
>> e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.

Indeed, IIUC, the buggy code is as below, if any preallocation failed, we need to
set FI_NO_PREALLOC flag.

map_blocks:
	err = f2fs_map_blocks(inode, &map, 1, flag);
	if (map.m_len > 0 && err == -ENOSPC) {
		if (!direct_io)         <----
			set_inode_flag(inode, FI_NO_PREALLOC);
		err = 0;
	}

BTW, it will be better to include above issue details you explained into commit
message?

>>
>>>
>>>> -			/*
>>>> -			 * If force_buffere_io() is true, we have to allocate
>>>> -			 * blocks all the time, since f2fs_direct_IO will fall
>>>> -			 * back to buffered IO.
>>>> -			 */
>>>> -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
>>>> -					f2fs_lfs_mode(F2FS_I_SB(inode)))
>>>> -				goto write;
>>>
>>> We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
>>> will always allocate two block addresses for each 4k append IO.
>>>
>>> I jsut test based on codes of last f2fs dev-test branch.
>>
>> Yes, I had misread that due to the weird goto and misleading comment and
>> translated it into:
>>
>>          /* If it will be an in-place direct write, don't bother. */
>>          if (dio && !f2fs_lfs_mode(sbi))
>>                  return 0;
>>
>> It should be:
>>
>>          if (dio && f2fs_lfs_mode(sbi))
>>                  return 0;
> 
> Hmm, this addresses my 250 failure. And, I think the below commit can explain
> the case.
> 
> commit 47501f87c61ad2aa234add63e1ae231521dbc3f5
> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> Date:   Tue Nov 26 15:01:42 2019 -0800
> 
>      f2fs: preallocate DIO blocks when forcing buffered_io
> 
>      The previous preallocation and DIO decision like below.
> 
>                               allow_outplace_dio              !allow_outplace_dio
>      f2fs_force_buffered_io   (*) No_Prealloc / Buffered_IO   Prealloc / Buffered_IO
>      !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO
> 
>      But, Javier reported Case (*) where zoned device bypassed preallocation but
>      fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
>      being read.
> 
>      In order to fix the issue, actually we need to preallocate blocks whenever
>      we fall back to buffered IO like this. No change is made in the other cases.
> 
>                               allow_outplace_dio              !allow_outplace_dio
>      f2fs_force_buffered_io   (*) Prealloc / Buffered_IO      Prealloc / Buffered_IO
>      !f2fs_force_buffered_io  No_Prealloc / DIO               Prealloc / DIO
> 
>      Reported-and-tested-by: Javier Gonzalez <javier@javigon.com>
>      Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>      Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>      Reviewed-by: Chao Yu <yuchao0@huawei.com>
>      Reviewed-by: Javier González <javier@javigon.com>
>      Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> 

Thanks for the explain.

> 
>>
>> Do you have a proper explanation for why preallocations shouldn't be done in

See commit f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode"),
f2fs_map_blocks() logic was changed to force allocating a new block address no matter
previous block address was existed if it is called from write path of DIO. So, in such
condition, if we preallocate new block address in f2fs_file_write_iter(), we will
suffer the problem which my trace indicates.

>> this case?  Note that preallocations are still done for buffered writes, which
>> may be out-of-place as well; how are those different?
Got your concern.

For buffered IO, we use F2FS_GET_BLOCK_PRE_AIO, in this mode, we just preserve
filesystem block count and tag NEW_ADDR in dnode block, so, it's fine, double
new block address allocation won't happen during data page writeback.

For direct IO, we use F2FS_GET_BLOCK_PRE_DIO, in this mode, we will allocate
physical block address, after preallocation, if we fallback to buffered IO, we
may suffer double new block address allocation issue... IIUC.

Well, can we relocate preallocation into f2fs_direct_IO() after all cases which
may cause fallbacking DIO to buffered IO?

Thanks,

>>
>> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-27  3:23           ` [f2fs-dev] " Chao Yu
@ 2021-07-27  7:38             ` Eric Biggers
  -1 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-27  7:38 UTC (permalink / raw)
  To: Chao Yu
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Tue, Jul 27, 2021 at 11:23:03AM +0800, Chao Yu wrote:
> > > 
> > > Do you have a proper explanation for why preallocations shouldn't be done in
> 
> See commit f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode"),
> f2fs_map_blocks() logic was changed to force allocating a new block address no matter
> previous block address was existed if it is called from write path of DIO. So, in such
> condition, if we preallocate new block address in f2fs_file_write_iter(), we will
> suffer the problem which my trace indicates.
> 
> > > this case?  Note that preallocations are still done for buffered writes, which
> > > may be out-of-place as well; how are those different?
> Got your concern.
> 
> For buffered IO, we use F2FS_GET_BLOCK_PRE_AIO, in this mode, we just preserve
> filesystem block count and tag NEW_ADDR in dnode block, so, it's fine, double
> new block address allocation won't happen during data page writeback.
> 
> For direct IO, we use F2FS_GET_BLOCK_PRE_DIO, in this mode, we will allocate
> physical block address, after preallocation, if we fallback to buffered IO, we
> may suffer double new block address allocation issue... IIUC.
> 
> Well, can we relocate preallocation into f2fs_direct_IO() after all cases which
> may cause fallbacking DIO to buffered IO?
> 

That's somewhat helpful, but I've been doing some more investigation and now I'm
even more confused.  How can f2fs support non-overwrite DIO writes at all
(meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
given that it has no support for unwritten extents?  AFAICS, as-is users can
easily leak uninitialized disk contents on f2fs by issuing a DIO write that
won't complete fully (or might not complete fully), then reading back the blocks
that got allocated but not written to.

I think that f2fs will have to take the ext2 approach of not allowing
non-overwrite DIO writes at all...

- Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-27  7:38             ` Eric Biggers
  0 siblings, 0 replies; 66+ messages in thread
From: Eric Biggers @ 2021-07-27  7:38 UTC (permalink / raw)
  To: Chao Yu
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Tue, Jul 27, 2021 at 11:23:03AM +0800, Chao Yu wrote:
> > > 
> > > Do you have a proper explanation for why preallocations shouldn't be done in
> 
> See commit f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode"),
> f2fs_map_blocks() logic was changed to force allocating a new block address no matter
> previous block address was existed if it is called from write path of DIO. So, in such
> condition, if we preallocate new block address in f2fs_file_write_iter(), we will
> suffer the problem which my trace indicates.
> 
> > > this case?  Note that preallocations are still done for buffered writes, which
> > > may be out-of-place as well; how are those different?
> Got your concern.
> 
> For buffered IO, we use F2FS_GET_BLOCK_PRE_AIO, in this mode, we just preserve
> filesystem block count and tag NEW_ADDR in dnode block, so, it's fine, double
> new block address allocation won't happen during data page writeback.
> 
> For direct IO, we use F2FS_GET_BLOCK_PRE_DIO, in this mode, we will allocate
> physical block address, after preallocation, if we fallback to buffered IO, we
> may suffer double new block address allocation issue... IIUC.
> 
> Well, can we relocate preallocation into f2fs_direct_IO() after all cases which
> may cause fallbacking DIO to buffered IO?
> 

That's somewhat helpful, but I've been doing some more investigation and now I'm
even more confused.  How can f2fs support non-overwrite DIO writes at all
(meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
given that it has no support for unwritten extents?  AFAICS, as-is users can
easily leak uninitialized disk contents on f2fs by issuing a DIO write that
won't complete fully (or might not complete fully), then reading back the blocks
that got allocated but not written to.

I think that f2fs will have to take the ext2 approach of not allowing
non-overwrite DIO writes at all...

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-27  7:38             ` [f2fs-dev] " Eric Biggers
@ 2021-07-27  8:30               ` Chao Yu
  -1 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-27  8:30 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 2021/7/27 15:38, Eric Biggers wrote:
> That's somewhat helpful, but I've been doing some more investigation and now I'm
> even more confused.  How can f2fs support non-overwrite DIO writes at all
> (meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
> given that it has no support for unwritten extents?  AFAICS, as-is users can

I'm trying to pick up DAX support patch created by Qiuyang from huawei, and it
looks it faces the same issue, so it tries to fix this by calling sb_issue_zeroout()
in f2fs_map_blocks() before it returns.

> easily leak uninitialized disk contents on f2fs by issuing a DIO write that
> won't complete fully (or might not complete fully), then reading back the blocks
> that got allocated but not written to.
> 
> I think that f2fs will have to take the ext2 approach of not allowing
> non-overwrite DIO writes at all...
Yes,

Another option is to enhance f2fs metadata's scalability which needs to update layout
of dnode block or SSA block, after that we can record the status of unwritten data block
there... it's a big change though...

Thanks,

> 
> - Eric
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-27  8:30               ` Chao Yu
  0 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-27  8:30 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel, Jaegeuk Kim

On 2021/7/27 15:38, Eric Biggers wrote:
> That's somewhat helpful, but I've been doing some more investigation and now I'm
> even more confused.  How can f2fs support non-overwrite DIO writes at all
> (meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
> given that it has no support for unwritten extents?  AFAICS, as-is users can

I'm trying to pick up DAX support patch created by Qiuyang from huawei, and it
looks it faces the same issue, so it tries to fix this by calling sb_issue_zeroout()
in f2fs_map_blocks() before it returns.

> easily leak uninitialized disk contents on f2fs by issuing a DIO write that
> won't complete fully (or might not complete fully), then reading back the blocks
> that got allocated but not written to.
> 
> I think that f2fs will have to take the ext2 approach of not allowing
> non-overwrite DIO writes at all...
Yes,

Another option is to enhance f2fs metadata's scalability which needs to update layout
of dnode block or SSA block, after that we can record the status of unwritten data block
there... it's a big change though...

Thanks,

> 
> - Eric
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-27  8:30               ` [f2fs-dev] " Chao Yu
@ 2021-07-27 15:33                 ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2021-07-27 15:33 UTC (permalink / raw)
  To: Chao Yu
  Cc: Eric Biggers, Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel,
	linux-xfs, Satya Tangirala, Changheun Lee, Matthew Bobrowski

On Tue, Jul 27, 2021 at 04:30:16PM +0800, Chao Yu wrote:
> On 2021/7/27 15:38, Eric Biggers wrote:
> > That's somewhat helpful, but I've been doing some more investigation and now I'm
> > even more confused.  How can f2fs support non-overwrite DIO writes at all
> > (meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
> > given that it has no support for unwritten extents?  AFAICS, as-is users can
> 
> I'm trying to pick up DAX support patch created by Qiuyang from huawei, and it
> looks it faces the same issue, so it tries to fix this by calling sb_issue_zeroout()
> in f2fs_map_blocks() before it returns.

I really hope you don't, because zeroing the region before memcpy'ing it
is absurd.  I don't know if f2fs can do that (xfs can't really) without
pinning resources during a potentially lengthy memcpy operation, but you
/could/ allocate the space in ->iomap_begin, attach some record of that
to iomap->private, and only commit the mapping update in ->iomap_end.

--D

> > easily leak uninitialized disk contents on f2fs by issuing a DIO write that
> > won't complete fully (or might not complete fully), then reading back the blocks
> > that got allocated but not written to.
> > 
> > I think that f2fs will have to take the ext2 approach of not allowing
> > non-overwrite DIO writes at all...
> Yes,
> 
> Another option is to enhance f2fs metadata's scalability which needs to update layout
> of dnode block or SSA block, after that we can record the status of unwritten data block
> there... it's a big change though...
> 
> Thanks,
> 
> > 
> > - Eric
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-27 15:33                 ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2021-07-27 15:33 UTC (permalink / raw)
  To: Chao Yu
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Eric Biggers,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel, Jaegeuk Kim

On Tue, Jul 27, 2021 at 04:30:16PM +0800, Chao Yu wrote:
> On 2021/7/27 15:38, Eric Biggers wrote:
> > That's somewhat helpful, but I've been doing some more investigation and now I'm
> > even more confused.  How can f2fs support non-overwrite DIO writes at all
> > (meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
> > given that it has no support for unwritten extents?  AFAICS, as-is users can
> 
> I'm trying to pick up DAX support patch created by Qiuyang from huawei, and it
> looks it faces the same issue, so it tries to fix this by calling sb_issue_zeroout()
> in f2fs_map_blocks() before it returns.

I really hope you don't, because zeroing the region before memcpy'ing it
is absurd.  I don't know if f2fs can do that (xfs can't really) without
pinning resources during a potentially lengthy memcpy operation, but you
/could/ allocate the space in ->iomap_begin, attach some record of that
to iomap->private, and only commit the mapping update in ->iomap_end.

--D

> > easily leak uninitialized disk contents on f2fs by issuing a DIO write that
> > won't complete fully (or might not complete fully), then reading back the blocks
> > that got allocated but not written to.
> > 
> > I think that f2fs will have to take the ext2 approach of not allowing
> > non-overwrite DIO writes at all...
> Yes,
> 
> Another option is to enhance f2fs metadata's scalability which needs to update layout
> of dnode block or SSA block, after that we can record the status of unwritten data block
> there... it's a big change though...
> 
> Thanks,
> 
> > 
> > - Eric
> > 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-27  2:00         ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-28  2:29           ` Jaegeuk Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-28  2:29 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Chao Yu, linux-f2fs-devel, linux-fsdevel, linux-xfs,
	Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 07/26, Jaegeuk Kim wrote:
> On 07/25, Eric Biggers wrote:
> > On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
> > > On 2021/7/16 22:39, Eric Biggers wrote:
> > > > From: Eric Biggers <ebiggers@google.com>
> > > > 
> > > > f2fs_write_begin() assumes that all blocks were preallocated by
> > > > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > > > corruption, as there are cases in which not all blocks are preallocated.
> > > > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > > > buffered_io") fixed one case, but there are others remaining.
> > > 
> > > Could you please explain which cases we missed to handle previously?
> > > then I can check those related logic before and after the rework.
> > 
> > Any case where a buffered write happens while not all blocks were preallocated
> > but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
> > the preallocations for a direct write that will fall back to a buffered write,
> > e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.
> > 
> > > 
> > > > -			/*
> > > > -			 * If force_buffere_io() is true, we have to allocate
> > > > -			 * blocks all the time, since f2fs_direct_IO will fall
> > > > -			 * back to buffered IO.
> > > > -			 */
> > > > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > > > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > > > -				goto write;
> > > 
> > > We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
> > > will always allocate two block addresses for each 4k append IO.
> > > 
> > > I jsut test based on codes of last f2fs dev-test branch.
> > 
> > Yes, I had misread that due to the weird goto and misleading comment and
> > translated it into:
> > 
> >         /* If it will be an in-place direct write, don't bother. */
> >         if (dio && !f2fs_lfs_mode(sbi))
> >                 return 0;
> > 
> > It should be:
> > 
> >         if (dio && f2fs_lfs_mode(sbi))
> >                 return 0;
> 
> Hmm, this addresses my 250 failure. And, I think the below commit can explain
> the case.

In addition to this, I got failure on generic/263, and the below change fixes
it. (I didn't take a look at deeply tho.)

--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4344,8 +4344,13 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
                        return ret;
        }

-       map.m_lblk = (pos >> inode->i_blkbits);
-       map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
+       map.m_lblk = F2FS_BLK_ALIGN(pos);
+       map.m_len = F2FS_BYTES_TO_BLK(pos + count);
+       if (map.m_len > map.m_lblk)
+               map.m_len -= map.m_lblk;
+       else
+               map.m_len = 0;
+

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-28  2:29           ` Jaegeuk Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Jaegeuk Kim @ 2021-07-28  2:29 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Satya Tangirala, linux-f2fs-devel, linux-xfs, Matthew Bobrowski,
	Changheun Lee, linux-fsdevel

On 07/26, Jaegeuk Kim wrote:
> On 07/25, Eric Biggers wrote:
> > On Sun, Jul 25, 2021 at 06:50:51PM +0800, Chao Yu wrote:
> > > On 2021/7/16 22:39, Eric Biggers wrote:
> > > > From: Eric Biggers <ebiggers@google.com>
> > > > 
> > > > f2fs_write_begin() assumes that all blocks were preallocated by
> > > > default unless FI_NO_PREALLOC is explicitly set.  This invites data
> > > > corruption, as there are cases in which not all blocks are preallocated.
> > > > Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing
> > > > buffered_io") fixed one case, but there are others remaining.
> > > 
> > > Could you please explain which cases we missed to handle previously?
> > > then I can check those related logic before and after the rework.
> > 
> > Any case where a buffered write happens while not all blocks were preallocated
> > but FI_NO_PREALLOC wasn't set.  For example when ENOSPC was hit in the middle of
> > the preallocations for a direct write that will fall back to a buffered write,
> > e.g. due to f2fs_force_buffered_io() or page cache invalidation failure.
> > 
> > > 
> > > > -			/*
> > > > -			 * If force_buffere_io() is true, we have to allocate
> > > > -			 * blocks all the time, since f2fs_direct_IO will fall
> > > > -			 * back to buffered IO.
> > > > -			 */
> > > > -			if (!f2fs_force_buffered_io(inode, iocb, from) &&
> > > > -					f2fs_lfs_mode(F2FS_I_SB(inode)))
> > > > -				goto write;
> > > 
> > > We should keep this OPU DIO logic, otherwise, in lfs mode, write dio
> > > will always allocate two block addresses for each 4k append IO.
> > > 
> > > I jsut test based on codes of last f2fs dev-test branch.
> > 
> > Yes, I had misread that due to the weird goto and misleading comment and
> > translated it into:
> > 
> >         /* If it will be an in-place direct write, don't bother. */
> >         if (dio && !f2fs_lfs_mode(sbi))
> >                 return 0;
> > 
> > It should be:
> > 
> >         if (dio && f2fs_lfs_mode(sbi))
> >                 return 0;
> 
> Hmm, this addresses my 250 failure. And, I think the below commit can explain
> the case.

In addition to this, I got failure on generic/263, and the below change fixes
it. (I didn't take a look at deeply tho.)

--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4344,8 +4344,13 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter)
                        return ret;
        }

-       map.m_lblk = (pos >> inode->i_blkbits);
-       map.m_len = ((pos + count - 1) >> inode->i_blkbits) - map.m_lblk + 1;
+       map.m_lblk = F2FS_BLK_ALIGN(pos);
+       map.m_len = F2FS_BYTES_TO_BLK(pos + count);
+       if (map.m_len > map.m_lblk)
+               map.m_len -= map.m_lblk;
+       else
+               map.m_len = 0;
+


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/9] f2fs: rework write preallocations
  2021-07-27 15:33                 ` [f2fs-dev] " Darrick J. Wong
@ 2021-07-29  0:26                   ` Chao Yu
  -1 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-29  0:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Eric Biggers, Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel,
	linux-xfs, Satya Tangirala, Changheun Lee, Matthew Bobrowski

On 2021/7/27 23:33, Darrick J. Wong wrote:
> On Tue, Jul 27, 2021 at 04:30:16PM +0800, Chao Yu wrote:
>> On 2021/7/27 15:38, Eric Biggers wrote:
>>> That's somewhat helpful, but I've been doing some more investigation and now I'm
>>> even more confused.  How can f2fs support non-overwrite DIO writes at all
>>> (meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
>>> given that it has no support for unwritten extents?  AFAICS, as-is users can
>>
>> I'm trying to pick up DAX support patch created by Qiuyang from huawei, and it
>> looks it faces the same issue, so it tries to fix this by calling sb_issue_zeroout()
>> in f2fs_map_blocks() before it returns.
> 
> I really hope you don't, because zeroing the region before memcpy'ing it
> is absurd.  I don't know if f2fs can do that (xfs can't really) without
> pinning resources during a potentially lengthy memcpy operation, but you
> /could/ allocate the space in ->iomap_begin, attach some record of that
> to iomap->private, and only commit the mapping update in ->iomap_end.

Thanks for the suggestion, let me check this a little bit later, since now I
just try to stabilize the codes...

Thanks,

> 
> --D
> 
>>> easily leak uninitialized disk contents on f2fs by issuing a DIO write that
>>> won't complete fully (or might not complete fully), then reading back the blocks
>>> that got allocated but not written to.
>>>
>>> I think that f2fs will have to take the ext2 approach of not allowing
>>> non-overwrite DIO writes at all...
>> Yes,
>>
>> Another option is to enhance f2fs metadata's scalability which needs to update layout
>> of dnode block or SSA block, after that we can record the status of unwritten data block
>> there... it's a big change though...
>>
>> Thanks,
>>
>>>
>>> - Eric
>>>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [f2fs-dev] [PATCH 3/9] f2fs: rework write preallocations
@ 2021-07-29  0:26                   ` Chao Yu
  0 siblings, 0 replies; 66+ messages in thread
From: Chao Yu @ 2021-07-29  0:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Satya Tangirala, linux-xfs, linux-f2fs-devel, Eric Biggers,
	Matthew Bobrowski, Changheun Lee, linux-fsdevel, Jaegeuk Kim

On 2021/7/27 23:33, Darrick J. Wong wrote:
> On Tue, Jul 27, 2021 at 04:30:16PM +0800, Chao Yu wrote:
>> On 2021/7/27 15:38, Eric Biggers wrote:
>>> That's somewhat helpful, but I've been doing some more investigation and now I'm
>>> even more confused.  How can f2fs support non-overwrite DIO writes at all
>>> (meaning DIO writes in LFS mode as well as DIO writes to holes in non-LFS mode),
>>> given that it has no support for unwritten extents?  AFAICS, as-is users can
>>
>> I'm trying to pick up DAX support patch created by Qiuyang from huawei, and it
>> looks it faces the same issue, so it tries to fix this by calling sb_issue_zeroout()
>> in f2fs_map_blocks() before it returns.
> 
> I really hope you don't, because zeroing the region before memcpy'ing it
> is absurd.  I don't know if f2fs can do that (xfs can't really) without
> pinning resources during a potentially lengthy memcpy operation, but you
> /could/ allocate the space in ->iomap_begin, attach some record of that
> to iomap->private, and only commit the mapping update in ->iomap_end.

Thanks for the suggestion, let me check this a little bit later, since now I
just try to stabilize the codes...

Thanks,

> 
> --D
> 
>>> easily leak uninitialized disk contents on f2fs by issuing a DIO write that
>>> won't complete fully (or might not complete fully), then reading back the blocks
>>> that got allocated but not written to.
>>>
>>> I think that f2fs will have to take the ext2 approach of not allowing
>>> non-overwrite DIO writes at all...
>> Yes,
>>
>> Another option is to enhance f2fs metadata's scalability which needs to update layout
>> of dnode block or SSA block, after that we can record the status of unwritten data block
>> there... it's a big change though...
>>
>> Thanks,
>>
>>>
>>> - Eric
>>>


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2021-07-29  0:26 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 14:39 [PATCH 0/9] f2fs: use iomap for direct I/O Eric Biggers
2021-07-16 14:39 ` [f2fs-dev] " Eric Biggers
2021-07-16 14:39 ` [PATCH 1/9] f2fs: make f2fs_write_failed() take struct inode Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-25 10:00   ` Chao Yu
2021-07-25 10:00     ` [f2fs-dev] " Chao Yu
2021-07-16 14:39 ` [PATCH 2/9] f2fs: remove allow_outplace_dio() Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-19  8:41   ` Christoph Hellwig
2021-07-19  8:41     ` [f2fs-dev] " Christoph Hellwig
2021-07-16 14:39 ` [PATCH 3/9] f2fs: rework write preallocations Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-25 10:50   ` Chao Yu
2021-07-25 10:50     ` [f2fs-dev] " Chao Yu
2021-07-25 17:57     ` Eric Biggers
2021-07-25 17:57       ` [f2fs-dev] " Eric Biggers
2021-07-27  2:00       ` Jaegeuk Kim
2021-07-27  2:00         ` [f2fs-dev] " Jaegeuk Kim
2021-07-27  3:23         ` Chao Yu
2021-07-27  3:23           ` [f2fs-dev] " Chao Yu
2021-07-27  7:38           ` Eric Biggers
2021-07-27  7:38             ` [f2fs-dev] " Eric Biggers
2021-07-27  8:30             ` Chao Yu
2021-07-27  8:30               ` [f2fs-dev] " Chao Yu
2021-07-27 15:33               ` Darrick J. Wong
2021-07-27 15:33                 ` [f2fs-dev] " Darrick J. Wong
2021-07-29  0:26                 ` Chao Yu
2021-07-29  0:26                   ` [f2fs-dev] " Chao Yu
2021-07-28  2:29         ` Jaegeuk Kim
2021-07-28  2:29           ` [f2fs-dev] " Jaegeuk Kim
2021-07-25 15:35   ` Jaegeuk Kim
2021-07-25 15:35     ` [f2fs-dev] " Jaegeuk Kim
2021-07-25 15:47     ` Jaegeuk Kim
2021-07-25 15:47       ` [f2fs-dev] " Jaegeuk Kim
2021-07-25 18:01       ` Eric Biggers
2021-07-25 18:01         ` [f2fs-dev] " Eric Biggers
2021-07-26 19:04         ` Jaegeuk Kim
2021-07-26 19:04           ` [f2fs-dev] " Jaegeuk Kim
2021-07-16 14:39 ` [PATCH 4/9] f2fs: reduce indentation in f2fs_file_write_iter() Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-16 14:39 ` [PATCH 5/9] f2fs: fix the f2fs_file_write_iter tracepoint Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-16 14:39 ` [PATCH 6/9] f2fs: implement iomap operations Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-19  8:59   ` Christoph Hellwig
2021-07-19  8:59     ` [f2fs-dev] " Christoph Hellwig
2021-07-22 20:47     ` Jaegeuk Kim
2021-07-22 20:47       ` [f2fs-dev] " Jaegeuk Kim
2021-07-22 20:49       ` Jaegeuk Kim
2021-07-22 20:49         ` [f2fs-dev] " Jaegeuk Kim
2021-07-22 20:54       ` Eric Biggers
2021-07-22 20:54         ` [f2fs-dev] " Eric Biggers
2021-07-22 21:57         ` Jaegeuk Kim
2021-07-22 21:57           ` [f2fs-dev] " Jaegeuk Kim
2021-07-23  1:52     ` Eric Biggers
2021-07-23  1:52       ` [f2fs-dev] " Eric Biggers
2021-07-23  5:00       ` Christoph Hellwig
2021-07-23  5:00         ` [f2fs-dev] " Christoph Hellwig
2021-07-23  8:05         ` Eric Biggers
2021-07-23  8:05           ` [f2fs-dev] " Eric Biggers
2021-07-16 14:39 ` [PATCH 7/9] f2fs: use iomap for direct I/O reads Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-16 14:39 ` [PATCH 8/9] f2fs: use iomap for direct I/O writes Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers
2021-07-16 14:39 ` [PATCH 9/9] f2fs: remove f2fs_direct_IO() Eric Biggers
2021-07-16 14:39   ` [f2fs-dev] " Eric Biggers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.