linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8 v6] btrfs direct-io using iomap
@ 2019-12-13 19:57 Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 1/8] fs: Export generic_file_buffered_read() Goldwyn Rodrigues
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel

This is an effort to use iomap for direct I/O in btrfs. This would
change the call from __blockdev_direct_io() to iomap_dio_rw().

The main objective is to lose the buffer head and use bio defined by
iomap code, and hopefully to use more of generic-FS codebase.

These patches are based and tested on v5.5-rc1. I have tested it against
xfstests/btrfs.

The tree is available at
https://github.com/goldwynr/linux/tree/btrfs-iomap-dio

Changes since v1
- Incorporated back the efficiency change for inode locking
- Review comments about coding style and git comments
- Merge related patches into one
- Direct read to go through btrfs_direct_IO()
- Removal of no longer used function dio_end_io()

Changes since v2
- aligning iomap offset/length to the position/length of I/O
- Removed btrfs_dio_data
- Removed BTRFS_INODE_READDIO_NEED_LOCK
- Re-incorporating write efficiency changes caused lockdep_assert() in
  iomap to be triggered, remove that code.

Changes since v3
- Fixed freeze on generic/095. Use iomap_end() to account for
  failed/incomplete dio instead of btrfs_dio_data

Changes since v4
- moved lockdep_assert_held() to functions calling iomap_dio_rw()
  This may be called immidiately after calling inode lock and
  may feel not required, but it seems important.
- Removed comments which are no longer required
- Changed commit comments to make them more appropriate

Changes since v5
- restore inode_dio_wait() in truncate
- Removed lockdep_assert_held() near callers

--
Goldwyn



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/8] fs: Export generic_file_buffered_read()
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Export generic_file_buffered_read() to be used to
supplement incomplete direct reads.

While we are at it, correct the comments and variable names.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 13 +++++++------
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98e0349adb52..3883adaa0e01 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3098,6 +3098,8 @@ extern int generic_file_rw_checks(struct file *file_in, struct file *file_out);
 extern int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
 				    struct file *file_out, loff_t pos_out,
 				    size_t *count, unsigned int flags);
+extern ssize_t generic_file_buffered_read(struct kiocb *iocb,
+		struct iov_iter *to, ssize_t already_read);
 extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
diff --git a/mm/filemap.c b/mm/filemap.c
index bf6aa30be58d..dc2c7e3472a0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1994,7 +1994,7 @@ static void shrink_readahead_size_eio(struct file *filp,
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
  * @iter:	data destination
- * @written:	already copied
+ * @copied:	already copied
  *
  * This is a generic file read routine, and uses the
  * mapping->a_ops->readpage() function for the actual low-level stuff.
@@ -2003,11 +2003,11 @@ static void shrink_readahead_size_eio(struct file *filp,
  * of the logic when it comes to error handling etc.
  *
  * Return:
- * * total number of bytes copied, including those the were already @written
+ * * total number of bytes copied, including those that were @copied
  * * negative error code if nothing was copied
  */
-static ssize_t generic_file_buffered_read(struct kiocb *iocb,
-		struct iov_iter *iter, ssize_t written)
+ssize_t generic_file_buffered_read(struct kiocb *iocb,
+		struct iov_iter *iter, ssize_t copied)
 {
 	struct file *filp = iocb->ki_filp;
 	struct address_space *mapping = filp->f_mapping;
@@ -2148,7 +2148,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		prev_offset = offset;
 
 		put_page(page);
-		written += ret;
+		copied += ret;
 		if (!iov_iter_count(iter))
 			goto out;
 		if (ret < nr) {
@@ -2256,8 +2256,9 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
 	*ppos = ((loff_t)index << PAGE_SHIFT) + offset;
 	file_accessed(filp);
-	return written ? written : error;
+	return copied ? copied : error;
 }
+EXPORT_SYMBOL_GPL(generic_file_buffered_read);
 
 /**
  * generic_file_read_iter - generic filesystem read routine
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 1/8] fs: Export generic_file_buffered_read() Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-14  0:31   ` Darrick J. Wong
  2019-12-18  2:02   ` Darrick J. Wong
  2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This helps filesystems to perform tasks on the bio while submitting for
I/O. This could be post-write operations such as data CRC or data
replication for fs-handled RAID.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/direct-io.c  | 14 +++++++++-----
 include/linux/iomap.h |  2 ++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 23837926c0c5..1a3bf3bd86fb 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
 EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
 
 static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
-		struct bio *bio)
+		struct bio *bio, loff_t pos)
 {
 	atomic_inc(&dio->ref);
 
@@ -67,7 +67,11 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
 		bio_set_polled(bio, dio->iocb);
 
 	dio->submit.last_queue = bdev_get_queue(iomap->bdev);
-	dio->submit.cookie = submit_bio(bio);
+	if (dio->dops && dio->dops->submit_io)
+		dio->submit.cookie = dio->dops->submit_io(bio,
+				dio->iocb->ki_filp, pos);
+	else
+		dio->submit.cookie = submit_bio(bio);
 }
 
 static ssize_t iomap_dio_complete(struct iomap_dio *dio)
@@ -191,7 +195,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
 	get_page(page);
 	__bio_add_page(bio, page, len, 0);
 	bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
-	iomap_dio_submit_bio(dio, iomap, bio);
+	iomap_dio_submit_bio(dio, iomap, bio, pos);
 }
 
 static loff_t
@@ -299,11 +303,11 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
 		}
 
 		dio->size += n;
-		pos += n;
 		copied += n;
 
 		nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
-		iomap_dio_submit_bio(dio, iomap, bio);
+		iomap_dio_submit_bio(dio, iomap, bio, pos);
+		pos += n;
 	} while (nr_pages);
 
 	/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b09463dae0d..2b093a23ef1c 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -252,6 +252,8 @@ int iomap_writepages(struct address_space *mapping,
 struct iomap_dio_ops {
 	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
 		      unsigned flags);
+	blk_qc_t (*submit_io)(struct bio *bio, struct file *file,
+			  loff_t file_offset);
 };
 
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 1/8] fs: Export generic_file_buffered_read() Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-14  0:32   ` Darrick J. Wong
                     ` (2 more replies)
  2019-12-13 19:57 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Filesystems such as btrfs can perform direct I/O without holding the
inode->i_rwsem in some of the cases like writing within i_size.
So, remove the check for lockdep_assert_held() in iomap_dio_rw()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/iomap/direct-io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 1a3bf3bd86fb..41c1e7c20a1f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -415,8 +415,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	struct blk_plug plug;
 	struct iomap_dio *dio;
 
-	lockdep_assert_held(&inode->i_rwsem);
-
 	if (!count)
 		return 0;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-21 14:42   ` Christoph Hellwig
  2019-12-13 19:57 ` [PATCH 5/8] fs: Remove dio_end_io() Goldwyn Rodrigues
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Switch from __blockdev_direct_IO() to iomap_dio_rw().
Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it
as iomap_begin() for iomap direct I/O functions. This function
allocates and locks all the blocks required for the I/O.
btrfs_submit_direct() is used as the submit_io() hook for direct I/O
ops.

Since we need direct I/O reads to go through iomap_dio_rw(), we change
file_operations.read_iter() to a btrfs_file_read_iter() which calls
btrfs_direct_IO() for direct reads and falls back to
generic_file_buffered_read() for incomplete reads and buffered reads.

We don't need address_space.direct_IO() anymore so set it to noop.
Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is
capable of direct I/O reads from a hole, so we don't need to return
-ENOENT.

BTRFS direct I/O is now done under i_rwsem, shared in case of
reads and exclusive in case of writes. This guards against simultaneous
truncates.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h |   1 +
 fs/btrfs/file.c  |  21 +++++-
 fs/btrfs/inode.c | 192 ++++++++++++++++++++++++++-----------------------------
 3 files changed, 110 insertions(+), 104 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b2e8fd8a8e59..113dcd1a11cd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2904,6 +2904,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					  u64 end, int uptodate);
 extern const struct dentry_operations btrfs_dentry_operations;
+ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0cb43b682789..7010dd7beccc 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1832,7 +1832,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	loff_t endbyte;
 	int err;
 
-	written = generic_file_direct_write(iocb, from);
+	written = btrfs_direct_IO(iocb, from);
 
 	if (written < 0 || !iov_iter_count(from))
 		return written;
@@ -3444,9 +3444,26 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 	return generic_file_open(inode, filp);
 }
 
+static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	ssize_t ret = 0;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		struct inode *inode = file_inode(iocb->ki_filp);
+
+		inode_lock_shared(inode);
+		ret = btrfs_direct_IO(iocb, to);
+		inode_unlock_shared(inode);
+		if (ret < 0)
+			return ret;
+	}
+
+	return generic_file_buffered_read(iocb, to, ret);
+}
+
 const struct file_operations btrfs_file_operations = {
 	.llseek		= btrfs_file_llseek,
-	.read_iter      = generic_file_read_iter,
+	.read_iter      = btrfs_file_read_iter,
 	.splice_read	= generic_file_splice_read,
 	.write_iter	= btrfs_file_write_iter,
 	.mmap		= btrfs_file_mmap,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 56032c518b26..ff5ee99086f0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -29,6 +29,7 @@
 #include <linux/iversion.h>
 #include <linux/swap.h>
 #include <linux/sched/mm.h>
+#include <linux/iomap.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -7510,7 +7511,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 }
 
 static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
-			      struct extent_state **cached_state, int writing)
+			      struct extent_state **cached_state, bool writing)
 {
 	struct btrfs_ordered_extent *ordered;
 	int ret = 0;
@@ -7648,30 +7649,7 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
 }
 
 
-static int btrfs_get_blocks_direct_read(struct extent_map *em,
-					struct buffer_head *bh_result,
-					struct inode *inode,
-					u64 start, u64 len)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-
-	if (em->block_start == EXTENT_MAP_HOLE ||
-			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-		return -ENOENT;
-
-	len = min(len, em->len - (start - em->start));
-
-	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
-		inode->i_blkbits;
-	bh_result->b_size = len;
-	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
-	set_buffer_mapped(bh_result);
-
-	return 0;
-}
-
 static int btrfs_get_blocks_direct_write(struct extent_map **map,
-					 struct buffer_head *bh_result,
 					 struct inode *inode,
 					 struct btrfs_dio_data *dio_data,
 					 u64 start, u64 len)
@@ -7733,7 +7711,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	}
 
 	/* this will cow the extent */
-	len = bh_result->b_size;
 	free_extent_map(em);
 	*map = em = btrfs_new_extent_direct(inode, start, len);
 	if (IS_ERR(em)) {
@@ -7744,15 +7721,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	len = min(len, em->len - (start - em->start));
 
 skip_cow:
-	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
-		inode->i_blkbits;
-	bh_result->b_size = len;
-	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
-	set_buffer_mapped(bh_result);
-
-	if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-		set_buffer_new(bh_result);
-
 	/*
 	 * Need to update the i_size under the extent lock so buffered
 	 * readers will get the updated i_size when we unlock.
@@ -7768,24 +7736,37 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	return ret;
 }
 
-static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
-				   struct buffer_head *bh_result, int create)
+static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
+		loff_t length, unsigned flags, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
 	struct btrfs_dio_data *dio_data = NULL;
-	u64 start = iblock << inode->i_blkbits;
 	u64 lockstart, lockend;
-	u64 len = bh_result->b_size;
+	bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
+	u64 len = length;
+	bool unlock_extents = false;
 
-	if (!create)
+	if (!write)
 		len = min_t(u64, len, fs_info->sectorsize);
 
 	lockstart = start;
 	lockend = start + len - 1;
 
+	/*
+	 * The generic stuff only does filemap_write_and_wait_range, which
+	 * isn't enough if we've written compressed pages to this area, so
+	 * we need to flush the dirty pages again to make absolutely sure
+	 * that any outstanding dirty pages are on disk.
+	 */
+	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+		     &BTRFS_I(inode)->runtime_flags))
+		ret = filemap_fdatawrite_range(inode->i_mapping, start,
+					 start + length - 1);
+
 	if (current->journal_info) {
 		/*
 		 * Need to pull our outstanding extents and set journal_info to NULL so
@@ -7801,7 +7782,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 	 * this range and we need to fallback to buffered.
 	 */
 	if (lock_extent_direct(inode, lockstart, lockend, &cached_state,
-			       create)) {
+			       write)) {
 		ret = -ENOTBLK;
 		goto err;
 	}
@@ -7833,35 +7814,52 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 		goto unlock_err;
 	}
 
-	if (create) {
-		ret = btrfs_get_blocks_direct_write(&em, bh_result, inode,
+	len = min(len, em->len - (start - em->start));
+	if (write) {
+		ret = btrfs_get_blocks_direct_write(&em, inode,
 						    dio_data, start, len);
 		if (ret < 0)
 			goto unlock_err;
-
-		unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart,
-				     lockend, &cached_state);
+		unlock_extents = true;
+		/* Recalc len in case the new em is smaller than requested */
+		len = min(len, em->len - (start - em->start));
+	} else if (em->block_start == EXTENT_MAP_HOLE ||
+			test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
+		/* Unlock in case of direct reading from a hole */
+		unlock_extents = true;
 	} else {
-		ret = btrfs_get_blocks_direct_read(em, bh_result, inode,
-						   start, len);
-		/* Can be negative only if we read from a hole */
-		if (ret < 0) {
-			ret = 0;
-			free_extent_map(em);
-			goto unlock_err;
-		}
 		/*
 		 * We need to unlock only the end area that we aren't using.
 		 * The rest is going to be unlocked by the endio routine.
 		 */
-		lockstart = start + bh_result->b_size;
-		if (lockstart < lockend) {
-			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     lockstart, lockend, &cached_state);
-		} else {
-			free_extent_state(cached_state);
-		}
+		lockstart = start + len;
+		if (lockstart < lockend)
+			unlock_extents = true;
+	}
+
+	if (unlock_extents)
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				lockstart, lockend, &cached_state);
+	else
+		free_extent_state(cached_state);
+
+	/*
+	 * Translate extent map information to iomap
+	 * We trim the extents (and move the addr) even though
+	 * iomap code does that, since we have locked only the parts
+	 * we are performing I/O in.
+	 */
+	if ((em->block_start == EXTENT_MAP_HOLE) ||
+	    (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) && !write)) {
+		iomap->addr = IOMAP_NULL_ADDR;
+		iomap->type = IOMAP_HOLE;
+	} else {
+		iomap->addr = em->block_start + (start - em->start);
+		iomap->type = IOMAP_MAPPED;
 	}
+	iomap->offset = start;
+	iomap->bdev = fs_info->fs_devices->latest_bdev;
+	iomap->length = len;
 
 	free_extent_map(em);
 
@@ -8229,10 +8227,9 @@ static void btrfs_endio_direct_read(struct bio *bio)
 
 	kfree(dip);
 
-	dio_bio->bi_status = err;
-	dio_end_io(dio_bio);
 	btrfs_io_bio_free_csum(io_bio);
-	bio_put(bio);
+	dio_bio->bi_status = err;
+	bio_endio(dio_bio);
 }
 
 static void __endio_write_update_ordered(struct inode *inode,
@@ -8289,8 +8286,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
 	kfree(dip);
 
 	dio_bio->bi_status = bio->bi_status;
-	dio_end_io(dio_bio);
-	bio_put(bio);
+	bio_endio(dio_bio);
 }
 
 static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
@@ -8522,9 +8518,10 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 	return 0;
 }
 
-static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
+static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
 				loff_t file_offset)
 {
+	struct inode *inode = file_inode(file);
 	struct btrfs_dio_private *dip = NULL;
 	struct bio *bio = NULL;
 	struct btrfs_io_bio *io_bio;
@@ -8575,7 +8572,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 
 	ret = btrfs_submit_direct_hook(dip);
 	if (!ret)
-		return;
+		return BLK_QC_T_NONE;
 
 	btrfs_io_bio_free_csum(io_bio);
 
@@ -8594,7 +8591,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		/*
 		 * The end io callbacks free our dip, do the final put on bio
 		 * and all the cleanup and final put for dio_bio (through
-		 * dio_end_io()).
+		 * end_io()).
 		 */
 		dip = NULL;
 		bio = NULL;
@@ -8609,15 +8606,12 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 			      file_offset + dio_bio->bi_iter.bi_size - 1);
 
 		dio_bio->bi_status = BLK_STS_IOERR;
-		/*
-		 * Releases and cleans up our dio_bio, no need to bio_put()
-		 * nor bio_endio()/bio_io_error() against dio_bio.
-		 */
-		dio_end_io(dio_bio);
+		bio_endio(dio_bio);
 	}
 	if (bio)
 		bio_put(bio);
 	kfree(dip);
+	return BLK_QC_T_NONE;
 }
 
 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
@@ -8653,7 +8647,23 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 	return retval;
 }
 
-static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+static const struct iomap_ops btrfs_dio_iomap_ops = {
+	.iomap_begin            = btrfs_dio_iomap_begin,
+};
+
+static const struct iomap_dio_ops btrfs_dops = {
+	.submit_io		= btrfs_submit_direct,
+};
+
+
+/*
+ * btrfs_direct_IO - perform direct I/O
+ * inode->i_rwsem must be locked before calling this function, shared or exclusive.
+ * @iocb - kernel iocb
+ * @iter - iter to/from data is copied
+ */
+
+ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
@@ -8662,28 +8672,15 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct extent_changeset *data_reserved = NULL;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
-	int flags = 0;
-	bool wakeup = true;
 	bool relock = false;
 	ssize_t ret;
 
+	lockdep_assert_held(&inode->i_rwsem);
+
 	if (check_direct_IO(fs_info, iter, offset))
 		return 0;
 
-	inode_dio_begin(inode);
-
-	/*
-	 * The generic stuff only does filemap_write_and_wait_range, which
-	 * isn't enough if we've written compressed pages to this area, so
-	 * we need to flush the dirty pages again to make absolutely sure
-	 * that any outstanding dirty pages are on disk.
-	 */
 	count = iov_iter_count(iter);
-	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
-		     &BTRFS_I(inode)->runtime_flags))
-		filemap_fdatawrite_range(inode->i_mapping, offset,
-					 offset + count - 1);
-
 	if (iov_iter_rw(iter) == WRITE) {
 		/*
 		 * If the write DIO is beyond the EOF, we need update
@@ -8714,17 +8711,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		dio_data.unsubmitted_oe_range_end = (u64)offset;
 		current->journal_info = &dio_data;
 		down_read(&BTRFS_I(inode)->dio_sem);
-	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
-				     &BTRFS_I(inode)->runtime_flags)) {
-		inode_dio_end(inode);
-		flags = DIO_LOCKING | DIO_SKIP_HOLES;
-		wakeup = false;
 	}
 
-	ret = __blockdev_direct_IO(iocb, inode,
-				   fs_info->fs_devices->latest_bdev,
-				   iter, btrfs_get_blocks_direct, NULL,
-				   btrfs_submit_direct, flags);
+	ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
+			is_sync_kiocb(iocb));
+
 	if (iov_iter_rw(iter) == WRITE) {
 		up_read(&BTRFS_I(inode)->dio_sem);
 		current->journal_info = NULL;
@@ -8751,11 +8742,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
 	}
 out:
-	if (wakeup)
-		inode_dio_end(inode);
 	if (relock)
 		inode_lock(inode);
-
 	extent_changeset_free(data_reserved);
 	return ret;
 }
@@ -11045,7 +11033,7 @@ static const struct address_space_operations btrfs_aops = {
 	.writepage	= btrfs_writepage,
 	.writepages	= btrfs_writepages,
 	.readpages	= btrfs_readpages,
-	.direct_IO	= btrfs_direct_IO,
+	.direct_IO	= noop_direct_IO,
 	.invalidatepage = btrfs_invalidatepage,
 	.releasepage	= btrfs_releasepage,
 	.set_page_dirty	= btrfs_set_page_dirty,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/8] fs: Remove dio_end_io()
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2019-12-13 19:57 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 6/8] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Since we removed the last user of dio_end_io(), remove the helper
function dio_end_io().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/direct-io.c     | 19 -------------------
 include/linux/fs.h |  2 --
 2 files changed, 21 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0ec4f270139f..fb19a77bec22 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -384,25 +384,6 @@ static void dio_bio_end_io(struct bio *bio)
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 }
 
-/**
- * dio_end_io - handle the end io action for the given bio
- * @bio: The direct io bio thats being completed
- *
- * This is meant to be called by any filesystem that uses their own dio_submit_t
- * so that the DIO specific endio actions are dealt with after the filesystem
- * has done it's completion work.
- */
-void dio_end_io(struct bio *bio)
-{
-	struct dio *dio = bio->bi_private;
-
-	if (dio->is_async)
-		dio_bio_end_aio(bio);
-	else
-		dio_bio_end_io(bio);
-}
-EXPORT_SYMBOL_GPL(dio_end_io);
-
 static inline void
 dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 	      struct block_device *bdev,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3883adaa0e01..d708422f77e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3157,8 +3157,6 @@ enum {
 	DIO_SKIP_HOLES	= 0x02,
 };
 
-void dio_end_io(struct bio *bio);
-
 ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 			     struct block_device *bdev, struct iov_iter *iter,
 			     get_block_t get_block,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/8] btrfs: Wait for extent bits to release page
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
                   ` (4 preceding siblings ...)
  2019-12-13 19:57 ` [PATCH 5/8] fs: Remove dio_end_io() Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 7/8] btrfs: Use ->iomap_end() instead of btrfs_dio_data Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

While trying to release a page, the extent containing the page may be locked
which would stop the page from being released. Wait for the
extent lock to be cleared, if blocking is allowed and then clear
the bits.

While we are at it, clean the code of try_release_extent_state() to make
it simpler.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 37 ++++++++++++++++---------------------
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     |  4 ++--
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index eb8bd0258360..193785981183 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4360,33 +4360,28 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  * are locked or under IO and drops the related state bits if it is safe
  * to drop the page.
  */
-static int try_release_extent_state(struct extent_io_tree *tree,
+static bool try_release_extent_state(struct extent_io_tree *tree,
 				    struct page *page, gfp_t mask)
 {
 	u64 start = page_offset(page);
 	u64 end = start + PAGE_SIZE - 1;
-	int ret = 1;
 
 	if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) {
-		ret = 0;
-	} else {
-		/*
-		 * at this point we can safely clear everything except the
-		 * locked bit and the nodatasum bit
-		 */
-		ret = __clear_extent_bit(tree, start, end,
-				 ~(EXTENT_LOCKED | EXTENT_NODATASUM),
-				 0, 0, NULL, mask, NULL);
-
-		/* if clear_extent_bit failed for enomem reasons,
-		 * we can't allow the release to continue.
-		 */
-		if (ret < 0)
-			ret = 0;
-		else
-			ret = 1;
+		if (!gfpflags_allow_blocking(mask))
+			return false;
+		wait_extent_bit(tree, start, end, EXTENT_LOCKED);
 	}
-	return ret;
+	/*
+	 * At this point we can safely clear everything except the locked and
+	 * nodatasum bits. If clear_extent_bit failed due to -ENOMEM,
+	 * don't allow release.
+	 */
+	if (__clear_extent_bit(tree, start, end,
+				~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0,
+				NULL, mask, NULL) < 0)
+		return false;
+
+	return true;
 }
 
 /*
@@ -4394,7 +4389,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
  * in the range corresponding to the page, both state records and extent
  * map records are removed
  */
-int try_release_extent_mapping(struct page *page, gfp_t mask)
+bool try_release_extent_mapping(struct page *page, gfp_t mask)
 {
 	struct extent_map *em;
 	u64 start = page_offset(page);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a8551a1f56e2..89cc0cf8a7fd 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -188,7 +188,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
 					  u64 start, u64 len,
 					  int create);
 
-int try_release_extent_mapping(struct page *page, gfp_t mask);
+bool try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ff5ee99086f0..276f3f2f26f3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8809,8 +8809,8 @@ btrfs_readpages(struct file *file, struct address_space *mapping,
 
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
-	int ret = try_release_extent_mapping(page, gfp_flags);
-	if (ret == 1) {
+	bool ret = try_release_extent_mapping(page, gfp_flags);
+	if (ret) {
 		ClearPagePrivate(page);
 		set_page_private(page, 0);
 		put_page(page);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/8] btrfs: Use ->iomap_end() instead of btrfs_dio_data
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
                   ` (5 preceding siblings ...)
  2019-12-13 19:57 ` [PATCH 6/8] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-13 19:57 ` [PATCH 8/8] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
  2019-12-16  0:01 ` [PATCH 0/8 v6] btrfs direct-io using iomap Nikolay Borisov
  8 siblings, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Use iomap->iomap_end() to check for failed or incomplete writes and call
__endio_write_update_ordered(). We don't need btrfs_dio_data anymore so
remove that. The bonus is we don't abuse current->journal_info anymore.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 89 ++++++++++----------------------------------------------
 1 file changed, 16 insertions(+), 73 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 276f3f2f26f3..07220194ffb8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -56,13 +56,6 @@ struct btrfs_iget_args {
 	struct btrfs_root *root;
 };
 
-struct btrfs_dio_data {
-	u64 reserve;
-	u64 unsubmitted_oe_range_start;
-	u64 unsubmitted_oe_range_end;
-	int overwrite;
-};
-
 static const struct inode_operations btrfs_dir_inode_operations;
 static const struct inode_operations btrfs_symlink_inode_operations;
 static const struct inode_operations btrfs_dir_ro_inode_operations;
@@ -7651,7 +7644,6 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
 
 static int btrfs_get_blocks_direct_write(struct extent_map **map,
 					 struct inode *inode,
-					 struct btrfs_dio_data *dio_data,
 					 u64 start, u64 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -7725,13 +7717,9 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	 * Need to update the i_size under the extent lock so buffered
 	 * readers will get the updated i_size when we unlock.
 	 */
-	if (!dio_data->overwrite && start + len > i_size_read(inode))
+	if (start + len > i_size_read(inode))
 		i_size_write(inode, start + len);
 
-	WARN_ON(dio_data->reserve < len);
-	dio_data->reserve -= len;
-	dio_data->unsubmitted_oe_range_end = start + len;
-	current->journal_info = dio_data;
 out:
 	return ret;
 }
@@ -7743,7 +7731,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
-	struct btrfs_dio_data *dio_data = NULL;
 	u64 lockstart, lockend;
 	bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
@@ -7767,16 +7754,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 		ret = filemap_fdatawrite_range(inode->i_mapping, start,
 					 start + length - 1);
 
-	if (current->journal_info) {
-		/*
-		 * Need to pull our outstanding extents and set journal_info to NULL so
-		 * that anything that needs to check if there's a transaction doesn't get
-		 * confused.
-		 */
-		dio_data = current->journal_info;
-		current->journal_info = NULL;
-	}
-
 	/*
 	 * If this errors out it's because we couldn't invalidate pagecache for
 	 * this range and we need to fallback to buffered.
@@ -7817,7 +7794,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	len = min(len, em->len - (start - em->start));
 	if (write) {
 		ret = btrfs_get_blocks_direct_write(&em, inode,
-						    dio_data, start, len);
+						    start, len);
 		if (ret < 0)
 			goto unlock_err;
 		unlock_extents = true;
@@ -7869,11 +7846,21 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 			     &cached_state);
 err:
-	if (dio_data)
-		current->journal_info = dio_data;
 	return ret;
 }
 
+static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+		ssize_t written, unsigned flags, struct iomap *iomap)
+{
+	if (!(flags & IOMAP_WRITE))
+		return 0;
+
+	if (written < length)
+		__endio_write_update_ordered(inode, pos + written,
+				length - written, false);
+	return 0;
+}
+
 static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
 						 struct bio *bio,
 						 int mirror_num)
@@ -8555,21 +8542,6 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
 		dip->subio_endio = btrfs_subio_endio_read;
 	}
 
-	/*
-	 * Reset the range for unsubmitted ordered extents (to a 0 length range)
-	 * even if we fail to submit a bio, because in such case we do the
-	 * corresponding error handling below and it must not be done a second
-	 * time by btrfs_direct_IO().
-	 */
-	if (write) {
-		struct btrfs_dio_data *dio_data = current->journal_info;
-
-		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
-			dip->bytes;
-		dio_data->unsubmitted_oe_range_start =
-			dio_data->unsubmitted_oe_range_end;
-	}
-
 	ret = btrfs_submit_direct_hook(dip);
 	if (!ret)
 		return BLK_QC_T_NONE;
@@ -8649,6 +8621,7 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 
 static const struct iomap_ops btrfs_dio_iomap_ops = {
 	.iomap_begin            = btrfs_dio_iomap_begin,
+	.iomap_end		= btrfs_dio_iomap_end,
 };
 
 static const struct iomap_dio_ops btrfs_dops = {
@@ -8668,7 +8641,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_dio_data dio_data = { 0 };
 	struct extent_changeset *data_reserved = NULL;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
@@ -8688,7 +8660,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		 * not unlock the i_mutex at this case.
 		 */
 		if (offset + count <= inode->i_size) {
-			dio_data.overwrite = 1;
 			inode_unlock(inode);
 			relock = true;
 		} else if (iocb->ki_flags & IOCB_NOWAIT) {
@@ -8700,16 +8671,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		if (ret)
 			goto out;
 
-		/*
-		 * We need to know how many extents we reserved so that we can
-		 * do the accounting properly if we go over the number we
-		 * originally calculated.  Abuse current->journal_info for this.
-		 */
-		dio_data.reserve = round_up(count,
-					    fs_info->sectorsize);
-		dio_data.unsubmitted_oe_range_start = (u64)offset;
-		dio_data.unsubmitted_oe_range_end = (u64)offset;
-		current->journal_info = &dio_data;
 		down_read(&BTRFS_I(inode)->dio_sem);
 	}
 
@@ -8718,25 +8679,7 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 	if (iov_iter_rw(iter) == WRITE) {
 		up_read(&BTRFS_I(inode)->dio_sem);
-		current->journal_info = NULL;
-		if (ret < 0 && ret != -EIOCBQUEUED) {
-			if (dio_data.reserve)
-				btrfs_delalloc_release_space(inode, data_reserved,
-					offset, dio_data.reserve, true);
-			/*
-			 * On error we might have left some ordered extents
-			 * without submitting corresponding bios for them, so
-			 * cleanup them up to avoid other tasks getting them
-			 * and waiting for them to complete forever.
-			 */
-			if (dio_data.unsubmitted_oe_range_start <
-			    dio_data.unsubmitted_oe_range_end)
-				__endio_write_update_ordered(inode,
-					dio_data.unsubmitted_oe_range_start,
-					dio_data.unsubmitted_oe_range_end -
-					dio_data.unsubmitted_oe_range_start,
-					false);
-		} else if (ret >= 0 && (size_t)ret < count)
+		if (ret >= 0 && (size_t)ret < count)
 			btrfs_delalloc_release_space(inode, data_reserved,
 					offset, count - (size_t)ret, true);
 		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/8] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
                   ` (6 preceding siblings ...)
  2019-12-13 19:57 ` [PATCH 7/8] btrfs: Use ->iomap_end() instead of btrfs_dio_data Goldwyn Rodrigues
@ 2019-12-13 19:57 ` Goldwyn Rodrigues
  2019-12-16  0:01 ` [PATCH 0/8 v6] btrfs direct-io using iomap Nikolay Borisov
  8 siblings, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-13 19:57 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Since we now perform direct reads using i_rwsem, we can remove this
inode flag used to co-ordinate unlocked reads.

The truncate call takes i_rwsem. This means it is correctly synchronized
with concurrent direct reads.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jth@kernel.org>
---
 fs/btrfs/btrfs_inode.h | 18 ------------------
 fs/btrfs/inode.c       |  3 ---
 2 files changed, 21 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..cd8f378ed8e7 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -27,7 +27,6 @@ enum {
 	BTRFS_INODE_NEEDS_FULL_SYNC,
 	BTRFS_INODE_COPY_EVERYTHING,
 	BTRFS_INODE_IN_DELALLOC_LIST,
-	BTRFS_INODE_READDIO_NEED_LOCK,
 	BTRFS_INODE_HAS_PROPS,
 	BTRFS_INODE_SNAPSHOT_FLUSH,
 };
@@ -317,23 +316,6 @@ struct btrfs_dio_private {
 			blk_status_t);
 };
 
-/*
- * Disable DIO read nolock optimization, so new dio readers will be forced
- * to grab i_mutex. It is used to avoid the endless truncate due to
- * nonlocked dio read.
- */
-static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
-{
-	set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-	smp_mb();
-}
-
-static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
-{
-	smp_mb__before_atomic();
-	clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-}
-
 /* Array of bytes with variable length, hexadecimal format 0x1234 */
 #define CSUM_FMT				"0x%*phN"
 #define CSUM_FMT_VALUE(size, bytes)		size, bytes
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 07220194ffb8..68adb7958ef4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5273,10 +5273,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 
 		truncate_setsize(inode, newsize);
 
-		/* Disable nonlocked read DIO to avoid the endless truncate */
-		btrfs_inode_block_unlocked_dio(BTRFS_I(inode));
 		inode_dio_wait(inode);
-		btrfs_inode_resume_unlocked_dio(BTRFS_I(inode));
 
 		ret = btrfs_truncate(inode, newsize == oldsize);
 		if (ret && inode->i_nlink) {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission
  2019-12-13 19:57 ` [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
@ 2019-12-14  0:31   ` Darrick J. Wong
  2019-12-18  2:02   ` Darrick J. Wong
  1 sibling, 0 replies; 37+ messages in thread
From: Darrick J. Wong @ 2019-12-14  0:31 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

On Fri, Dec 13, 2019 at 01:57:44PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> This helps filesystems to perform tasks on the bio while submitting for
> I/O. This could be post-write operations such as data CRC or data
> replication for fs-handled RAID.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/direct-io.c  | 14 +++++++++-----
>  include/linux/iomap.h |  2 ++
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 23837926c0c5..1a3bf3bd86fb 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
>  EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
>  
>  static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
> -		struct bio *bio)
> +		struct bio *bio, loff_t pos)
>  {
>  	atomic_inc(&dio->ref);
>  
> @@ -67,7 +67,11 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
>  		bio_set_polled(bio, dio->iocb);
>  
>  	dio->submit.last_queue = bdev_get_queue(iomap->bdev);
> -	dio->submit.cookie = submit_bio(bio);
> +	if (dio->dops && dio->dops->submit_io)
> +		dio->submit.cookie = dio->dops->submit_io(bio,
> +				dio->iocb->ki_filp, pos);
> +	else
> +		dio->submit.cookie = submit_bio(bio);
>  }
>  
>  static ssize_t iomap_dio_complete(struct iomap_dio *dio)
> @@ -191,7 +195,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
>  	get_page(page);
>  	__bio_add_page(bio, page, len, 0);
>  	bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
> -	iomap_dio_submit_bio(dio, iomap, bio);
> +	iomap_dio_submit_bio(dio, iomap, bio, pos);
>  }
>  
>  static loff_t
> @@ -299,11 +303,11 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
>  		}
>  
>  		dio->size += n;
> -		pos += n;
>  		copied += n;
>  
>  		nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
> -		iomap_dio_submit_bio(dio, iomap, bio);
> +		iomap_dio_submit_bio(dio, iomap, bio, pos);
> +		pos += n;
>  	} while (nr_pages);
>  
>  	/*
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 8b09463dae0d..2b093a23ef1c 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -252,6 +252,8 @@ int iomap_writepages(struct address_space *mapping,
>  struct iomap_dio_ops {
>  	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
>  		      unsigned flags);
> +	blk_qc_t (*submit_io)(struct bio *bio, struct file *file,
> +			  loff_t file_offset);
>  };
>  
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> -- 
> 2.16.4
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
  2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
@ 2019-12-14  0:32   ` Darrick J. Wong
  2019-12-18  2:04   ` Darrick J. Wong
  2019-12-21 13:41   ` Christoph Hellwig
  2 siblings, 0 replies; 37+ messages in thread
From: Darrick J. Wong @ 2019-12-14  0:32 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

On Fri, Dec 13, 2019 at 01:57:45PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Filesystems such as btrfs can perform direct I/O without holding the
> inode->i_rwsem in some of the cases like writing within i_size.
> So, remove the check for lockdep_assert_held() in iomap_dio_rw()
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Mildly scary, but OTOH filesystems are supposed to take care of their
own locking before calling iomap...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/direct-io.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1a3bf3bd86fb..41c1e7c20a1f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -415,8 +415,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	struct blk_plug plug;
>  	struct iomap_dio *dio;
>  
> -	lockdep_assert_held(&inode->i_rwsem);
> -
>  	if (!count)
>  		return 0;
>  
> -- 
> 2.16.4
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8 v6] btrfs direct-io using iomap
  2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
                   ` (7 preceding siblings ...)
  2019-12-13 19:57 ` [PATCH 8/8] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
@ 2019-12-16  0:01 ` Nikolay Borisov
  2019-12-16 12:41   ` Goldwyn Rodrigues
  8 siblings, 1 reply; 37+ messages in thread
From: Nikolay Borisov @ 2019-12-16  0:01 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs
  Cc: hch, darrick.wong, fdmanana, dsterba, jthumshirn, linux-fsdevel



On 13.12.19 г. 21:57 ч., Goldwyn Rodrigues wrote:
> This is an effort to use iomap for direct I/O in btrfs. This would
> change the call from __blockdev_direct_io() to iomap_dio_rw().
> 
> The main objective is to lose the buffer head and use bio defined by
> iomap code, and hopefully to use more of generic-FS codebase.
> 
> These patches are based and tested on v5.5-rc1. I have tested it against
> xfstests/btrfs.
> 
> The tree is available at
> https://github.com/goldwynr/linux/tree/btrfs-iomap-dio
> 
> Changes since v1
> - Incorporated back the efficiency change for inode locking
> - Review comments about coding style and git comments
> - Merge related patches into one
> - Direct read to go through btrfs_direct_IO()
> - Removal of no longer used function dio_end_io()
> 
> Changes since v2
> - aligning iomap offset/length to the position/length of I/O
> - Removed btrfs_dio_data
> - Removed BTRFS_INODE_READDIO_NEED_LOCK
> - Re-incorporating write efficiency changes caused lockdep_assert() in
>   iomap to be triggered, remove that code.
> 
> Changes since v3
> - Fixed freeze on generic/095. Use iomap_end() to account for
>   failed/incomplete dio instead of btrfs_dio_data
> 
> Changes since v4
> - moved lockdep_assert_held() to functions calling iomap_dio_rw()
>   This may be called immidiately after calling inode lock and
>   may feel not required, but it seems important.
> - Removed comments which are no longer required
> - Changed commit comments to make them more appropriate
> 
> Changes since v5
> - restore inode_dio_wait() in truncate

I'm confused about this - you no longer call inode_dio_begin after patch
4/8 so inode_dio_wait which is left intact in truncate can never trigger
a wait really. Exclusion is provided by the fact that btrfs_direct_IO is
called with rwsem held shared and truncate holds it exclusive? So what
necessitated restoring inode_dio_wait?


Another point, I don't see whether you have explicitly addressed
concerns raised in:

https://lore.kernel.org/linux-btrfs/20191212003043.31093-1-rgoldwyn@suse.de/T/#me7f96506e5a1d921d05b76d01ecf6ea1ebcea594

> - Removed lockdep_assert_held() near callers
> 
> --
> Goldwyn
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8 v6] btrfs direct-io using iomap
  2019-12-16  0:01 ` [PATCH 0/8 v6] btrfs direct-io using iomap Nikolay Borisov
@ 2019-12-16 12:41   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-16 12:41 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: linux-btrfs, hch, darrick.wong, fdmanana, dsterba, jthumshirn,
	linux-fsdevel

On  2:01 16/12, Nikolay Borisov wrote:
> 
> 
> On 13.12.19 г. 21:57 ч., Goldwyn Rodrigues wrote:
> > This is an effort to use iomap for direct I/O in btrfs. This would
> > change the call from __blockdev_direct_io() to iomap_dio_rw().
> > 
> > The main objective is to lose the buffer head and use bio defined by
> > iomap code, and hopefully to use more of generic-FS codebase.
> > 
> > These patches are based and tested on v5.5-rc1. I have tested it against
> > xfstests/btrfs.
> > 
> > The tree is available at
> > https://github.com/goldwynr/linux/tree/btrfs-iomap-dio
> > 
> > Changes since v1
> > - Incorporated back the efficiency change for inode locking
> > - Review comments about coding style and git comments
> > - Merge related patches into one
> > - Direct read to go through btrfs_direct_IO()
> > - Removal of no longer used function dio_end_io()
> > 
> > Changes since v2
> > - aligning iomap offset/length to the position/length of I/O
> > - Removed btrfs_dio_data
> > - Removed BTRFS_INODE_READDIO_NEED_LOCK
> > - Re-incorporating write efficiency changes caused lockdep_assert() in
> >   iomap to be triggered, remove that code.
> > 
> > Changes since v3
> > - Fixed freeze on generic/095. Use iomap_end() to account for
> >   failed/incomplete dio instead of btrfs_dio_data
> > 
> > Changes since v4
> > - moved lockdep_assert_held() to functions calling iomap_dio_rw()
> >   This may be called immidiately after calling inode lock and
> >   may feel not required, but it seems important.
> > - Removed comments which are no longer required
> > - Changed commit comments to make them more appropriate
> > 
> > Changes since v5
> > - restore inode_dio_wait() in truncate
> 
> I'm confused about this - you no longer call inode_dio_begin after patch
> 4/8 so inode_dio_wait which is left intact in truncate can never trigger
> a wait really. Exclusion is provided by the fact that btrfs_direct_IO is
> called with rwsem held shared and truncate holds it exclusive? So what
> necessitated restoring inode_dio_wait?

The optimization of the write path of direct I/O. The rwsem is
released if write is within i_size. This is also the reason we
removed lockdep_assert_held().

Reference: 38851cc19adb ("Btrfs: implement unlocked dio write")

This was inspired by ext4. However, ext4 after it's switch to
iomap_dio_rw() does not employ this optimization.

> 
> 
> Another point, I don't see whether you have explicitly addressed
> concerns raised in:
> 
> https://lore.kernel.org/linux-btrfs/20191212003043.31093-1-rgoldwyn@suse.de/T/#me7f96506e5a1d921d05b76d01ecf6ea1ebcea594

I did address it by releasing csums (referring io_bio) before calling
bio_endio().


-- 
Goldwyn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission
  2019-12-13 19:57 ` [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
  2019-12-14  0:31   ` Darrick J. Wong
@ 2019-12-18  2:02   ` Darrick J. Wong
  1 sibling, 0 replies; 37+ messages in thread
From: Darrick J. Wong @ 2019-12-18  2:02 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

On Fri, Dec 13, 2019 at 01:57:44PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> This helps filesystems to perform tasks on the bio while submitting for
> I/O. This could be post-write operations such as data CRC or data
> replication for fs-handled RAID.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Seems fine to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/direct-io.c  | 14 +++++++++-----
>  include/linux/iomap.h |  2 ++
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 23837926c0c5..1a3bf3bd86fb 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
>  EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
>  
>  static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
> -		struct bio *bio)
> +		struct bio *bio, loff_t pos)
>  {
>  	atomic_inc(&dio->ref);
>  
> @@ -67,7 +67,11 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
>  		bio_set_polled(bio, dio->iocb);
>  
>  	dio->submit.last_queue = bdev_get_queue(iomap->bdev);
> -	dio->submit.cookie = submit_bio(bio);
> +	if (dio->dops && dio->dops->submit_io)
> +		dio->submit.cookie = dio->dops->submit_io(bio,
> +				dio->iocb->ki_filp, pos);
> +	else
> +		dio->submit.cookie = submit_bio(bio);
>  }
>  
>  static ssize_t iomap_dio_complete(struct iomap_dio *dio)
> @@ -191,7 +195,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
>  	get_page(page);
>  	__bio_add_page(bio, page, len, 0);
>  	bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
> -	iomap_dio_submit_bio(dio, iomap, bio);
> +	iomap_dio_submit_bio(dio, iomap, bio, pos);
>  }
>  
>  static loff_t
> @@ -299,11 +303,11 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
>  		}
>  
>  		dio->size += n;
> -		pos += n;
>  		copied += n;
>  
>  		nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
> -		iomap_dio_submit_bio(dio, iomap, bio);
> +		iomap_dio_submit_bio(dio, iomap, bio, pos);
> +		pos += n;
>  	} while (nr_pages);
>  
>  	/*
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 8b09463dae0d..2b093a23ef1c 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -252,6 +252,8 @@ int iomap_writepages(struct address_space *mapping,
>  struct iomap_dio_ops {
>  	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
>  		      unsigned flags);
> +	blk_qc_t (*submit_io)(struct bio *bio, struct file *file,
> +			  loff_t file_offset);
>  };
>  
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> -- 
> 2.16.4
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
  2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
  2019-12-14  0:32   ` Darrick J. Wong
@ 2019-12-18  2:04   ` Darrick J. Wong
  2019-12-21 13:41   ` Christoph Hellwig
  2 siblings, 0 replies; 37+ messages in thread
From: Darrick J. Wong @ 2019-12-18  2:04 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

On Fri, Dec 13, 2019 at 01:57:45PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Filesystems such as btrfs can perform direct I/O without holding the
> inode->i_rwsem in some of the cases like writing within i_size.
> So, remove the check for lockdep_assert_held() in iomap_dio_rw()
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

/me wishes there was a way for iomap to verify that the fs has indeed
taken /some/ measure to ensure correct concurrent operations, but that's
probably just asking for liar functions. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/direct-io.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1a3bf3bd86fb..41c1e7c20a1f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -415,8 +415,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	struct blk_plug plug;
>  	struct iomap_dio *dio;
>  
> -	lockdep_assert_held(&inode->i_rwsem);
> -
>  	if (!count)
>  		return 0;
>  
> -- 
> 2.16.4
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
  2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
  2019-12-14  0:32   ` Darrick J. Wong
  2019-12-18  2:04   ` Darrick J. Wong
@ 2019-12-21 13:41   ` Christoph Hellwig
  2019-12-21 13:42     ` Christoph Hellwig
  2019-12-21 18:02     ` Darrick J. Wong
  2 siblings, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-21 13:41 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, darrick.wong, fdmanana, nborisov, dsterba,
	jthumshirn, linux-fsdevel, Goldwyn Rodrigues

On Fri, Dec 13, 2019 at 01:57:45PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Filesystems such as btrfs can perform direct I/O without holding the
> inode->i_rwsem in some of the cases like writing within i_size.
> So, remove the check for lockdep_assert_held() in iomap_dio_rw()

As said last time: in the callers the assert is completely pointless,
as it is always very close to taking the lock.  This was just intended
to deal with callers not adhering to the iomap_dio_rw calling
conventins, and moving the assert to the calllers doesn't help with
that at all.

So if you think you need to remove it do just that and write a changelog
explaining the why much better.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
  2019-12-21 13:41   ` Christoph Hellwig
@ 2019-12-21 13:42     ` Christoph Hellwig
  2019-12-21 18:02     ` Darrick J. Wong
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-21 13:42 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, darrick.wong, fdmanana, nborisov, dsterba,
	jthumshirn, linux-fsdevel, Goldwyn Rodrigues

On Sat, Dec 21, 2019 at 05:41:18AM -0800, Christoph Hellwig wrote:
> On Fri, Dec 13, 2019 at 01:57:45PM -0600, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > Filesystems such as btrfs can perform direct I/O without holding the
> > inode->i_rwsem in some of the cases like writing within i_size.
> > So, remove the check for lockdep_assert_held() in iomap_dio_rw()
> 
> As said last time: in the callers the assert is completely pointless,
> as it is always very close to taking the lock.  This was just intended
> to deal with callers not adhering to the iomap_dio_rw calling
> conventins, and moving the assert to the calllers doesn't help with
> that at all.

And the actual patch even does that, it just is the commit log that
doesn't match it..

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-13 19:57 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
@ 2019-12-21 14:42   ` Christoph Hellwig
  2020-01-02 18:01     ` Goldwyn Rodrigues
  2020-01-07 11:59     ` Goldwyn Rodrigues
  0 siblings, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-21 14:42 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, hch, darrick.wong, fdmanana, nborisov, dsterba,
	jthumshirn, linux-fsdevel, Goldwyn Rodrigues

[-- Attachment #1: Type: text/plain, Size: 580 bytes --]

So Ilooked into the "unlocked" direct I/O case, and I think the current
code using dio_sem is really sketchy.  What btrfs really needs to do is
take i_rwsem shared by default for direct writes, and only upgrade to
the exclusive lock when needed, similar to xfs and the WIP ext4 code.

While looking for that I also noticed two other things:

 - check_direct_IO looks pretty bogus
 - btrfs_direct_IO really should be split and folded into the two
   callers

Untested patches attached.  The first should probably go into a prep
patch, and the second could be folded into this one.

[-- Attachment #2: 0001-btrfs-remove-direct-I-O-aligment-checks.patch --]
[-- Type: text/plain, Size: 3922 bytes --]

From bc285e440a50140beb456f11e545a049bdf51ec1 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sat, 21 Dec 2019 15:17:26 +0100
Subject: btrfs: remove direct I/O aligment checks

The direct I/O code itself already checks for the proper sector
size alignment, so remove the duplicate checks.  The remainder of
check_direct_IO is not ony needed for reads and can be moved to
file.c and outside of i_rwsem.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/file.c  | 34 +++++++++++++++++++++++++++-------
 fs/btrfs/inode.c | 37 -------------------------------------
 2 files changed, 27 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a6d41d7bf362..0522f6d45a98 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3444,21 +3444,41 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 	return generic_file_open(inode, filp);
 }
 
-static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+/*
+ * If there are duplicate iov_base's in this iovec, fall back to buffered I/O
+ * to avoid checksum errors.
+ */
+static bool btrfs_direct_read_ok(struct kiocb *iocb, struct iov_iter *iter)
 {
-	ssize_t ret = 0;
+	int seg, i;
 
-	if (iocb->ki_flags & IOCB_DIRECT) {
+	if (!iter_is_iovec(iter))
+		return true;
+
+	for (seg = 0; seg < iter->nr_segs; seg++) {
+		for (i = seg + 1; i < iter->nr_segs; i++) {
+			if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
+				return false;
+		}
+	}
+
+	return true;
+}
+
+
+static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	if ((iocb->ki_flags & IOCB_DIRECT) && btrfs_direct_read_ok(iocb, to)) {
 		struct inode *inode = file_inode(iocb->ki_filp);
+		ssize_t ret;
 
 		inode_lock_shared(inode);
 		ret = btrfs_direct_IO(iocb, to);
 		inode_unlock_shared(inode);
-		if (ret < 0)
-			return ret;
-	}
 
-	return generic_file_buffered_read(iocb, to, ret);
+		return ret;
+	}
+	return generic_file_buffered_read(iocb, to, 0);
 }
 
 const struct file_operations btrfs_file_operations = {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 824f318cee5e..18d153a62655 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8581,39 +8581,6 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
 	return BLK_QC_T_NONE;
 }
 
-static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
-			       const struct iov_iter *iter, loff_t offset)
-{
-	int seg;
-	int i;
-	unsigned int blocksize_mask = fs_info->sectorsize - 1;
-	ssize_t retval = -EINVAL;
-
-	if (offset & blocksize_mask)
-		goto out;
-
-	if (iov_iter_alignment(iter) & blocksize_mask)
-		goto out;
-
-	/* If this is a write we don't need to check anymore */
-	if (iov_iter_rw(iter) != READ || !iter_is_iovec(iter))
-		return 0;
-	/*
-	 * Check to make sure we don't have duplicate iov_base's in this
-	 * iovec, if so return EINVAL, otherwise we'll get csum errors
-	 * when reading back.
-	 */
-	for (seg = 0; seg < iter->nr_segs; seg++) {
-		for (i = seg + 1; i < iter->nr_segs; i++) {
-			if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
-				goto out;
-		}
-	}
-	retval = 0;
-out:
-	return retval;
-}
-
 static const struct iomap_ops btrfs_dio_iomap_ops = {
 	.iomap_begin            = btrfs_dio_iomap_begin,
 	.iomap_end		= btrfs_dio_iomap_end,
@@ -8635,7 +8602,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_changeset *data_reserved = NULL;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
@@ -8644,9 +8610,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 	lockdep_assert_held(&inode->i_rwsem);
 
-	if (check_direct_IO(fs_info, iter, offset))
-		return 0;
-
 	count = iov_iter_count(iter);
 	if (iov_iter_rw(iter) == WRITE) {
 		/*
-- 
2.24.0


[-- Attachment #3: 0002-btrfs-split-btrfs_direct_IO.patch --]
[-- Type: text/plain, Size: 6757 bytes --]

From 7194fa1986a48af46d2b01457865066cdbd14e35 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sat, 21 Dec 2019 15:23:41 +0100
Subject: btrfs: split btrfs_direct_IO

The read and write versions don't have anything in common except
for the call to iomap_dio_rw.  So split this function, and merge
each half into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/ctree.h |  4 ++-
 fs/btrfs/file.c  | 44 +++++++++++++++++++++++++----
 fs/btrfs/inode.c | 72 ++++--------------------------------------------
 3 files changed, 48 insertions(+), 72 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8faa069b0a73..fccbbfebdf88 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -28,6 +28,7 @@
 #include <linux/dynamic_debug.h>
 #include <linux/refcount.h>
 #include <linux/crc32c.h>
+#include <linux/iomap.h>
 #include "extent-io-tree.h"
 #include "extent_io.h"
 #include "extent_map.h"
@@ -2904,7 +2905,8 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					  u64 end, int uptodate);
 extern const struct dentry_operations btrfs_dentry_operations;
-ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
+const struct iomap_ops btrfs_dio_iomap_ops;
+const struct iomap_dio_ops btrfs_dio_ops;
 
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0522f6d45a98..ed0b2e015d8d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1822,17 +1822,50 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	return num_written ? num_written : ret;
 }
 
-static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
+static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	loff_t pos;
+	size_t count = iov_iter_count(from);
+	struct extent_changeset *data_reserved = NULL;
+	loff_t pos = iocb->ki_pos;
 	ssize_t written;
 	ssize_t written_buffered;
 	loff_t endbyte;
+	bool relock = false;
 	int err;
 
-	written = btrfs_direct_IO(iocb, from);
+	/*
+	 * If the write DIO is beyond the EOF, we need update the isize, but
+	 * it is protected by i_mutex. So we can not unlock the i_mutex in
+	 * this case.
+	 */
+	if (pos + count <= inode->i_size) {
+		inode_unlock(inode);
+		relock = true;
+	} else {
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			return -EAGAIN;
+	}
+
+	err = btrfs_delalloc_reserve_space(inode, &data_reserved, pos, count);
+	if (err) {
+		if (relock)
+			inode_lock(inode);
+		return err;
+	}
+
+	down_read(&BTRFS_I(inode)->dio_sem);
+	written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
+			is_sync_kiocb(iocb));
+	up_read(&BTRFS_I(inode)->dio_sem);
+	if (written >= 0 && (size_t)written < count)
+		btrfs_delalloc_release_space(inode, data_reserved,
+				pos, count - (size_t)written, true);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), count);
+	if (relock)
+		inode_lock(inode);
+	extent_changeset_free(data_reserved);
 
 	if (written < 0 || !iov_iter_count(from))
 		return written;
@@ -1975,7 +2008,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		atomic_inc(&BTRFS_I(inode)->sync_writers);
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		num_written = __btrfs_direct_write(iocb, from);
+		num_written = btrfs_direct_write(iocb, from);
 	} else {
 		num_written = btrfs_buffered_write(iocb, from);
 		if (num_written > 0)
@@ -3473,7 +3506,8 @@ static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		ssize_t ret;
 
 		inode_lock_shared(inode);
-		ret = btrfs_direct_IO(iocb, to);
+		ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops,
+				   &btrfs_dio_ops, is_sync_kiocb(iocb));
 		inode_unlock_shared(inode);
 
 		return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 18d153a62655..7b747270ec40 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -29,7 +29,6 @@
 #include <linux/iversion.h>
 #include <linux/swap.h>
 #include <linux/sched/mm.h>
-#include <linux/iomap.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -7856,6 +7855,11 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	return 0;
 }
 
+const struct iomap_ops btrfs_dio_iomap_ops = {
+	.iomap_begin            = btrfs_dio_iomap_begin,
+	.iomap_end		= btrfs_dio_iomap_end,
+};
+
 static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
 						 struct bio *bio,
 						 int mirror_num)
@@ -8581,74 +8585,10 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
 	return BLK_QC_T_NONE;
 }
 
-static const struct iomap_ops btrfs_dio_iomap_ops = {
-	.iomap_begin            = btrfs_dio_iomap_begin,
-	.iomap_end		= btrfs_dio_iomap_end,
-};
-
-static const struct iomap_dio_ops btrfs_dops = {
+const struct iomap_dio_ops btrfs_dio_ops = {
 	.submit_io		= btrfs_submit_direct,
 };
 
-
-/*
- * btrfs_direct_IO - perform direct I/O
- * inode->i_rwsem must be locked before calling this function, shared or exclusive.
- * @iocb - kernel iocb
- * @iter - iter to/from data is copied
- */
-
-ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-	struct extent_changeset *data_reserved = NULL;
-	loff_t offset = iocb->ki_pos;
-	size_t count = 0;
-	bool relock = false;
-	ssize_t ret;
-
-	lockdep_assert_held(&inode->i_rwsem);
-
-	count = iov_iter_count(iter);
-	if (iov_iter_rw(iter) == WRITE) {
-		/*
-		 * If the write DIO is beyond the EOF, we need update
-		 * the isize, but it is protected by i_mutex. So we can
-		 * not unlock the i_mutex at this case.
-		 */
-		if (offset + count <= inode->i_size) {
-			inode_unlock(inode);
-			relock = true;
-		} else if (iocb->ki_flags & IOCB_NOWAIT) {
-			ret = -EAGAIN;
-			goto out;
-		}
-		ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
-						   offset, count);
-		if (ret)
-			goto out;
-
-		down_read(&BTRFS_I(inode)->dio_sem);
-	}
-
-	ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
-			is_sync_kiocb(iocb));
-
-	if (iov_iter_rw(iter) == WRITE) {
-		up_read(&BTRFS_I(inode)->dio_sem);
-		if (ret >= 0 && (size_t)ret < count)
-			btrfs_delalloc_release_space(inode, data_reserved,
-					offset, count - (size_t)ret, true);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
-	}
-out:
-	if (relock)
-		inode_lock(inode);
-	extent_changeset_free(data_reserved);
-	return ret;
-}
-
 #define BTRFS_FIEMAP_FLAGS	(FIEMAP_FLAG_SYNC)
 
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
  2019-12-21 13:41   ` Christoph Hellwig
  2019-12-21 13:42     ` Christoph Hellwig
@ 2019-12-21 18:02     ` Darrick J. Wong
  1 sibling, 0 replies; 37+ messages in thread
From: Darrick J. Wong @ 2019-12-21 18:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Goldwyn Rodrigues, linux-btrfs, fdmanana, nborisov, dsterba,
	jthumshirn, linux-fsdevel, Goldwyn Rodrigues

On Sat, Dec 21, 2019 at 05:41:18AM -0800, Christoph Hellwig wrote:
> On Fri, Dec 13, 2019 at 01:57:45PM -0600, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > Filesystems such as btrfs can perform direct I/O without holding the
> > inode->i_rwsem in some of the cases like writing within i_size.
> > So, remove the check for lockdep_assert_held() in iomap_dio_rw()
> 
> As said last time: in the callers the assert is completely pointless,
> as it is always very close to taking the lock.  This was just intended
> to deal with callers not adhering to the iomap_dio_rw calling
> conventins, and moving the assert to the calllers doesn't help with
> that at all.
> 
> So if you think you need to remove it do just that and write a changelog
> explaining the why much better.

Uh... that's exactly what this patch does, and the commit message says
that btrfs doesn't need to hold i_rwsem to guarantee concurrency
correctness.

Hm, but maybe the commit message is simply too subtle here?  How about:

"Filesystems do not necessarily need i_rwsem to ensure correct operation
with multiple threads, so remove the i_rwsem assertion in iomap_dio_rw.
For example, btrfs can perform directio within i_size without needing to
hold i_rwsem."

--D

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-21 14:42   ` Christoph Hellwig
@ 2020-01-02 18:01     ` Goldwyn Rodrigues
  2020-01-07 17:23       ` Christoph Hellwig
  2020-01-07 11:59     ` Goldwyn Rodrigues
  1 sibling, 1 reply; 37+ messages in thread
From: Goldwyn Rodrigues @ 2020-01-02 18:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-btrfs, darrick.wong, fdmanana, nborisov, dsterba,
	jthumshirn, linux-fsdevel

On  6:42 21/12, Christoph Hellwig wrote:
> So Ilooked into the "unlocked" direct I/O case, and I think the current
> code using dio_sem is really sketchy.  What btrfs really needs to do is
> take i_rwsem shared by default for direct writes, and only upgrade to
> the exclusive lock when needed, similar to xfs and the WIP ext4 code.

Sketchy in what sense? I am not trying to second-guess, but I want to
know where it could fail. I would want it to be simpler as well, but if
we can perform direct writes without locking, why should we introduce
locks.

> 
> While looking for that I also noticed two other things:
> 
>  - check_direct_IO looks pretty bogus
>  - btrfs_direct_IO really should be split and folded into the two
>    callers

Thanks for the cleanups. I will incorporate these.

> 
> Untested patches attached.  The first should probably go into a prep
> patch, and the second could be folded into this one.

> From bc285e440a50140beb456f11e545a049bdf51ec1 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Sat, 21 Dec 2019 15:17:26 +0100
> Subject: btrfs: remove direct I/O aligment checks
> 
> The direct I/O code itself already checks for the proper sector
> size alignment, so remove the duplicate checks.  The remainder of
> check_direct_IO is not ony needed for reads and can be moved to
> file.c and outside of i_rwsem.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/file.c  | 34 +++++++++++++++++++++++++++-------
>  fs/btrfs/inode.c | 37 -------------------------------------
>  2 files changed, 27 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a6d41d7bf362..0522f6d45a98 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -3444,21 +3444,41 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
>  	return generic_file_open(inode, filp);
>  }
>  
> -static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> +/*
> + * If there are duplicate iov_base's in this iovec, fall back to buffered I/O
> + * to avoid checksum errors.
> + */
> +static bool btrfs_direct_read_ok(struct kiocb *iocb, struct iov_iter *iter)
>  {
> -	ssize_t ret = 0;
> +	int seg, i;
>  
> -	if (iocb->ki_flags & IOCB_DIRECT) {
> +	if (!iter_is_iovec(iter))
> +		return true;
> +
> +	for (seg = 0; seg < iter->nr_segs; seg++) {
> +		for (i = seg + 1; i < iter->nr_segs; i++) {
> +			if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
> +				return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
> +
> +static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> +{
> +	if ((iocb->ki_flags & IOCB_DIRECT) && btrfs_direct_read_ok(iocb, to)) {
>  		struct inode *inode = file_inode(iocb->ki_filp);
> +		ssize_t ret;
>  
>  		inode_lock_shared(inode);
>  		ret = btrfs_direct_IO(iocb, to);
>  		inode_unlock_shared(inode);
> -		if (ret < 0)
> -			return ret;
> -	}
>  
> -	return generic_file_buffered_read(iocb, to, ret);
> +		return ret;
> +	}
> +	return generic_file_buffered_read(iocb, to, 0);
>  }
>  
>  const struct file_operations btrfs_file_operations = {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 824f318cee5e..18d153a62655 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8581,39 +8581,6 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
>  	return BLK_QC_T_NONE;
>  }
>  
> -static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
> -			       const struct iov_iter *iter, loff_t offset)
> -{
> -	int seg;
> -	int i;
> -	unsigned int blocksize_mask = fs_info->sectorsize - 1;
> -	ssize_t retval = -EINVAL;
> -
> -	if (offset & blocksize_mask)
> -		goto out;
> -
> -	if (iov_iter_alignment(iter) & blocksize_mask)
> -		goto out;
> -
> -	/* If this is a write we don't need to check anymore */
> -	if (iov_iter_rw(iter) != READ || !iter_is_iovec(iter))
> -		return 0;
> -	/*
> -	 * Check to make sure we don't have duplicate iov_base's in this
> -	 * iovec, if so return EINVAL, otherwise we'll get csum errors
> -	 * when reading back.
> -	 */
> -	for (seg = 0; seg < iter->nr_segs; seg++) {
> -		for (i = seg + 1; i < iter->nr_segs; i++) {
> -			if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
> -				goto out;
> -		}
> -	}
> -	retval = 0;
> -out:
> -	return retval;
> -}
> -
>  static const struct iomap_ops btrfs_dio_iomap_ops = {
>  	.iomap_begin            = btrfs_dio_iomap_begin,
>  	.iomap_end		= btrfs_dio_iomap_end,
> @@ -8635,7 +8602,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file->f_mapping->host;
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	struct extent_changeset *data_reserved = NULL;
>  	loff_t offset = iocb->ki_pos;
>  	size_t count = 0;
> @@ -8644,9 +8610,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  
>  	lockdep_assert_held(&inode->i_rwsem);
>  
> -	if (check_direct_IO(fs_info, iter, offset))
> -		return 0;
> -
>  	count = iov_iter_count(iter);
>  	if (iov_iter_rw(iter) == WRITE) {
>  		/*
> -- 
> 2.24.0
> 

> From 7194fa1986a48af46d2b01457865066cdbd14e35 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Sat, 21 Dec 2019 15:23:41 +0100
> Subject: btrfs: split btrfs_direct_IO
> 
> The read and write versions don't have anything in common except
> for the call to iomap_dio_rw.  So split this function, and merge
> each half into its only caller.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/ctree.h |  4 ++-
>  fs/btrfs/file.c  | 44 +++++++++++++++++++++++++----
>  fs/btrfs/inode.c | 72 ++++--------------------------------------------
>  3 files changed, 48 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 8faa069b0a73..fccbbfebdf88 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -28,6 +28,7 @@
>  #include <linux/dynamic_debug.h>
>  #include <linux/refcount.h>
>  #include <linux/crc32c.h>
> +#include <linux/iomap.h>
>  #include "extent-io-tree.h"
>  #include "extent_io.h"
>  #include "extent_map.h"
> @@ -2904,7 +2905,8 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
>  void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>  					  u64 end, int uptodate);
>  extern const struct dentry_operations btrfs_dentry_operations;
> -ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
> +const struct iomap_ops btrfs_dio_iomap_ops;
> +const struct iomap_dio_ops btrfs_dio_ops;
>  
>  /* ioctl.c */
>  long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0522f6d45a98..ed0b2e015d8d 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1822,17 +1822,50 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  	return num_written ? num_written : ret;
>  }
>  
> -static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
> +static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file_inode(file);
> -	loff_t pos;
> +	size_t count = iov_iter_count(from);
> +	struct extent_changeset *data_reserved = NULL;
> +	loff_t pos = iocb->ki_pos;
>  	ssize_t written;
>  	ssize_t written_buffered;
>  	loff_t endbyte;
> +	bool relock = false;
>  	int err;
>  
> -	written = btrfs_direct_IO(iocb, from);
> +	/*
> +	 * If the write DIO is beyond the EOF, we need update the isize, but
> +	 * it is protected by i_mutex. So we can not unlock the i_mutex in
> +	 * this case.
> +	 */
> +	if (pos + count <= inode->i_size) {
> +		inode_unlock(inode);
> +		relock = true;
> +	} else {
> +		if (iocb->ki_flags & IOCB_NOWAIT)
> +			return -EAGAIN;
> +	}
> +
> +	err = btrfs_delalloc_reserve_space(inode, &data_reserved, pos, count);
> +	if (err) {
> +		if (relock)
> +			inode_lock(inode);
> +		return err;
> +	}
> +
> +	down_read(&BTRFS_I(inode)->dio_sem);
> +	written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
> +			is_sync_kiocb(iocb));
> +	up_read(&BTRFS_I(inode)->dio_sem);
> +	if (written >= 0 && (size_t)written < count)
> +		btrfs_delalloc_release_space(inode, data_reserved,
> +				pos, count - (size_t)written, true);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), count);
> +	if (relock)
> +		inode_lock(inode);
> +	extent_changeset_free(data_reserved);
>  
>  	if (written < 0 || !iov_iter_count(from))
>  		return written;
> @@ -1975,7 +2008,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>  		atomic_inc(&BTRFS_I(inode)->sync_writers);
>  
>  	if (iocb->ki_flags & IOCB_DIRECT) {
> -		num_written = __btrfs_direct_write(iocb, from);
> +		num_written = btrfs_direct_write(iocb, from);
>  	} else {
>  		num_written = btrfs_buffered_write(iocb, from);
>  		if (num_written > 0)
> @@ -3473,7 +3506,8 @@ static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  		ssize_t ret;
>  
>  		inode_lock_shared(inode);
> -		ret = btrfs_direct_IO(iocb, to);
> +		ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops,
> +				   &btrfs_dio_ops, is_sync_kiocb(iocb));
>  		inode_unlock_shared(inode);
>  
>  		return ret;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 18d153a62655..7b747270ec40 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -29,7 +29,6 @@
>  #include <linux/iversion.h>
>  #include <linux/swap.h>
>  #include <linux/sched/mm.h>
> -#include <linux/iomap.h>
>  #include <asm/unaligned.h>
>  #include "misc.h"
>  #include "ctree.h"
> @@ -7856,6 +7855,11 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
>  	return 0;
>  }
>  
> +const struct iomap_ops btrfs_dio_iomap_ops = {
> +	.iomap_begin            = btrfs_dio_iomap_begin,
> +	.iomap_end		= btrfs_dio_iomap_end,
> +};
> +
>  static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
>  						 struct bio *bio,
>  						 int mirror_num)
> @@ -8581,74 +8585,10 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
>  	return BLK_QC_T_NONE;
>  }
>  
> -static const struct iomap_ops btrfs_dio_iomap_ops = {
> -	.iomap_begin            = btrfs_dio_iomap_begin,
> -	.iomap_end		= btrfs_dio_iomap_end,
> -};
> -
> -static const struct iomap_dio_ops btrfs_dops = {
> +const struct iomap_dio_ops btrfs_dio_ops = {
>  	.submit_io		= btrfs_submit_direct,
>  };
>  
> -
> -/*
> - * btrfs_direct_IO - perform direct I/O
> - * inode->i_rwsem must be locked before calling this function, shared or exclusive.
> - * @iocb - kernel iocb
> - * @iter - iter to/from data is copied
> - */
> -
> -ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> -{
> -	struct file *file = iocb->ki_filp;
> -	struct inode *inode = file->f_mapping->host;
> -	struct extent_changeset *data_reserved = NULL;
> -	loff_t offset = iocb->ki_pos;
> -	size_t count = 0;
> -	bool relock = false;
> -	ssize_t ret;
> -
> -	lockdep_assert_held(&inode->i_rwsem);
> -
> -	count = iov_iter_count(iter);
> -	if (iov_iter_rw(iter) == WRITE) {
> -		/*
> -		 * If the write DIO is beyond the EOF, we need update
> -		 * the isize, but it is protected by i_mutex. So we can
> -		 * not unlock the i_mutex at this case.
> -		 */
> -		if (offset + count <= inode->i_size) {
> -			inode_unlock(inode);
> -			relock = true;
> -		} else if (iocb->ki_flags & IOCB_NOWAIT) {
> -			ret = -EAGAIN;
> -			goto out;
> -		}
> -		ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
> -						   offset, count);
> -		if (ret)
> -			goto out;
> -
> -		down_read(&BTRFS_I(inode)->dio_sem);
> -	}
> -
> -	ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
> -			is_sync_kiocb(iocb));
> -
> -	if (iov_iter_rw(iter) == WRITE) {
> -		up_read(&BTRFS_I(inode)->dio_sem);
> -		if (ret >= 0 && (size_t)ret < count)
> -			btrfs_delalloc_release_space(inode, data_reserved,
> -					offset, count - (size_t)ret, true);
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
> -	}
> -out:
> -	if (relock)
> -		inode_lock(inode);
> -	extent_changeset_free(data_reserved);
> -	return ret;
> -}
> -
>  #define BTRFS_FIEMAP_FLAGS	(FIEMAP_FLAG_SYNC)
>  
>  static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> -- 
> 2.24.0
> 


-- 
Goldwyn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-21 14:42   ` Christoph Hellwig
  2020-01-02 18:01     ` Goldwyn Rodrigues
@ 2020-01-07 11:59     ` Goldwyn Rodrigues
  2020-01-07 17:21       ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Goldwyn Rodrigues @ 2020-01-07 11:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-btrfs, darrick.wong, fdmanana, nborisov, dsterba,
	jthumshirn, linux-fsdevel

On  6:42 21/12, Christoph Hellwig wrote:
> So Ilooked into the "unlocked" direct I/O case, and I think the current
> code using dio_sem is really sketchy.  What btrfs really needs to do is
> take i_rwsem shared by default for direct writes, and only upgrade to
> the exclusive lock when needed, similar to xfs and the WIP ext4 code.
> 
> While looking for that I also noticed two other things:
> 
>  - check_direct_IO looks pretty bogus
>  - btrfs_direct_IO really should be split and folded into the two
>    callers
> 
> Untested patches attached.  The first should probably go into a prep
> patch, and the second could be folded into this one.

Testing revealed that removing check_direct_IO will not work. We try and
reserve space as a whole for the entire direct write. These checks
safeguard from requests unaligned to fs_info->sectorsize.

I liked the patch to split and fold the direct_IO code. However to merge
it into this will make it difficult to understand the changes since we
are moving it to a different file rather than changing in-place. A
separate patch would better serve as a cleanup.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2020-01-07 11:59     ` Goldwyn Rodrigues
@ 2020-01-07 17:21       ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2020-01-07 17:21 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: Christoph Hellwig, linux-btrfs, darrick.wong, fdmanana, nborisov,
	dsterba, jthumshirn, linux-fsdevel

On Tue, Jan 07, 2020 at 05:59:09AM -0600, Goldwyn Rodrigues wrote:
> Testing revealed that removing check_direct_IO will not work. We try and
> reserve space as a whole for the entire direct write. These checks
> safeguard from requests unaligned to fs_info->sectorsize.

Ok.  The fact that a wrong sector size falls back to buffered I/O
instead of failing the I/O is still bogus, though.  Btrfs should align
with all other file systems there.

> 
> I liked the patch to split and fold the direct_IO code. However to merge
> it into this will make it difficult to understand the changes since we
> are moving it to a different file rather than changing in-place. A
> separate patch would better serve as a cleanup.

Sure, this can be added on top.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2020-01-02 18:01     ` Goldwyn Rodrigues
@ 2020-01-07 17:23       ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2020-01-07 17:23 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: Christoph Hellwig, linux-btrfs, darrick.wong, fdmanana, nborisov,
	dsterba, jthumshirn, linux-fsdevel

On Thu, Jan 02, 2020 at 12:01:27PM -0600, Goldwyn Rodrigues wrote:
> On  6:42 21/12, Christoph Hellwig wrote:
> > So Ilooked into the "unlocked" direct I/O case, and I think the current
> > code using dio_sem is really sketchy.  What btrfs really needs to do is
> > take i_rwsem shared by default for direct writes, and only upgrade to
> > the exclusive lock when needed, similar to xfs and the WIP ext4 code.
> 
> Sketchy in what sense? I am not trying to second-guess, but I want to
> know where it could fail. I would want it to be simpler as well, but if
> we can perform direct writes without locking, why should we introduce
> locks.

In that it needs yet another lock which doesn't really provide
exclusion guarantees in its own.  In many ways this lock plus the
historic i_mutex were abused to provide the shared/exclusiv lock
that now exists natively with i_rwsem.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-10 23:01 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
  2019-12-11  8:58   ` Filipe Manana
@ 2019-12-11 10:43   ` Nikolay Borisov
  1 sibling, 0 replies; 37+ messages in thread
From: Nikolay Borisov @ 2019-12-11 10:43 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs
  Cc: hch, darrick.wong, fdmanana, dsterba, jthumshirn, linux-fsdevel,
	Goldwyn Rodrigues



On 11.12.19 г. 1:01 ч., Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Switch from __blockdev_direct_IO() to iomap_dio_rw().
> Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it
> as iomap_begin() for iomap direct I/O functions. This function
> allocates and locks all the blocks required for the I/O.
> btrfs_submit_direct() is used as the submit_io() hook for direct I/O
> ops.
> 
> Since we need direct I/O reads to go through iomap_dio_rw(), we change
> file_operations.read_iter() to a btrfs_file_read_iter() which calls
> btrfs_direct_IO() for direct reads and falls back to
> generic_file_buffered_read() for incomplete reads and buffered reads.
> 
> We don't need address_space.direct_IO() anymore so set it to noop.
> Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is
> capable of direct I/O reads from a hole, so we don't need to return
> -ENOENT.
> 
> BTRFS direct I/O is now done under i_rwsem, shared in case of
> reads and exclusive in case of writes. This guards against simultaneous
> truncates.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/ctree.h |   1 +
>  fs/btrfs/file.c  |  21 ++++++-
>  fs/btrfs/inode.c | 184 ++++++++++++++++++++++++++-----------------------------
>  3 files changed, 107 insertions(+), 99 deletions(-)
> 

<snip>

> +
> +/*
> + * btrfs_direct_IO - perform direct I/O
> + * inode->i_rwsem must be locked before calling this function, shared or exclusive.

Instead of having this as a comment which cannot easily be checked
automatically I'd rather it gets codified via lockdep_assert_held. I
believe this is in line with Dave's comment comment on patch 3.

> + * @iocb - kernel iocb
> + * @iter - iter to/from data is copied
> + */
> +
> +ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file->f_mapping->host;
> @@ -8662,28 +8676,13 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  	struct extent_changeset *data_reserved = NULL;
>  	loff_t offset = iocb->ki_pos;
>  	size_t count = 0;
> -	int flags = 0;
> -	bool wakeup = true;
>  	bool relock = false;
>  	ssize_t ret;
>  
>  	if (check_direct_IO(fs_info, iter, offset))
>  		return 0;
>  
> -	inode_dio_begin(inode);
> -
> -	/*
> -	 * The generic stuff only does filemap_write_and_wait_range, which
> -	 * isn't enough if we've written compressed pages to this area, so
> -	 * we need to flush the dirty pages again to make absolutely sure
> -	 * that any outstanding dirty pages are on disk.
> -	 */
>  	count = iov_iter_count(iter);
> -	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
> -		     &BTRFS_I(inode)->runtime_flags))
> -		filemap_fdatawrite_range(inode->i_mapping, offset,
> -					 offset + count - 1);
> -
>  	if (iov_iter_rw(iter) == WRITE) {
>  		/*
>  		 * If the write DIO is beyond the EOF, we need update
> @@ -8714,17 +8713,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  		dio_data.unsubmitted_oe_range_end = (u64)offset;
>  		current->journal_info = &dio_data;
>  		down_read(&BTRFS_I(inode)->dio_sem);
> -	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
> -				     &BTRFS_I(inode)->runtime_flags)) {
> -		inode_dio_end(inode);
> -		flags = DIO_LOCKING | DIO_SKIP_HOLES;
> -		wakeup = false;
>  	}
>  
> -	ret = __blockdev_direct_IO(iocb, inode,
> -				   fs_info->fs_devices->latest_bdev,
> -				   iter, btrfs_get_blocks_direct, NULL,
> -				   btrfs_submit_direct, flags);
> +	ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
> +			is_sync_kiocb(iocb));
> +
>  	if (iov_iter_rw(iter) == WRITE) {
>  		up_read(&BTRFS_I(inode)->dio_sem);
>  		current->journal_info = NULL;
> @@ -8751,11 +8744,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
>  	}
>  out:
> -	if (wakeup)
> -		inode_dio_end(inode);
>  	if (relock)
>  		inode_lock(inode);
> -
>  	extent_changeset_free(data_reserved);
>  	return ret;
>  }
> @@ -11045,7 +11035,7 @@ static const struct address_space_operations btrfs_aops = {
>  	.writepage	= btrfs_writepage,
>  	.writepages	= btrfs_writepages,
>  	.readpages	= btrfs_readpages,
> -	.direct_IO	= btrfs_direct_IO,
> +	.direct_IO	= noop_direct_IO,
>  	.invalidatepage = btrfs_invalidatepage,
>  	.releasepage	= btrfs_releasepage,
>  	.set_page_dirty	= btrfs_set_page_dirty,
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-10 23:01 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
@ 2019-12-11  8:58   ` Filipe Manana
  2019-12-11 10:43   ` Nikolay Borisov
  1 sibling, 0 replies; 37+ messages in thread
From: Filipe Manana @ 2019-12-11  8:58 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, Christoph Hellwig, Darrick J. Wong, Nikolay Borisov,
	dsterba, Johannes Thumshirn, linux-fsdevel, Goldwyn Rodrigues

On Tue, Dec 10, 2019 at 11:02 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> Switch from __blockdev_direct_IO() to iomap_dio_rw().
> Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it
> as iomap_begin() for iomap direct I/O functions. This function
> allocates and locks all the blocks required for the I/O.
> btrfs_submit_direct() is used as the submit_io() hook for direct I/O
> ops.
>
> Since we need direct I/O reads to go through iomap_dio_rw(), we change
> file_operations.read_iter() to a btrfs_file_read_iter() which calls
> btrfs_direct_IO() for direct reads and falls back to
> generic_file_buffered_read() for incomplete reads and buffered reads.
>
> We don't need address_space.direct_IO() anymore so set it to noop.
> Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is
> capable of direct I/O reads from a hole, so we don't need to return
> -ENOENT.
>
> BTRFS direct I/O is now done under i_rwsem, shared in case of
> reads and exclusive in case of writes. This guards against simultaneous
> truncates.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/ctree.h |   1 +
>  fs/btrfs/file.c  |  21 ++++++-
>  fs/btrfs/inode.c | 184 ++++++++++++++++++++++++++-----------------------------
>  3 files changed, 107 insertions(+), 99 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b2e8fd8a8e59..113dcd1a11cd 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2904,6 +2904,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
>  void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>                                           u64 end, int uptodate);
>  extern const struct dentry_operations btrfs_dentry_operations;
> +ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
>
>  /* ioctl.c */
>  long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0cb43b682789..7010dd7beccc 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1832,7 +1832,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>         loff_t endbyte;
>         int err;
>
> -       written = generic_file_direct_write(iocb, from);
> +       written = btrfs_direct_IO(iocb, from);
>
>         if (written < 0 || !iov_iter_count(from))
>                 return written;
> @@ -3444,9 +3444,26 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
>         return generic_file_open(inode, filp);
>  }
>
> +static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> +{
> +       ssize_t ret = 0;
> +
> +       if (iocb->ki_flags & IOCB_DIRECT) {
> +               struct inode *inode = file_inode(iocb->ki_filp);
> +
> +               inode_lock_shared(inode);
> +               ret = btrfs_direct_IO(iocb, to);
> +               inode_unlock_shared(inode);
> +               if (ret < 0)
> +                       return ret;
> +       }
> +
> +       return generic_file_buffered_read(iocb, to, ret);
> +}
> +
>  const struct file_operations btrfs_file_operations = {
>         .llseek         = btrfs_file_llseek,
> -       .read_iter      = generic_file_read_iter,
> +       .read_iter      = btrfs_file_read_iter,
>         .splice_read    = generic_file_splice_read,
>         .write_iter     = btrfs_file_write_iter,
>         .mmap           = btrfs_file_mmap,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 56032c518b26..86730f2f2bf6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -29,6 +29,7 @@
>  #include <linux/iversion.h>
>  #include <linux/swap.h>
>  #include <linux/sched/mm.h>
> +#include <linux/iomap.h>
>  #include <asm/unaligned.h>
>  #include "misc.h"
>  #include "ctree.h"
> @@ -7510,7 +7511,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>  }
>
>  static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
> -                             struct extent_state **cached_state, int writing)
> +                             struct extent_state **cached_state, bool writing)
>  {
>         struct btrfs_ordered_extent *ordered;
>         int ret = 0;
> @@ -7648,30 +7649,7 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
>  }
>
>
> -static int btrfs_get_blocks_direct_read(struct extent_map *em,
> -                                       struct buffer_head *bh_result,
> -                                       struct inode *inode,
> -                                       u64 start, u64 len)
> -{
> -       struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -
> -       if (em->block_start == EXTENT_MAP_HOLE ||
> -                       test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
> -               return -ENOENT;
> -
> -       len = min(len, em->len - (start - em->start));
> -
> -       bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
> -               inode->i_blkbits;
> -       bh_result->b_size = len;
> -       bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
> -       set_buffer_mapped(bh_result);
> -
> -       return 0;
> -}
> -
>  static int btrfs_get_blocks_direct_write(struct extent_map **map,
> -                                        struct buffer_head *bh_result,
>                                          struct inode *inode,
>                                          struct btrfs_dio_data *dio_data,
>                                          u64 start, u64 len)
> @@ -7733,7 +7711,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>         }
>
>         /* this will cow the extent */
> -       len = bh_result->b_size;
>         free_extent_map(em);
>         *map = em = btrfs_new_extent_direct(inode, start, len);
>         if (IS_ERR(em)) {
> @@ -7744,15 +7721,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>         len = min(len, em->len - (start - em->start));
>
>  skip_cow:
> -       bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
> -               inode->i_blkbits;
> -       bh_result->b_size = len;
> -       bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
> -       set_buffer_mapped(bh_result);
> -
> -       if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
> -               set_buffer_new(bh_result);
> -
>         /*
>          * Need to update the i_size under the extent lock so buffered
>          * readers will get the updated i_size when we unlock.
> @@ -7768,24 +7736,37 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>         return ret;
>  }
>
> -static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
> -                                  struct buffer_head *bh_result, int create)
> +static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
> +               loff_t length, unsigned flags, struct iomap *iomap,
> +               struct iomap *srcmap)
>  {
>         struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>         struct extent_map *em;
>         struct extent_state *cached_state = NULL;
>         struct btrfs_dio_data *dio_data = NULL;
> -       u64 start = iblock << inode->i_blkbits;
>         u64 lockstart, lockend;
> -       u64 len = bh_result->b_size;
> +       bool write = !!(flags & IOMAP_WRITE);
>         int ret = 0;
> +       u64 len = length;
> +       bool unlock_extents = false;
>
> -       if (!create)
> +       if (!write)
>                 len = min_t(u64, len, fs_info->sectorsize);
>
>         lockstart = start;
>         lockend = start + len - 1;
>
> +       /*
> +        * The generic stuff only does filemap_write_and_wait_range, which
> +        * isn't enough if we've written compressed pages to this area, so
> +        * we need to flush the dirty pages again to make absolutely sure
> +        * that any outstanding dirty pages are on disk.
> +        */
> +       if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
> +                    &BTRFS_I(inode)->runtime_flags))
> +               ret = filemap_fdatawrite_range(inode->i_mapping, start,
> +                                        start + length - 1);
> +
>         if (current->journal_info) {
>                 /*
>                  * Need to pull our outstanding extents and set journal_info to NULL so
> @@ -7801,7 +7782,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
>          * this range and we need to fallback to buffered.
>          */
>         if (lock_extent_direct(inode, lockstart, lockend, &cached_state,
> -                              create)) {
> +                              write)) {
>                 ret = -ENOTBLK;
>                 goto err;
>         }
> @@ -7833,36 +7814,53 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
>                 goto unlock_err;
>         }
>
> -       if (create) {
> -               ret = btrfs_get_blocks_direct_write(&em, bh_result, inode,
> +       len = min(len, em->len - (start - em->start));
> +       if (write) {
> +               ret = btrfs_get_blocks_direct_write(&em, inode,
>                                                     dio_data, start, len);
>                 if (ret < 0)
>                         goto unlock_err;
> -
> -               unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart,
> -                                    lockend, &cached_state);
> +               unlock_extents = true;
> +               /* Recalc len in case the new em is smaller than requested */
> +               len = min(len, em->len - (start - em->start));
> +       } else if (em->block_start == EXTENT_MAP_HOLE ||
> +                       test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
> +               /* Unlock in case of direct reading from a hole */
> +               unlock_extents = true;
>         } else {
> -               ret = btrfs_get_blocks_direct_read(em, bh_result, inode,
> -                                                  start, len);
> -               /* Can be negative only if we read from a hole */
> -               if (ret < 0) {
> -                       ret = 0;
> -                       free_extent_map(em);
> -                       goto unlock_err;
> -               }
>                 /*
>                  * We need to unlock only the end area that we aren't using.
>                  * The rest is going to be unlocked by the endio routine.
>                  */
> -               lockstart = start + bh_result->b_size;
> -               if (lockstart < lockend) {
> -                       unlock_extent_cached(&BTRFS_I(inode)->io_tree,
> -                                            lockstart, lockend, &cached_state);
> -               } else {
> -                       free_extent_state(cached_state);
> -               }
> +               lockstart = start + len;
> +               if (lockstart < lockend)
> +                       unlock_extents = true;
>         }
>
> +       if (unlock_extents)
> +               unlock_extent_cached(&BTRFS_I(inode)->io_tree,
> +                               lockstart, lockend, &cached_state);
> +       else
> +               free_extent_state(cached_state);
> +
> +       /*
> +        * Translate extent map information to iomap
> +        * We trim the extents (and move the addr) even though
> +        * iomap code does that, since we have locked only the parts
> +        * we are performing I/O in.
> +        */
> +       if ((em->block_start == EXTENT_MAP_HOLE) ||
> +           (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) && !write)) {
> +               iomap->addr = IOMAP_NULL_ADDR;
> +               iomap->type = IOMAP_HOLE;
> +       } else {
> +               iomap->addr = em->block_start + (start - em->start);
> +               iomap->type = IOMAP_MAPPED;
> +       }
> +       iomap->offset = start;
> +       iomap->bdev = fs_info->fs_devices->latest_bdev;
> +       iomap->length = len;
> +
>         free_extent_map(em);
>
>         return 0;
> @@ -8230,9 +8228,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
>         kfree(dip);
>
>         dio_bio->bi_status = err;
> -       dio_end_io(dio_bio);
> +       bio_endio(dio_bio);
>         btrfs_io_bio_free_csum(io_bio);
> -       bio_put(bio);
>  }
>
>  static void __endio_write_update_ordered(struct inode *inode,
> @@ -8289,8 +8286,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
>         kfree(dip);
>
>         dio_bio->bi_status = bio->bi_status;
> -       dio_end_io(dio_bio);
> -       bio_put(bio);
> +       bio_endio(dio_bio);
>  }
>
>  static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
> @@ -8522,9 +8518,10 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
>         return 0;
>  }
>
> -static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
> +static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
>                                 loff_t file_offset)
>  {
> +       struct inode *inode = file_inode(file);
>         struct btrfs_dio_private *dip = NULL;
>         struct bio *bio = NULL;
>         struct btrfs_io_bio *io_bio;
> @@ -8575,7 +8572,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
>
>         ret = btrfs_submit_direct_hook(dip);
>         if (!ret)
> -               return;
> +               return BLK_QC_T_NONE;
>
>         btrfs_io_bio_free_csum(io_bio);
>
> @@ -8594,7 +8591,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
>                 /*
>                  * The end io callbacks free our dip, do the final put on bio
>                  * and all the cleanup and final put for dio_bio (through
> -                * dio_end_io()).
> +                * end_io()).
>                  */
>                 dip = NULL;
>                 bio = NULL;
> @@ -8613,11 +8610,12 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
>                  * Releases and cleans up our dio_bio, no need to bio_put()
>                  * nor bio_endio()/bio_io_error() against dio_bio.
>                  */
> -               dio_end_io(dio_bio);
> +               bio_endio(dio_bio);

Same as before, the comment is now stale. It mentions no need to call
bio_endio(), yet the change makes it call bio_endio().

thanks

>         }
>         if (bio)
>                 bio_put(bio);
>         kfree(dip);
> +       return BLK_QC_T_NONE;
>  }
>
>  static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
> @@ -8653,7 +8651,23 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
>         return retval;
>  }
>
> -static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> +static const struct iomap_ops btrfs_dio_iomap_ops = {
> +       .iomap_begin            = btrfs_dio_iomap_begin,
> +};
> +
> +static const struct iomap_dio_ops btrfs_dops = {
> +       .submit_io              = btrfs_submit_direct,
> +};
> +
> +
> +/*
> + * btrfs_direct_IO - perform direct I/O
> + * inode->i_rwsem must be locked before calling this function, shared or exclusive.
> + * @iocb - kernel iocb
> + * @iter - iter to/from data is copied
> + */
> +
> +ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  {
>         struct file *file = iocb->ki_filp;
>         struct inode *inode = file->f_mapping->host;
> @@ -8662,28 +8676,13 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>         struct extent_changeset *data_reserved = NULL;
>         loff_t offset = iocb->ki_pos;
>         size_t count = 0;
> -       int flags = 0;
> -       bool wakeup = true;
>         bool relock = false;
>         ssize_t ret;
>
>         if (check_direct_IO(fs_info, iter, offset))
>                 return 0;
>
> -       inode_dio_begin(inode);
> -
> -       /*
> -        * The generic stuff only does filemap_write_and_wait_range, which
> -        * isn't enough if we've written compressed pages to this area, so
> -        * we need to flush the dirty pages again to make absolutely sure
> -        * that any outstanding dirty pages are on disk.
> -        */
>         count = iov_iter_count(iter);
> -       if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
> -                    &BTRFS_I(inode)->runtime_flags))
> -               filemap_fdatawrite_range(inode->i_mapping, offset,
> -                                        offset + count - 1);
> -
>         if (iov_iter_rw(iter) == WRITE) {
>                 /*
>                  * If the write DIO is beyond the EOF, we need update
> @@ -8714,17 +8713,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>                 dio_data.unsubmitted_oe_range_end = (u64)offset;
>                 current->journal_info = &dio_data;
>                 down_read(&BTRFS_I(inode)->dio_sem);
> -       } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
> -                                    &BTRFS_I(inode)->runtime_flags)) {
> -               inode_dio_end(inode);
> -               flags = DIO_LOCKING | DIO_SKIP_HOLES;
> -               wakeup = false;
>         }
>
> -       ret = __blockdev_direct_IO(iocb, inode,
> -                                  fs_info->fs_devices->latest_bdev,
> -                                  iter, btrfs_get_blocks_direct, NULL,
> -                                  btrfs_submit_direct, flags);
> +       ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
> +                       is_sync_kiocb(iocb));
> +
>         if (iov_iter_rw(iter) == WRITE) {
>                 up_read(&BTRFS_I(inode)->dio_sem);
>                 current->journal_info = NULL;
> @@ -8751,11 +8744,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>                 btrfs_delalloc_release_extents(BTRFS_I(inode), count);
>         }
>  out:
> -       if (wakeup)
> -               inode_dio_end(inode);
>         if (relock)
>                 inode_lock(inode);
> -
>         extent_changeset_free(data_reserved);
>         return ret;
>  }
> @@ -11045,7 +11035,7 @@ static const struct address_space_operations btrfs_aops = {
>         .writepage      = btrfs_writepage,
>         .writepages     = btrfs_writepages,
>         .readpages      = btrfs_readpages,
> -       .direct_IO      = btrfs_direct_IO,
> +       .direct_IO      = noop_direct_IO,
>         .invalidatepage = btrfs_invalidatepage,
>         .releasepage    = btrfs_releasepage,
>         .set_page_dirty = btrfs_set_page_dirty,
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-10 23:01 [PATCH 0/8 v4] " Goldwyn Rodrigues
@ 2019-12-10 23:01 ` Goldwyn Rodrigues
  2019-12-11  8:58   ` Filipe Manana
  2019-12-11 10:43   ` Nikolay Borisov
  0 siblings, 2 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-10 23:01 UTC (permalink / raw)
  To: linux-btrfs
  Cc: hch, darrick.wong, fdmanana, nborisov, dsterba, jthumshirn,
	linux-fsdevel, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Switch from __blockdev_direct_IO() to iomap_dio_rw().
Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it
as iomap_begin() for iomap direct I/O functions. This function
allocates and locks all the blocks required for the I/O.
btrfs_submit_direct() is used as the submit_io() hook for direct I/O
ops.

Since we need direct I/O reads to go through iomap_dio_rw(), we change
file_operations.read_iter() to a btrfs_file_read_iter() which calls
btrfs_direct_IO() for direct reads and falls back to
generic_file_buffered_read() for incomplete reads and buffered reads.

We don't need address_space.direct_IO() anymore so set it to noop.
Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is
capable of direct I/O reads from a hole, so we don't need to return
-ENOENT.

BTRFS direct I/O is now done under i_rwsem, shared in case of
reads and exclusive in case of writes. This guards against simultaneous
truncates.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h |   1 +
 fs/btrfs/file.c  |  21 ++++++-
 fs/btrfs/inode.c | 184 ++++++++++++++++++++++++++-----------------------------
 3 files changed, 107 insertions(+), 99 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b2e8fd8a8e59..113dcd1a11cd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2904,6 +2904,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					  u64 end, int uptodate);
 extern const struct dentry_operations btrfs_dentry_operations;
+ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0cb43b682789..7010dd7beccc 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1832,7 +1832,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	loff_t endbyte;
 	int err;
 
-	written = generic_file_direct_write(iocb, from);
+	written = btrfs_direct_IO(iocb, from);
 
 	if (written < 0 || !iov_iter_count(from))
 		return written;
@@ -3444,9 +3444,26 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 	return generic_file_open(inode, filp);
 }
 
+static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	ssize_t ret = 0;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		struct inode *inode = file_inode(iocb->ki_filp);
+
+		inode_lock_shared(inode);
+		ret = btrfs_direct_IO(iocb, to);
+		inode_unlock_shared(inode);
+		if (ret < 0)
+			return ret;
+	}
+
+	return generic_file_buffered_read(iocb, to, ret);
+}
+
 const struct file_operations btrfs_file_operations = {
 	.llseek		= btrfs_file_llseek,
-	.read_iter      = generic_file_read_iter,
+	.read_iter      = btrfs_file_read_iter,
 	.splice_read	= generic_file_splice_read,
 	.write_iter	= btrfs_file_write_iter,
 	.mmap		= btrfs_file_mmap,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 56032c518b26..86730f2f2bf6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -29,6 +29,7 @@
 #include <linux/iversion.h>
 #include <linux/swap.h>
 #include <linux/sched/mm.h>
+#include <linux/iomap.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -7510,7 +7511,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 }
 
 static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
-			      struct extent_state **cached_state, int writing)
+			      struct extent_state **cached_state, bool writing)
 {
 	struct btrfs_ordered_extent *ordered;
 	int ret = 0;
@@ -7648,30 +7649,7 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
 }
 
 
-static int btrfs_get_blocks_direct_read(struct extent_map *em,
-					struct buffer_head *bh_result,
-					struct inode *inode,
-					u64 start, u64 len)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-
-	if (em->block_start == EXTENT_MAP_HOLE ||
-			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-		return -ENOENT;
-
-	len = min(len, em->len - (start - em->start));
-
-	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
-		inode->i_blkbits;
-	bh_result->b_size = len;
-	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
-	set_buffer_mapped(bh_result);
-
-	return 0;
-}
-
 static int btrfs_get_blocks_direct_write(struct extent_map **map,
-					 struct buffer_head *bh_result,
 					 struct inode *inode,
 					 struct btrfs_dio_data *dio_data,
 					 u64 start, u64 len)
@@ -7733,7 +7711,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	}
 
 	/* this will cow the extent */
-	len = bh_result->b_size;
 	free_extent_map(em);
 	*map = em = btrfs_new_extent_direct(inode, start, len);
 	if (IS_ERR(em)) {
@@ -7744,15 +7721,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	len = min(len, em->len - (start - em->start));
 
 skip_cow:
-	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
-		inode->i_blkbits;
-	bh_result->b_size = len;
-	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
-	set_buffer_mapped(bh_result);
-
-	if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-		set_buffer_new(bh_result);
-
 	/*
 	 * Need to update the i_size under the extent lock so buffered
 	 * readers will get the updated i_size when we unlock.
@@ -7768,24 +7736,37 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	return ret;
 }
 
-static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
-				   struct buffer_head *bh_result, int create)
+static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
+		loff_t length, unsigned flags, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
 	struct btrfs_dio_data *dio_data = NULL;
-	u64 start = iblock << inode->i_blkbits;
 	u64 lockstart, lockend;
-	u64 len = bh_result->b_size;
+	bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
+	u64 len = length;
+	bool unlock_extents = false;
 
-	if (!create)
+	if (!write)
 		len = min_t(u64, len, fs_info->sectorsize);
 
 	lockstart = start;
 	lockend = start + len - 1;
 
+	/*
+	 * The generic stuff only does filemap_write_and_wait_range, which
+	 * isn't enough if we've written compressed pages to this area, so
+	 * we need to flush the dirty pages again to make absolutely sure
+	 * that any outstanding dirty pages are on disk.
+	 */
+	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+		     &BTRFS_I(inode)->runtime_flags))
+		ret = filemap_fdatawrite_range(inode->i_mapping, start,
+					 start + length - 1);
+
 	if (current->journal_info) {
 		/*
 		 * Need to pull our outstanding extents and set journal_info to NULL so
@@ -7801,7 +7782,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 	 * this range and we need to fallback to buffered.
 	 */
 	if (lock_extent_direct(inode, lockstart, lockend, &cached_state,
-			       create)) {
+			       write)) {
 		ret = -ENOTBLK;
 		goto err;
 	}
@@ -7833,36 +7814,53 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 		goto unlock_err;
 	}
 
-	if (create) {
-		ret = btrfs_get_blocks_direct_write(&em, bh_result, inode,
+	len = min(len, em->len - (start - em->start));
+	if (write) {
+		ret = btrfs_get_blocks_direct_write(&em, inode,
 						    dio_data, start, len);
 		if (ret < 0)
 			goto unlock_err;
-
-		unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart,
-				     lockend, &cached_state);
+		unlock_extents = true;
+		/* Recalc len in case the new em is smaller than requested */
+		len = min(len, em->len - (start - em->start));
+	} else if (em->block_start == EXTENT_MAP_HOLE ||
+			test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
+		/* Unlock in case of direct reading from a hole */
+		unlock_extents = true;
 	} else {
-		ret = btrfs_get_blocks_direct_read(em, bh_result, inode,
-						   start, len);
-		/* Can be negative only if we read from a hole */
-		if (ret < 0) {
-			ret = 0;
-			free_extent_map(em);
-			goto unlock_err;
-		}
 		/*
 		 * We need to unlock only the end area that we aren't using.
 		 * The rest is going to be unlocked by the endio routine.
 		 */
-		lockstart = start + bh_result->b_size;
-		if (lockstart < lockend) {
-			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     lockstart, lockend, &cached_state);
-		} else {
-			free_extent_state(cached_state);
-		}
+		lockstart = start + len;
+		if (lockstart < lockend)
+			unlock_extents = true;
 	}
 
+	if (unlock_extents)
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				lockstart, lockend, &cached_state);
+	else
+		free_extent_state(cached_state);
+
+	/*
+	 * Translate extent map information to iomap
+	 * We trim the extents (and move the addr) even though
+	 * iomap code does that, since we have locked only the parts
+	 * we are performing I/O in.
+	 */
+	if ((em->block_start == EXTENT_MAP_HOLE) ||
+	    (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) && !write)) {
+		iomap->addr = IOMAP_NULL_ADDR;
+		iomap->type = IOMAP_HOLE;
+	} else {
+		iomap->addr = em->block_start + (start - em->start);
+		iomap->type = IOMAP_MAPPED;
+	}
+	iomap->offset = start;
+	iomap->bdev = fs_info->fs_devices->latest_bdev;
+	iomap->length = len;
+
 	free_extent_map(em);
 
 	return 0;
@@ -8230,9 +8228,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
 	kfree(dip);
 
 	dio_bio->bi_status = err;
-	dio_end_io(dio_bio);
+	bio_endio(dio_bio);
 	btrfs_io_bio_free_csum(io_bio);
-	bio_put(bio);
 }
 
 static void __endio_write_update_ordered(struct inode *inode,
@@ -8289,8 +8286,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
 	kfree(dip);
 
 	dio_bio->bi_status = bio->bi_status;
-	dio_end_io(dio_bio);
-	bio_put(bio);
+	bio_endio(dio_bio);
 }
 
 static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
@@ -8522,9 +8518,10 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 	return 0;
 }
 
-static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
+static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
 				loff_t file_offset)
 {
+	struct inode *inode = file_inode(file);
 	struct btrfs_dio_private *dip = NULL;
 	struct bio *bio = NULL;
 	struct btrfs_io_bio *io_bio;
@@ -8575,7 +8572,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 
 	ret = btrfs_submit_direct_hook(dip);
 	if (!ret)
-		return;
+		return BLK_QC_T_NONE;
 
 	btrfs_io_bio_free_csum(io_bio);
 
@@ -8594,7 +8591,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		/*
 		 * The end io callbacks free our dip, do the final put on bio
 		 * and all the cleanup and final put for dio_bio (through
-		 * dio_end_io()).
+		 * end_io()).
 		 */
 		dip = NULL;
 		bio = NULL;
@@ -8613,11 +8610,12 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		 * Releases and cleans up our dio_bio, no need to bio_put()
 		 * nor bio_endio()/bio_io_error() against dio_bio.
 		 */
-		dio_end_io(dio_bio);
+		bio_endio(dio_bio);
 	}
 	if (bio)
 		bio_put(bio);
 	kfree(dip);
+	return BLK_QC_T_NONE;
 }
 
 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
@@ -8653,7 +8651,23 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 	return retval;
 }
 
-static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+static const struct iomap_ops btrfs_dio_iomap_ops = {
+	.iomap_begin            = btrfs_dio_iomap_begin,
+};
+
+static const struct iomap_dio_ops btrfs_dops = {
+	.submit_io		= btrfs_submit_direct,
+};
+
+
+/*
+ * btrfs_direct_IO - perform direct I/O
+ * inode->i_rwsem must be locked before calling this function, shared or exclusive.
+ * @iocb - kernel iocb
+ * @iter - iter to/from data is copied
+ */
+
+ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
@@ -8662,28 +8676,13 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct extent_changeset *data_reserved = NULL;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
-	int flags = 0;
-	bool wakeup = true;
 	bool relock = false;
 	ssize_t ret;
 
 	if (check_direct_IO(fs_info, iter, offset))
 		return 0;
 
-	inode_dio_begin(inode);
-
-	/*
-	 * The generic stuff only does filemap_write_and_wait_range, which
-	 * isn't enough if we've written compressed pages to this area, so
-	 * we need to flush the dirty pages again to make absolutely sure
-	 * that any outstanding dirty pages are on disk.
-	 */
 	count = iov_iter_count(iter);
-	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
-		     &BTRFS_I(inode)->runtime_flags))
-		filemap_fdatawrite_range(inode->i_mapping, offset,
-					 offset + count - 1);
-
 	if (iov_iter_rw(iter) == WRITE) {
 		/*
 		 * If the write DIO is beyond the EOF, we need update
@@ -8714,17 +8713,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		dio_data.unsubmitted_oe_range_end = (u64)offset;
 		current->journal_info = &dio_data;
 		down_read(&BTRFS_I(inode)->dio_sem);
-	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
-				     &BTRFS_I(inode)->runtime_flags)) {
-		inode_dio_end(inode);
-		flags = DIO_LOCKING | DIO_SKIP_HOLES;
-		wakeup = false;
 	}
 
-	ret = __blockdev_direct_IO(iocb, inode,
-				   fs_info->fs_devices->latest_bdev,
-				   iter, btrfs_get_blocks_direct, NULL,
-				   btrfs_submit_direct, flags);
+	ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
+			is_sync_kiocb(iocb));
+
 	if (iov_iter_rw(iter) == WRITE) {
 		up_read(&BTRFS_I(inode)->dio_sem);
 		current->journal_info = NULL;
@@ -8751,11 +8744,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
 	}
 out:
-	if (wakeup)
-		inode_dio_end(inode);
 	if (relock)
 		inode_lock(inode);
-
 	extent_changeset_free(data_reserved);
 	return ret;
 }
@@ -11045,7 +11035,7 @@ static const struct address_space_operations btrfs_aops = {
 	.writepage	= btrfs_writepage,
 	.writepages	= btrfs_writepages,
 	.readpages	= btrfs_readpages,
-	.direct_IO	= btrfs_direct_IO,
+	.direct_IO	= noop_direct_IO,
 	.invalidatepage = btrfs_invalidatepage,
 	.releasepage	= btrfs_releasepage,
 	.set_page_dirty	= btrfs_set_page_dirty,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 15:56 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
  2019-12-05 17:18   ` Johannes Thumshirn
@ 2019-12-05 22:59   ` Nikolay Borisov
  1 sibling, 0 replies; 37+ messages in thread
From: Nikolay Borisov @ 2019-12-05 22:59 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs
  Cc: linux-fsdevel, hch, darrick.wong, fdmanana, dsterba, jthumshirn,
	Goldwyn Rodrigues



On 5.12.19 г. 17:56 ч., Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Switch from __blockdev_direct_IO() to iomap_dio_rw().
> Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it
> as iomap_begin() for iomap direct I/O functions. This function
> allocates and locks all the blocks required for the I/O.
> btrfs_submit_direct() is used as the submit_io() hook for direct I/O
> ops.
> 
> Since we need direct I/O reads to go through iomap_dio_rw(), we change
> file_operations.read_iter() to a btrfs_file_read_iter() which calls
> btrfs_direct_IO() for direct reads and falls back to
> generic_file_buffered_read() for incomplete reads and buffered reads.
> 
> We don't need address_space.direct_IO() anymore so set it to noop.
> Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is
> capable of direct I/O reads from a hole, so we don't need to return
> -ENOENT.

IMO it will be good to document in the changelog the lock context of the
current scheme. E.g. it's done with inode lock held shared meaning it
allows other DIO reads but not writes/truncates. (This is different than
what we had up until now).

<nit>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:32       ` Johannes Thumshirn
  2019-12-05 17:33         ` Christoph Hellwig
@ 2019-12-05 17:44         ` Goldwyn Rodrigues
  1 sibling, 0 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-05 17:44 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, linux-btrfs, linux-fsdevel, darrick.wong,
	fdmanana, nborisov, dsterba

On 18:32 05/12, Johannes Thumshirn wrote:
> On Thu, Dec 05, 2019 at 09:19:59AM -0800, Christoph Hellwig wrote:
> > I actually much prefer exporting generic_file_buffered_read and will
> > gladly switch other callers not needing the messy direct I/O handling
> > in generic_file_read_iter over to generic_file_buffered_read once this
> > series is merged.
> 
> I think you misunderstood me here, I meant the code to be:
> 
> static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> {
> 	ssize_t ret = 0;
> 
> 	if (iocb->ki_flags & IOCB_DIRECT) {
> 		struct inode *inode = file_inode(iocb->ki_filp);
> 
> 		inode_lock_shared(inode);
> 		ret = btrfs_direct_IO(iocb, to);
> 		inode_unlock_shared(inode);
> 		if (ret < 0)
> 			return ret;
> 		}
> 	}
> 
> 	return generic_file_read_iter(icob, to);
> }
> 
> This way an iocb that is no dio will end in generic_file_read_iter():
> 
> generic_file_read_iter(iocb, to)
> {
> 	size_t count = iov_iter_count(iter);
>         ssize_t retval = 0;
> 
>         if (!count)
>                 goto out; /* skip atime */
> 
> 	if (iocb->ki_flags & IOCB_DIRECT) {
> 		skipped as flag is not set
> 	}
> 
> 	retval = generic_file_buffered_read(iocb, iter, retval);
> out:
> 	return retval;
> }
> 
> Meaning we do not need to export generic_file_buffered_read() and still can
> skip the generic DIO madness.
> 
> Makes sense?

For btrfs, DIO and buffered I/O is not mutually exclusive, since we fall
back to buffered I/O in case of incomplete Direct I/O. In this case,
the control will not skip IOCB_DIRECT branch.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:37               ` Christoph Hellwig
@ 2019-12-05 17:40                 ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2019-12-05 17:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Goldwyn Rodrigues, linux-btrfs, linux-fsdevel, darrick.wong,
	fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 09:37:43AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 05, 2019 at 09:37:28AM -0800, Christoph Hellwig wrote:
> > On Thu, Dec 05, 2019 at 06:36:48PM +0100, Johannes Thumshirn wrote:
> > > To hide the implementation details and not provoke someone yelling layering
> > > violation?
> > 
> > The only layering is ->direct_IO, and this is a step to get rid of that
> > junk..
> 
> s/layering/layering violation/

OK I admit defeat.

Byte,
	Johannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:37             ` Christoph Hellwig
@ 2019-12-05 17:37               ` Christoph Hellwig
  2019-12-05 17:40                 ` Johannes Thumshirn
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-05 17:37 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-btrfs, linux-fsdevel,
	darrick.wong, fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 09:37:28AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 05, 2019 at 06:36:48PM +0100, Johannes Thumshirn wrote:
> > To hide the implementation details and not provoke someone yelling layering
> > violation?
> 
> The only layering is ->direct_IO, and this is a step to get rid of that
> junk..

s/layering/layering violation/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:36           ` Johannes Thumshirn
@ 2019-12-05 17:37             ` Christoph Hellwig
  2019-12-05 17:37               ` Christoph Hellwig
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-05 17:37 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-btrfs, linux-fsdevel,
	darrick.wong, fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 06:36:48PM +0100, Johannes Thumshirn wrote:
> To hide the implementation details and not provoke someone yelling layering
> violation?

The only layering is ->direct_IO, and this is a step to get rid of that
junk..

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:33         ` Christoph Hellwig
@ 2019-12-05 17:36           ` Johannes Thumshirn
  2019-12-05 17:37             ` Christoph Hellwig
  0 siblings, 1 reply; 37+ messages in thread
From: Johannes Thumshirn @ 2019-12-05 17:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Goldwyn Rodrigues, linux-btrfs, linux-fsdevel, darrick.wong,
	fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 09:33:46AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 05, 2019 at 06:32:42PM +0100, Johannes Thumshirn wrote:
> > Meaning we do not need to export generic_file_buffered_read() and still can
> > skip the generic DIO madness.
> 
> But why not export it?  That way we call the function we want directly
> instead of through a wrapper that is entirely pointless for this case.

To hide the implementation details and not provoke someone yelling layering
violation?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:32       ` Johannes Thumshirn
@ 2019-12-05 17:33         ` Christoph Hellwig
  2019-12-05 17:36           ` Johannes Thumshirn
  2019-12-05 17:44         ` Goldwyn Rodrigues
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-05 17:33 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-btrfs, linux-fsdevel,
	darrick.wong, fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 06:32:42PM +0100, Johannes Thumshirn wrote:
> Meaning we do not need to export generic_file_buffered_read() and still can
> skip the generic DIO madness.

But why not export it?  That way we call the function we want directly
instead of through a wrapper that is entirely pointless for this case.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:19     ` Christoph Hellwig
@ 2019-12-05 17:32       ` Johannes Thumshirn
  2019-12-05 17:33         ` Christoph Hellwig
  2019-12-05 17:44         ` Goldwyn Rodrigues
  0 siblings, 2 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2019-12-05 17:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Goldwyn Rodrigues, linux-btrfs, linux-fsdevel, darrick.wong,
	fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 09:19:59AM -0800, Christoph Hellwig wrote:
> I actually much prefer exporting generic_file_buffered_read and will
> gladly switch other callers not needing the messy direct I/O handling
> in generic_file_read_iter over to generic_file_buffered_read once this
> series is merged.

I think you misunderstood me here, I meant the code to be:

static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
	ssize_t ret = 0;

	if (iocb->ki_flags & IOCB_DIRECT) {
		struct inode *inode = file_inode(iocb->ki_filp);

		inode_lock_shared(inode);
		ret = btrfs_direct_IO(iocb, to);
		inode_unlock_shared(inode);
		if (ret < 0)
			return ret;
		}
	}

	return generic_file_read_iter(icob, to);
}

This way an iocb that is no dio will end in generic_file_read_iter():

generic_file_read_iter(iocb, to)
{
	size_t count = iov_iter_count(iter);
        ssize_t retval = 0;

        if (!count)
                goto out; /* skip atime */

	if (iocb->ki_flags & IOCB_DIRECT) {
		skipped as flag is not set
	}

	retval = generic_file_buffered_read(iocb, iter, retval);
out:
	return retval;
}

Meaning we do not need to export generic_file_buffered_read() and still can
skip the generic DIO madness.

Makes sense?

	Johannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 17:18   ` Johannes Thumshirn
@ 2019-12-05 17:19     ` Christoph Hellwig
  2019-12-05 17:32       ` Johannes Thumshirn
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2019-12-05 17:19 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Goldwyn Rodrigues, linux-btrfs, linux-fsdevel, hch, darrick.wong,
	fdmanana, nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 06:18:15PM +0100, Johannes Thumshirn wrote:
> > +
> > +	return generic_file_buffered_read(iocb, to, ret);
> 
> This could as well call generic_file_read_iter() and would thus make patch 1
> of this series obsolete. I think this is cleaner.

I actually much prefer exporting generic_file_buffered_read and will
gladly switch other callers not needing the messy direct I/O handling
in generic_file_read_iter over to generic_file_buffered_read once this
series is merged.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 15:56 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
@ 2019-12-05 17:18   ` Johannes Thumshirn
  2019-12-05 17:19     ` Christoph Hellwig
  2019-12-05 22:59   ` Nikolay Borisov
  1 sibling, 1 reply; 37+ messages in thread
From: Johannes Thumshirn @ 2019-12-05 17:18 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, linux-fsdevel, hch, darrick.wong, fdmanana,
	nborisov, dsterba, Goldwyn Rodrigues

On Thu, Dec 05, 2019 at 09:56:26AM -0600, Goldwyn Rodrigues wrote:
[...]
> +static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> +{
> +	ssize_t ret = 0;
> +
> +	if (iocb->ki_flags & IOCB_DIRECT) {
> +		struct inode *inode = file_inode(iocb->ki_filp);
> +
> +		inode_lock_shared(inode);
> +		ret = btrfs_direct_IO(iocb, to);
> +		inode_unlock_shared(inode);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	return generic_file_buffered_read(iocb, to, ret);

This could as well call generic_file_read_iter() and would thus make patch 1
of this series obsolete. I think this is cleaner.

Thanks,
	Johannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
  2019-12-05 15:56 [PATCH 0/8 v3] btrfs direct-io using iomap Goldwyn Rodrigues
@ 2019-12-05 15:56 ` Goldwyn Rodrigues
  2019-12-05 17:18   ` Johannes Thumshirn
  2019-12-05 22:59   ` Nikolay Borisov
  0 siblings, 2 replies; 37+ messages in thread
From: Goldwyn Rodrigues @ 2019-12-05 15:56 UTC (permalink / raw)
  To: linux-btrfs
  Cc: linux-fsdevel, hch, darrick.wong, fdmanana, nborisov, dsterba,
	jthumshirn, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Switch from __blockdev_direct_IO() to iomap_dio_rw().
Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it
as iomap_begin() for iomap direct I/O functions. This function
allocates and locks all the blocks required for the I/O.
btrfs_submit_direct() is used as the submit_io() hook for direct I/O
ops.

Since we need direct I/O reads to go through iomap_dio_rw(), we change
file_operations.read_iter() to a btrfs_file_read_iter() which calls
btrfs_direct_IO() for direct reads and falls back to
generic_file_buffered_read() for incomplete reads and buffered reads.

We don't need address_space.direct_IO() anymore so set it to noop.
Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is
capable of direct I/O reads from a hole, so we don't need to return
-ENOENT.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h |   1 +
 fs/btrfs/file.c  |  21 ++++++-
 fs/btrfs/inode.c | 182 ++++++++++++++++++++++++++-----------------------------
 3 files changed, 107 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fe2b8765d9e6..cde8423673fc 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2903,6 +2903,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					  u64 end, int uptodate);
 extern const struct dentry_operations btrfs_dentry_operations;
+ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 435a502a3226..c9c83a1711ee 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1834,7 +1834,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	loff_t endbyte;
 	int err;
 
-	written = generic_file_direct_write(iocb, from);
+	written = btrfs_direct_IO(iocb, from);
 
 	if (written < 0 || !iov_iter_count(from))
 		return written;
@@ -3452,9 +3452,26 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 	return generic_file_open(inode, filp);
 }
 
+static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	ssize_t ret = 0;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		struct inode *inode = file_inode(iocb->ki_filp);
+
+		inode_lock_shared(inode);
+		ret = btrfs_direct_IO(iocb, to);
+		inode_unlock_shared(inode);
+		if (ret < 0)
+			return ret;
+	}
+
+	return generic_file_buffered_read(iocb, to, ret);
+}
+
 const struct file_operations btrfs_file_operations = {
 	.llseek		= btrfs_file_llseek,
-	.read_iter      = generic_file_read_iter,
+	.read_iter      = btrfs_file_read_iter,
 	.splice_read	= generic_file_splice_read,
 	.write_iter	= btrfs_file_write_iter,
 	.mmap		= btrfs_file_mmap,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 015910079e73..b4a297407672 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -29,6 +29,7 @@
 #include <linux/iversion.h>
 #include <linux/swap.h>
 #include <linux/sched/mm.h>
+#include <linux/iomap.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -7479,7 +7480,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 }
 
 static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
-			      struct extent_state **cached_state, int writing)
+			      struct extent_state **cached_state, bool writing)
 {
 	struct btrfs_ordered_extent *ordered;
 	int ret = 0;
@@ -7619,28 +7620,7 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
 }
 
 
-static int btrfs_get_blocks_direct_read(struct extent_map *em,
-					struct buffer_head *bh_result,
-					struct inode *inode,
-					u64 start, u64 len)
-{
-	if (em->block_start == EXTENT_MAP_HOLE ||
-			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-		return -ENOENT;
-
-	len = min(len, em->len - (start - em->start));
-
-	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
-		inode->i_blkbits;
-	bh_result->b_size = len;
-	bh_result->b_bdev = em->bdev;
-	set_buffer_mapped(bh_result);
-
-	return 0;
-}
-
 static int btrfs_get_blocks_direct_write(struct extent_map **map,
-					 struct buffer_head *bh_result,
 					 struct inode *inode,
 					 struct btrfs_dio_data *dio_data,
 					 u64 start, u64 len)
@@ -7702,7 +7682,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	}
 
 	/* this will cow the extent */
-	len = bh_result->b_size;
 	free_extent_map(em);
 	*map = em = btrfs_new_extent_direct(inode, start, len);
 	if (IS_ERR(em)) {
@@ -7713,15 +7692,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	len = min(len, em->len - (start - em->start));
 
 skip_cow:
-	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
-		inode->i_blkbits;
-	bh_result->b_size = len;
-	bh_result->b_bdev = em->bdev;
-	set_buffer_mapped(bh_result);
-
-	if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-		set_buffer_new(bh_result);
-
 	/*
 	 * Need to update the i_size under the extent lock so buffered
 	 * readers will get the updated i_size when we unlock.
@@ -7737,24 +7707,37 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	return ret;
 }
 
-static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
-				   struct buffer_head *bh_result, int create)
+static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
+		loff_t length, unsigned flags, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
 	struct btrfs_dio_data *dio_data = NULL;
-	u64 start = iblock << inode->i_blkbits;
 	u64 lockstart, lockend;
-	u64 len = bh_result->b_size;
+	bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
+	u64 len = length;
+	bool unlock_extents = false;
 
-	if (!create)
+	if (!write)
 		len = min_t(u64, len, fs_info->sectorsize);
 
 	lockstart = start;
 	lockend = start + len - 1;
 
+	/*
+	 * The generic stuff only does filemap_write_and_wait_range, which
+	 * isn't enough if we've written compressed pages to this area, so
+	 * we need to flush the dirty pages again to make absolutely sure
+	 * that any outstanding dirty pages are on disk.
+	 */
+	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+		     &BTRFS_I(inode)->runtime_flags))
+		ret = filemap_fdatawrite_range(inode->i_mapping, start,
+					 start + length - 1);
+
 	if (current->journal_info) {
 		/*
 		 * Need to pull our outstanding extents and set journal_info to NULL so
@@ -7770,7 +7753,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 	 * this range and we need to fallback to buffered.
 	 */
 	if (lock_extent_direct(inode, lockstart, lockend, &cached_state,
-			       create)) {
+			       write)) {
 		ret = -ENOTBLK;
 		goto err;
 	}
@@ -7802,36 +7785,53 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 		goto unlock_err;
 	}
 
-	if (create) {
-		ret = btrfs_get_blocks_direct_write(&em, bh_result, inode,
+	len = min(len, em->len - (start - em->start));
+	if (write) {
+		ret = btrfs_get_blocks_direct_write(&em, inode,
 						    dio_data, start, len);
 		if (ret < 0)
 			goto unlock_err;
-
-		unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart,
-				     lockend, &cached_state);
+		unlock_extents = true;
+		/* Recalc len in case the new em is smaller than requested */
+		len = min(len, em->len - (start - em->start));
+	} else if (em->block_start == EXTENT_MAP_HOLE ||
+			test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
+		/* Unlock in case of direct reading from a hole */
+		unlock_extents = true;
 	} else {
-		ret = btrfs_get_blocks_direct_read(em, bh_result, inode,
-						   start, len);
-		/* Can be negative only if we read from a hole */
-		if (ret < 0) {
-			ret = 0;
-			free_extent_map(em);
-			goto unlock_err;
-		}
 		/*
 		 * We need to unlock only the end area that we aren't using.
 		 * The rest is going to be unlocked by the endio routine.
 		 */
-		lockstart = start + bh_result->b_size;
-		if (lockstart < lockend) {
-			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     lockstart, lockend, &cached_state);
-		} else {
-			free_extent_state(cached_state);
-		}
+		lockstart = start + len;
+		if (lockstart < lockend)
+			unlock_extents = true;
 	}
 
+	if (unlock_extents)
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				lockstart, lockend, &cached_state);
+	else
+		free_extent_state(cached_state);
+
+	/*
+	 * Translate extent map information to iomap
+	 * We trim the extents (and move the addr) even though
+	 * iomap code does that, since we have locked only the parts
+	 * we are performing I/O in.
+	 */
+	if ((em->block_start == EXTENT_MAP_HOLE) ||
+	    (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) && !write)) {
+		iomap->addr = IOMAP_NULL_ADDR;
+		iomap->type = IOMAP_HOLE;
+	} else {
+		iomap->addr = em->block_start + (start - em->start);
+		iomap->type = IOMAP_MAPPED;
+	}
+	iomap->offset = start;
+	iomap->bdev = em->bdev;
+	iomap->length = len;
+
 	free_extent_map(em);
 
 	return 0;
@@ -8199,9 +8199,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
 	kfree(dip);
 
 	dio_bio->bi_status = err;
-	dio_end_io(dio_bio);
+	bio_endio(dio_bio);
 	btrfs_io_bio_free_csum(io_bio);
-	bio_put(bio);
 }
 
 static void __endio_write_update_ordered(struct inode *inode,
@@ -8263,8 +8262,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
 	kfree(dip);
 
 	dio_bio->bi_status = bio->bi_status;
-	dio_end_io(dio_bio);
-	bio_put(bio);
+	bio_endio(dio_bio);
 }
 
 static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
@@ -8496,9 +8494,10 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 	return 0;
 }
 
-static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
+static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
 				loff_t file_offset)
 {
+	struct inode *inode = file_inode(file);
 	struct btrfs_dio_private *dip = NULL;
 	struct bio *bio = NULL;
 	struct btrfs_io_bio *io_bio;
@@ -8549,7 +8548,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 
 	ret = btrfs_submit_direct_hook(dip);
 	if (!ret)
-		return;
+		return BLK_QC_T_NONE;
 
 	btrfs_io_bio_free_csum(io_bio);
 
@@ -8568,7 +8567,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		/*
 		 * The end io callbacks free our dip, do the final put on bio
 		 * and all the cleanup and final put for dio_bio (through
-		 * dio_end_io()).
+		 * end_io()).
 		 */
 		dip = NULL;
 		bio = NULL;
@@ -8587,11 +8586,12 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		 * Releases and cleans up our dio_bio, no need to bio_put()
 		 * nor bio_endio()/bio_io_error() against dio_bio.
 		 */
-		dio_end_io(dio_bio);
+		bio_endio(dio_bio);
 	}
 	if (bio)
 		bio_put(bio);
 	kfree(dip);
+	return BLK_QC_T_NONE;
 }
 
 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
@@ -8627,7 +8627,23 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 	return retval;
 }
 
-static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+static const struct iomap_ops btrfs_dio_iomap_ops = {
+	.iomap_begin            = btrfs_dio_iomap_begin,
+};
+
+static const struct iomap_dio_ops btrfs_dops = {
+	.submit_io		= btrfs_submit_direct,
+};
+
+
+/*
+ * btrfs_direct_IO - perform direct I/O
+ * inode->i_rwsem must be locked before calling this function, shared or exclusive.
+ * @iocb - kernel iocb
+ * @iter - iter to/from data is copied
+ */
+
+ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
@@ -8636,28 +8652,13 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct extent_changeset *data_reserved = NULL;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
-	int flags = 0;
-	bool wakeup = true;
 	bool relock = false;
 	ssize_t ret;
 
 	if (check_direct_IO(fs_info, iter, offset))
 		return 0;
 
-	inode_dio_begin(inode);
-
-	/*
-	 * The generic stuff only does filemap_write_and_wait_range, which
-	 * isn't enough if we've written compressed pages to this area, so
-	 * we need to flush the dirty pages again to make absolutely sure
-	 * that any outstanding dirty pages are on disk.
-	 */
 	count = iov_iter_count(iter);
-	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
-		     &BTRFS_I(inode)->runtime_flags))
-		filemap_fdatawrite_range(inode->i_mapping, offset,
-					 offset + count - 1);
-
 	if (iov_iter_rw(iter) == WRITE) {
 		/*
 		 * If the write DIO is beyond the EOF, we need update
@@ -8688,17 +8689,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		dio_data.unsubmitted_oe_range_end = (u64)offset;
 		current->journal_info = &dio_data;
 		down_read(&BTRFS_I(inode)->dio_sem);
-	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
-				     &BTRFS_I(inode)->runtime_flags)) {
-		inode_dio_end(inode);
-		flags = DIO_LOCKING | DIO_SKIP_HOLES;
-		wakeup = false;
 	}
 
-	ret = __blockdev_direct_IO(iocb, inode,
-				   fs_info->fs_devices->latest_bdev,
-				   iter, btrfs_get_blocks_direct, NULL,
-				   btrfs_submit_direct, flags);
+	ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
+			is_sync_kiocb(iocb));
+
 	if (iov_iter_rw(iter) == WRITE) {
 		up_read(&BTRFS_I(inode)->dio_sem);
 		current->journal_info = NULL;
@@ -8725,11 +8720,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
 	}
 out:
-	if (wakeup)
-		inode_dio_end(inode);
 	if (relock)
 		inode_lock(inode);
-
 	extent_changeset_free(data_reserved);
 	return ret;
 }
@@ -11019,7 +11011,7 @@ static const struct address_space_operations btrfs_aops = {
 	.writepage	= btrfs_writepage,
 	.writepages	= btrfs_writepages,
 	.readpages	= btrfs_readpages,
-	.direct_IO	= btrfs_direct_IO,
+	.direct_IO	= noop_direct_IO,
 	.invalidatepage = btrfs_invalidatepage,
 	.releasepage	= btrfs_releasepage,
 	.set_page_dirty	= btrfs_set_page_dirty,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-01-07 17:23 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 1/8] fs: Export generic_file_buffered_read() Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
2019-12-14  0:31   ` Darrick J. Wong
2019-12-18  2:02   ` Darrick J. Wong
2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
2019-12-14  0:32   ` Darrick J. Wong
2019-12-18  2:04   ` Darrick J. Wong
2019-12-21 13:41   ` Christoph Hellwig
2019-12-21 13:42     ` Christoph Hellwig
2019-12-21 18:02     ` Darrick J. Wong
2019-12-13 19:57 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2019-12-21 14:42   ` Christoph Hellwig
2020-01-02 18:01     ` Goldwyn Rodrigues
2020-01-07 17:23       ` Christoph Hellwig
2020-01-07 11:59     ` Goldwyn Rodrigues
2020-01-07 17:21       ` Christoph Hellwig
2019-12-13 19:57 ` [PATCH 5/8] fs: Remove dio_end_io() Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 6/8] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 7/8] btrfs: Use ->iomap_end() instead of btrfs_dio_data Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 8/8] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
2019-12-16  0:01 ` [PATCH 0/8 v6] btrfs direct-io using iomap Nikolay Borisov
2019-12-16 12:41   ` Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2019-12-10 23:01 [PATCH 0/8 v4] " Goldwyn Rodrigues
2019-12-10 23:01 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2019-12-11  8:58   ` Filipe Manana
2019-12-11 10:43   ` Nikolay Borisov
2019-12-05 15:56 [PATCH 0/8 v3] btrfs direct-io using iomap Goldwyn Rodrigues
2019-12-05 15:56 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2019-12-05 17:18   ` Johannes Thumshirn
2019-12-05 17:19     ` Christoph Hellwig
2019-12-05 17:32       ` Johannes Thumshirn
2019-12-05 17:33         ` Christoph Hellwig
2019-12-05 17:36           ` Johannes Thumshirn
2019-12-05 17:37             ` Christoph Hellwig
2019-12-05 17:37               ` Christoph Hellwig
2019-12-05 17:40                 ` Johannes Thumshirn
2019-12-05 17:44         ` Goldwyn Rodrigues
2019-12-05 22:59   ` Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).