All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix
@ 2020-09-21 14:43 Goldwyn Rodrigues
  2020-09-21 14:43 ` [PATCH 01/15] fs: remove dio_end_io() Goldwyn Rodrigues
                   ` (14 more replies)
  0 siblings, 15 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef

I have merged two series here:
(1) Using inode_lock_shared for <EOF DIO writes
(2) Fix O_DSYNC | O_DIRECT

Both require inode locking to be pushed closer to I/O calls and thought
was relevant. (2) requires changes in iomap code. If there is a need to 
separate the two please let me know. I don't have any ulterior motives
here ;)

Testing: xfstests on both btrfs and xfs

Git: https://github.com/goldwynr/linux/tree/btrfs-inode-lock

Changes since v1:

 - Changed fix for deadlock due to O_DSYNC (iomap patches added)
 - btrfs_inode_lock() shifted to inode.c
 - Reinstated lockdep_assert_held() in iomap_dio_rw()

-- 
Goldwyn

 fs/btrfs/btrfs_inode.h |   28 --
 fs/btrfs/ctree.h       |   13 +
 fs/btrfs/file.c        |  497 +++++++++++++++++++++++++++----------------------
 fs/btrfs/inode.c       |  176 +++--------------
 fs/btrfs/transaction.h |    1
 fs/direct-io.c         |   19 -
 fs/iomap/direct-io.c   |   38 ++-
 include/linux/fs.h     |    2
 include/linux/iomap.h  |    5
 9 files changed, 355 insertions(+), 424 deletions(-)




^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/15] fs: remove dio_end_io()
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 14:17   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues, Nikolay Borisov,
	Johannes Thumshirn

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Since we removed the last user of dio_end_io(), remove the helper
function dio_end_io().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/direct-io.c     | 19 -------------------
 include/linux/fs.h |  2 --
 2 files changed, 21 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 183299892465..abf535b036ab 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -386,25 +386,6 @@ static void dio_bio_end_io(struct bio *bio)
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 }
 
-/**
- * dio_end_io - handle the end io action for the given bio
- * @bio: The direct io bio thats being completed
- *
- * This is meant to be called by any filesystem that uses their own dio_submit_t
- * so that the DIO specific endio actions are dealt with after the filesystem
- * has done it's completion work.
- */
-void dio_end_io(struct bio *bio)
-{
-	struct dio *dio = bio->bi_private;
-
-	if (dio->is_async)
-		dio_bio_end_aio(bio);
-	else
-		dio_bio_end_io(bio);
-}
-EXPORT_SYMBOL_GPL(dio_end_io);
-
 static inline void
 dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 	      struct block_device *bdev,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a08..8e2842a9c0b3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3079,8 +3079,6 @@ enum {
 	DIO_SKIP_HOLES	= 0x02,
 };
 
-void dio_end_io(struct bio *bio);
-
 ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 			     struct block_device *bdev, struct iov_iter *iter,
 			     get_block_t get_block,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
  2020-09-21 14:43 ` [PATCH 01/15] fs: remove dio_end_io() Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 13:18   ` Christoph Hellwig
  2020-09-22 14:17   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Goldwyn Rodrigues
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues, Nikolay Borisov,
	Johannes Thumshirn

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Since we now perform direct reads using i_rwsem, we can remove this
inode flag used to co-ordinate unlocked reads.

The truncate call takes i_rwsem. This means it is correctly synchronized
with concurrent direct reads.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/btrfs_inode.h | 18 ------------------
 fs/btrfs/inode.c       |  3 ---
 2 files changed, 21 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 6fdb46d58299..738009a22320 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -33,7 +33,6 @@ enum {
 	BTRFS_INODE_NEEDS_FULL_SYNC,
 	BTRFS_INODE_COPY_EVERYTHING,
 	BTRFS_INODE_IN_DELALLOC_LIST,
-	BTRFS_INODE_READDIO_NEED_LOCK,
 	BTRFS_INODE_HAS_PROPS,
 	BTRFS_INODE_SNAPSHOT_FLUSH,
 };
@@ -334,23 +333,6 @@ struct btrfs_dio_private {
 	u8 csums[];
 };
 
-/*
- * Disable DIO read nolock optimization, so new dio readers will be forced
- * to grab i_mutex. It is used to avoid the endless truncate due to
- * nonlocked dio read.
- */
-static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
-{
-	set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-	smp_mb();
-}
-
-static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
-{
-	smp_mb__before_atomic();
-	clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-}
-
 /* Array of bytes with variable length, hexadecimal format 0x1234 */
 #define CSUM_FMT				"0x%*phN"
 #define CSUM_FMT_VALUE(size, bytes)		size, bytes
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 18d171b2d544..3db4697ff1ba 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4855,10 +4855,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 
 		truncate_setsize(inode, newsize);
 
-		/* Disable nonlocked read DIO to avoid the endless truncate */
-		btrfs_inode_block_unlocked_dio(BTRFS_I(inode));
 		inode_dio_wait(inode);
-		btrfs_inode_resume_unlocked_dio(BTRFS_I(inode));
 
 		ret = btrfs_truncate(inode, newsize == oldsize);
 		if (ret && inode->i_nlink) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
  2020-09-21 14:43 ` [PATCH 01/15] fs: remove dio_end_io() Goldwyn Rodrigues
  2020-09-21 14:43 ` [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-21 15:09   ` Johannes Thumshirn
                     ` (2 more replies)
  2020-09-21 14:43 ` [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Goldwyn Rodrigues
                   ` (11 subsequent siblings)
  14 siblings, 3 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Christoph Hellwig <hch@lst.de>

This is to avoid the deadlock caused in btrfs because of O_DIRECT |
O_DSYNC.

Filesystems such as btrfs require i_rwsem while performing sync on a
file. iomap_dio_rw() is called under i_rw_sem. This leads to a
deadlock because of:

iomap_dio_complete()
  generic_write_sync()
    btrfs_sync_file()

Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems
can call iomap_dio_complete() after unlocking i_rwsem.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/iomap/direct-io.c  | 34 +++++++++++++++++++++++++---------
 include/linux/iomap.h |  5 +++++
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index c1aafb2ab990..d970c6bbbe11 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -76,7 +76,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
 		dio->submit.cookie = submit_bio(bio);
 }
 
-static ssize_t iomap_dio_complete(struct iomap_dio *dio)
+ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
 	const struct iomap_dio_ops *dops = dio->dops;
 	struct kiocb *iocb = dio->iocb;
@@ -130,6 +130,7 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iomap_dio_complete);
 
 static void iomap_dio_complete_work(struct work_struct *work)
 {
@@ -406,8 +407,8 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
  * Returns -ENOTBLK In case of a page invalidation invalidation failure for
  * writes.  The callers needs to fall back to buffered I/O in this case.
  */
-ssize_t
-iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
+struct iomap_dio *
+__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		bool wait_for_completion)
 {
@@ -421,14 +422,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	struct iomap_dio *dio;
 
 	if (!count)
-		return 0;
+		return NULL;
 
 	if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion))
-		return -EIO;
+		return ERR_PTR(-EIO);
 
 	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
 	if (!dio)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	dio->iocb = iocb;
 	atomic_set(&dio->ref, 1);
@@ -558,7 +559,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->wait_for_completion = wait_for_completion;
 	if (!atomic_dec_and_test(&dio->ref)) {
 		if (!wait_for_completion)
-			return -EIOCBQUEUED;
+			return ERR_PTR(-EIOCBQUEUED);
 
 		for (;;) {
 			set_current_state(TASK_UNINTERRUPTIBLE);
@@ -574,10 +575,25 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		__set_current_state(TASK_RUNNING);
 	}
 
-	return iomap_dio_complete(dio);
+	return dio;
 
 out_free_dio:
 	kfree(dio);
-	return ret;
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(__iomap_dio_rw);
+
+ssize_t
+iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
+		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
+		bool wait_for_completion)
+{
+	struct iomap_dio *dio;
+
+	dio = __iomap_dio_rw(iocb, iter, ops, dops, wait_for_completion);
+	if (IS_ERR_OR_NULL(dio))
+		return PTR_ERR_OR_ZERO(dio);
+	return iomap_dio_complete(dio);
 }
 EXPORT_SYMBOL_GPL(iomap_dio_rw);
+
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4d1d3c3469e9..172b3397a1a3 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -13,6 +13,7 @@
 struct address_space;
 struct fiemap_extent_info;
 struct inode;
+struct iomap_dio;
 struct iomap_writepage_ctx;
 struct iov_iter;
 struct kiocb;
@@ -258,6 +259,10 @@ struct iomap_dio_ops {
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		bool wait_for_completion);
+struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
+		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
+		bool wait_for_completion);
+ssize_t iomap_dio_complete(struct iomap_dio *dio);
 int iomap_dio_iopoll(struct kiocb *kiocb, bool spin);
 
 #ifdef CONFIG_SWAP
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-21 15:11   ` Johannes Thumshirn
                     ` (2 more replies)
  2020-09-21 14:43 ` [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write Goldwyn Rodrigues
                   ` (10 subsequent siblings)
  14 siblings, 3 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

iomap complete routine can deadlock with btrfs_fallocate because of the
call to generic_write_sync().

P0                      P1
inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
__iomap_dio_rw()        inode_lock()
                        <block>
<submits IO>
<completes IO>
inode_unlock()
                        <gets inode_lock()>
                        inode_dio_wait()
iomap_dio_complete()
  generic_write_sync()
    btrfs_file_fsync()
      inode_lock()
      <deadlock>

inode_dio_end() is used to notify the end of DIO data in order
to synchronize with truncate. Call inode_dio_end() before calling
generic_write_sync(), so filesystems can lock i_rwsem during a sync.

---
 fs/iomap/direct-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index d970c6bbbe11..e01f81e7b76f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 			dio_warn_stale_pagecache(iocb->ki_filp);
 	}
 
+	inode_dio_end(file_inode(iocb->ki_filp));
 	/*
 	 * If this is a DSYNC write, make sure we push it to stable storage now
 	 * that we've written data.
@@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 	if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
 		ret = generic_write_sync(iocb, ret);
 
-	inode_dio_end(file_inode(iocb->ki_filp));
 	kfree(dio);
 
 	return ret;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 13:22   ` Christoph Hellwig
  2020-09-22 14:27   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write() Goldwyn Rodrigues
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The read and write DIO don't have anything in common except for the
call to iomap_dio_rw. Extract the write call into a new function to get
rid of conditional statements for direct write.

Originally proposed by Christoph Hellwig <hch@lst.de>

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h |  5 ++-
 fs/btrfs/file.c  | 98 +++++++++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/inode.c | 91 ++------------------------------------------
 3 files changed, 95 insertions(+), 99 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index cd644c755142..b47a8dcff028 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -28,6 +28,7 @@
 #include <linux/dynamic_debug.h>
 #include <linux/refcount.h>
 #include <linux/crc32c.h>
+#include <linux/iomap.h>
 #include "extent-io-tree.h"
 #include "extent_io.h"
 #include "extent_map.h"
@@ -3048,7 +3049,9 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					  u64 end, int uptodate);
 extern const struct dentry_operations btrfs_dentry_operations;
-ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
+extern const struct iomap_ops btrfs_dio_iomap_ops;
+extern const struct iomap_dio_ops btrfs_dio_ops;
+extern const struct iomap_dio_ops btrfs_sync_dops;
 
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 038e0afaf3d0..910e2fd234a9 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1855,21 +1855,68 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	return num_written ? num_written : ret;
 }
 
-static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
+static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
+		const struct iov_iter *iter, loff_t offset)
+{
+	const unsigned int blocksize_mask = fs_info->sectorsize - 1;
+
+	if (offset & blocksize_mask)
+		return -EINVAL;
+
+	if (iov_iter_alignment(iter) & blocksize_mask)
+		return -EINVAL;
+
+	return 0;
+}
+
+static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	loff_t pos;
-	ssize_t written;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	loff_t pos = iocb->ki_pos;
+	ssize_t written = 0;
+	bool relock = false;
 	ssize_t written_buffered;
 	loff_t endbyte;
 	int err;
 
-	written = btrfs_direct_IO(iocb, from);
+	if (check_direct_IO(fs_info, from, pos))
+		goto buffered;
+
+	/*
+	 * If the write DIO is beyond the EOF, we need update
+	 * the isize, but it is protected by i_mutex. So we can
+	 * not unlock the i_mutex at this case.
+	 */
+	if (pos + iov_iter_count(from) <= inode->i_size) {
+		inode_unlock(inode);
+		relock = true;
+	}
+	down_read(&BTRFS_I(inode)->dio_sem);
+
+	/*
+	 * We have are actually a sync iocb, so we need our fancy endio to know
+	 * if we need to sync.
+	 */
+	if (current->journal_info)
+		written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
+				&btrfs_sync_dops, is_sync_kiocb(iocb));
+	else
+		written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
+				&btrfs_dio_ops, is_sync_kiocb(iocb));
+
+	if (written == -ENOTBLK)
+		written = 0;
+
+	up_read(&BTRFS_I(inode)->dio_sem);
+	if (relock)
+		inode_lock(inode);
 
 	if (written < 0 || !iov_iter_count(from))
 		return written;
 
+buffered:
 	pos = iocb->ki_pos;
 	written_buffered = btrfs_buffered_write(iocb, from);
 	if (written_buffered < 0) {
@@ -2043,7 +2090,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 			iocb->ki_flags &= ~IOCB_DSYNC;
 			current->journal_info = BTRFS_DIO_SYNC_STUB;
 		}
-		num_written = __btrfs_direct_write(iocb, from);
+		num_written = btrfs_direct_write(iocb, from);
 
 		/*
 		 * As stated above, we cleared journal_info, so we need to do
@@ -3618,16 +3665,47 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 	return generic_file_open(inode, filp);
 }
 
+static int check_direct_read(struct btrfs_fs_info *fs_info,
+		const struct iov_iter *iter, loff_t offset)
+{
+	int ret;
+	int i, seg;
+
+	ret = check_direct_IO(fs_info, iter, offset);
+	if (ret < 0)
+		return ret;
+
+	if (!iter_is_iovec(iter))
+		return 0;
+
+	for (seg = 0; seg < iter->nr_segs; seg++)
+		for (i = seg + 1; i < iter->nr_segs; i++)
+			if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
+				return -EINVAL;
+	return 0;
+}
+
+static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	ssize_t ret;
+
+	if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos))
+		return 0;
+
+	inode_lock_shared(inode);
+	ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
+			is_sync_kiocb(iocb));
+	inode_unlock_shared(inode);
+	return ret;
+}
+
 static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	ssize_t ret = 0;
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		struct inode *inode = file_inode(iocb->ki_filp);
-
-		inode_lock_shared(inode);
-		ret = btrfs_direct_IO(iocb, to);
-		inode_unlock_shared(inode);
+		ret = btrfs_direct_read(iocb, to);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3db4697ff1ba..0730131b6590 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7898,39 +7898,6 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
 	return BLK_QC_T_NONE;
 }
 
-static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
-			       const struct iov_iter *iter, loff_t offset)
-{
-	int seg;
-	int i;
-	unsigned int blocksize_mask = fs_info->sectorsize - 1;
-	ssize_t retval = -EINVAL;
-
-	if (offset & blocksize_mask)
-		goto out;
-
-	if (iov_iter_alignment(iter) & blocksize_mask)
-		goto out;
-
-	/* If this is a write we don't need to check anymore */
-	if (iov_iter_rw(iter) != READ || !iter_is_iovec(iter))
-		return 0;
-	/*
-	 * Check to make sure we don't have duplicate iov_base's in this
-	 * iovec, if so return EINVAL, otherwise we'll get csum errors
-	 * when reading back.
-	 */
-	for (seg = 0; seg < iter->nr_segs; seg++) {
-		for (i = seg + 1; i < iter->nr_segs; i++) {
-			if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
-				goto out;
-		}
-	}
-	retval = 0;
-out:
-	return retval;
-}
-
 static inline int btrfs_maybe_fsync_end_io(struct kiocb *iocb, ssize_t size,
 					   int error, unsigned flags)
 {
@@ -7955,72 +7922,20 @@ static inline int btrfs_maybe_fsync_end_io(struct kiocb *iocb, ssize_t size,
 	return 0;
 }
 
-static const struct iomap_ops btrfs_dio_iomap_ops = {
+const struct iomap_ops btrfs_dio_iomap_ops = {
 	.iomap_begin            = btrfs_dio_iomap_begin,
 	.iomap_end              = btrfs_dio_iomap_end,
 };
 
-static const struct iomap_dio_ops btrfs_dio_ops = {
+const struct iomap_dio_ops btrfs_dio_ops = {
 	.submit_io		= btrfs_submit_direct,
 };
 
-static const struct iomap_dio_ops btrfs_sync_dops = {
+const struct iomap_dio_ops btrfs_sync_dops = {
 	.submit_io		= btrfs_submit_direct,
 	.end_io			= btrfs_maybe_fsync_end_io,
 };
 
-ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct extent_changeset *data_reserved = NULL;
-	loff_t offset = iocb->ki_pos;
-	size_t count = 0;
-	bool relock = false;
-	ssize_t ret;
-
-	if (check_direct_IO(fs_info, iter, offset))
-		return 0;
-
-	count = iov_iter_count(iter);
-	if (iov_iter_rw(iter) == WRITE) {
-		/*
-		 * If the write DIO is beyond the EOF, we need update
-		 * the isize, but it is protected by i_mutex. So we can
-		 * not unlock the i_mutex at this case.
-		 */
-		if (offset + count <= inode->i_size) {
-			inode_unlock(inode);
-			relock = true;
-		}
-		down_read(&BTRFS_I(inode)->dio_sem);
-	}
-
-	/*
-	 * We have are actually a sync iocb, so we need our fancy endio to know
-	 * if we need to sync.
-	 */
-	if (current->journal_info)
-		ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops,
-				   &btrfs_sync_dops, is_sync_kiocb(iocb));
-	else
-		ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops,
-				   &btrfs_dio_ops, is_sync_kiocb(iocb));
-
-	if (ret == -ENOTBLK)
-		ret = 0;
-
-	if (iov_iter_rw(iter) == WRITE)
-		up_read(&BTRFS_I(inode)->dio_sem);
-
-	if (relock)
-		inode_lock(inode);
-
-	extent_changeset_free(data_reserved);
-	return ret;
-}
-
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			u64 start, u64 len)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write()
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (4 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 13:22   ` Christoph Hellwig
  2020-09-22 14:30   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 07/15] btrfs: Move FS error state bit early during write Goldwyn Rodrigues
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

While we do this, correct the call to pagecache_isize_extended():
 - pagecache_isisze_extended needs to be called to the starting of the
   write as opposed to i_size
 - We don't need to check range before the call, this is done in the
   function

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 910e2fd234a9..4c40a2742aab 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1632,6 +1632,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	int ret = 0;
 	bool only_release_metadata = false;
 	bool force_page_uptodate = false;
+	loff_t old_isize = i_size_read(inode);
 
 	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
 			PAGE_SIZE / (sizeof(struct page *)));
@@ -1852,6 +1853,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	}
 
 	extent_changeset_free(data_reserved);
+	if (num_written > 0) {
+		pagecache_isize_extended(inode, old_isize, iocb->ki_pos);
+		iocb->ki_pos += num_written;
+	}
 	return num_written ? num_written : ret;
 }
 
@@ -1975,7 +1980,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	loff_t pos;
 	size_t count;
 	loff_t oldsize;
-	int clean_page = 0;
 
 	if (!(iocb->ki_flags & IOCB_DIRECT) &&
 	    (iocb->ki_flags & IOCB_NOWAIT))
@@ -2057,8 +2061,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 			inode_unlock(inode);
 			goto out;
 		}
-		if (start_pos > round_up(oldsize, fs_info->sectorsize))
-			clean_page = 1;
 	}
 
 	if (sync)
@@ -2101,11 +2103,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		current->journal_info = NULL;
 	} else {
 		num_written = btrfs_buffered_write(iocb, from);
-		if (num_written > 0)
-			iocb->ki_pos = pos + num_written;
-		if (clean_page)
-			pagecache_isize_extended(inode, oldsize,
-						i_size_read(inode));
 	}
 
 	inode_unlock(inode);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 07/15] btrfs: Move FS error state bit early during write
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (5 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write() Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 14:38   ` Josef Bacik
  2020-09-23  9:10   ` Nikolay Borisov
  2020-09-21 14:43 ` [PATCH 08/15] btrfs: Introduce btrfs_write_check() Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

fs_info->fs_state is a filesystem bit check as opposed to inode
and can be performed before we begin with write checks. This eliminates
inode lock/unlock in case of error bit is set.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4c40a2742aab..ca374cb5ffc9 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1981,6 +1981,15 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	size_t count;
 	loff_t oldsize;
 
+	/*
+	 * If BTRFS flips readonly due to some impossible error
+	 * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR),
+	 * although we have opened a file as writable, we have
+	 * to stop this write operation to ensure FS consistency.
+	 */
+	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+		return -EROFS;
+
 	if (!(iocb->ki_flags & IOCB_DIRECT) &&
 	    (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
@@ -2030,18 +2039,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		goto out;
 	}
 
-	/*
-	 * If BTRFS flips readonly due to some impossible error
-	 * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR),
-	 * although we have opened a file as writable, we have
-	 * to stop this write operation to ensure FS consistency.
-	 */
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
-		inode_unlock(inode);
-		err = -EROFS;
-		goto out;
-	}
-
 	/*
 	 * We reserve space for updating the inode when we reserve space for the
 	 * extent we are going to write, so we will enospc out there.  We don't
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 08/15] btrfs: Introduce btrfs_write_check()
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (6 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 07/15] btrfs: Move FS error state bit early during write Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 13:26   ` Christoph Hellwig
  2020-09-22 14:42   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock() Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

btrfs_write_check() checks for all parameters in one place before
beginning a write. This does away with inode_unlock() after every check.
In the later patches, it will help push inode_lock/unlock() in buffered
and direct write functions respectively.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 159 ++++++++++++++++++++++++------------------------
 1 file changed, 81 insertions(+), 78 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ca374cb5ffc9..0f961ce1fa98 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1615,6 +1615,85 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
 	btrfs_drew_write_unlock(&inode->root->snapshot_lock);
 }
 
+static void update_time_for_write(struct inode *inode)
+{
+	struct timespec64 now;
+
+	if (IS_NOCMTIME(inode))
+		return;
+
+	now = current_time(inode);
+	if (!timespec64_equal(&inode->i_mtime, &now))
+		inode->i_mtime = now;
+
+	if (!timespec64_equal(&inode->i_ctime, &now))
+		inode->i_ctime = now;
+
+	if (IS_I_VERSION(inode))
+		inode_inc_iversion(inode);
+}
+
+static size_t btrfs_write_check(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	loff_t pos = iocb->ki_pos;
+	size_t count = iov_iter_count(from);
+	int err;
+	loff_t oldsize;
+	loff_t start_pos;
+
+	err = generic_write_checks(iocb, from);
+	if (err <= 0)
+		return err;
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		size_t nocow_bytes = count;
+
+		/*
+		 * We will allocate space in case nodatacow is not set,
+		 * so bail
+		 */
+		if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes)
+		    <= 0)
+			return -EAGAIN;
+		/*
+		 * There are holes in the range or parts of the range that must
+		 * be COWed (shared extents, RO block groups, etc), so just bail
+		 * out.
+		 */
+		if (nocow_bytes < count)
+			return -EAGAIN;
+	}
+
+	current->backing_dev_info = inode_to_bdi(inode);
+	err = file_remove_privs(file);
+	if (err)
+		return err;
+
+	/*
+	 * We reserve space for updating the inode when we reserve space for the
+	 * extent we are going to write, so we will enospc out there.  We don't
+	 * need to start yet another transaction to update the inode as we will
+	 * update the inode when we finish writing whatever data we write.
+	 */
+	update_time_for_write(inode);
+
+	start_pos = round_down(pos, fs_info->sectorsize);
+	oldsize = i_size_read(inode);
+	if (start_pos > oldsize) {
+		/* Expand hole size to cover write data, preventing empty gap */
+		loff_t end_pos = round_up(pos + count,
+					  fs_info->sectorsize);
+		err = btrfs_cont_expand(inode, oldsize, end_pos);
+		if (err)
+			return err;
+	}
+
+	return count;
+}
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
@@ -1947,24 +2026,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	return written ? written : err;
 }
 
-static void update_time_for_write(struct inode *inode)
-{
-	struct timespec64 now;
-
-	if (IS_NOCMTIME(inode))
-		return;
-
-	now = current_time(inode);
-	if (!timespec64_equal(&inode->i_mtime, &now))
-		inode->i_mtime = now;
-
-	if (!timespec64_equal(&inode->i_ctime, &now))
-		inode->i_ctime = now;
-
-	if (IS_I_VERSION(inode))
-		inode_inc_iversion(inode);
-}
-
 static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 				    struct iov_iter *from)
 {
@@ -1972,14 +2033,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	u64 start_pos;
-	u64 end_pos;
 	ssize_t num_written = 0;
 	const bool sync = iocb->ki_flags & IOCB_DSYNC;
 	ssize_t err;
-	loff_t pos;
-	size_t count;
-	loff_t oldsize;
 
 	/*
 	 * If BTRFS flips readonly due to some impossible error
@@ -2001,65 +2057,12 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		inode_lock(inode);
 	}
 
-	err = generic_write_checks(iocb, from);
+	err = btrfs_write_check(iocb, from);
 	if (err <= 0) {
 		inode_unlock(inode);
 		return err;
 	}
 
-	pos = iocb->ki_pos;
-	count = iov_iter_count(from);
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		size_t nocow_bytes = count;
-
-		/*
-		 * We will allocate space in case nodatacow is not set,
-		 * so bail
-		 */
-		if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes)
-		    <= 0) {
-			inode_unlock(inode);
-			return -EAGAIN;
-		}
-		/*
-		 * There are holes in the range or parts of the range that must
-		 * be COWed (shared extents, RO block groups, etc), so just bail
-		 * out.
-		 */
-		if (nocow_bytes < count) {
-			inode_unlock(inode);
-			return -EAGAIN;
-		}
-	}
-
-	current->backing_dev_info = inode_to_bdi(inode);
-	err = file_remove_privs(file);
-	if (err) {
-		inode_unlock(inode);
-		goto out;
-	}
-
-	/*
-	 * We reserve space for updating the inode when we reserve space for the
-	 * extent we are going to write, so we will enospc out there.  We don't
-	 * need to start yet another transaction to update the inode as we will
-	 * update the inode when we finish writing whatever data we write.
-	 */
-	update_time_for_write(inode);
-
-	start_pos = round_down(pos, fs_info->sectorsize);
-	oldsize = i_size_read(inode);
-	if (start_pos > oldsize) {
-		/* Expand hole size to cover write data, preventing empty gap */
-		end_pos = round_up(pos + count,
-				   fs_info->sectorsize);
-		err = btrfs_cont_expand(inode, oldsize, end_pos);
-		if (err) {
-			inode_unlock(inode);
-			goto out;
-		}
-	}
-
 	if (sync)
 		atomic_inc(&BTRFS_I(inode)->sync_writers);
 
@@ -2117,7 +2120,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 
 	if (sync)
 		atomic_dec(&BTRFS_I(inode)->sync_writers);
-out:
+
 	current->backing_dev_info = NULL;
 	return num_written ? num_written : err;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock()
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (7 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 08/15] btrfs: Introduce btrfs_write_check() Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 14:45   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Helper functions for locking/unlocking i_rwsem.
btrfs_inode_lock/unlock() acquires the inode->i_rwsem depending on the
flags passed. ilock_flags determines the type of lock to be taken:

BTRFS_ILOCK_SHARED - for shared locks, for possible parallel DIO
BTRFS_ILOCK_TRY - for the RWF_NOWAIT sequence

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h |  7 +++++++
 fs/btrfs/file.c  | 31 ++++++++++++++++---------------
 fs/btrfs/inode.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b47a8dcff028..ea15771bf3da 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3053,6 +3053,13 @@ extern const struct iomap_ops btrfs_dio_iomap_ops;
 extern const struct iomap_dio_ops btrfs_dio_ops;
 extern const struct iomap_dio_ops btrfs_sync_dops;
 
+/* ilock flags definition */
+#define BTRFS_ILOCK_SHARED	(1 << 0)
+#define BTRFS_ILOCK_TRY 	(1 << 1)
+
+int btrfs_inode_lock(struct inode *inode, int ilock_flags);
+void btrfs_inode_unlock(struct inode *inode, int ilock_flags);
+
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 long btrfs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0f961ce1fa98..7e18334e8121 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1974,7 +1974,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	 * not unlock the i_mutex at this case.
 	 */
 	if (pos + iov_iter_count(from) <= inode->i_size) {
-		inode_unlock(inode);
+		btrfs_inode_unlock(inode, 0);
 		relock = true;
 	}
 	down_read(&BTRFS_I(inode)->dio_sem);
@@ -1995,7 +1995,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 
 	up_read(&BTRFS_I(inode)->dio_sem);
 	if (relock)
-		inode_lock(inode);
+		btrfs_inode_lock(inode, 0);
 
 	if (written < 0 || !iov_iter_count(from))
 		return written;
@@ -2036,6 +2036,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	ssize_t num_written = 0;
 	const bool sync = iocb->ki_flags & IOCB_DSYNC;
 	ssize_t err;
+	int ilock_flags = 0;
 
 	/*
 	 * If BTRFS flips readonly due to some impossible error
@@ -2050,16 +2051,16 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	    (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
 
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		if (!inode_trylock(inode))
-			return -EAGAIN;
-	} else {
-		inode_lock(inode);
-	}
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		ilock_flags |= BTRFS_ILOCK_TRY;
+
+	err = btrfs_inode_lock(inode, ilock_flags);
+	if (err < 0)
+		return err;
 
 	err = btrfs_write_check(iocb, from);
 	if (err <= 0) {
-		inode_unlock(inode);
+		btrfs_inode_unlock(inode, ilock_flags);
 		return err;
 	}
 
@@ -2105,7 +2106,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		num_written = btrfs_buffered_write(iocb, from);
 	}
 
-	inode_unlock(inode);
+	btrfs_inode_unlock(inode, ilock_flags);
 
 	/*
 	 * We also have to set last_sub_trans to the current log transid,
@@ -3405,7 +3406,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 			return ret;
 	}
 
-	inode_lock(inode);
+	btrfs_inode_lock(inode, 0);
 
 	if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) {
 		ret = inode_newsize_ok(inode, offset + len);
@@ -3644,9 +3645,9 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence)
 		return generic_file_llseek(file, offset, whence);
 	case SEEK_DATA:
 	case SEEK_HOLE:
-		inode_lock_shared(inode);
+		btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
 		offset = find_desired_extent(inode, offset, whence);
-		inode_unlock_shared(inode);
+		btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
 		break;
 	}
 
@@ -3690,10 +3691,10 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to)
 	if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos))
 		return 0;
 
-	inode_lock_shared(inode);
+	btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
 	ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
 			is_sync_kiocb(iocb));
-	inode_unlock_shared(inode);
+	btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0730131b6590..f305efac75ae 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -96,6 +96,37 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode,
 					 const u64 offset, const u64 bytes,
 					 const bool uptodate);
 
+int btrfs_inode_lock(struct inode *inode, int ilock_flags)
+{
+	if (ilock_flags & BTRFS_ILOCK_SHARED) {
+		if (ilock_flags & BTRFS_ILOCK_TRY) {
+			if (!inode_trylock_shared(inode))
+				return -EAGAIN;
+			else
+				return 0;
+		}
+		inode_lock_shared(inode);
+	} else {
+		if (ilock_flags & BTRFS_ILOCK_TRY) {
+			if (!inode_trylock(inode))
+				return -EAGAIN;
+			else
+				return 0;
+		}
+		inode_lock(inode);
+	}
+	return 0;
+}
+
+void btrfs_inode_unlock(struct inode *inode, int ilock_flags)
+{
+	if (ilock_flags & BTRFS_ILOCK_SHARED)
+		inode_unlock_shared(inode);
+	else
+		inode_unlock(inode);
+
+}
+
 /*
  * Cleanup all submitted ordered extents in specified range to handle errors
  * from the btrfs_run_delalloc_range() callback.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (8 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock() Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22  9:26     ` Dan Carpenter
  2020-09-22 14:48   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF Goldwyn Rodrigues
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Push inode locking and unlocking closer to where we perform the I/O. For
this we need to move the write checks inside the respective functions as
well.

pos is assigned after write checks because generic_write_check can
modify iocb->ki_pos.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 67 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7e18334e8121..d9c3be19d7b3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1698,7 +1698,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
 	struct file *file = iocb->ki_filp;
-	loff_t pos = iocb->ki_pos;
+	loff_t pos;
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct page **pages = NULL;
@@ -1712,14 +1712,29 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	bool only_release_metadata = false;
 	bool force_page_uptodate = false;
 	loff_t old_isize = i_size_read(inode);
+	int ilock_flags = 0;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		ilock_flags |= BTRFS_ILOCK_TRY;
+
+	ret = btrfs_inode_lock(inode, ilock_flags);
+	if (ret < 0)
+		return ret;
+
+	ret = btrfs_write_check(iocb, i);
+	if (ret <= 0)
+		goto out;
 
+	pos = iocb->ki_pos;
 	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
 			PAGE_SIZE / (sizeof(struct page *)));
 	nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
 	nrptrs = max(nrptrs, 8);
 	pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL);
-	if (!pages)
-		return -ENOMEM;
+	if (!pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	while (iov_iter_count(i) > 0) {
 		struct extent_state *cached_state = NULL;
@@ -1936,6 +1951,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		pagecache_isize_extended(inode, old_isize, iocb->ki_pos);
 		iocb->ki_pos += num_written;
 	}
+out:
+	btrfs_inode_unlock(inode, ilock_flags);
 	return num_written ? num_written : ret;
 }
 
@@ -1958,15 +1975,33 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	loff_t pos = iocb->ki_pos;
+	loff_t pos;
 	ssize_t written = 0;
 	bool relock = false;
 	ssize_t written_buffered;
 	loff_t endbyte;
 	int err;
+	int ilock_flags = 0;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		ilock_flags |= BTRFS_ILOCK_TRY;
+
+	err = btrfs_inode_lock(inode, ilock_flags);
+	if (err < 0)
+		return err;
+
+	err = btrfs_write_check(iocb, from);
+	if (err <= 0) {
+		btrfs_inode_unlock(inode, ilock_flags);
+		goto out;
+	}
+
+	pos = iocb->ki_pos;
 
-	if (check_direct_IO(fs_info, from, pos))
+	if (check_direct_IO(fs_info, from, pos)) {
+		btrfs_inode_unlock(inode, ilock_flags);
 		goto buffered;
+	}
 
 	/*
 	 * If the write DIO is beyond the EOF, we need update
@@ -1997,8 +2032,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	if (relock)
 		btrfs_inode_lock(inode, 0);
 
-	if (written < 0 || !iov_iter_count(from))
-		return written;
+	if (written < 0 || !iov_iter_count(from)) {
+		err = written;
+		goto out;
+	}
 
 buffered:
 	pos = iocb->ki_pos;
@@ -2036,7 +2073,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	ssize_t num_written = 0;
 	const bool sync = iocb->ki_flags & IOCB_DSYNC;
 	ssize_t err;
-	int ilock_flags = 0;
 
 	/*
 	 * If BTRFS flips readonly due to some impossible error
@@ -2051,19 +2087,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	    (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
 
-	if (iocb->ki_flags & IOCB_NOWAIT)
-		ilock_flags |= BTRFS_ILOCK_TRY;
-
-	err = btrfs_inode_lock(inode, ilock_flags);
-	if (err < 0)
-		return err;
-
-	err = btrfs_write_check(iocb, from);
-	if (err <= 0) {
-		btrfs_inode_unlock(inode, ilock_flags);
-		return err;
-	}
-
 	if (sync)
 		atomic_inc(&BTRFS_I(inode)->sync_writers);
 
@@ -2106,8 +2129,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		num_written = btrfs_buffered_write(iocb, from);
 	}
 
-	btrfs_inode_unlock(inode, ilock_flags);
-
 	/*
 	 * We also have to set last_sub_trans to the current log transid,
 	 * otherwise subsequent syncs to a file that's been synced in this
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (9 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 14:52   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 12/15] btrfs: Remove dio_sem Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Direct writes within EOF are safe to be performed with inode shared lock
to improve parallelization with other direct writes or reads because EOF
is not changed and there is no race with truncate().

Direct reads are already performed under shared inode lock.

This patch is precursor to removing btrfs_inode->dio_sem.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d9c3be19d7b3..50092d24eee2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1977,7 +1977,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	loff_t pos;
 	ssize_t written = 0;
-	bool relock = false;
 	ssize_t written_buffered;
 	loff_t endbyte;
 	int err;
@@ -1986,6 +1985,15 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		ilock_flags |= BTRFS_ILOCK_TRY;
 
+	/*
+	 * If the write DIO within EOF,  use a shared lock
+	 */
+	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode))
+		ilock_flags |= BTRFS_ILOCK_SHARED;
+	else if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EAGAIN;
+
+relock:
 	err = btrfs_inode_lock(inode, ilock_flags);
 	if (err < 0)
 		return err;
@@ -1997,21 +2005,23 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	}
 
 	pos = iocb->ki_pos;
+	/*
+	 * Re-check since file size may have changed
+	 * just before taking the lock or pos may have changed
+	 * because of O_APPEND in generic_write_check()
+	 */
+	if ((ilock_flags & BTRFS_ILOCK_SHARED) &&
+	    pos + iov_iter_count(from) > i_size_read(inode)) {
+		btrfs_inode_unlock(inode, ilock_flags);
+		ilock_flags &= ~BTRFS_ILOCK_SHARED;
+		goto relock;
+	}
 
 	if (check_direct_IO(fs_info, from, pos)) {
 		btrfs_inode_unlock(inode, ilock_flags);
 		goto buffered;
 	}
 
-	/*
-	 * If the write DIO is beyond the EOF, we need update
-	 * the isize, but it is protected by i_mutex. So we can
-	 * not unlock the i_mutex at this case.
-	 */
-	if (pos + iov_iter_count(from) <= inode->i_size) {
-		btrfs_inode_unlock(inode, 0);
-		relock = true;
-	}
 	down_read(&BTRFS_I(inode)->dio_sem);
 
 	/*
@@ -2029,8 +2039,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		written = 0;
 
 	up_read(&BTRFS_I(inode)->dio_sem);
-	if (relock)
-		btrfs_inode_lock(inode, 0);
+	btrfs_inode_unlock(inode, ilock_flags);
 
 	if (written < 0 || !iov_iter_count(from)) {
 		err = written;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/15] btrfs: Remove dio_sem
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (10 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 14:52   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

dio_sem can be eliminated because all DIO synchronization is performed
through inode->i_rwsem now.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/btrfs_inode.h | 10 ----------
 fs/btrfs/file.c        | 13 -------------
 fs/btrfs/inode.c       |  1 -
 3 files changed, 24 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 738009a22320..176abce59467 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -203,16 +203,6 @@ struct btrfs_inode {
 	/* Hook into fs_info->delayed_iputs */
 	struct list_head delayed_iput;
 
-	/*
-	 * To avoid races between lockless (i_mutex not held) direct IO writes
-	 * and concurrent fsync requests. Direct IO writes must acquire read
-	 * access on this semaphore for creating an extent map and its
-	 * corresponding ordered extent. The fast fsync path must acquire write
-	 * access on this semaphore before it collects ordered extents and
-	 * extent maps.
-	 */
-	struct rw_semaphore dio_sem;
-
 	struct inode vfs_inode;
 };
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 50092d24eee2..193af84f5405 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2022,8 +2022,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		goto buffered;
 	}
 
-	down_read(&BTRFS_I(inode)->dio_sem);
-
 	/*
 	 * We have are actually a sync iocb, so we need our fancy endio to know
 	 * if we need to sync.
@@ -2038,7 +2036,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	if (written == -ENOTBLK)
 		written = 0;
 
-	up_read(&BTRFS_I(inode)->dio_sem);
 	btrfs_inode_unlock(inode, ilock_flags);
 
 	if (written < 0 || !iov_iter_count(from)) {
@@ -2248,13 +2245,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 	inode_lock(inode);
 
-	/*
-	 * We take the dio_sem here because the tree log stuff can race with
-	 * lockless dio writes and get an extent map logged for an extent we
-	 * never waited on.  We need it this high up for lockdep reasons.
-	 */
-	down_write(&BTRFS_I(inode)->dio_sem);
-
 	atomic_inc(&root->log_batch);
 
 	/*
@@ -2285,7 +2275,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 */
 	ret = start_ordered_ops(inode, start, end);
 	if (ret) {
-		up_write(&BTRFS_I(inode)->dio_sem);
 		inode_unlock(inode);
 		goto out;
 	}
@@ -2382,7 +2371,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * file again, but that will end up using the synchronization
 	 * inside btrfs_sync_log to keep things safe.
 	 */
-	up_write(&BTRFS_I(inode)->dio_sem);
 	inode_unlock(inode);
 
 	if (ret != BTRFS_NO_LOG_SYNC) {
@@ -2413,7 +2401,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 out_release_extents:
 	btrfs_release_log_ctx_extents(&ctx);
-	up_write(&BTRFS_I(inode)->dio_sem);
 	inode_unlock(inode);
 	goto out;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f305efac75ae..96ff8e4a3b45 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8580,7 +8580,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	INIT_LIST_HEAD(&ei->delalloc_inodes);
 	INIT_LIST_HEAD(&ei->delayed_iput);
 	RB_CLEAR_NODE(&ei->rb_node);
-	init_rwsem(&ei->dio_sem);
 
 	return inode;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (11 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 12/15] btrfs: Remove dio_sem Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 15:11   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround") Goldwyn Rodrigues
  2020-09-21 14:43 ` [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw() Goldwyn Rodrigues
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

If direct writes are called with O_DIRECT | O_DSYNC, it will result in a
deadlock because iomap_dio_rw() is called under i_rwsem which calls
iomap_dio_complete()
  generic_write_sync()
    btrfs_sync_file().

btrfs_sync_file() requires i_rwsem, so call __iomap_dio_rw() with the
i_rwsem locked, and call iomap_dio_complete() after unlocking i_rwsem.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 193af84f5405..9c7a2d4b4148 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1981,6 +1981,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	loff_t endbyte;
 	int err;
 	int ilock_flags = 0;
+	struct iomap_dio *dio = NULL;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		ilock_flags |= BTRFS_ILOCK_TRY;
@@ -2022,22 +2023,19 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		goto buffered;
 	}
 
-	/*
-	 * We have are actually a sync iocb, so we need our fancy endio to know
-	 * if we need to sync.
-	 */
-	if (current->journal_info)
-		written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
-				&btrfs_sync_dops, is_sync_kiocb(iocb));
-	else
-		written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
-				&btrfs_dio_ops, is_sync_kiocb(iocb));
-
-	if (written == -ENOTBLK)
-		written = 0;
+	dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
+			&btrfs_dio_ops, is_sync_kiocb(iocb));
 
 	btrfs_inode_unlock(inode, ilock_flags);
 
+	if (IS_ERR_OR_NULL(dio)) {
+		err = PTR_ERR_OR_ZERO(dio);
+		if (err < 0 && err != -ENOTBLK)
+			goto out;
+	} else {
+		written = iomap_dio_complete(dio);
+	}
+
 	if (written < 0 || !iov_iter_count(from)) {
 		err = written;
 		goto out;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround")
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (12 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 15:12   ` Josef Bacik
  2020-09-21 14:43 ` [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw() Goldwyn Rodrigues
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

iomap_dio_complete()->generic_write_sync() is not called under i_rwsem
anymore, revert the workaround.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h       |  1 -
 fs/btrfs/file.c        | 38 ++------------------------------
 fs/btrfs/inode.c       | 50 ------------------------------------------
 fs/btrfs/transaction.h |  1 -
 4 files changed, 2 insertions(+), 88 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ea15771bf3da..bc96c52021b2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3051,7 +3051,6 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 extern const struct dentry_operations btrfs_dentry_operations;
 extern const struct iomap_ops btrfs_dio_iomap_ops;
 extern const struct iomap_dio_ops btrfs_dio_ops;
-extern const struct iomap_dio_ops btrfs_sync_dops;
 
 /* ilock flags definition */
 #define BTRFS_ILOCK_SHARED	(1 << 0)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9c7a2d4b4148..1d3363e7b09d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2094,44 +2094,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	if (sync)
 		atomic_inc(&BTRFS_I(inode)->sync_writers);
 
-	if (iocb->ki_flags & IOCB_DIRECT) {
-		/*
-		 * 1. We must always clear IOCB_DSYNC in order to not deadlock
-		 *    in iomap, as it calls generic_write_sync() in this case.
-		 * 2. If we are async, we can call iomap_dio_complete() either
-		 *    in
-		 *
-		 *    2.1. A worker thread from the last bio completed.  In this
-		 *	   case we need to mark the btrfs_dio_data that it is
-		 *	   async in order to call generic_write_sync() properly.
-		 *	   This is handled by setting BTRFS_DIO_SYNC_STUB in the
-		 *	   current->journal_info.
-		 *    2.2  The submitter context, because all IO completed
-		 *         before we exited iomap_dio_rw().  In this case we can
-		 *         just re-set the IOCB_DSYNC on the iocb and we'll do
-		 *         the sync below.  If our ->end_io() gets called and
-		 *         current->journal_info is set, then we know we're in
-		 *         our current context and we will clear
-		 *         current->journal_info to indicate that we need to
-		 *         sync below.
-		 */
-		if (sync) {
-			ASSERT(current->journal_info == NULL);
-			iocb->ki_flags &= ~IOCB_DSYNC;
-			current->journal_info = BTRFS_DIO_SYNC_STUB;
-		}
+	if (iocb->ki_flags & IOCB_DIRECT)
 		num_written = btrfs_direct_write(iocb, from);
-
-		/*
-		 * As stated above, we cleared journal_info, so we need to do
-		 * the sync ourselves.
-		 */
-		if (sync && current->journal_info == NULL)
-			iocb->ki_flags |= IOCB_DSYNC;
-		current->journal_info = NULL;
-	} else {
+	else
 		num_written = btrfs_buffered_write(iocb, from);
-	}
 
 	/*
 	 * We also have to set last_sub_trans to the current log transid,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 96ff8e4a3b45..41e32ef6b6ef 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -62,7 +62,6 @@ struct btrfs_dio_data {
 	loff_t length;
 	ssize_t submitted;
 	struct extent_changeset *data_reserved;
-	bool sync;
 };
 
 static const struct inode_operations btrfs_dir_inode_operations;
@@ -7362,17 +7361,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	int ret = 0;
 	u64 len = length;
 	bool unlock_extents = false;
-	bool sync = (current->journal_info == BTRFS_DIO_SYNC_STUB);
-
-	/*
-	 * We used current->journal_info here to see if we were sync, but
-	 * there's a lot of tests in the enospc machinery to not do flushing if
-	 * we have a journal_info set, so we need to clear this out and re-set
-	 * it in iomap_end.
-	 */
-	ASSERT(current->journal_info == NULL ||
-	       current->journal_info == BTRFS_DIO_SYNC_STUB);
-	current->journal_info = NULL;
 
 	if (!write)
 		len = min_t(u64, len, fs_info->sectorsize);
@@ -7398,7 +7386,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	if (!dio_data)
 		return -ENOMEM;
 
-	dio_data->sync = sync;
 	dio_data->length = length;
 	if (write) {
 		dio_data->reserve = round_up(length, fs_info->sectorsize);
@@ -7546,14 +7533,6 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 		extent_changeset_free(dio_data->data_reserved);
 	}
 out:
-	/*
-	 * We're all done, we can re-set the current->journal_info now safely
-	 * for our endio.
-	 */
-	if (dio_data->sync) {
-		ASSERT(current->journal_info == NULL);
-		current->journal_info = BTRFS_DIO_SYNC_STUB;
-	}
 	kfree(dio_data);
 	iomap->private = NULL;
 
@@ -7929,30 +7908,6 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
 	return BLK_QC_T_NONE;
 }
 
-static inline int btrfs_maybe_fsync_end_io(struct kiocb *iocb, ssize_t size,
-					   int error, unsigned flags)
-{
-	/*
-	 * Now if we're still in the context of our submitter we know we can't
-	 * safely run generic_write_sync(), so clear our flag here so that the
-	 * caller knows to follow up with a sync.
-	 */
-	if (current->journal_info == BTRFS_DIO_SYNC_STUB) {
-		current->journal_info = NULL;
-		return error;
-	}
-
-	if (error)
-		return error;
-
-	if (size) {
-		iocb->ki_flags |= IOCB_DSYNC;
-		return generic_write_sync(iocb, size);
-	}
-
-	return 0;
-}
-
 const struct iomap_ops btrfs_dio_iomap_ops = {
 	.iomap_begin            = btrfs_dio_iomap_begin,
 	.iomap_end              = btrfs_dio_iomap_end,
@@ -7962,11 +7917,6 @@ const struct iomap_dio_ops btrfs_dio_ops = {
 	.submit_io		= btrfs_submit_direct,
 };
 
-const struct iomap_dio_ops btrfs_sync_dops = {
-	.submit_io		= btrfs_submit_direct,
-	.end_io			= btrfs_maybe_fsync_end_io,
-};
-
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			u64 start, u64 len)
 {
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 858d9153a1cd..8241c050ba71 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -112,7 +112,6 @@ struct btrfs_transaction {
 #define TRANS_EXTWRITERS	(__TRANS_START | __TRANS_ATTACH)
 
 #define BTRFS_SEND_TRANS_STUB	((void *)1)
-#define BTRFS_DIO_SYNC_STUB	((void *)2)
 
 struct btrfs_trans_handle {
 	u64 transid;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw()
  2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
                   ` (13 preceding siblings ...)
  2020-09-21 14:43 ` [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround") Goldwyn Rodrigues
@ 2020-09-21 14:43 ` Goldwyn Rodrigues
  2020-09-22 13:26   ` Christoph Hellwig
  14 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-21 14:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

btrfs holds inode_lock_shared() while performing DIO within EOF, so
lockdep_assert_held() check can be re-instated.

Revert 3ad99bec6e82 ("iomap: remove lockdep_assert_held()")

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/iomap/direct-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index e01f81e7b76f..b5e030971001 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -421,6 +421,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	struct blk_plug plug;
 	struct iomap_dio *dio;
 
+	lockdep_assert_held(&inode->i_rwsem);
+
 	if (!count)
 		return NULL;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-21 14:43 ` [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Goldwyn Rodrigues
@ 2020-09-21 15:09   ` Johannes Thumshirn
  2020-09-22 13:19     ` hch
  2020-09-22  9:24     ` Dan Carpenter
  2020-09-22 14:17   ` Josef Bacik
  2 siblings, 1 reply; 57+ messages in thread
From: Johannes Thumshirn @ 2020-09-21 15:09 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, dsterba, darrick.wong, josef, Goldwyn Rodrigues

On 21/09/2020 16:44, Goldwyn Rodrigues wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> This is to avoid the deadlock caused in btrfs because of O_DIRECT |
> O_DSYNC.
> 
> Filesystems such as btrfs require i_rwsem while performing sync on a
> file. iomap_dio_rw() is called under i_rw_sem. This leads to a
> deadlock because of:
> 
> iomap_dio_complete()
>   generic_write_sync()
>     btrfs_sync_file()
> 
> Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems
> can call iomap_dio_complete() after unlocking i_rwsem.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>

This is missing Christoph's S-o-b

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-21 14:43 ` [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Goldwyn Rodrigues
@ 2020-09-21 15:11   ` Johannes Thumshirn
  2020-09-22 13:21   ` Christoph Hellwig
  2020-09-22 14:20   ` Josef Bacik
  2 siblings, 0 replies; 57+ messages in thread
From: Johannes Thumshirn @ 2020-09-21 15:11 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, dsterba, darrick.wong, josef, Goldwyn Rodrigues

On 21/09/2020 16:44, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> iomap complete routine can deadlock with btrfs_fallocate because of the
> call to generic_write_sync().
> 
> P0                      P1
> inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
> __iomap_dio_rw()        inode_lock()
>                         <block>
> <submits IO>
> <completes IO>
> inode_unlock()
>                         <gets inode_lock()>
>                         inode_dio_wait()
> iomap_dio_complete()
>   generic_write_sync()
>     btrfs_file_fsync()
>       inode_lock()
>       <deadlock>
> 
> inode_dio_end() is used to notify the end of DIO data in order
> to synchronize with truncate. Call inode_dio_end() before calling
> generic_write_sync(), so filesystems can lock i_rwsem during a sync.
> 
> ---
>  fs/iomap/direct-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


Missing S-o-b as well.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-21 14:43 ` [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Goldwyn Rodrigues
@ 2020-09-22  9:24     ` Dan Carpenter
  2020-09-22  9:24     ` Dan Carpenter
  2020-09-22 14:17   ` Josef Bacik
  2 siblings, 0 replies; 57+ messages in thread
From: Dan Carpenter @ 2020-09-22  9:24 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 14901 bytes --]

Hi Goldwyn,

url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-m001-20200920 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/iomap/direct-io.c:582 __iomap_dio_rw() warn: passing zero to 'ERR_PTR'

# https://github.com/0day-ci/linux/commit/072d65c9ae6669b5dd8ae4d8cccc49f2046afeb4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
git checkout 072d65c9ae6669b5dd8ae4d8cccc49f2046afeb4
vim +/ERR_PTR +582 fs/iomap/direct-io.c

072d65c9ae6669 Christoph Hellwig  2020-09-21  410  struct iomap_dio *
072d65c9ae6669 Christoph Hellwig  2020-09-21  411  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
13ef954445df4f Jan Kara           2019-10-15  412  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
13ef954445df4f Jan Kara           2019-10-15  413  		bool wait_for_completion)
db074436f42196 Darrick J. Wong    2019-07-15  414  {
db074436f42196 Darrick J. Wong    2019-07-15  415  	struct address_space *mapping = iocb->ki_filp->f_mapping;
db074436f42196 Darrick J. Wong    2019-07-15  416  	struct inode *inode = file_inode(iocb->ki_filp);
db074436f42196 Darrick J. Wong    2019-07-15  417  	size_t count = iov_iter_count(iter);
88cfd30e188fcf Johannes Thumshirn 2019-11-26  418  	loff_t pos = iocb->ki_pos;
db074436f42196 Darrick J. Wong    2019-07-15  419  	loff_t end = iocb->ki_pos + count - 1, ret = 0;
db074436f42196 Darrick J. Wong    2019-07-15  420  	unsigned int flags = IOMAP_DIRECT;
db074436f42196 Darrick J. Wong    2019-07-15  421  	struct blk_plug plug;
db074436f42196 Darrick J. Wong    2019-07-15  422  	struct iomap_dio *dio;
db074436f42196 Darrick J. Wong    2019-07-15  423  
db074436f42196 Darrick J. Wong    2019-07-15  424  	if (!count)
072d65c9ae6669 Christoph Hellwig  2020-09-21  425  		return NULL;
db074436f42196 Darrick J. Wong    2019-07-15  426  
13ef954445df4f Jan Kara           2019-10-15  427  	if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion))
072d65c9ae6669 Christoph Hellwig  2020-09-21  428  		return ERR_PTR(-EIO);
13ef954445df4f Jan Kara           2019-10-15  429  
db074436f42196 Darrick J. Wong    2019-07-15  430  	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
db074436f42196 Darrick J. Wong    2019-07-15  431  	if (!dio)
072d65c9ae6669 Christoph Hellwig  2020-09-21  432  		return ERR_PTR(-ENOMEM);
db074436f42196 Darrick J. Wong    2019-07-15  433  
db074436f42196 Darrick J. Wong    2019-07-15  434  	dio->iocb = iocb;
db074436f42196 Darrick J. Wong    2019-07-15  435  	atomic_set(&dio->ref, 1);
db074436f42196 Darrick J. Wong    2019-07-15  436  	dio->size = 0;
db074436f42196 Darrick J. Wong    2019-07-15  437  	dio->i_size = i_size_read(inode);
838c4f3d7515ef Christoph Hellwig  2019-09-19  438  	dio->dops = dops;
db074436f42196 Darrick J. Wong    2019-07-15  439  	dio->error = 0;
db074436f42196 Darrick J. Wong    2019-07-15  440  	dio->flags = 0;
db074436f42196 Darrick J. Wong    2019-07-15  441  
db074436f42196 Darrick J. Wong    2019-07-15  442  	dio->submit.iter = iter;
db074436f42196 Darrick J. Wong    2019-07-15  443  	dio->submit.waiter = current;
db074436f42196 Darrick J. Wong    2019-07-15  444  	dio->submit.cookie = BLK_QC_T_NONE;
db074436f42196 Darrick J. Wong    2019-07-15  445  	dio->submit.last_queue = NULL;
db074436f42196 Darrick J. Wong    2019-07-15  446  
db074436f42196 Darrick J. Wong    2019-07-15  447  	if (iov_iter_rw(iter) == READ) {
db074436f42196 Darrick J. Wong    2019-07-15  448  		if (pos >= dio->i_size)
db074436f42196 Darrick J. Wong    2019-07-15  449  			goto out_free_dio;

ret needs to be set on this patch to prevent an Oops in the caller.

db074436f42196 Darrick J. Wong    2019-07-15  450  
a901004214994f Joseph Qi          2019-10-29  451  		if (iter_is_iovec(iter))
db074436f42196 Darrick J. Wong    2019-07-15  452  			dio->flags |= IOMAP_DIO_DIRTY;
db074436f42196 Darrick J. Wong    2019-07-15  453  	} else {
db074436f42196 Darrick J. Wong    2019-07-15  454  		flags |= IOMAP_WRITE;
db074436f42196 Darrick J. Wong    2019-07-15  455  		dio->flags |= IOMAP_DIO_WRITE;
db074436f42196 Darrick J. Wong    2019-07-15  456  
db074436f42196 Darrick J. Wong    2019-07-15  457  		/* for data sync or sync, we need sync completion processing */
db074436f42196 Darrick J. Wong    2019-07-15  458  		if (iocb->ki_flags & IOCB_DSYNC)
db074436f42196 Darrick J. Wong    2019-07-15  459  			dio->flags |= IOMAP_DIO_NEED_SYNC;
db074436f42196 Darrick J. Wong    2019-07-15  460  
db074436f42196 Darrick J. Wong    2019-07-15  461  		/*
db074436f42196 Darrick J. Wong    2019-07-15  462  		 * For datasync only writes, we optimistically try using FUA for
db074436f42196 Darrick J. Wong    2019-07-15  463  		 * this IO.  Any non-FUA write that occurs will clear this flag,
db074436f42196 Darrick J. Wong    2019-07-15  464  		 * hence we know before completion whether a cache flush is
db074436f42196 Darrick J. Wong    2019-07-15  465  		 * necessary.
db074436f42196 Darrick J. Wong    2019-07-15  466  		 */
db074436f42196 Darrick J. Wong    2019-07-15  467  		if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC)
db074436f42196 Darrick J. Wong    2019-07-15  468  			dio->flags |= IOMAP_DIO_WRITE_FUA;
db074436f42196 Darrick J. Wong    2019-07-15  469  	}
db074436f42196 Darrick J. Wong    2019-07-15  470  
db074436f42196 Darrick J. Wong    2019-07-15  471  	if (iocb->ki_flags & IOCB_NOWAIT) {
88cfd30e188fcf Johannes Thumshirn 2019-11-26  472  		if (filemap_range_has_page(mapping, pos, end)) {
db074436f42196 Darrick J. Wong    2019-07-15  473  			ret = -EAGAIN;
db074436f42196 Darrick J. Wong    2019-07-15  474  			goto out_free_dio;
db074436f42196 Darrick J. Wong    2019-07-15  475  		}
db074436f42196 Darrick J. Wong    2019-07-15  476  		flags |= IOMAP_NOWAIT;
db074436f42196 Darrick J. Wong    2019-07-15  477  	}
db074436f42196 Darrick J. Wong    2019-07-15  478  
88cfd30e188fcf Johannes Thumshirn 2019-11-26  479  	ret = filemap_write_and_wait_range(mapping, pos, end);
db074436f42196 Darrick J. Wong    2019-07-15  480  	if (ret)
db074436f42196 Darrick J. Wong    2019-07-15  481  		goto out_free_dio;
db074436f42196 Darrick J. Wong    2019-07-15  482  
54752de928c400 Dave Chinner       2020-07-23  483  	if (iov_iter_rw(iter) == WRITE) {
db074436f42196 Darrick J. Wong    2019-07-15  484  		/*
54752de928c400 Dave Chinner       2020-07-23  485  		 * Try to invalidate cache pages for the range we are writing.
60263d5889e6dc Christoph Hellwig  2020-07-23  486  		 * If this invalidation fails, let the caller fall back to
60263d5889e6dc Christoph Hellwig  2020-07-23  487  		 * buffered I/O.
db074436f42196 Darrick J. Wong    2019-07-15  488  		 */
54752de928c400 Dave Chinner       2020-07-23  489  		if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
60263d5889e6dc Christoph Hellwig  2020-07-23  490  				end >> PAGE_SHIFT)) {
60263d5889e6dc Christoph Hellwig  2020-07-23  491  			trace_iomap_dio_invalidate_fail(inode, pos, count);
60263d5889e6dc Christoph Hellwig  2020-07-23  492  			ret = -ENOTBLK;
60263d5889e6dc Christoph Hellwig  2020-07-23  493  			goto out_free_dio;
60263d5889e6dc Christoph Hellwig  2020-07-23  494  		}
db074436f42196 Darrick J. Wong    2019-07-15  495  
54752de928c400 Dave Chinner       2020-07-23  496  		if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) {
db074436f42196 Darrick J. Wong    2019-07-15  497  			ret = sb_init_dio_done_wq(inode->i_sb);
db074436f42196 Darrick J. Wong    2019-07-15  498  			if (ret < 0)
db074436f42196 Darrick J. Wong    2019-07-15  499  				goto out_free_dio;
db074436f42196 Darrick J. Wong    2019-07-15  500  		}
54752de928c400 Dave Chinner       2020-07-23  501  	}
db074436f42196 Darrick J. Wong    2019-07-15  502  
db074436f42196 Darrick J. Wong    2019-07-15  503  	inode_dio_begin(inode);
db074436f42196 Darrick J. Wong    2019-07-15  504  
db074436f42196 Darrick J. Wong    2019-07-15  505  	blk_start_plug(&plug);
db074436f42196 Darrick J. Wong    2019-07-15  506  	do {
db074436f42196 Darrick J. Wong    2019-07-15  507  		ret = iomap_apply(inode, pos, count, flags, ops, dio,
db074436f42196 Darrick J. Wong    2019-07-15  508  				iomap_dio_actor);
db074436f42196 Darrick J. Wong    2019-07-15  509  		if (ret <= 0) {
db074436f42196 Darrick J. Wong    2019-07-15  510  			/* magic error code to fall back to buffered I/O */
db074436f42196 Darrick J. Wong    2019-07-15  511  			if (ret == -ENOTBLK) {
db074436f42196 Darrick J. Wong    2019-07-15  512  				wait_for_completion = true;
db074436f42196 Darrick J. Wong    2019-07-15  513  				ret = 0;
db074436f42196 Darrick J. Wong    2019-07-15  514  			}
db074436f42196 Darrick J. Wong    2019-07-15  515  			break;
db074436f42196 Darrick J. Wong    2019-07-15  516  		}
db074436f42196 Darrick J. Wong    2019-07-15  517  		pos += ret;
db074436f42196 Darrick J. Wong    2019-07-15  518  
419e9c38aa075e Jan Kara           2019-11-21  519  		if (iov_iter_rw(iter) == READ && pos >= dio->i_size) {
419e9c38aa075e Jan Kara           2019-11-21  520  			/*
419e9c38aa075e Jan Kara           2019-11-21  521  			 * We only report that we've read data up to i_size.
419e9c38aa075e Jan Kara           2019-11-21  522  			 * Revert iter to a state corresponding to that as
419e9c38aa075e Jan Kara           2019-11-21  523  			 * some callers (such as splice code) rely on it.
419e9c38aa075e Jan Kara           2019-11-21  524  			 */
419e9c38aa075e Jan Kara           2019-11-21  525  			iov_iter_revert(iter, pos - dio->i_size);
db074436f42196 Darrick J. Wong    2019-07-15  526  			break;
419e9c38aa075e Jan Kara           2019-11-21  527  		}
db074436f42196 Darrick J. Wong    2019-07-15  528  	} while ((count = iov_iter_count(iter)) > 0);
db074436f42196 Darrick J. Wong    2019-07-15  529  	blk_finish_plug(&plug);
db074436f42196 Darrick J. Wong    2019-07-15  530  
db074436f42196 Darrick J. Wong    2019-07-15  531  	if (ret < 0)
db074436f42196 Darrick J. Wong    2019-07-15  532  		iomap_dio_set_error(dio, ret);
db074436f42196 Darrick J. Wong    2019-07-15  533  
db074436f42196 Darrick J. Wong    2019-07-15  534  	/*
db074436f42196 Darrick J. Wong    2019-07-15  535  	 * If all the writes we issued were FUA, we don't need to flush the
db074436f42196 Darrick J. Wong    2019-07-15  536  	 * cache on IO completion. Clear the sync flag for this case.
db074436f42196 Darrick J. Wong    2019-07-15  537  	 */
db074436f42196 Darrick J. Wong    2019-07-15  538  	if (dio->flags & IOMAP_DIO_WRITE_FUA)
db074436f42196 Darrick J. Wong    2019-07-15  539  		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
db074436f42196 Darrick J. Wong    2019-07-15  540  
db074436f42196 Darrick J. Wong    2019-07-15  541  	WRITE_ONCE(iocb->ki_cookie, dio->submit.cookie);
db074436f42196 Darrick J. Wong    2019-07-15  542  	WRITE_ONCE(iocb->private, dio->submit.last_queue);
db074436f42196 Darrick J. Wong    2019-07-15  543  
db074436f42196 Darrick J. Wong    2019-07-15  544  	/*
db074436f42196 Darrick J. Wong    2019-07-15  545  	 * We are about to drop our additional submission reference, which
d9973ce2fe5bcd yangerkun          2020-03-18  546  	 * might be the last reference to the dio.  There are three different
d9973ce2fe5bcd yangerkun          2020-03-18  547  	 * ways we can progress here:
db074436f42196 Darrick J. Wong    2019-07-15  548  	 *
db074436f42196 Darrick J. Wong    2019-07-15  549  	 *  (a) If this is the last reference we will always complete and free
db074436f42196 Darrick J. Wong    2019-07-15  550  	 *	the dio ourselves.
db074436f42196 Darrick J. Wong    2019-07-15  551  	 *  (b) If this is not the last reference, and we serve an asynchronous
db074436f42196 Darrick J. Wong    2019-07-15  552  	 *	iocb, we must never touch the dio after the decrement, the
db074436f42196 Darrick J. Wong    2019-07-15  553  	 *	I/O completion handler will complete and free it.
db074436f42196 Darrick J. Wong    2019-07-15  554  	 *  (c) If this is not the last reference, but we serve a synchronous
db074436f42196 Darrick J. Wong    2019-07-15  555  	 *	iocb, the I/O completion handler will wake us up on the drop
db074436f42196 Darrick J. Wong    2019-07-15  556  	 *	of the final reference, and we will complete and free it here
db074436f42196 Darrick J. Wong    2019-07-15  557  	 *	after we got woken by the I/O completion handler.
db074436f42196 Darrick J. Wong    2019-07-15  558  	 */
db074436f42196 Darrick J. Wong    2019-07-15  559  	dio->wait_for_completion = wait_for_completion;
db074436f42196 Darrick J. Wong    2019-07-15  560  	if (!atomic_dec_and_test(&dio->ref)) {
db074436f42196 Darrick J. Wong    2019-07-15  561  		if (!wait_for_completion)
072d65c9ae6669 Christoph Hellwig  2020-09-21  562  			return ERR_PTR(-EIOCBQUEUED);
db074436f42196 Darrick J. Wong    2019-07-15  563  
db074436f42196 Darrick J. Wong    2019-07-15  564  		for (;;) {
db074436f42196 Darrick J. Wong    2019-07-15  565  			set_current_state(TASK_UNINTERRUPTIBLE);
db074436f42196 Darrick J. Wong    2019-07-15  566  			if (!READ_ONCE(dio->submit.waiter))
db074436f42196 Darrick J. Wong    2019-07-15  567  				break;
db074436f42196 Darrick J. Wong    2019-07-15  568  
db074436f42196 Darrick J. Wong    2019-07-15  569  			if (!(iocb->ki_flags & IOCB_HIPRI) ||
db074436f42196 Darrick J. Wong    2019-07-15  570  			    !dio->submit.last_queue ||
db074436f42196 Darrick J. Wong    2019-07-15  571  			    !blk_poll(dio->submit.last_queue,
db074436f42196 Darrick J. Wong    2019-07-15  572  					 dio->submit.cookie, true))
e6249cdd46e43a Ming Lei           2020-05-03  573  				blk_io_schedule();
db074436f42196 Darrick J. Wong    2019-07-15  574  		}
db074436f42196 Darrick J. Wong    2019-07-15  575  		__set_current_state(TASK_RUNNING);
db074436f42196 Darrick J. Wong    2019-07-15  576  	}
db074436f42196 Darrick J. Wong    2019-07-15  577  
072d65c9ae6669 Christoph Hellwig  2020-09-21  578  	return dio;
db074436f42196 Darrick J. Wong    2019-07-15  579  
db074436f42196 Darrick J. Wong    2019-07-15  580  out_free_dio:
db074436f42196 Darrick J. Wong    2019-07-15  581  	kfree(dio);
072d65c9ae6669 Christoph Hellwig  2020-09-21 @582  	return ERR_PTR(ret);
072d65c9ae6669 Christoph Hellwig  2020-09-21  583  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 39250 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
@ 2020-09-22  9:24     ` Dan Carpenter
  0 siblings, 0 replies; 57+ messages in thread
From: Dan Carpenter @ 2020-09-22  9:24 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 14901 bytes --]

Hi Goldwyn,

url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-m001-20200920 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/iomap/direct-io.c:582 __iomap_dio_rw() warn: passing zero to 'ERR_PTR'

# https://github.com/0day-ci/linux/commit/072d65c9ae6669b5dd8ae4d8cccc49f2046afeb4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
git checkout 072d65c9ae6669b5dd8ae4d8cccc49f2046afeb4
vim +/ERR_PTR +582 fs/iomap/direct-io.c

072d65c9ae6669 Christoph Hellwig  2020-09-21  410  struct iomap_dio *
072d65c9ae6669 Christoph Hellwig  2020-09-21  411  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
13ef954445df4f Jan Kara           2019-10-15  412  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
13ef954445df4f Jan Kara           2019-10-15  413  		bool wait_for_completion)
db074436f42196 Darrick J. Wong    2019-07-15  414  {
db074436f42196 Darrick J. Wong    2019-07-15  415  	struct address_space *mapping = iocb->ki_filp->f_mapping;
db074436f42196 Darrick J. Wong    2019-07-15  416  	struct inode *inode = file_inode(iocb->ki_filp);
db074436f42196 Darrick J. Wong    2019-07-15  417  	size_t count = iov_iter_count(iter);
88cfd30e188fcf Johannes Thumshirn 2019-11-26  418  	loff_t pos = iocb->ki_pos;
db074436f42196 Darrick J. Wong    2019-07-15  419  	loff_t end = iocb->ki_pos + count - 1, ret = 0;
db074436f42196 Darrick J. Wong    2019-07-15  420  	unsigned int flags = IOMAP_DIRECT;
db074436f42196 Darrick J. Wong    2019-07-15  421  	struct blk_plug plug;
db074436f42196 Darrick J. Wong    2019-07-15  422  	struct iomap_dio *dio;
db074436f42196 Darrick J. Wong    2019-07-15  423  
db074436f42196 Darrick J. Wong    2019-07-15  424  	if (!count)
072d65c9ae6669 Christoph Hellwig  2020-09-21  425  		return NULL;
db074436f42196 Darrick J. Wong    2019-07-15  426  
13ef954445df4f Jan Kara           2019-10-15  427  	if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion))
072d65c9ae6669 Christoph Hellwig  2020-09-21  428  		return ERR_PTR(-EIO);
13ef954445df4f Jan Kara           2019-10-15  429  
db074436f42196 Darrick J. Wong    2019-07-15  430  	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
db074436f42196 Darrick J. Wong    2019-07-15  431  	if (!dio)
072d65c9ae6669 Christoph Hellwig  2020-09-21  432  		return ERR_PTR(-ENOMEM);
db074436f42196 Darrick J. Wong    2019-07-15  433  
db074436f42196 Darrick J. Wong    2019-07-15  434  	dio->iocb = iocb;
db074436f42196 Darrick J. Wong    2019-07-15  435  	atomic_set(&dio->ref, 1);
db074436f42196 Darrick J. Wong    2019-07-15  436  	dio->size = 0;
db074436f42196 Darrick J. Wong    2019-07-15  437  	dio->i_size = i_size_read(inode);
838c4f3d7515ef Christoph Hellwig  2019-09-19  438  	dio->dops = dops;
db074436f42196 Darrick J. Wong    2019-07-15  439  	dio->error = 0;
db074436f42196 Darrick J. Wong    2019-07-15  440  	dio->flags = 0;
db074436f42196 Darrick J. Wong    2019-07-15  441  
db074436f42196 Darrick J. Wong    2019-07-15  442  	dio->submit.iter = iter;
db074436f42196 Darrick J. Wong    2019-07-15  443  	dio->submit.waiter = current;
db074436f42196 Darrick J. Wong    2019-07-15  444  	dio->submit.cookie = BLK_QC_T_NONE;
db074436f42196 Darrick J. Wong    2019-07-15  445  	dio->submit.last_queue = NULL;
db074436f42196 Darrick J. Wong    2019-07-15  446  
db074436f42196 Darrick J. Wong    2019-07-15  447  	if (iov_iter_rw(iter) == READ) {
db074436f42196 Darrick J. Wong    2019-07-15  448  		if (pos >= dio->i_size)
db074436f42196 Darrick J. Wong    2019-07-15  449  			goto out_free_dio;

ret needs to be set on this patch to prevent an Oops in the caller.

db074436f42196 Darrick J. Wong    2019-07-15  450  
a901004214994f Joseph Qi          2019-10-29  451  		if (iter_is_iovec(iter))
db074436f42196 Darrick J. Wong    2019-07-15  452  			dio->flags |= IOMAP_DIO_DIRTY;
db074436f42196 Darrick J. Wong    2019-07-15  453  	} else {
db074436f42196 Darrick J. Wong    2019-07-15  454  		flags |= IOMAP_WRITE;
db074436f42196 Darrick J. Wong    2019-07-15  455  		dio->flags |= IOMAP_DIO_WRITE;
db074436f42196 Darrick J. Wong    2019-07-15  456  
db074436f42196 Darrick J. Wong    2019-07-15  457  		/* for data sync or sync, we need sync completion processing */
db074436f42196 Darrick J. Wong    2019-07-15  458  		if (iocb->ki_flags & IOCB_DSYNC)
db074436f42196 Darrick J. Wong    2019-07-15  459  			dio->flags |= IOMAP_DIO_NEED_SYNC;
db074436f42196 Darrick J. Wong    2019-07-15  460  
db074436f42196 Darrick J. Wong    2019-07-15  461  		/*
db074436f42196 Darrick J. Wong    2019-07-15  462  		 * For datasync only writes, we optimistically try using FUA for
db074436f42196 Darrick J. Wong    2019-07-15  463  		 * this IO.  Any non-FUA write that occurs will clear this flag,
db074436f42196 Darrick J. Wong    2019-07-15  464  		 * hence we know before completion whether a cache flush is
db074436f42196 Darrick J. Wong    2019-07-15  465  		 * necessary.
db074436f42196 Darrick J. Wong    2019-07-15  466  		 */
db074436f42196 Darrick J. Wong    2019-07-15  467  		if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC)
db074436f42196 Darrick J. Wong    2019-07-15  468  			dio->flags |= IOMAP_DIO_WRITE_FUA;
db074436f42196 Darrick J. Wong    2019-07-15  469  	}
db074436f42196 Darrick J. Wong    2019-07-15  470  
db074436f42196 Darrick J. Wong    2019-07-15  471  	if (iocb->ki_flags & IOCB_NOWAIT) {
88cfd30e188fcf Johannes Thumshirn 2019-11-26  472  		if (filemap_range_has_page(mapping, pos, end)) {
db074436f42196 Darrick J. Wong    2019-07-15  473  			ret = -EAGAIN;
db074436f42196 Darrick J. Wong    2019-07-15  474  			goto out_free_dio;
db074436f42196 Darrick J. Wong    2019-07-15  475  		}
db074436f42196 Darrick J. Wong    2019-07-15  476  		flags |= IOMAP_NOWAIT;
db074436f42196 Darrick J. Wong    2019-07-15  477  	}
db074436f42196 Darrick J. Wong    2019-07-15  478  
88cfd30e188fcf Johannes Thumshirn 2019-11-26  479  	ret = filemap_write_and_wait_range(mapping, pos, end);
db074436f42196 Darrick J. Wong    2019-07-15  480  	if (ret)
db074436f42196 Darrick J. Wong    2019-07-15  481  		goto out_free_dio;
db074436f42196 Darrick J. Wong    2019-07-15  482  
54752de928c400 Dave Chinner       2020-07-23  483  	if (iov_iter_rw(iter) == WRITE) {
db074436f42196 Darrick J. Wong    2019-07-15  484  		/*
54752de928c400 Dave Chinner       2020-07-23  485  		 * Try to invalidate cache pages for the range we are writing.
60263d5889e6dc Christoph Hellwig  2020-07-23  486  		 * If this invalidation fails, let the caller fall back to
60263d5889e6dc Christoph Hellwig  2020-07-23  487  		 * buffered I/O.
db074436f42196 Darrick J. Wong    2019-07-15  488  		 */
54752de928c400 Dave Chinner       2020-07-23  489  		if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
60263d5889e6dc Christoph Hellwig  2020-07-23  490  				end >> PAGE_SHIFT)) {
60263d5889e6dc Christoph Hellwig  2020-07-23  491  			trace_iomap_dio_invalidate_fail(inode, pos, count);
60263d5889e6dc Christoph Hellwig  2020-07-23  492  			ret = -ENOTBLK;
60263d5889e6dc Christoph Hellwig  2020-07-23  493  			goto out_free_dio;
60263d5889e6dc Christoph Hellwig  2020-07-23  494  		}
db074436f42196 Darrick J. Wong    2019-07-15  495  
54752de928c400 Dave Chinner       2020-07-23  496  		if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) {
db074436f42196 Darrick J. Wong    2019-07-15  497  			ret = sb_init_dio_done_wq(inode->i_sb);
db074436f42196 Darrick J. Wong    2019-07-15  498  			if (ret < 0)
db074436f42196 Darrick J. Wong    2019-07-15  499  				goto out_free_dio;
db074436f42196 Darrick J. Wong    2019-07-15  500  		}
54752de928c400 Dave Chinner       2020-07-23  501  	}
db074436f42196 Darrick J. Wong    2019-07-15  502  
db074436f42196 Darrick J. Wong    2019-07-15  503  	inode_dio_begin(inode);
db074436f42196 Darrick J. Wong    2019-07-15  504  
db074436f42196 Darrick J. Wong    2019-07-15  505  	blk_start_plug(&plug);
db074436f42196 Darrick J. Wong    2019-07-15  506  	do {
db074436f42196 Darrick J. Wong    2019-07-15  507  		ret = iomap_apply(inode, pos, count, flags, ops, dio,
db074436f42196 Darrick J. Wong    2019-07-15  508  				iomap_dio_actor);
db074436f42196 Darrick J. Wong    2019-07-15  509  		if (ret <= 0) {
db074436f42196 Darrick J. Wong    2019-07-15  510  			/* magic error code to fall back to buffered I/O */
db074436f42196 Darrick J. Wong    2019-07-15  511  			if (ret == -ENOTBLK) {
db074436f42196 Darrick J. Wong    2019-07-15  512  				wait_for_completion = true;
db074436f42196 Darrick J. Wong    2019-07-15  513  				ret = 0;
db074436f42196 Darrick J. Wong    2019-07-15  514  			}
db074436f42196 Darrick J. Wong    2019-07-15  515  			break;
db074436f42196 Darrick J. Wong    2019-07-15  516  		}
db074436f42196 Darrick J. Wong    2019-07-15  517  		pos += ret;
db074436f42196 Darrick J. Wong    2019-07-15  518  
419e9c38aa075e Jan Kara           2019-11-21  519  		if (iov_iter_rw(iter) == READ && pos >= dio->i_size) {
419e9c38aa075e Jan Kara           2019-11-21  520  			/*
419e9c38aa075e Jan Kara           2019-11-21  521  			 * We only report that we've read data up to i_size.
419e9c38aa075e Jan Kara           2019-11-21  522  			 * Revert iter to a state corresponding to that as
419e9c38aa075e Jan Kara           2019-11-21  523  			 * some callers (such as splice code) rely on it.
419e9c38aa075e Jan Kara           2019-11-21  524  			 */
419e9c38aa075e Jan Kara           2019-11-21  525  			iov_iter_revert(iter, pos - dio->i_size);
db074436f42196 Darrick J. Wong    2019-07-15  526  			break;
419e9c38aa075e Jan Kara           2019-11-21  527  		}
db074436f42196 Darrick J. Wong    2019-07-15  528  	} while ((count = iov_iter_count(iter)) > 0);
db074436f42196 Darrick J. Wong    2019-07-15  529  	blk_finish_plug(&plug);
db074436f42196 Darrick J. Wong    2019-07-15  530  
db074436f42196 Darrick J. Wong    2019-07-15  531  	if (ret < 0)
db074436f42196 Darrick J. Wong    2019-07-15  532  		iomap_dio_set_error(dio, ret);
db074436f42196 Darrick J. Wong    2019-07-15  533  
db074436f42196 Darrick J. Wong    2019-07-15  534  	/*
db074436f42196 Darrick J. Wong    2019-07-15  535  	 * If all the writes we issued were FUA, we don't need to flush the
db074436f42196 Darrick J. Wong    2019-07-15  536  	 * cache on IO completion. Clear the sync flag for this case.
db074436f42196 Darrick J. Wong    2019-07-15  537  	 */
db074436f42196 Darrick J. Wong    2019-07-15  538  	if (dio->flags & IOMAP_DIO_WRITE_FUA)
db074436f42196 Darrick J. Wong    2019-07-15  539  		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
db074436f42196 Darrick J. Wong    2019-07-15  540  
db074436f42196 Darrick J. Wong    2019-07-15  541  	WRITE_ONCE(iocb->ki_cookie, dio->submit.cookie);
db074436f42196 Darrick J. Wong    2019-07-15  542  	WRITE_ONCE(iocb->private, dio->submit.last_queue);
db074436f42196 Darrick J. Wong    2019-07-15  543  
db074436f42196 Darrick J. Wong    2019-07-15  544  	/*
db074436f42196 Darrick J. Wong    2019-07-15  545  	 * We are about to drop our additional submission reference, which
d9973ce2fe5bcd yangerkun          2020-03-18  546  	 * might be the last reference to the dio.  There are three different
d9973ce2fe5bcd yangerkun          2020-03-18  547  	 * ways we can progress here:
db074436f42196 Darrick J. Wong    2019-07-15  548  	 *
db074436f42196 Darrick J. Wong    2019-07-15  549  	 *  (a) If this is the last reference we will always complete and free
db074436f42196 Darrick J. Wong    2019-07-15  550  	 *	the dio ourselves.
db074436f42196 Darrick J. Wong    2019-07-15  551  	 *  (b) If this is not the last reference, and we serve an asynchronous
db074436f42196 Darrick J. Wong    2019-07-15  552  	 *	iocb, we must never touch the dio after the decrement, the
db074436f42196 Darrick J. Wong    2019-07-15  553  	 *	I/O completion handler will complete and free it.
db074436f42196 Darrick J. Wong    2019-07-15  554  	 *  (c) If this is not the last reference, but we serve a synchronous
db074436f42196 Darrick J. Wong    2019-07-15  555  	 *	iocb, the I/O completion handler will wake us up on the drop
db074436f42196 Darrick J. Wong    2019-07-15  556  	 *	of the final reference, and we will complete and free it here
db074436f42196 Darrick J. Wong    2019-07-15  557  	 *	after we got woken by the I/O completion handler.
db074436f42196 Darrick J. Wong    2019-07-15  558  	 */
db074436f42196 Darrick J. Wong    2019-07-15  559  	dio->wait_for_completion = wait_for_completion;
db074436f42196 Darrick J. Wong    2019-07-15  560  	if (!atomic_dec_and_test(&dio->ref)) {
db074436f42196 Darrick J. Wong    2019-07-15  561  		if (!wait_for_completion)
072d65c9ae6669 Christoph Hellwig  2020-09-21  562  			return ERR_PTR(-EIOCBQUEUED);
db074436f42196 Darrick J. Wong    2019-07-15  563  
db074436f42196 Darrick J. Wong    2019-07-15  564  		for (;;) {
db074436f42196 Darrick J. Wong    2019-07-15  565  			set_current_state(TASK_UNINTERRUPTIBLE);
db074436f42196 Darrick J. Wong    2019-07-15  566  			if (!READ_ONCE(dio->submit.waiter))
db074436f42196 Darrick J. Wong    2019-07-15  567  				break;
db074436f42196 Darrick J. Wong    2019-07-15  568  
db074436f42196 Darrick J. Wong    2019-07-15  569  			if (!(iocb->ki_flags & IOCB_HIPRI) ||
db074436f42196 Darrick J. Wong    2019-07-15  570  			    !dio->submit.last_queue ||
db074436f42196 Darrick J. Wong    2019-07-15  571  			    !blk_poll(dio->submit.last_queue,
db074436f42196 Darrick J. Wong    2019-07-15  572  					 dio->submit.cookie, true))
e6249cdd46e43a Ming Lei           2020-05-03  573  				blk_io_schedule();
db074436f42196 Darrick J. Wong    2019-07-15  574  		}
db074436f42196 Darrick J. Wong    2019-07-15  575  		__set_current_state(TASK_RUNNING);
db074436f42196 Darrick J. Wong    2019-07-15  576  	}
db074436f42196 Darrick J. Wong    2019-07-15  577  
072d65c9ae6669 Christoph Hellwig  2020-09-21  578  	return dio;
db074436f42196 Darrick J. Wong    2019-07-15  579  
db074436f42196 Darrick J. Wong    2019-07-15  580  out_free_dio:
db074436f42196 Darrick J. Wong    2019-07-15  581  	kfree(dio);
072d65c9ae6669 Christoph Hellwig  2020-09-21 @582  	return ERR_PTR(ret);
072d65c9ae6669 Christoph Hellwig  2020-09-21  583  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 39250 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write
  2020-09-21 14:43 ` [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write Goldwyn Rodrigues
@ 2020-09-22  9:26     ` Dan Carpenter
  2020-09-22 14:48   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Dan Carpenter @ 2020-09-22  9:26 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 8499 bytes --]

Hi Goldwyn,

url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-m001-20200920 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/btrfs/file.c:2147 btrfs_file_write_iter() error: uninitialized symbol 'err'.

Old smatch warnings:
fs/btrfs/file.c:2902 btrfs_replace_file_extents() error: uninitialized symbol 'drop_end'.

# https://github.com/0day-ci/linux/commit/2e4ab0a0ed53f87e6fe8f369c2bf5d809f455ac4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
git checkout 2e4ab0a0ed53f87e6fe8f369c2bf5d809f455ac4
vim +/err +2147 fs/btrfs/file.c

b30ac0fc4109701 Al Viro           2014-04-03  2066  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
b30ac0fc4109701 Al Viro           2014-04-03  2067  				    struct iov_iter *from)
d0215f3e5ebb580 Josef Bacik       2011-01-25  2068  {
d0215f3e5ebb580 Josef Bacik       2011-01-25  2069  	struct file *file = iocb->ki_filp;
496ad9aa8ef4480 Al Viro           2013-01-23  2070  	struct inode *inode = file_inode(file);
0b246afa62b0cf5 Jeff Mahoney      2016-06-22  2071  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
d0215f3e5ebb580 Josef Bacik       2011-01-25  2072  	struct btrfs_root *root = BTRFS_I(inode)->root;
d0215f3e5ebb580 Josef Bacik       2011-01-25  2073  	ssize_t num_written = 0;
f50cb7aff964599 Omar Sandoval     2019-08-15  2074  	const bool sync = iocb->ki_flags & IOCB_DSYNC;
3309dd04cbcd2cd Al Viro           2015-04-09  2075  	ssize_t err;
                                                        ^^^^^^^^^^^
This is never initialized.

d0215f3e5ebb580 Josef Bacik       2011-01-25  2076  
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2077  	/*
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2078  	 * If BTRFS flips readonly due to some impossible error
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2079  	 * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR),
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2080  	 * although we have opened a file as writable, we have
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2081  	 * to stop this write operation to ensure FS consistency.
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2082  	 */
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2083  	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2084  		return -EROFS;
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2085  
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2086  	if (!(iocb->ki_flags & IOCB_DIRECT) &&
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2087  	    (iocb->ki_flags & IOCB_NOWAIT))
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2088  		return -EOPNOTSUPP;
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2089  
b812ce28796f746 Josef Bacik       2012-11-16  2090  	if (sync)
b812ce28796f746 Josef Bacik       2012-11-16  2091  		atomic_inc(&BTRFS_I(inode)->sync_writers);
b812ce28796f746 Josef Bacik       2012-11-16  2092  
2ba48ce513c4e54 Al Viro           2015-04-09  2093  	if (iocb->ki_flags & IOCB_DIRECT) {
09745ff88d93537 Josef Bacik       2020-09-03  2094  		/*
09745ff88d93537 Josef Bacik       2020-09-03  2095  		 * 1. We must always clear IOCB_DSYNC in order to not deadlock
09745ff88d93537 Josef Bacik       2020-09-03  2096  		 *    in iomap, as it calls generic_write_sync() in this case.
09745ff88d93537 Josef Bacik       2020-09-03  2097  		 * 2. If we are async, we can call iomap_dio_complete() either
09745ff88d93537 Josef Bacik       2020-09-03  2098  		 *    in
09745ff88d93537 Josef Bacik       2020-09-03  2099  		 *
09745ff88d93537 Josef Bacik       2020-09-03  2100  		 *    2.1. A worker thread from the last bio completed.  In this
09745ff88d93537 Josef Bacik       2020-09-03  2101  		 *	   case we need to mark the btrfs_dio_data that it is
09745ff88d93537 Josef Bacik       2020-09-03  2102  		 *	   async in order to call generic_write_sync() properly.
09745ff88d93537 Josef Bacik       2020-09-03  2103  		 *	   This is handled by setting BTRFS_DIO_SYNC_STUB in the
09745ff88d93537 Josef Bacik       2020-09-03  2104  		 *	   current->journal_info.
09745ff88d93537 Josef Bacik       2020-09-03  2105  		 *    2.2  The submitter context, because all IO completed
09745ff88d93537 Josef Bacik       2020-09-03  2106  		 *         before we exited iomap_dio_rw().  In this case we can
09745ff88d93537 Josef Bacik       2020-09-03  2107  		 *         just re-set the IOCB_DSYNC on the iocb and we'll do
09745ff88d93537 Josef Bacik       2020-09-03  2108  		 *         the sync below.  If our ->end_io() gets called and
09745ff88d93537 Josef Bacik       2020-09-03  2109  		 *         current->journal_info is set, then we know we're in
09745ff88d93537 Josef Bacik       2020-09-03  2110  		 *         our current context and we will clear
09745ff88d93537 Josef Bacik       2020-09-03  2111  		 *         current->journal_info to indicate that we need to
09745ff88d93537 Josef Bacik       2020-09-03  2112  		 *         sync below.
09745ff88d93537 Josef Bacik       2020-09-03  2113  		 */
09745ff88d93537 Josef Bacik       2020-09-03  2114  		if (sync) {
09745ff88d93537 Josef Bacik       2020-09-03  2115  			ASSERT(current->journal_info == NULL);
09745ff88d93537 Josef Bacik       2020-09-03  2116  			iocb->ki_flags &= ~IOCB_DSYNC;
09745ff88d93537 Josef Bacik       2020-09-03  2117  			current->journal_info = BTRFS_DIO_SYNC_STUB;
09745ff88d93537 Josef Bacik       2020-09-03  2118  		}
d8a13486adc39bc Goldwyn Rodrigues 2020-09-21  2119  		num_written = btrfs_direct_write(iocb, from);
09745ff88d93537 Josef Bacik       2020-09-03  2120  
09745ff88d93537 Josef Bacik       2020-09-03  2121  		/*
09745ff88d93537 Josef Bacik       2020-09-03  2122  		 * As stated above, we cleared journal_info, so we need to do
09745ff88d93537 Josef Bacik       2020-09-03  2123  		 * the sync ourselves.
09745ff88d93537 Josef Bacik       2020-09-03  2124  		 */
09745ff88d93537 Josef Bacik       2020-09-03  2125  		if (sync && current->journal_info == NULL)
09745ff88d93537 Josef Bacik       2020-09-03  2126  			iocb->ki_flags |= IOCB_DSYNC;
09745ff88d93537 Josef Bacik       2020-09-03  2127  		current->journal_info = NULL;
d0215f3e5ebb580 Josef Bacik       2011-01-25  2128  	} else {
e4af400a9c5081e Goldwyn Rodrigues 2018-06-17  2129  		num_written = btrfs_buffered_write(iocb, from);
d0215f3e5ebb580 Josef Bacik       2011-01-25  2130  	}
d0215f3e5ebb580 Josef Bacik       2011-01-25  2131  
5a3f23d515a2ebf Chris Mason       2009-03-31  2132  	/*
6c760c072403f44 Josef Bacik       2012-11-09  2133  	 * We also have to set last_sub_trans to the current log transid,
6c760c072403f44 Josef Bacik       2012-11-09  2134  	 * otherwise subsequent syncs to a file that's been synced in this
bb7ab3b92e46da0 Adam Buchbinder   2016-03-04  2135  	 * transaction will appear to have already occurred.
5a3f23d515a2ebf Chris Mason       2009-03-31  2136  	 */
2f2ff0ee5e4303e Filipe Manana     2015-03-20  2137  	spin_lock(&BTRFS_I(inode)->lock);
6c760c072403f44 Josef Bacik       2012-11-09  2138  	BTRFS_I(inode)->last_sub_trans = root->log_transid;
2f2ff0ee5e4303e Filipe Manana     2015-03-20  2139  	spin_unlock(&BTRFS_I(inode)->lock);
e259221763a4040 Christoph Hellwig 2016-04-07  2140  	if (num_written > 0)
e259221763a4040 Christoph Hellwig 2016-04-07  2141  		num_written = generic_write_sync(iocb, num_written);
0a3404dcff29a75 Miao Xie          2013-01-28  2142  
b812ce28796f746 Josef Bacik       2012-11-16  2143  	if (sync)
b812ce28796f746 Josef Bacik       2012-11-16  2144  		atomic_dec(&BTRFS_I(inode)->sync_writers);
81be057069d906f Goldwyn Rodrigues 2020-09-21  2145  
39279cc3d2704cf Chris Mason       2007-06-12  2146  	current->backing_dev_info = NULL;
39279cc3d2704cf Chris Mason       2007-06-12 @2147  	return num_written ? num_written : err;
39279cc3d2704cf Chris Mason       2007-06-12  2148  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 39250 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write
@ 2020-09-22  9:26     ` Dan Carpenter
  0 siblings, 0 replies; 57+ messages in thread
From: Dan Carpenter @ 2020-09-22  9:26 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8499 bytes --]

Hi Goldwyn,

url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-m001-20200920 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/btrfs/file.c:2147 btrfs_file_write_iter() error: uninitialized symbol 'err'.

Old smatch warnings:
fs/btrfs/file.c:2902 btrfs_replace_file_extents() error: uninitialized symbol 'drop_end'.

# https://github.com/0day-ci/linux/commit/2e4ab0a0ed53f87e6fe8f369c2bf5d809f455ac4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
git checkout 2e4ab0a0ed53f87e6fe8f369c2bf5d809f455ac4
vim +/err +2147 fs/btrfs/file.c

b30ac0fc4109701 Al Viro           2014-04-03  2066  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
b30ac0fc4109701 Al Viro           2014-04-03  2067  				    struct iov_iter *from)
d0215f3e5ebb580 Josef Bacik       2011-01-25  2068  {
d0215f3e5ebb580 Josef Bacik       2011-01-25  2069  	struct file *file = iocb->ki_filp;
496ad9aa8ef4480 Al Viro           2013-01-23  2070  	struct inode *inode = file_inode(file);
0b246afa62b0cf5 Jeff Mahoney      2016-06-22  2071  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
d0215f3e5ebb580 Josef Bacik       2011-01-25  2072  	struct btrfs_root *root = BTRFS_I(inode)->root;
d0215f3e5ebb580 Josef Bacik       2011-01-25  2073  	ssize_t num_written = 0;
f50cb7aff964599 Omar Sandoval     2019-08-15  2074  	const bool sync = iocb->ki_flags & IOCB_DSYNC;
3309dd04cbcd2cd Al Viro           2015-04-09  2075  	ssize_t err;
                                                        ^^^^^^^^^^^
This is never initialized.

d0215f3e5ebb580 Josef Bacik       2011-01-25  2076  
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2077  	/*
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2078  	 * If BTRFS flips readonly due to some impossible error
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2079  	 * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR),
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2080  	 * although we have opened a file as writable, we have
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2081  	 * to stop this write operation to ensure FS consistency.
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2082  	 */
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2083  	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2084  		return -EROFS;
1b4761d18914c44 Goldwyn Rodrigues 2020-09-21  2085  
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2086  	if (!(iocb->ki_flags & IOCB_DIRECT) &&
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2087  	    (iocb->ki_flags & IOCB_NOWAIT))
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2088  		return -EOPNOTSUPP;
91f9943e1c7b663 Christoph Hellwig 2017-08-29  2089  
b812ce28796f746 Josef Bacik       2012-11-16  2090  	if (sync)
b812ce28796f746 Josef Bacik       2012-11-16  2091  		atomic_inc(&BTRFS_I(inode)->sync_writers);
b812ce28796f746 Josef Bacik       2012-11-16  2092  
2ba48ce513c4e54 Al Viro           2015-04-09  2093  	if (iocb->ki_flags & IOCB_DIRECT) {
09745ff88d93537 Josef Bacik       2020-09-03  2094  		/*
09745ff88d93537 Josef Bacik       2020-09-03  2095  		 * 1. We must always clear IOCB_DSYNC in order to not deadlock
09745ff88d93537 Josef Bacik       2020-09-03  2096  		 *    in iomap, as it calls generic_write_sync() in this case.
09745ff88d93537 Josef Bacik       2020-09-03  2097  		 * 2. If we are async, we can call iomap_dio_complete() either
09745ff88d93537 Josef Bacik       2020-09-03  2098  		 *    in
09745ff88d93537 Josef Bacik       2020-09-03  2099  		 *
09745ff88d93537 Josef Bacik       2020-09-03  2100  		 *    2.1. A worker thread from the last bio completed.  In this
09745ff88d93537 Josef Bacik       2020-09-03  2101  		 *	   case we need to mark the btrfs_dio_data that it is
09745ff88d93537 Josef Bacik       2020-09-03  2102  		 *	   async in order to call generic_write_sync() properly.
09745ff88d93537 Josef Bacik       2020-09-03  2103  		 *	   This is handled by setting BTRFS_DIO_SYNC_STUB in the
09745ff88d93537 Josef Bacik       2020-09-03  2104  		 *	   current->journal_info.
09745ff88d93537 Josef Bacik       2020-09-03  2105  		 *    2.2  The submitter context, because all IO completed
09745ff88d93537 Josef Bacik       2020-09-03  2106  		 *         before we exited iomap_dio_rw().  In this case we can
09745ff88d93537 Josef Bacik       2020-09-03  2107  		 *         just re-set the IOCB_DSYNC on the iocb and we'll do
09745ff88d93537 Josef Bacik       2020-09-03  2108  		 *         the sync below.  If our ->end_io() gets called and
09745ff88d93537 Josef Bacik       2020-09-03  2109  		 *         current->journal_info is set, then we know we're in
09745ff88d93537 Josef Bacik       2020-09-03  2110  		 *         our current context and we will clear
09745ff88d93537 Josef Bacik       2020-09-03  2111  		 *         current->journal_info to indicate that we need to
09745ff88d93537 Josef Bacik       2020-09-03  2112  		 *         sync below.
09745ff88d93537 Josef Bacik       2020-09-03  2113  		 */
09745ff88d93537 Josef Bacik       2020-09-03  2114  		if (sync) {
09745ff88d93537 Josef Bacik       2020-09-03  2115  			ASSERT(current->journal_info == NULL);
09745ff88d93537 Josef Bacik       2020-09-03  2116  			iocb->ki_flags &= ~IOCB_DSYNC;
09745ff88d93537 Josef Bacik       2020-09-03  2117  			current->journal_info = BTRFS_DIO_SYNC_STUB;
09745ff88d93537 Josef Bacik       2020-09-03  2118  		}
d8a13486adc39bc Goldwyn Rodrigues 2020-09-21  2119  		num_written = btrfs_direct_write(iocb, from);
09745ff88d93537 Josef Bacik       2020-09-03  2120  
09745ff88d93537 Josef Bacik       2020-09-03  2121  		/*
09745ff88d93537 Josef Bacik       2020-09-03  2122  		 * As stated above, we cleared journal_info, so we need to do
09745ff88d93537 Josef Bacik       2020-09-03  2123  		 * the sync ourselves.
09745ff88d93537 Josef Bacik       2020-09-03  2124  		 */
09745ff88d93537 Josef Bacik       2020-09-03  2125  		if (sync && current->journal_info == NULL)
09745ff88d93537 Josef Bacik       2020-09-03  2126  			iocb->ki_flags |= IOCB_DSYNC;
09745ff88d93537 Josef Bacik       2020-09-03  2127  		current->journal_info = NULL;
d0215f3e5ebb580 Josef Bacik       2011-01-25  2128  	} else {
e4af400a9c5081e Goldwyn Rodrigues 2018-06-17  2129  		num_written = btrfs_buffered_write(iocb, from);
d0215f3e5ebb580 Josef Bacik       2011-01-25  2130  	}
d0215f3e5ebb580 Josef Bacik       2011-01-25  2131  
5a3f23d515a2ebf Chris Mason       2009-03-31  2132  	/*
6c760c072403f44 Josef Bacik       2012-11-09  2133  	 * We also have to set last_sub_trans to the current log transid,
6c760c072403f44 Josef Bacik       2012-11-09  2134  	 * otherwise subsequent syncs to a file that's been synced in this
bb7ab3b92e46da0 Adam Buchbinder   2016-03-04  2135  	 * transaction will appear to have already occurred.
5a3f23d515a2ebf Chris Mason       2009-03-31  2136  	 */
2f2ff0ee5e4303e Filipe Manana     2015-03-20  2137  	spin_lock(&BTRFS_I(inode)->lock);
6c760c072403f44 Josef Bacik       2012-11-09  2138  	BTRFS_I(inode)->last_sub_trans = root->log_transid;
2f2ff0ee5e4303e Filipe Manana     2015-03-20  2139  	spin_unlock(&BTRFS_I(inode)->lock);
e259221763a4040 Christoph Hellwig 2016-04-07  2140  	if (num_written > 0)
e259221763a4040 Christoph Hellwig 2016-04-07  2141  		num_written = generic_write_sync(iocb, num_written);
0a3404dcff29a75 Miao Xie          2013-01-28  2142  
b812ce28796f746 Josef Bacik       2012-11-16  2143  	if (sync)
b812ce28796f746 Josef Bacik       2012-11-16  2144  		atomic_dec(&BTRFS_I(inode)->sync_writers);
81be057069d906f Goldwyn Rodrigues 2020-09-21  2145  
39279cc3d2704cf Chris Mason       2007-06-12  2146  	current->backing_dev_info = NULL;
39279cc3d2704cf Chris Mason       2007-06-12 @2147  	return num_written ? num_written : err;
39279cc3d2704cf Chris Mason       2007-06-12  2148  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 39250 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  2020-09-21 14:43 ` [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
@ 2020-09-22 13:18   ` Christoph Hellwig
  2020-09-22 14:17   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-22 13:18 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues, Nikolay Borisov,
	Johannes Thumshirn

On Mon, Sep 21, 2020 at 09:43:40AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Since we now perform direct reads using i_rwsem, we can remove this
> inode flag used to co-ordinate unlocked reads.
> 
> The truncate call takes i_rwsem. This means it is correctly synchronized
> with concurrent direct reads.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Johannes Thumshirn <jth@kernel.org>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-21 15:09   ` Johannes Thumshirn
@ 2020-09-22 13:19     ` hch
  0 siblings, 0 replies; 57+ messages in thread
From: hch @ 2020-09-22 13:19 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Goldwyn Rodrigues, linux-fsdevel, linux-btrfs, david, hch,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues

On Mon, Sep 21, 2020 at 03:09:29PM +0000, Johannes Thumshirn wrote:
> On 21/09/2020 16:44, Goldwyn Rodrigues wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > This is to avoid the deadlock caused in btrfs because of O_DIRECT |
> > O_DSYNC.
> > 
> > Filesystems such as btrfs require i_rwsem while performing sync on a
> > file. iomap_dio_rw() is called under i_rw_sem. This leads to a
> > deadlock because of:
> > 
> > iomap_dio_complete()
> >   generic_write_sync()
> >     btrfs_sync_file()
> > 
> > Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems
> > can call iomap_dio_complete() after unlocking i_rwsem.
> > 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >
> 
> This is missing Christoph's S-o-b

Goldwyn picked this up for a rfc.  But this looks like my patch with
a sane commit log, so:

Signed-off-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-21 14:43 ` [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Goldwyn Rodrigues
  2020-09-21 15:11   ` Johannes Thumshirn
@ 2020-09-22 13:21   ` Christoph Hellwig
  2020-09-22 14:20   ` Josef Bacik
  2 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-22 13:21 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues

On Mon, Sep 21, 2020 at 09:43:42AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> iomap complete routine can deadlock with btrfs_fallocate because of the
> call to generic_write_sync().
> 
> P0                      P1
> inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
> __iomap_dio_rw()        inode_lock()
>                         <block>
> <submits IO>
> <completes IO>
> inode_unlock()
>                         <gets inode_lock()>
>                         inode_dio_wait()
> iomap_dio_complete()
>   generic_write_sync()
>     btrfs_file_fsync()
>       inode_lock()
>       <deadlock>
> 
> inode_dio_end() is used to notify the end of DIO data in order
> to synchronize with truncate. Call inode_dio_end() before calling
> generic_write_sync(), so filesystems can lock i_rwsem during a sync.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

You might want to mention in the commit log that this matches the
sequence in the old fs/direct-io.c implementation.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write
  2020-09-21 14:43 ` [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write Goldwyn Rodrigues
@ 2020-09-22 13:22   ` Christoph Hellwig
  2020-09-22 14:27   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-22 13:22 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues

On Mon, Sep 21, 2020 at 09:43:43AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The read and write DIO don't have anything in common except for the
> call to iomap_dio_rw. Extract the write call into a new function to get
> rid of conditional statements for direct write.
> 
> Originally proposed by Christoph Hellwig <hch@lst.de>
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write()
  2020-09-21 14:43 ` [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write() Goldwyn Rodrigues
@ 2020-09-22 13:22   ` Christoph Hellwig
  2020-09-22 14:30   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-22 13:22 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues

On Mon, Sep 21, 2020 at 09:43:44AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> While we do this, correct the call to pagecache_isize_extended():
>  - pagecache_isisze_extended needs to be called to the starting of the
>    write as opposed to i_size
>  - We don't need to check range before the call, this is done in the
>    function
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/15] btrfs: Introduce btrfs_write_check()
  2020-09-21 14:43 ` [PATCH 08/15] btrfs: Introduce btrfs_write_check() Goldwyn Rodrigues
@ 2020-09-22 13:26   ` Christoph Hellwig
  2020-09-22 14:42   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-22 13:26 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues

> +	if (iocb->ki_flags & IOCB_NOWAIT) {
> +		size_t nocow_bytes = count;
> +
> +		/*
> +		 * We will allocate space in case nodatacow is not set,
> +		 * so bail
> +		 */
> +		if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes)
> +		    <= 0)

This could easily be:

		if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes) <= 0)

as it perfectly fits withing 80 characters.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw()
  2020-09-21 14:43 ` [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw() Goldwyn Rodrigues
@ 2020-09-22 13:26   ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-22 13:26 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef, Goldwyn Rodrigues

On Mon, Sep 21, 2020 at 09:43:53AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> btrfs holds inode_lock_shared() while performing DIO within EOF, so
> lockdep_assert_held() check can be re-instated.
> 
> Revert 3ad99bec6e82 ("iomap: remove lockdep_assert_held()")

It turns out gfs2 also calls without the lock, so while I'd love to see
the assert come back we'd regress gfs2.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-22  9:24     ` Dan Carpenter
  (?)
@ 2020-09-22 14:16     ` Goldwyn Rodrigues
  2020-09-22 14:58         ` Dan Carpenter
  -1 siblings, 1 reply; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-22 14:16 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

Hi Dan,

On 12:24 22/09, Dan Carpenter wrote:
> Hi Goldwyn,
> 
> url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> config: x86_64-randconfig-m001-20200920 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> smatch warnings:
> fs/iomap/direct-io.c:582 __iomap_dio_rw() warn: passing zero to 'ERR_PTR'

This looks like a false warning. The code goes to the out_free_dio only
if ret < 0.


-- 
Goldwyn

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/15] fs: remove dio_end_io()
  2020-09-21 14:43 ` [PATCH 01/15] fs: remove dio_end_io() Goldwyn Rodrigues
@ 2020-09-22 14:17   ` Josef Bacik
  0 siblings, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:17 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues, Nikolay Borisov,
	Johannes Thumshirn

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Since we removed the last user of dio_end_io(), remove the helper
> function dio_end_io().
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  2020-09-21 14:43 ` [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
  2020-09-22 13:18   ` Christoph Hellwig
@ 2020-09-22 14:17   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:17 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues, Nikolay Borisov,
	Johannes Thumshirn

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Since we now perform direct reads using i_rwsem, we can remove this
> inode flag used to co-ordinate unlocked reads.
> 
> The truncate call takes i_rwsem. This means it is correctly synchronized
> with concurrent direct reads.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Johannes Thumshirn <jth@kernel.org>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-21 14:43 ` [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Goldwyn Rodrigues
  2020-09-21 15:09   ` Johannes Thumshirn
  2020-09-22  9:24     ` Dan Carpenter
@ 2020-09-22 14:17   ` Josef Bacik
  2 siblings, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:17 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> This is to avoid the deadlock caused in btrfs because of O_DIRECT |
> O_DSYNC.
> 
> Filesystems such as btrfs require i_rwsem while performing sync on a
> file. iomap_dio_rw() is called under i_rw_sem. This leads to a
> deadlock because of:
> 
> iomap_dio_complete()
>    generic_write_sync()
>      btrfs_sync_file()
> 
> Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems
> can call iomap_dio_complete() after unlocking i_rwsem.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>   fs/iomap/direct-io.c  | 34 +++++++++++++++++++++++++---------
>   include/linux/iomap.h |  5 +++++
>   2 files changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index c1aafb2ab990..d970c6bbbe11 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -76,7 +76,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
>   		dio->submit.cookie = submit_bio(bio);
>   }
>   
> -static ssize_t iomap_dio_complete(struct iomap_dio *dio)
> +ssize_t iomap_dio_complete(struct iomap_dio *dio)
>   {
>   	const struct iomap_dio_ops *dops = dio->dops;
>   	struct kiocb *iocb = dio->iocb;
> @@ -130,6 +130,7 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
>   
>   	return ret;
>   }
> +EXPORT_SYMBOL_GPL(iomap_dio_complete);
>   
>   static void iomap_dio_complete_work(struct work_struct *work)
>   {
> @@ -406,8 +407,8 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
>    * Returns -ENOTBLK In case of a page invalidation invalidation failure for
>    * writes.  The callers needs to fall back to buffered I/O in this case.
>    */
> -ssize_t
> -iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> +struct iomap_dio *
> +__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>   		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
>   		bool wait_for_completion)
>   {
> @@ -421,14 +422,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>   	struct iomap_dio *dio;
>   
>   	if (!count)
> -		return 0;
> +		return NULL;
>   
>   	if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion))
> -		return -EIO;
> +		return ERR_PTR(-EIO);
>   
>   	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
>   	if (!dio)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>   
>   	dio->iocb = iocb;
>   	atomic_set(&dio->ref, 1);
> @@ -558,7 +559,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>   	dio->wait_for_completion = wait_for_completion;
>   	if (!atomic_dec_and_test(&dio->ref)) {
>   		if (!wait_for_completion)
> -			return -EIOCBQUEUED;
> +			return ERR_PTR(-EIOCBQUEUED);
>   
>   		for (;;) {
>   			set_current_state(TASK_UNINTERRUPTIBLE);
> @@ -574,10 +575,25 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>   		__set_current_state(TASK_RUNNING);
>   	}
>   
> -	return iomap_dio_complete(dio);
> +	return dio;
>   
>   out_free_dio:
>   	kfree(dio);
> -	return ret;
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(__iomap_dio_rw);
> +
> +ssize_t
> +iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> +		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> +		bool wait_for_completion)
> +{
> +	struct iomap_dio *dio;
> +
> +	dio = __iomap_dio_rw(iocb, iter, ops, dops, wait_for_completion);
> +	if (IS_ERR_OR_NULL(dio))
> +		return PTR_ERR_OR_ZERO(dio);
> +	return iomap_dio_complete(dio);
>   }
>   EXPORT_SYMBOL_GPL(iomap_dio_rw);
> +

Got an extra + here for some reason.  Otherwise

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-21 14:43 ` [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Goldwyn Rodrigues
  2020-09-21 15:11   ` Johannes Thumshirn
  2020-09-22 13:21   ` Christoph Hellwig
@ 2020-09-22 14:20   ` Josef Bacik
  2020-09-22 16:31     ` Darrick J. Wong
  2 siblings, 1 reply; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:20 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> iomap complete routine can deadlock with btrfs_fallocate because of the
> call to generic_write_sync().
> 
> P0                      P1
> inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
> __iomap_dio_rw()        inode_lock()
>                          <block>
> <submits IO>
> <completes IO>
> inode_unlock()
>                          <gets inode_lock()>
>                          inode_dio_wait()
> iomap_dio_complete()
>    generic_write_sync()
>      btrfs_file_fsync()
>        inode_lock()
>        <deadlock>
> 
> inode_dio_end() is used to notify the end of DIO data in order
> to synchronize with truncate. Call inode_dio_end() before calling
> generic_write_sync(), so filesystems can lock i_rwsem during a sync.
> 
> ---
>   fs/iomap/direct-io.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index d970c6bbbe11..e01f81e7b76f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
>   			dio_warn_stale_pagecache(iocb->ki_filp);
>   	}
>   
> +	inode_dio_end(file_inode(iocb->ki_filp));
>   	/*
>   	 * If this is a DSYNC write, make sure we push it to stable storage now
>   	 * that we've written data.
> @@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
>   	if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
>   		ret = generic_write_sync(iocb, ret);
>   
> -	inode_dio_end(file_inode(iocb->ki_filp));
>   	kfree(dio);
>   
>   	return ret;
> 

Did you verify that xfs or ext4 don't rely on the inode_dio_end() happening 
before the generic_write_sync()?  I wouldn't expect that they would, but we've 
already run into problems making those kind of assumptions.  If it's fine you 
can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write
  2020-09-21 14:43 ` [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write Goldwyn Rodrigues
  2020-09-22 13:22   ` Christoph Hellwig
@ 2020-09-22 14:27   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The read and write DIO don't have anything in common except for the
> call to iomap_dio_rw. Extract the write call into a new function to get
> rid of conditional statements for direct write.
> 
> Originally proposed by Christoph Hellwig <hch@lst.de>
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write()
  2020-09-21 14:43 ` [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write() Goldwyn Rodrigues
  2020-09-22 13:22   ` Christoph Hellwig
@ 2020-09-22 14:30   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:30 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> While we do this, correct the call to pagecache_isize_extended():
>   - pagecache_isisze_extended needs to be called to the starting of the
>     write as opposed to i_size
>   - We don't need to check range before the call, this is done in the
>     function
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/15] btrfs: Move FS error state bit early during write
  2020-09-21 14:43 ` [PATCH 07/15] btrfs: Move FS error state bit early during write Goldwyn Rodrigues
@ 2020-09-22 14:38   ` Josef Bacik
  2020-09-23  9:10   ` Nikolay Borisov
  1 sibling, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:38 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> fs_info->fs_state is a filesystem bit check as opposed to inode
> and can be performed before we begin with write checks. This eliminates
> inode lock/unlock in case of error bit is set.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

The check was there because remove_file_privs may have tripped the error 
starting a trans handle.  That being said we could easily have been set before 
now, so this is fine

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/15] btrfs: Introduce btrfs_write_check()
  2020-09-21 14:43 ` [PATCH 08/15] btrfs: Introduce btrfs_write_check() Goldwyn Rodrigues
  2020-09-22 13:26   ` Christoph Hellwig
@ 2020-09-22 14:42   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:42 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> btrfs_write_check() checks for all parameters in one place before
> beginning a write. This does away with inode_unlock() after every check.
> In the later patches, it will help push inode_lock/unlock() in buffered
> and direct write functions respectively.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>   fs/btrfs/file.c | 159 ++++++++++++++++++++++++------------------------
>   1 file changed, 81 insertions(+), 78 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index ca374cb5ffc9..0f961ce1fa98 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1615,6 +1615,85 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
>   	btrfs_drew_write_unlock(&inode->root->snapshot_lock);
>   }
>   
> +static void update_time_for_write(struct inode *inode)
> +{
> +	struct timespec64 now;
> +
> +	if (IS_NOCMTIME(inode))
> +		return;
> +
> +	now = current_time(inode);
> +	if (!timespec64_equal(&inode->i_mtime, &now))
> +		inode->i_mtime = now;
> +
> +	if (!timespec64_equal(&inode->i_ctime, &now))
> +		inode->i_ctime = now;
> +
> +	if (IS_I_VERSION(inode))
> +		inode_inc_iversion(inode);
> +}
> +
> +static size_t btrfs_write_check(struct kiocb *iocb, struct iov_iter *from)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct inode *inode = file_inode(file);
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	loff_t pos = iocb->ki_pos;
> +	size_t count = iov_iter_count(from);
> +	int err;
> +	loff_t oldsize;
> +	loff_t start_pos;
> +
> +	err = generic_write_checks(iocb, from);
> +	if (err <= 0)
> +		return err;
> +
> +	if (iocb->ki_flags & IOCB_NOWAIT) {
> +		size_t nocow_bytes = count;
> +
> +		/*
> +		 * We will allocate space in case nodatacow is not set,
> +		 * so bail
> +		 */
> +		if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes)
> +		    <= 0)
> +			return -EAGAIN;
> +		/*
> +		 * There are holes in the range or parts of the range that must
> +		 * be COWed (shared extents, RO block groups, etc), so just bail
> +		 * out.
> +		 */
> +		if (nocow_bytes < count)
> +			return -EAGAIN;
> +	}
> +
> +	current->backing_dev_info = inode_to_bdi(inode);
> +	err = file_remove_privs(file);
> +	if (err)
> +		return err;
> +
> +	/*
> +	 * We reserve space for updating the inode when we reserve space for the
> +	 * extent we are going to write, so we will enospc out there.  We don't
> +	 * need to start yet another transaction to update the inode as we will
> +	 * update the inode when we finish writing whatever data we write.
> +	 */
> +	update_time_for_write(inode);
> +
> +	start_pos = round_down(pos, fs_info->sectorsize);
> +	oldsize = i_size_read(inode);
> +	if (start_pos > oldsize) {
> +		/* Expand hole size to cover write data, preventing empty gap */
> +		loff_t end_pos = round_up(pos + count,
> +					  fs_info->sectorsize);
> +		err = btrfs_cont_expand(inode, oldsize, end_pos);
> +		if (err)
> +			return err;

In this case we're not re-setting current->backing_dev_info to NULL.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock()
  2020-09-21 14:43 ` [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock() Goldwyn Rodrigues
@ 2020-09-22 14:45   ` Josef Bacik
  0 siblings, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:45 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Helper functions for locking/unlocking i_rwsem.
> btrfs_inode_lock/unlock() acquires the inode->i_rwsem depending on the
> flags passed. ilock_flags determines the type of lock to be taken:
> 
> BTRFS_ILOCK_SHARED - for shared locks, for possible parallel DIO
> BTRFS_ILOCK_TRY - for the RWF_NOWAIT sequence
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>   fs/btrfs/ctree.h |  7 +++++++
>   fs/btrfs/file.c  | 31 ++++++++++++++++---------------
>   fs/btrfs/inode.c | 31 +++++++++++++++++++++++++++++++
>   3 files changed, 54 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b47a8dcff028..ea15771bf3da 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3053,6 +3053,13 @@ extern const struct iomap_ops btrfs_dio_iomap_ops;
>   extern const struct iomap_dio_ops btrfs_dio_ops;
>   extern const struct iomap_dio_ops btrfs_sync_dops;
>   
> +/* ilock flags definition */
> +#define BTRFS_ILOCK_SHARED	(1 << 0)
> +#define BTRFS_ILOCK_TRY 	(1 << 1)
> +
> +int btrfs_inode_lock(struct inode *inode, int ilock_flags);
> +void btrfs_inode_unlock(struct inode *inode, int ilock_flags);
> +
>   /* ioctl.c */
>   long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
>   long btrfs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0f961ce1fa98..7e18334e8121 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1974,7 +1974,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>   	 * not unlock the i_mutex at this case.
>   	 */
>   	if (pos + iov_iter_count(from) <= inode->i_size) {
> -		inode_unlock(inode);
> +		btrfs_inode_unlock(inode, 0);
>   		relock = true;
>   	}
>   	down_read(&BTRFS_I(inode)->dio_sem);
> @@ -1995,7 +1995,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>   
>   	up_read(&BTRFS_I(inode)->dio_sem);
>   	if (relock)
> -		inode_lock(inode);
> +		btrfs_inode_lock(inode, 0);
>   
>   	if (written < 0 || !iov_iter_count(from))
>   		return written;
> @@ -2036,6 +2036,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>   	ssize_t num_written = 0;
>   	const bool sync = iocb->ki_flags & IOCB_DSYNC;
>   	ssize_t err;
> +	int ilock_flags = 0;
>   
>   	/*
>   	 * If BTRFS flips readonly due to some impossible error
> @@ -2050,16 +2051,16 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>   	    (iocb->ki_flags & IOCB_NOWAIT))
>   		return -EOPNOTSUPP;
>   
> -	if (iocb->ki_flags & IOCB_NOWAIT) {
> -		if (!inode_trylock(inode))
> -			return -EAGAIN;
> -	} else {
> -		inode_lock(inode);
> -	}
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		ilock_flags |= BTRFS_ILOCK_TRY;
> +
> +	err = btrfs_inode_lock(inode, ilock_flags);
> +	if (err < 0)
> +		return err;
>   
>   	err = btrfs_write_check(iocb, from);
>   	if (err <= 0) {
> -		inode_unlock(inode);
> +		btrfs_inode_unlock(inode, ilock_flags);
>   		return err;
>   	}
>   
> @@ -2105,7 +2106,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>   		num_written = btrfs_buffered_write(iocb, from);
>   	}
>   
> -	inode_unlock(inode);
> +	btrfs_inode_unlock(inode, ilock_flags);
>   
>   	/*
>   	 * We also have to set last_sub_trans to the current log transid,
> @@ -3405,7 +3406,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>   			return ret;
>   	}
>   
> -	inode_lock(inode);
> +	btrfs_inode_lock(inode, 0);
>   
>   	if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) {
>   		ret = inode_newsize_ok(inode, offset + len);
> @@ -3644,9 +3645,9 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence)
>   		return generic_file_llseek(file, offset, whence);
>   	case SEEK_DATA:
>   	case SEEK_HOLE:
> -		inode_lock_shared(inode);
> +		btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
>   		offset = find_desired_extent(inode, offset, whence);
> -		inode_unlock_shared(inode);
> +		btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
>   		break;
>   	}
>   
> @@ -3690,10 +3691,10 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to)
>   	if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos))
>   		return 0;
>   
> -	inode_lock_shared(inode);
> +	btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
>   	ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
>   			is_sync_kiocb(iocb));
> -	inode_unlock_shared(inode);
> +	btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
>   	return ret;
>   }
>   
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0730131b6590..f305efac75ae 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -96,6 +96,37 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode,
>   					 const u64 offset, const u64 bytes,
>   					 const bool uptodate);
>   
> +int btrfs_inode_lock(struct inode *inode, int ilock_flags)
> +{
> +	if (ilock_flags & BTRFS_ILOCK_SHARED) {
> +		if (ilock_flags & BTRFS_ILOCK_TRY) {
> +			if (!inode_trylock_shared(inode))
> +				return -EAGAIN;
> +			else
> +				return 0;
> +		}
> +		inode_lock_shared(inode);
> +	} else {
> +		if (ilock_flags & BTRFS_ILOCK_TRY) {
> +			if (!inode_trylock(inode))
> +				return -EAGAIN;
> +			else
> +				return 0;
> +		}
> +		inode_lock(inode);
> +	}
> +	return 0;
> +}
> +
> +void btrfs_inode_unlock(struct inode *inode, int ilock_flags)
> +{
> +	if (ilock_flags & BTRFS_ILOCK_SHARED)
> +		inode_unlock_shared(inode);
> +	else
> +		inode_unlock(inode);
> +
> +}
> +

Since you are going to have to respin anyway, function comments for these 
please.  Then you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write
  2020-09-21 14:43 ` [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write Goldwyn Rodrigues
  2020-09-22  9:26     ` Dan Carpenter
@ 2020-09-22 14:48   ` Josef Bacik
  1 sibling, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:48 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Push inode locking and unlocking closer to where we perform the I/O. For
> this we need to move the write checks inside the respective functions as
> well.
> 
> pos is assigned after write checks because generic_write_check can
> modify iocb->ki_pos.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF
  2020-09-21 14:43 ` [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF Goldwyn Rodrigues
@ 2020-09-22 14:52   ` Josef Bacik
  2020-09-22 17:33     ` Goldwyn Rodrigues
  0 siblings, 1 reply; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:52 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Direct writes within EOF are safe to be performed with inode shared lock
> to improve parallelization with other direct writes or reads because EOF
> is not changed and there is no race with truncate().
> 
> Direct reads are already performed under shared inode lock.
> 
> This patch is precursor to removing btrfs_inode->dio_sem.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>   fs/btrfs/file.c | 33 +++++++++++++++++++++------------
>   1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index d9c3be19d7b3..50092d24eee2 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1977,7 +1977,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	loff_t pos;
>   	ssize_t written = 0;
> -	bool relock = false;
>   	ssize_t written_buffered;
>   	loff_t endbyte;
>   	int err;
> @@ -1986,6 +1985,15 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>   	if (iocb->ki_flags & IOCB_NOWAIT)
>   		ilock_flags |= BTRFS_ILOCK_TRY;
>   
> +	/*
> +	 * If the write DIO within EOF,  use a shared lock
> +	 */
> +	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode))
> +		ilock_flags |= BTRFS_ILOCK_SHARED;
> +	else if (iocb->ki_flags & IOCB_NOWAIT)
> +		return -EAGAIN;
> +
> +relock:

Huh?  Why are you making it so EOF extending NOWAIT writes now fail?  We are 
still using ILOCK_TRY here, so we may still not block, am I missing something? 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 12/15] btrfs: Remove dio_sem
  2020-09-21 14:43 ` [PATCH 12/15] btrfs: Remove dio_sem Goldwyn Rodrigues
@ 2020-09-22 14:52   ` Josef Bacik
  0 siblings, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 14:52 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> dio_sem can be eliminated because all DIO synchronization is performed
> through inode->i_rwsem now.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-22 14:16     ` Goldwyn Rodrigues
@ 2020-09-22 14:58         ` Dan Carpenter
  0 siblings, 0 replies; 57+ messages in thread
From: Dan Carpenter @ 2020-09-22 14:58 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Tue, Sep 22, 2020 at 09:16:49AM -0500, Goldwyn Rodrigues wrote:
> Hi Dan,
> 
> On 12:24 22/09, Dan Carpenter wrote:
> > Hi Goldwyn,
> > 
> > url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649 
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git  for-next
> > config: x86_64-randconfig-m001-20200920 (attached as .config)
> > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> > 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > 
> > smatch warnings:
> > fs/iomap/direct-io.c:582 __iomap_dio_rw() warn: passing zero to 'ERR_PTR'
> 
> This looks like a false warning. The code goes to the out_free_dio only
> if ret < 0.

The email has a full copy of the function and I noted the place where
"ret" was set to zero.

It's possible that this was a merge error with the kbuild bot?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
@ 2020-09-22 14:58         ` Dan Carpenter
  0 siblings, 0 replies; 57+ messages in thread
From: Dan Carpenter @ 2020-09-22 14:58 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Tue, Sep 22, 2020 at 09:16:49AM -0500, Goldwyn Rodrigues wrote:
> Hi Dan,
> 
> On 12:24 22/09, Dan Carpenter wrote:
> > Hi Goldwyn,
> > 
> > url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649 
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git  for-next
> > config: x86_64-randconfig-m001-20200920 (attached as .config)
> > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> > 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > 
> > smatch warnings:
> > fs/iomap/direct-io.c:582 __iomap_dio_rw() warn: passing zero to 'ERR_PTR'
> 
> This looks like a false warning. The code goes to the out_free_dio only
> if ret < 0.

The email has a full copy of the function and I noted the place where
"ret" was set to zero.

It's possible that this was a merge error with the kbuild bot?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock
  2020-09-21 14:43 ` [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock Goldwyn Rodrigues
@ 2020-09-22 15:11   ` Josef Bacik
  0 siblings, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 15:11 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> If direct writes are called with O_DIRECT | O_DSYNC, it will result in a
> deadlock because iomap_dio_rw() is called under i_rwsem which calls
> iomap_dio_complete()
>    generic_write_sync()
>      btrfs_sync_file().
> 
> btrfs_sync_file() requires i_rwsem, so call __iomap_dio_rw() with the
> i_rwsem locked, and call iomap_dio_complete() after unlocking i_rwsem.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround")
  2020-09-21 14:43 ` [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround") Goldwyn Rodrigues
@ 2020-09-22 15:12   ` Josef Bacik
  0 siblings, 0 replies; 57+ messages in thread
From: Josef Bacik @ 2020-09-22 15:12 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, Goldwyn Rodrigues

On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> iomap_dio_complete()->generic_write_sync() is not called under i_rwsem
> anymore, revert the workaround.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem
  2020-09-22 14:58         ` Dan Carpenter
  (?)
@ 2020-09-22 16:06         ` Goldwyn Rodrigues
  -1 siblings, 0 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-22 16:06 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

On 17:58 22/09, Dan Carpenter wrote:
> On Tue, Sep 22, 2020 at 09:16:49AM -0500, Goldwyn Rodrigues wrote:
> > Hi Dan,
> > 
> > On 12:24 22/09, Dan Carpenter wrote:
> > > Hi Goldwyn,
> > > 
> > > url:    https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/BTRFS-DIO-inode-locking-D_SYNC-fix/20200921-224649 
> > > base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git  for-next
> > > config: x86_64-randconfig-m001-20200920 (attached as .config)
> > > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> > > 
> > > If you fix the issue, kindly add following tag as appropriate
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > 
> > > smatch warnings:
> > > fs/iomap/direct-io.c:582 __iomap_dio_rw() warn: passing zero to 'ERR_PTR'
> > 
> > This looks like a false warning. The code goes to the out_free_dio only
> > if ret < 0.
> 
> The email has a full copy of the function and I noted the place where
> "ret" was set to zero.
> 
> It's possible that this was a merge error with the kbuild bot?
> 

Sorry, I was wrong. Got it. Thanks!

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-22 14:20   ` Josef Bacik
@ 2020-09-22 16:31     ` Darrick J. Wong
  2020-09-22 17:25       ` Goldwyn Rodrigues
  2020-09-22 21:49       ` Dave Chinner
  0 siblings, 2 replies; 57+ messages in thread
From: Darrick J. Wong @ 2020-09-22 16:31 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Goldwyn Rodrigues, linux-fsdevel, linux-btrfs, david, hch,
	johannes.thumshirn, dsterba, Goldwyn Rodrigues

On Tue, Sep 22, 2020 at 10:20:11AM -0400, Josef Bacik wrote:
> On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > iomap complete routine can deadlock with btrfs_fallocate because of the
> > call to generic_write_sync().
> > 
> > P0                      P1
> > inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
> > __iomap_dio_rw()        inode_lock()
> >                          <block>
> > <submits IO>
> > <completes IO>
> > inode_unlock()
> >                          <gets inode_lock()>
> >                          inode_dio_wait()
> > iomap_dio_complete()
> >    generic_write_sync()
> >      btrfs_file_fsync()
> >        inode_lock()
> >        <deadlock>
> > 
> > inode_dio_end() is used to notify the end of DIO data in order
> > to synchronize with truncate. Call inode_dio_end() before calling
> > generic_write_sync(), so filesystems can lock i_rwsem during a sync.
> > 
> > ---
> >   fs/iomap/direct-io.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index d970c6bbbe11..e01f81e7b76f 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> >   			dio_warn_stale_pagecache(iocb->ki_filp);
> >   	}
> > +	inode_dio_end(file_inode(iocb->ki_filp));
> >   	/*
> >   	 * If this is a DSYNC write, make sure we push it to stable storage now
> >   	 * that we've written data.
> > @@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> >   	if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
> >   		ret = generic_write_sync(iocb, ret);
> > -	inode_dio_end(file_inode(iocb->ki_filp));
> >   	kfree(dio);
> >   	return ret;
> > 
> 
> Did you verify that xfs or ext4 don't rely on the inode_dio_end() happening
> before the generic_write_sync()?  I wouldn't expect that they would, but
> we've already run into problems making those kind of assumptions.  If it's
> fine you can add

I was gonna ask the same question, but as there's no SoB on this patch I
hadn't really looked at it yet. ;)

Operations that rely on inode_dio_wait to have blocked until all the
directios are complete could get tripped up by iomap not having done the
generic_write_sync to stabilise the metadata, but I /think/ most
operations that do that also themselves flush the file.  But I don't
really know if there's a subtlety there if the inode_dio_wait thread
manages to grab the ILOCK before the generic_write_sync thread does.

--D

> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> Thanks,
> 
> Josef

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-22 16:31     ` Darrick J. Wong
@ 2020-09-22 17:25       ` Goldwyn Rodrigues
  2020-09-22 21:49       ` Dave Chinner
  1 sibling, 0 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-22 17:25 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Josef Bacik, linux-fsdevel, linux-btrfs, david, hch,
	johannes.thumshirn, dsterba

On  9:31 22/09, Darrick J. Wong wrote:
> On Tue, Sep 22, 2020 at 10:20:11AM -0400, Josef Bacik wrote:
> > On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> > > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > > 
> > > iomap complete routine can deadlock with btrfs_fallocate because of the
> > > call to generic_write_sync().
> > > 
> > > P0                      P1
> > > inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
> > > __iomap_dio_rw()        inode_lock()
> > >                          <block>
> > > <submits IO>
> > > <completes IO>
> > > inode_unlock()
> > >                          <gets inode_lock()>
> > >                          inode_dio_wait()
> > > iomap_dio_complete()
> > >    generic_write_sync()
> > >      btrfs_file_fsync()
> > >        inode_lock()
> > >        <deadlock>
> > > 
> > > inode_dio_end() is used to notify the end of DIO data in order
> > > to synchronize with truncate. Call inode_dio_end() before calling
> > > generic_write_sync(), so filesystems can lock i_rwsem during a sync.
> > > 
> > > ---
> > >   fs/iomap/direct-io.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index d970c6bbbe11..e01f81e7b76f 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> > >   			dio_warn_stale_pagecache(iocb->ki_filp);
> > >   	}
> > > +	inode_dio_end(file_inode(iocb->ki_filp));
> > >   	/*
> > >   	 * If this is a DSYNC write, make sure we push it to stable storage now
> > >   	 * that we've written data.
> > > @@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> > >   	if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
> > >   		ret = generic_write_sync(iocb, ret);
> > > -	inode_dio_end(file_inode(iocb->ki_filp));
> > >   	kfree(dio);
> > >   	return ret;
> > > 
> > 
> > Did you verify that xfs or ext4 don't rely on the inode_dio_end() happening
> > before the generic_write_sync()?  I wouldn't expect that they would, but
> > we've already run into problems making those kind of assumptions.  If it's
> > fine you can add
> 
> I was gonna ask the same question, but as there's no SoB on this patch I
> hadn't really looked at it yet. ;)
> 
> Operations that rely on inode_dio_wait to have blocked until all the
> directios are complete could get tripped up by iomap not having done the
> generic_write_sync to stabilise the metadata, but I /think/ most
> operations that do that also themselves flush the file.  But I don't
> really know if there's a subtlety there if the inode_dio_wait thread
> manages to grab the ILOCK before the generic_write_sync thread does.
> 

I ran xfstests and it was successful. I am testing ext4 now.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF
  2020-09-22 14:52   ` Josef Bacik
@ 2020-09-22 17:33     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-22 17:33 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong

On 10:52 22/09, Josef Bacik wrote:
> On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > Direct writes within EOF are safe to be performed with inode shared lock
> > to improve parallelization with other direct writes or reads because EOF
> > is not changed and there is no race with truncate().
> > 
> > Direct reads are already performed under shared inode lock.
> > 
> > This patch is precursor to removing btrfs_inode->dio_sem.
> > 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > ---
> >   fs/btrfs/file.c | 33 +++++++++++++++++++++------------
> >   1 file changed, 21 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index d9c3be19d7b3..50092d24eee2 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -1977,7 +1977,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
> >   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> >   	loff_t pos;
> >   	ssize_t written = 0;
> > -	bool relock = false;
> >   	ssize_t written_buffered;
> >   	loff_t endbyte;
> >   	int err;
> > @@ -1986,6 +1985,15 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
> >   	if (iocb->ki_flags & IOCB_NOWAIT)
> >   		ilock_flags |= BTRFS_ILOCK_TRY;
> > +	/*
> > +	 * If the write DIO within EOF,  use a shared lock
> > +	 */
> > +	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode))
> > +		ilock_flags |= BTRFS_ILOCK_SHARED;
> > +	else if (iocb->ki_flags & IOCB_NOWAIT)
> > +		return -EAGAIN;
> > +
> > +relock:
> 
> Huh?  Why are you making it so EOF extending NOWAIT writes now fail?  We are
> still using ILOCK_TRY here, so we may still not block, am I missing
> something? Thanks,
> 

Yes, this is incorrect. I had thought of this would block on disk space
allocations. But did not consider the prealloc case.

I am removing this check to match the previous behavior.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-22 16:31     ` Darrick J. Wong
  2020-09-22 17:25       ` Goldwyn Rodrigues
@ 2020-09-22 21:49       ` Dave Chinner
  2020-09-23  5:16         ` Christoph Hellwig
  1 sibling, 1 reply; 57+ messages in thread
From: Dave Chinner @ 2020-09-22 21:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Josef Bacik, Goldwyn Rodrigues, linux-fsdevel, linux-btrfs, hch,
	johannes.thumshirn, dsterba, Goldwyn Rodrigues

On Tue, Sep 22, 2020 at 09:31:56AM -0700, Darrick J. Wong wrote:
> On Tue, Sep 22, 2020 at 10:20:11AM -0400, Josef Bacik wrote:
> > On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote:
> > > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > > 
> > > iomap complete routine can deadlock with btrfs_fallocate because of the
> > > call to generic_write_sync().
> > > 
> > > P0                      P1
> > > inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
> > > __iomap_dio_rw()        inode_lock()
> > >                          <block>
> > > <submits IO>
> > > <completes IO>
> > > inode_unlock()
> > >                          <gets inode_lock()>
> > >                          inode_dio_wait()
> > > iomap_dio_complete()
> > >    generic_write_sync()
> > >      btrfs_file_fsync()
> > >        inode_lock()
> > >        <deadlock>
> > > 
> > > inode_dio_end() is used to notify the end of DIO data in order
> > > to synchronize with truncate. Call inode_dio_end() before calling
> > > generic_write_sync(), so filesystems can lock i_rwsem during a sync.
> > > 
> > > ---
> > >   fs/iomap/direct-io.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index d970c6bbbe11..e01f81e7b76f 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> > >   			dio_warn_stale_pagecache(iocb->ki_filp);
> > >   	}
> > > +	inode_dio_end(file_inode(iocb->ki_filp));
> > >   	/*
> > >   	 * If this is a DSYNC write, make sure we push it to stable storage now
> > >   	 * that we've written data.
> > > @@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> > >   	if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
> > >   		ret = generic_write_sync(iocb, ret);
> > > -	inode_dio_end(file_inode(iocb->ki_filp));
> > >   	kfree(dio);
> > >   	return ret;
> > > 
> > 
> > Did you verify that xfs or ext4 don't rely on the inode_dio_end() happening
> > before the generic_write_sync()?  I wouldn't expect that they would, but
> > we've already run into problems making those kind of assumptions.  If it's
> > fine you can add
> 
> I was gonna ask the same question, but as there's no SoB on this patch I
> hadn't really looked at it yet. ;)
> 
> Operations that rely on inode_dio_wait to have blocked until all the
> directios are complete could get tripped up by iomap not having done the
> generic_write_sync to stabilise the metadata, but I /think/ most
> operations that do that also themselves flush the file.  But I don't
> really know if there's a subtlety there if the inode_dio_wait thread
> manages to grab the ILOCK before the generic_write_sync thread does.

I did point out in the previous thread that this actually means that
inode_dio_wait() now has inconsistent wait semantics for O_DSYNC
writes. If it's a pure overwrite and we hit the FUA path, the
O_DSYNC write will be complete and guaranteed to be on stable storage
before the IO completes. If the inode is metadata dirty, then the IO
will now be signalled complete *before* the data and metadata are
flushed to stable storage.

Hence, from the perspective of writes to *stable* storage, this
makes the ordering of O_DSYNC DIO against anything waiting for it to
complete to be potentially inconsistent at the stable storage level.

That's an extremely subtle change of behaviour, and something that
would be largely impossible to test or reproduce. And, really, I
don't like having this sort of "oh, it should be fine" handwavy
justification when we are talking about data integrity operations...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-22 21:49       ` Dave Chinner
@ 2020-09-23  5:16         ` Christoph Hellwig
  2020-09-23  5:31           ` Darrick J. Wong
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-23  5:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Josef Bacik, Goldwyn Rodrigues, linux-fsdevel,
	linux-btrfs, hch, johannes.thumshirn, dsterba, Goldwyn Rodrigues

On Wed, Sep 23, 2020 at 07:49:34AM +1000, Dave Chinner wrote:
> I did point out in the previous thread that this actually means that
> inode_dio_wait() now has inconsistent wait semantics for O_DSYNC
> writes. If it's a pure overwrite and we hit the FUA path, the
> O_DSYNC write will be complete and guaranteed to be on stable storage
> before the IO completes. If the inode is metadata dirty, then the IO
> will now be signalled complete *before* the data and metadata are
> flushed to stable storage.
> 
> Hence, from the perspective of writes to *stable* storage, this
> makes the ordering of O_DSYNC DIO against anything waiting for it to
> complete to be potentially inconsistent at the stable storage level.
> 
> That's an extremely subtle change of behaviour, and something that
> would be largely impossible to test or reproduce. And, really, I
> don't like having this sort of "oh, it should be fine" handwavy
> justification when we are talking about data integrity operations...

... and I replied with a detailed analysis of what it is fine, and
how this just restores the behavior we historically had before
switching to the iomap direct I/O code.  Although if we want to go
into the fine details we did not have the REQ_FUA path back then,
but that does not change the analysis.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-23  5:16         ` Christoph Hellwig
@ 2020-09-23  5:31           ` Darrick J. Wong
  2020-09-23  5:49             ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Darrick J. Wong @ 2020-09-23  5:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Josef Bacik, Goldwyn Rodrigues, linux-fsdevel,
	linux-btrfs, johannes.thumshirn, dsterba, Goldwyn Rodrigues

On Wed, Sep 23, 2020 at 07:16:58AM +0200, Christoph Hellwig wrote:
> On Wed, Sep 23, 2020 at 07:49:34AM +1000, Dave Chinner wrote:
> > I did point out in the previous thread that this actually means that
> > inode_dio_wait() now has inconsistent wait semantics for O_DSYNC
> > writes. If it's a pure overwrite and we hit the FUA path, the
> > O_DSYNC write will be complete and guaranteed to be on stable storage
> > before the IO completes. If the inode is metadata dirty, then the IO
> > will now be signalled complete *before* the data and metadata are
> > flushed to stable storage.
> > 
> > Hence, from the perspective of writes to *stable* storage, this
> > makes the ordering of O_DSYNC DIO against anything waiting for it to
> > complete to be potentially inconsistent at the stable storage level.
> > 
> > That's an extremely subtle change of behaviour, and something that
> > would be largely impossible to test or reproduce. And, really, I
> > don't like having this sort of "oh, it should be fine" handwavy
> > justification when we are talking about data integrity operations...
> 
> ... and I replied with a detailed analysis of what it is fine, and
> how this just restores the behavior we historically had before
> switching to the iomap direct I/O code.  Although if we want to go
> into the fine details we did not have the REQ_FUA path back then,
> but that does not change the analysis.

You did?  Got a link?  Not sure if vger/oraclemail are still delaying
messages for me.... :/

--D

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-23  5:31           ` Darrick J. Wong
@ 2020-09-23  5:49             ` Christoph Hellwig
  2020-09-23  5:59               ` Dave Chinner
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2020-09-23  5:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dave Chinner, Josef Bacik, Goldwyn Rodrigues,
	linux-fsdevel, linux-btrfs, johannes.thumshirn, dsterba,
	Goldwyn Rodrigues

On Tue, Sep 22, 2020 at 10:31:49PM -0700, Darrick J. Wong wrote:
> > ... and I replied with a detailed analysis of what it is fine, and
> > how this just restores the behavior we historically had before
> > switching to the iomap direct I/O code.  Although if we want to go
> > into the fine details we did not have the REQ_FUA path back then,
> > but that does not change the analysis.
> 
> You did?  Got a link?  Not sure if vger/oraclemail are still delaying
> messages for me.... :/

Two replies from September 17 to the
"Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context"

thread.

Msg IDs:

20200917055232.GA31646@lst.de

and

20200917064238.GA32441@lst.de

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync()
  2020-09-23  5:49             ` Christoph Hellwig
@ 2020-09-23  5:59               ` Dave Chinner
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Chinner @ 2020-09-23  5:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Josef Bacik, Goldwyn Rodrigues, linux-fsdevel,
	linux-btrfs, johannes.thumshirn, dsterba, Goldwyn Rodrigues

On Wed, Sep 23, 2020 at 07:49:25AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 10:31:49PM -0700, Darrick J. Wong wrote:
> > > ... and I replied with a detailed analysis of what it is fine, and
> > > how this just restores the behavior we historically had before
> > > switching to the iomap direct I/O code.  Although if we want to go
> > > into the fine details we did not have the REQ_FUA path back then,
> > > but that does not change the analysis.
> > 
> > You did?  Got a link?  Not sure if vger/oraclemail are still delaying
> > messages for me.... :/
> 
> Two replies from September 17 to the
> "Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context"
> 
> thread.
> 
> Msg IDs:
> 
> 20200917055232.GA31646@lst.de
> 
> and
> 
> 20200917064238.GA32441@lst.de

<sigh>

That last one is not in my local archive - vger has been on the
blink lately, so I guess I'm not really that surprised that mail has
gone missing and not just delayed for a day or two....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/15] btrfs: Move FS error state bit early during write
  2020-09-21 14:43 ` [PATCH 07/15] btrfs: Move FS error state bit early during write Goldwyn Rodrigues
  2020-09-22 14:38   ` Josef Bacik
@ 2020-09-23  9:10   ` Nikolay Borisov
  2020-09-23 14:07     ` Goldwyn Rodrigues
  1 sibling, 1 reply; 57+ messages in thread
From: Nikolay Borisov @ 2020-09-23  9:10 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-fsdevel
  Cc: linux-btrfs, david, hch, johannes.thumshirn, dsterba,
	darrick.wong, josef, Goldwyn Rodrigues



On 21.09.20 г. 17:43 ч., Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> fs_info->fs_state is a filesystem bit check as opposed to inode
> and can be performed before we begin with write checks. This eliminates
> inode lock/unlock in case of error bit is set.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/file.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4c40a2742aab..ca374cb5ffc9 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1981,6 +1981,15 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>  	size_t count;
>  	loff_t oldsize;
>  
> +	/*
> +	 * If BTRFS flips readonly due to some impossible error
> +	 * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR),
> +	 * although we have opened a file as writable, we have
> +	 * to stop this write operation to ensure FS consistency.
> +	 */
> +	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> +		return -EROFS;
> +

nit: Actually can't this check be eliminated altogether or the comment
vastly simplified because BTRFS_SUPER_FLAG_ERROR check is performed only
during mount so the description in the parantheses is invalid i.e the fs
won't flip to RO because BTRFS_SUPER_FLAG_ERROR is now set in the super
block. As a matter of fact how is this flag set - because I don't see it
set in the kernel code nor in btrfs-progs ?

<snip>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/15] btrfs: Move FS error state bit early during write
  2020-09-23  9:10   ` Nikolay Borisov
@ 2020-09-23 14:07     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 57+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-23 14:07 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: linux-fsdevel, linux-btrfs, david, hch, johannes.thumshirn,
	dsterba, darrick.wong, josef

On 12:10 23/09, Nikolay Borisov wrote:
> 
> 
> On 21.09.20 г. 17:43 ч., Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > fs_info->fs_state is a filesystem bit check as opposed to inode
> > and can be performed before we begin with write checks. This eliminates
> > inode lock/unlock in case of error bit is set.
> > 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > ---
> >  fs/btrfs/file.c | 21 +++++++++------------
> >  1 file changed, 9 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 4c40a2742aab..ca374cb5ffc9 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -1981,6 +1981,15 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> >  	size_t count;
> >  	loff_t oldsize;
> >  
> > +	/*
> > +	 * If BTRFS flips readonly due to some impossible error
> > +	 * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR),
> > +	 * although we have opened a file as writable, we have
> > +	 * to stop this write operation to ensure FS consistency.
> > +	 */
> > +	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> > +		return -EROFS;
> > +
> 
> nit: Actually can't this check be eliminated altogether or the comment
> vastly simplified because BTRFS_SUPER_FLAG_ERROR check is performed only
> during mount so the description in the parantheses is invalid i.e the fs
> won't flip to RO because BTRFS_SUPER_FLAG_ERROR is now set in the super
> block. As a matter of fact how is this flag set - because I don't see it
> set in the kernel code nor in btrfs-progs ?

You are right. This flag originated from 
acce952b0263 ("Btrfs: forced readonly mounts on errors")

However, the following commit removed writing the super in case of the
error:
68ce9682a4bb ("Btrfs: remove superblock writing after fatal error")

So, it does not land in the super flags anyways.
The flag does not make sense if we use BTRFS_FS_STATE_ERROR.

I will remove BTRFS_SUPER_FLAG_ERROR comment for now.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2020-09-23 14:08 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-21 14:43 [PATCH 0/15 v2] BTRFS DIO inode locking/D_SYNC fix Goldwyn Rodrigues
2020-09-21 14:43 ` [PATCH 01/15] fs: remove dio_end_io() Goldwyn Rodrigues
2020-09-22 14:17   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
2020-09-22 13:18   ` Christoph Hellwig
2020-09-22 14:17   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Goldwyn Rodrigues
2020-09-21 15:09   ` Johannes Thumshirn
2020-09-22 13:19     ` hch
2020-09-22  9:24   ` Dan Carpenter
2020-09-22  9:24     ` Dan Carpenter
2020-09-22 14:16     ` Goldwyn Rodrigues
2020-09-22 14:58       ` Dan Carpenter
2020-09-22 14:58         ` Dan Carpenter
2020-09-22 16:06         ` Goldwyn Rodrigues
2020-09-22 14:17   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Goldwyn Rodrigues
2020-09-21 15:11   ` Johannes Thumshirn
2020-09-22 13:21   ` Christoph Hellwig
2020-09-22 14:20   ` Josef Bacik
2020-09-22 16:31     ` Darrick J. Wong
2020-09-22 17:25       ` Goldwyn Rodrigues
2020-09-22 21:49       ` Dave Chinner
2020-09-23  5:16         ` Christoph Hellwig
2020-09-23  5:31           ` Darrick J. Wong
2020-09-23  5:49             ` Christoph Hellwig
2020-09-23  5:59               ` Dave Chinner
2020-09-21 14:43 ` [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write Goldwyn Rodrigues
2020-09-22 13:22   ` Christoph Hellwig
2020-09-22 14:27   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write() Goldwyn Rodrigues
2020-09-22 13:22   ` Christoph Hellwig
2020-09-22 14:30   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 07/15] btrfs: Move FS error state bit early during write Goldwyn Rodrigues
2020-09-22 14:38   ` Josef Bacik
2020-09-23  9:10   ` Nikolay Borisov
2020-09-23 14:07     ` Goldwyn Rodrigues
2020-09-21 14:43 ` [PATCH 08/15] btrfs: Introduce btrfs_write_check() Goldwyn Rodrigues
2020-09-22 13:26   ` Christoph Hellwig
2020-09-22 14:42   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock() Goldwyn Rodrigues
2020-09-22 14:45   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write Goldwyn Rodrigues
2020-09-22  9:26   ` Dan Carpenter
2020-09-22  9:26     ` Dan Carpenter
2020-09-22 14:48   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF Goldwyn Rodrigues
2020-09-22 14:52   ` Josef Bacik
2020-09-22 17:33     ` Goldwyn Rodrigues
2020-09-21 14:43 ` [PATCH 12/15] btrfs: Remove dio_sem Goldwyn Rodrigues
2020-09-22 14:52   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock Goldwyn Rodrigues
2020-09-22 15:11   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround") Goldwyn Rodrigues
2020-09-22 15:12   ` Josef Bacik
2020-09-21 14:43 ` [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw() Goldwyn Rodrigues
2020-09-22 13:26   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.