[PATCH 01/21] fs: readahead_begin() to call before locking folio

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/21] fs: readahead_begin() to call before locking folio
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-06 16:53   ` Christoph Hellwig
  2023-03-02 22:24 ` [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range Goldwyn Rodrigues
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The btrfs filesystem needs to lock the extents before locking folios
to be read from disk. So, introduce a function in
address_space_operaitons, called btrfs_readahead_begin() which is called
before the folio are allocateed and locked.
---
 include/linux/fs.h | 1 +
 mm/readahead.c     | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1769a2c5d70..6b650db57ca3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -363,6 +363,7 @@ struct address_space_operations {
 	/* Mark a folio dirty.  Return true if this dirtied it */
 	bool (*dirty_folio)(struct address_space *, struct folio *);
 
+	void (*readahead_begin)(struct readahead_control *);
 	void (*readahead)(struct readahead_control *);
 
 	int (*write_begin)(struct file *, struct address_space *mapping,
diff --git a/mm/readahead.c b/mm/readahead.c
index b10f0cf81d80..6924d5fed350 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -520,6 +520,9 @@ void page_cache_ra_order(struct readahead_control *ractl,
 			new_order--;
 	}
 
+	if (mapping->a_ops->readahead_begin)
+		mapping->a_ops->readahead_begin(ractl);
+
 	filemap_invalidate_lock_shared(mapping);
 	while (index <= limit) {
 		unsigned int order = new_order;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
  2023-03-02 22:24 ` [PATCH 01/21] fs: readahead_begin() to call before locking folio Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-08 19:28   ` Boris Burkov
  2023-03-02 22:24 ` [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range() Goldwyn Rodrigues
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Add a WARN_ON(start > end) to make sure that the locking happens on the
correct range and no incorrect nodes (with state->start > state->end)
are added to the tree.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent-io-tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 29a225836e28..482721dd1eba 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1710,6 +1710,7 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
 	int err;
 	u64 failed_start;
 
+	WARN_ON(start > end);
 	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
 			       NULL, cached, NULL, GFP_NOFS);
 	if (err == -EEXIST) {
@@ -1732,6 +1733,7 @@ int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
 	int err;
 	u64 failed_start;
 
+	WARN_ON(start > end);
 	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
 			       &failed_state, cached_state, NULL, GFP_NOFS);
 	while (err == -EEXIST) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range()
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
  2023-03-02 22:24 ` [PATCH 01/21] fs: readahead_begin() to call before locking folio Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-08 19:29   ` Boris Burkov
  2023-03-02 22:24 ` [PATCH 04/21] btrfs: make btrfs_qgroup_flush() non-static Goldwyn Rodrigues
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

For issues such as zero size writes, we can get start > end. Check them
in btrfs_debug_check_extent_io_range() so this may be caught early.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent-io-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 482721dd1eba..d467c614c84e 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -65,7 +65,8 @@ static inline void __btrfs_debug_check_extent_io_range(const char *caller,
 		return;
 
 	isize = i_size_read(&inode->vfs_inode);
-	if (end >= PAGE_SIZE && (end % 2) == 0 && end != isize - 1) {
+	if ((start > end) ||
+	    (end >= PAGE_SIZE && (end % 2) == 0 && end != isize - 1)) {
 		btrfs_debug_rl(inode->root->fs_info,
 		    "%s: ino %llu isize %llu odd range [%llu,%llu]",
 			caller, btrfs_ino(inode), isize, start, end);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/21] btrfs: make btrfs_qgroup_flush() non-static
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (2 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range() Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 05/21] btrfs: Lock extents before pages for buffered write() Goldwyn Rodrigues
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

btrfs_qgroup_flush() is used to flush data when a qgroup reservation
is unsuccessfull. This causes a writeback which takes extent locks.
If we have to make reservations under locks, we call reservation
function with nowait true, so it does not call btrfs_qgroup_flush().

btrfs_qgroup_flush() call becomes the responsibility of the function
performing reservations under locks. They must call the reservation
function with nowait=true.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/qgroup.c | 6 +++---
 fs/btrfs/qgroup.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 52a7d2fa2284..235bc78a8418 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3709,7 +3709,7 @@ static int qgroup_unreserve_range(struct btrfs_inode *inode,
  *   In theory this shouldn't provide much space, but any more qgroup space
  *   is needed.
  */
-static int try_flush_qgroup(struct btrfs_root *root)
+int btrfs_qgroup_flush(struct btrfs_root *root)
 {
 	struct btrfs_trans_handle *trans;
 	int ret;
@@ -3821,7 +3821,7 @@ int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
 	if (ret <= 0 && ret != -EDQUOT)
 		return ret;
 
-	ret = try_flush_qgroup(inode->root);
+	ret = btrfs_qgroup_flush(inode->root);
 	if (ret < 0)
 		return ret;
 	return qgroup_reserve_data(inode, reserved_ret, start, len);
@@ -4032,7 +4032,7 @@ int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 	if ((ret <= 0 && ret != -EDQUOT) || noflush)
 		return ret;
 
-	ret = try_flush_qgroup(root);
+	ret = btrfs_qgroup_flush(root);
 	if (ret < 0)
 		return ret;
 	return btrfs_qgroup_reserve_meta(root, num_bytes, type, enforce);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 7bffa10589d6..232bc5ad3dca 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -439,5 +439,6 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
 		struct btrfs_root *root, struct extent_buffer *eb);
 void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
 bool btrfs_check_quota_leak(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_flush(struct btrfs_root *root);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/21] btrfs: Lock extents before pages for buffered write()
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (3 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 04/21] btrfs: make btrfs_qgroup_flush() non-static Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 06/21] btrfs: wait ordered range before locking during truncate Goldwyn Rodrigues
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

While performing writes, lock the extents before locking the pages.

Ideally, this should be done before  space reservations. However,
This is performed after check for space because qgroup initiates
writeback which may cause deadlocks.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/file.c | 78 ++++++++++++-------------------------------------
 1 file changed, 19 insertions(+), 59 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5cc5a1faaef5..a2f8f566cfbf 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -973,8 +973,8 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
  * the other < 0 number - Something wrong happens
  */
 static noinline int
-lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
-				size_t num_pages, loff_t pos,
+lock_and_cleanup_extent_if_need(struct btrfs_inode *inode,
+				loff_t pos,
 				size_t write_bytes,
 				u64 *lockstart, u64 *lockend, bool nowait,
 				struct extent_state **cached_state)
@@ -982,7 +982,6 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	u64 start_pos;
 	u64 last_pos;
-	int i;
 	int ret = 0;
 
 	start_pos = round_down(pos, fs_info->sectorsize);
@@ -993,15 +992,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 
 		if (nowait) {
 			if (!try_lock_extent(&inode->io_tree, start_pos, last_pos,
-					     cached_state)) {
-				for (i = 0; i < num_pages; i++) {
-					unlock_page(pages[i]);
-					put_page(pages[i]);
-					pages[i] = NULL;
-				}
-
+					     cached_state))
 				return -EAGAIN;
-			}
 		} else {
 			lock_extent(&inode->io_tree, start_pos, last_pos, cached_state);
 		}
@@ -1013,10 +1005,6 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 		    ordered->file_offset <= last_pos) {
 			unlock_extent(&inode->io_tree, start_pos, last_pos,
 				      cached_state);
-			for (i = 0; i < num_pages; i++) {
-				unlock_page(pages[i]);
-				put_page(pages[i]);
-			}
 			btrfs_start_ordered_extent(ordered);
 			btrfs_put_ordered_extent(ordered);
 			return -EAGAIN;
@@ -1029,13 +1017,6 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 		ret = 1;
 	}
 
-	/*
-	 * We should be called after prepare_pages() which should have locked
-	 * all pages in the range.
-	 */
-	for (i = 0; i < num_pages; i++)
-		WARN_ON(!PageLocked(pages[i]));
-
 	return ret;
 }
 
@@ -1299,13 +1280,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		}
 
 		release_bytes = reserve_bytes;
-again:
 		ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
 			break;
 		}
 
+		extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode),
+				pos, write_bytes, &lockstart, &lockend,
+				nowait, &cached_state);
+		if (extents_locked < 0) {
+			ret = extents_locked;
+			btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+			break;
+		}
+
+
 		/*
 		 * This is going to setup the pages array with the number of
 		 * pages we want, so we don't really need to worry about the
@@ -1313,25 +1303,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		 */
 		ret = prepare_pages(inode, pages, num_pages,
 				    pos, write_bytes, force_page_uptodate, false);
-		if (ret) {
-			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
-			break;
-		}
-
-		extents_locked = lock_and_cleanup_extent_if_need(
-				BTRFS_I(inode), pages,
-				num_pages, pos, write_bytes, &lockstart,
-				&lockend, nowait, &cached_state);
-		if (extents_locked < 0) {
-			if (!nowait && extents_locked == -EAGAIN)
-				goto again;
-
+		if (ret)
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
-			ret = extents_locked;
-			break;
-		}
 
 		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
 
@@ -1380,33 +1354,19 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
 					dirty_pages, pos, copied,
-					&cached_state, only_release_metadata);
-
-		/*
-		 * If we have not locked the extent range, because the range's
-		 * start offset is >= i_size, we might still have a non-NULL
-		 * cached extent state, acquired while marking the extent range
-		 * as delalloc through btrfs_dirty_pages(). Therefore free any
-		 * possible cached extent state to avoid a memory leak.
-		 */
-		if (extents_locked)
-			unlock_extent(&BTRFS_I(inode)->io_tree, lockstart,
-				      lockend, &cached_state);
-		else
-			free_extent_state(cached_state);
+					NULL, only_release_metadata);
 
 		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
-		if (ret) {
-			btrfs_drop_pages(fs_info, pages, num_pages, pos, copied);
+		btrfs_drop_pages(fs_info, pages, num_pages, pos, copied);
+		if (extents_locked)
+			unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend, &cached_state);
+		if (ret)
 			break;
-		}
 
 		release_bytes = 0;
 		if (only_release_metadata)
 			btrfs_check_nocow_unlock(BTRFS_I(inode));
 
-		btrfs_drop_pages(fs_info, pages, num_pages, pos, copied);
-
 		cond_resched();
 
 		pos += copied;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/21] btrfs: wait ordered range before locking during truncate
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (4 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 05/21] btrfs: Lock extents before pages for buffered write() Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-07 17:03   ` Christoph Hellwig
  2023-03-02 22:24 ` [PATCH 07/21] btrfs: lock extents while truncating Goldwyn Rodrigues
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Johannes Thumshirn, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Check if truncate needs to wait for ordered range before calling
btrfs_truncate(). Instead of performing it in btrfs_truncate(), perform
the wait before the call.

Remove the no longer needed variable to perform writeback in
btrfs_truncate().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2c96c39975e0..02307789b0a8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -109,7 +109,7 @@ static const struct file_operations btrfs_dir_file_operations;
 static struct kmem_cache *btrfs_inode_cachep;
 
 static int btrfs_setsize(struct inode *inode, struct iattr *attr);
-static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback);
+static int btrfs_truncate(struct btrfs_inode *inode);
 static noinline int cow_file_range(struct btrfs_inode *inode,
 				   struct page *locked_page,
 				   u64 start, u64 end, int *page_started,
@@ -5084,7 +5084,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 	} else {
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 
-		if (btrfs_is_zoned(fs_info)) {
+		if (btrfs_is_zoned(fs_info) || (newsize < oldsize)) {
 			ret = btrfs_wait_ordered_range(inode,
 					ALIGN(newsize, fs_info->sectorsize),
 					(u64)-1);
@@ -5105,7 +5105,8 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 
 		inode_dio_wait(inode);
 
-		ret = btrfs_truncate(BTRFS_I(inode), newsize == oldsize);
+		ret = btrfs_truncate(BTRFS_I(inode));
+
 		if (ret && inode->i_nlink) {
 			int err;
 
@@ -8241,7 +8242,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	return ret;
 }
 
-static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback)
+static int btrfs_truncate(struct btrfs_inode *inode)
 {
 	struct btrfs_truncate_control control = {
 		.inode = inode,
@@ -8254,17 +8255,8 @@ static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback)
 	struct btrfs_block_rsv *rsv;
 	int ret;
 	struct btrfs_trans_handle *trans;
-	u64 mask = fs_info->sectorsize - 1;
 	u64 min_size = btrfs_calc_metadata_size(fs_info, 1);
 
-	if (!skip_writeback) {
-		ret = btrfs_wait_ordered_range(&inode->vfs_inode,
-					       inode->vfs_inode.i_size & (~mask),
-					       (u64)-1);
-		if (ret)
-			return ret;
-	}
-
 	/*
 	 * Yes ladies and gentlemen, this is indeed ugly.  We have a couple of
 	 * things going on here:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/21] btrfs: lock extents while truncating
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (5 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 06/21] btrfs: wait ordered range before locking during truncate Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 08/21] btrfs: no need to lock extent while performing invalidate_folio() Goldwyn Rodrigues
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Extent locking before pages.

Lock extents while performing truncate_setsize(). This calls
btrfs_invalidatepage(), so remove all locking during invalidatepage().

Note, extent locks are not required during inode eviction, which calls
invalidatepage as well.

Call btrfs_delalloc_reserve_metadata() with nowait as true to avoid
qgroup flush with extent locked. Flush, if required after unlocking in
btrfs_setsize().

There are cases when the user will truncate at the file size (which
could also be the block size). In such a case, end is start - 1, and it
results in incorrect extent bit tree.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c  |  4 ++--
 fs/btrfs/inode.c | 54 +++++++++++++++++++++++++++---------------------
 2 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a2f8f566cfbf..2e835096e3ce 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2165,10 +2165,10 @@ static void btrfs_punch_hole_lock_range(struct inode *inode,
 	const u64 page_lockend = round_down(lockend + 1, PAGE_SIZE) - 1;
 
 	while (1) {
-		truncate_pagecache_range(inode, lockstart, lockend);
-
 		lock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 			    cached_state);
+
+		truncate_pagecache_range(inode, lockstart, lockend);
 		/*
 		 * We can't have ordered extents in the range, nor dirty/writeback
 		 * pages, because we have locked the inode's VFS lock in exclusive
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 02307789b0a8..2816629fafe4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4757,7 +4757,6 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct address_space *mapping = inode->vfs_inode.i_mapping;
-	struct extent_io_tree *io_tree = &inode->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
 	struct extent_changeset *data_reserved = NULL;
@@ -4789,7 +4788,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 			goto out;
 		}
 	}
-	ret = btrfs_delalloc_reserve_metadata(inode, blocksize, blocksize, false);
+	ret = btrfs_delalloc_reserve_metadata(inode, blocksize, blocksize, true);
 	if (ret < 0) {
 		if (!only_release_metadata)
 			btrfs_free_reserved_data_space(inode, data_reserved,
@@ -4824,11 +4823,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	}
 	wait_on_page_writeback(page);
 
-	lock_extent(io_tree, block_start, block_end, &cached_state);
-
 	ordered = btrfs_lookup_ordered_extent(inode, block_start);
 	if (ordered) {
-		unlock_extent(io_tree, block_start, block_end, &cached_state);
 		unlock_page(page);
 		put_page(page);
 		btrfs_start_ordered_extent(ordered);
@@ -4842,10 +4838,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 
 	ret = btrfs_set_extent_delalloc(inode, block_start, block_end, 0,
 					&cached_state);
-	if (ret) {
-		unlock_extent(io_tree, block_start, block_end, &cached_state);
+	if (ret)
 		goto out_unlock;
-	}
 
 	if (offset != blocksize) {
 		if (!len)
@@ -4860,7 +4854,6 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	btrfs_page_clear_checked(fs_info, page, block_start,
 				 block_end + 1 - block_start);
 	btrfs_page_set_dirty(fs_info, page, block_start, block_end + 1 - block_start);
-	unlock_extent(io_tree, block_start, block_end, &cached_state);
 
 	if (only_release_metadata)
 		set_extent_bit(&inode->io_tree, block_start, block_end,
@@ -4952,6 +4945,13 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 	u64 hole_size;
 	int err = 0;
 
+	/*
+	 * Check so that no erroneous nodes are created in locking trees
+	 * when hole_start and block_end are equal.
+	 */
+	if (hole_start != block_end)
+		btrfs_lock_and_flush_ordered_range(inode, hole_start, block_end - 1, &cached_state);
+
 	/*
 	 * If our size started in the middle of a block we need to zero out the
 	 * rest of the block before we expand the i_size, otherwise we could
@@ -4959,13 +4959,11 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 	 */
 	err = btrfs_truncate_block(inode, oldsize, 0, 0);
 	if (err)
-		return err;
+		goto out;
 
 	if (size <= hole_start)
-		return 0;
+		goto out;
 
-	btrfs_lock_and_flush_ordered_range(inode, hole_start, block_end - 1,
-					   &cached_state);
 	cur_offset = hole_start;
 	while (1) {
 		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
@@ -5027,7 +5025,9 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 			break;
 	}
 	free_extent_map(em);
-	unlock_extent(io_tree, hole_start, block_end - 1, &cached_state);
+out:
+	if (hole_start != block_end)
+		unlock_extent(io_tree, hole_start, block_end - 1, &cached_state);
 	return err;
 }
 
@@ -5039,6 +5039,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 	loff_t newsize = attr->ia_size;
 	int mask = attr->ia_valid;
 	int ret;
+	bool flushed = false;
 
 	/*
 	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
@@ -5083,6 +5084,9 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		btrfs_end_transaction(trans);
 	} else {
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+		u64 start = round_down(newsize, fs_info->sectorsize);
+		u64 end = round_up(oldsize, fs_info->sectorsize) - 1;
+		struct extent_state **cached = NULL;
 
 		if (btrfs_is_zoned(fs_info) || (newsize < oldsize)) {
 			ret = btrfs_wait_ordered_range(inode,
@@ -5100,12 +5104,22 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		if (newsize == 0)
 			set_bit(BTRFS_INODE_FLUSH_ON_CLOSE,
 				&BTRFS_I(inode)->runtime_flags);
-
+again:
+		if (start < end)
+			lock_extent(&BTRFS_I(inode)->io_tree, start, end, cached);
 		truncate_setsize(inode, newsize);
 
 		inode_dio_wait(inode);
 
 		ret = btrfs_truncate(BTRFS_I(inode));
+		if (start < end)
+			unlock_extent(&BTRFS_I(inode)->io_tree, start, end, cached);
+
+		if (ret == -EDQUOT && !flushed) {
+			flushed = true;
+			btrfs_qgroup_flush(BTRFS_I(inode)->root);
+			goto again;
+		}
 
 		if (ret && inode->i_nlink) {
 			int err;
@@ -7956,9 +7970,6 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 		return;
 	}
 
-	if (!inode_evicting)
-		lock_extent(tree, page_start, page_end, &cached_state);
-
 	cur = page_start;
 	while (cur < page_end) {
 		struct btrfs_ordered_extent *ordered;
@@ -8059,7 +8070,7 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 		 */
 		btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur);
 		if (!inode_evicting) {
-			clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED |
+			clear_extent_bit(tree, cur, range_end,
 				 EXTENT_DELALLOC | EXTENT_UPTODATE |
 				 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG |
 				 extra_flags, &cached_state);
@@ -8309,12 +8320,9 @@ static int btrfs_truncate(struct btrfs_inode *inode)
 	trans->block_rsv = rsv;
 
 	while (1) {
-		struct extent_state *cached_state = NULL;
 		const u64 new_size = inode->vfs_inode.i_size;
-		const u64 lock_start = ALIGN_DOWN(new_size, fs_info->sectorsize);
 
 		control.new_size = new_size;
-		lock_extent(&inode->io_tree, lock_start, (u64)-1, &cached_state);
 		/*
 		 * We want to drop from the next block forward in case this new
 		 * size is not block aligned since we will be keeping the last
@@ -8329,8 +8337,6 @@ static int btrfs_truncate(struct btrfs_inode *inode)
 		inode_sub_bytes(&inode->vfs_inode, control.sub_bytes);
 		btrfs_inode_safe_disk_i_size_write(inode, control.last_size);
 
-		unlock_extent(&inode->io_tree, lock_start, (u64)-1, &cached_state);
-
 		trans->block_rsv = &fs_info->trans_block_rsv;
 		if (ret != -ENOSPC && ret != -EAGAIN)
 			break;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/21] btrfs: no need to lock extent while performing invalidate_folio()
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (6 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 07/21] btrfs: lock extents while truncating Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 09/21] btrfs: lock extents before folio for read()s Goldwyn Rodrigues
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Don't lock extents while performing invalidate_folio because this is
performed by the calling function higher up the call chain.

With this change, extent_invalidate_folio() calls only
folio_wait_writeback(). Remove and cleanup this function.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/disk-io.c   |  4 +---
 fs/btrfs/extent_io.c | 32 --------------------------------
 2 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 48368d4bc331..c2b954134851 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -755,9 +755,7 @@ static bool btree_release_folio(struct folio *folio, gfp_t gfp_flags)
 static void btree_invalidate_folio(struct folio *folio, size_t offset,
 				 size_t length)
 {
-	struct extent_io_tree *tree;
-	tree = &BTRFS_I(folio->mapping->host)->io_tree;
-	extent_invalidate_folio(tree, folio, offset);
+	folio_wait_writeback(folio);
 	btree_release_folio(folio, GFP_NOFS);
 	if (folio_get_private(folio)) {
 		btrfs_warn(BTRFS_I(folio->mapping->host)->root->fs_info,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c25fa74d7615..ed054c2f38d8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2778,38 +2778,6 @@ void extent_readahead(struct readahead_control *rac)
 	submit_one_bio(&bio_ctrl);
 }
 
-/*
- * basic invalidate_folio code, this waits on any locked or writeback
- * ranges corresponding to the folio, and then deletes any extent state
- * records from the tree
- */
-int extent_invalidate_folio(struct extent_io_tree *tree,
-			  struct folio *folio, size_t offset)
-{
-	struct extent_state *cached_state = NULL;
-	u64 start = folio_pos(folio);
-	u64 end = start + folio_size(folio) - 1;
-	size_t blocksize = folio->mapping->host->i_sb->s_blocksize;
-
-	/* This function is only called for the btree inode */
-	ASSERT(tree->owner == IO_TREE_BTREE_INODE_IO);
-
-	start += ALIGN(offset, blocksize);
-	if (start > end)
-		return 0;
-
-	lock_extent(tree, start, end, &cached_state);
-	folio_wait_writeback(folio);
-
-	/*
-	 * Currently for btree io tree, only EXTENT_LOCKED is utilized,
-	 * so here we only need to unlock the extent range to free any
-	 * existing extent state.
-	 */
-	unlock_extent(tree, start, end, &cached_state);
-	return 0;
-}
-
 /*
  * a helper for release_folio, this tests for areas of the page that
  * are locked or under IO and drops the related state bits if it is safe
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/21] btrfs: lock extents before folio for read()s
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (7 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 08/21] btrfs: no need to lock extent while performing invalidate_folio() Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 10/21] btrfs: lock extents before pages in writepages Goldwyn Rodrigues
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock the extents before folio by locking them using readahead_begin().
Unlock the extents after the readahead is complete.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/compression.c |  5 -----
 fs/btrfs/extent_io.c   | 25 -------------------------
 fs/btrfs/inode.c       | 17 +++++++++++++++++
 3 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f42f31f22d13..b0dd01e31078 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -381,11 +381,9 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 	struct extent_map *em;
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_map_tree *em_tree;
-	struct extent_io_tree *tree;
 	int sectors_missed = 0;
 
 	em_tree = &BTRFS_I(inode)->extent_tree;
-	tree = &BTRFS_I(inode)->io_tree;
 
 	if (isize == 0)
 		return 0;
@@ -452,7 +450,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		}
 
 		page_end = (pg_index << PAGE_SHIFT) + PAGE_SIZE - 1;
-		lock_extent(tree, cur, page_end, NULL);
 		read_lock(&em_tree->lock);
 		em = lookup_extent_mapping(em_tree, cur, page_end + 1 - cur);
 		read_unlock(&em_tree->lock);
@@ -466,7 +463,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		    (cur + fs_info->sectorsize > extent_map_end(em)) ||
 		    (em->block_start >> 9) != cb->orig_bio->bi_iter.bi_sector) {
 			free_extent_map(em);
-			unlock_extent(tree, cur, page_end, NULL);
 			unlock_page(page);
 			put_page(page);
 			break;
@@ -486,7 +482,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		add_size = min(em->start + em->len, page_end + 1) - cur;
 		ret = bio_add_page(cb->orig_bio, page, add_size, offset_in_page(cur));
 		if (ret != add_size) {
-			unlock_extent(tree, cur, page_end, NULL);
 			unlock_page(page);
 			put_page(page);
 			break;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ed054c2f38d8..e44329a84caf 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -644,9 +644,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 			      struct btrfs_inode *inode, u64 start, u64 end,
 			      bool uptodate)
 {
-	struct extent_state *cached = NULL;
-	struct extent_io_tree *tree;
-
 	/* The first extent, initialize @processed */
 	if (!processed->inode)
 		goto update;
@@ -668,13 +665,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 		return;
 	}
 
-	tree = &processed->inode->io_tree;
-	/*
-	 * Now we don't have range contiguous to the processed range, release
-	 * the processed range now.
-	 */
-	unlock_extent(tree, processed->start, processed->end, &cached);
-
 update:
 	/* Update processed to current range */
 	processed->inode = inode;
@@ -1209,11 +1199,9 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	size_t pg_offset = 0;
 	size_t iosize;
 	size_t blocksize = inode->i_sb->s_blocksize;
-	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
 	ret = set_page_extent_mapped(page);
 	if (ret < 0) {
-		unlock_extent(tree, start, end, NULL);
 		btrfs_page_set_error(fs_info, page, start, PAGE_SIZE);
 		unlock_page(page);
 		goto out;
@@ -1238,14 +1226,12 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		if (cur >= last_byte) {
 			iosize = PAGE_SIZE - pg_offset;
 			memzero_page(page, pg_offset, iosize);
-			unlock_extent(tree, cur, cur + iosize - 1, NULL);
 			end_page_read(page, true, cur, iosize);
 			break;
 		}
 		em = __get_extent_map(inode, page, pg_offset, cur,
 				      end - cur + 1, em_cached);
 		if (IS_ERR(em)) {
-			unlock_extent(tree, cur, end, NULL);
 			end_page_read(page, false, cur, end + 1 - cur);
 			ret = PTR_ERR(em);
 			break;
@@ -1315,8 +1301,6 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		/* we've found a hole, just zero and go on */
 		if (block_start == EXTENT_MAP_HOLE) {
 			memzero_page(page, pg_offset, iosize);
-
-			unlock_extent(tree, cur, cur + iosize - 1, NULL);
 			end_page_read(page, true, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
@@ -1324,7 +1308,6 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		}
 		/* the get_extent function already copied into the page */
 		if (block_start == EXTENT_MAP_INLINE) {
-			unlock_extent(tree, cur, cur + iosize - 1, NULL);
 			end_page_read(page, true, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
@@ -1340,7 +1323,6 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			 * We have to unlock the remaining range, or the page
 			 * will never be unlocked.
 			 */
-			unlock_extent(tree, cur, end, NULL);
 			end_page_read(page, false, cur, end + 1 - cur);
 			goto out;
 		}
@@ -1354,13 +1336,9 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 int btrfs_read_folio(struct file *file, struct folio *folio)
 {
 	struct page *page = &folio->page;
-	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
-	u64 start = page_offset(page);
-	u64 end = start + PAGE_SIZE - 1;
 	struct btrfs_bio_ctrl bio_ctrl = { 0 };
 	int ret;
 
-	btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
 
 	ret = btrfs_do_readpage(page, NULL, &bio_ctrl, 0, NULL);
 	/*
@@ -1377,11 +1355,8 @@ static inline void contiguous_readpages(struct page *pages[], int nr_pages,
 					struct btrfs_bio_ctrl *bio_ctrl,
 					u64 *prev_em_start)
 {
-	struct btrfs_inode *inode = BTRFS_I(pages[0]->mapping->host);
 	int index;
 
-	btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
-
 	for (index = 0; index < nr_pages; index++) {
 		btrfs_do_readpage(pages[index], em_cached, bio_ctrl,
 				  REQ_RAHEAD, prev_em_start);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2816629fafe4..53bd9a64e803 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7848,9 +7848,25 @@ static int btrfs_writepages(struct address_space *mapping,
 	return extent_writepages(mapping, wbc);
 }
 
+static void btrfs_readahead_begin(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+	int sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
+	u64 start = round_down(readahead_pos(rac), sectorsize);
+	u64 end = round_up(start + readahead_length(rac), sectorsize) - 1;
+
+	lock_extent(&BTRFS_I(inode)->io_tree, start, end, NULL);
+}
+
 static void btrfs_readahead(struct readahead_control *rac)
 {
+	struct inode *inode = rac->mapping->host;
+	int sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
+	u64 start = round_down(readahead_pos(rac), sectorsize);
+	u64 end = round_up(start + readahead_length(rac), sectorsize) - 1;
+
 	extent_readahead(rac);
+	unlock_extent(&BTRFS_I(inode)->io_tree, start, end, NULL);
 }
 
 /*
@@ -10930,6 +10946,7 @@ static const struct file_operations btrfs_dir_file_operations = {
 static const struct address_space_operations btrfs_aops = {
 	.read_folio	= btrfs_read_folio,
 	.writepages	= btrfs_writepages,
+	.readahead_begin = btrfs_readahead_begin,
 	.readahead	= btrfs_readahead,
 	.direct_IO	= noop_direct_IO,
 	.invalidate_folio = btrfs_invalidate_folio,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/21] btrfs: lock extents before pages in writepages
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (8 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 09/21] btrfs: lock extents before folio for read()s Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 11/21] btrfs: locking extents for async writeback Goldwyn Rodrigues
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

writepages() locks the extents in find_lock_delalloc_range() and unlocks
using clear_bit EXTENT_LOCKED operations is cow/delalloc operations.

Call extent locking/unlocking around writepages() sequence as opposed to
while performing delayed allocation.

This converts a range_cyclic wbc to non-range_cyclic wbc with the range
set to start from writeback_index and ending at inode size. This is done
because inode size can change while the writepages() is going on. So,
the number of pages accounted for writeback in wbc are accurate.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c        |  5 ----
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/inode.c            | 53 ++++++++++++++++++++++++++++---------
 3 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e44329a84caf..cdce2db82d7e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -483,15 +483,10 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 		}
 	}
 
-	/* step three, lock the state bits for the whole range */
-	lock_extent(tree, delalloc_start, delalloc_end, &cached_state);
-
 	/* then test to make sure it is all still delalloc */
 	ret = test_range_bit(tree, delalloc_start, delalloc_end,
 			     EXTENT_DELALLOC, 1, cached_state);
 	if (!ret) {
-		unlock_extent(tree, delalloc_start, delalloc_end,
-			      &cached_state);
 		__unlock_for_delalloc(inode, locked_page,
 			      delalloc_start, delalloc_end);
 		cond_resched();
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 0d250d052487..2373f248d70f 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -355,9 +355,9 @@ int btrfs_truncate_free_space_cache(struct btrfs_trans_handle *trans,
 	}
 
 	btrfs_i_size_write(inode, 0);
-	truncate_pagecache(vfs_inode, 0);
 
 	lock_extent(&inode->io_tree, 0, (u64)-1, &cached_state);
+	truncate_pagecache(vfs_inode, 0);
 	btrfs_drop_extent_map_range(inode, 0, (u64)-1, false);
 
 	/*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 53bd9a64e803..eeddd7cdff58 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -977,7 +977,6 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 				   struct async_extent *async_extent,
 				   u64 *alloc_hint)
 {
-	struct extent_io_tree *io_tree = &inode->io_tree;
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key ins;
@@ -998,7 +997,6 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 		if (!(start >= locked_page_end || end <= locked_page_start))
 			locked_page = async_chunk->locked_page;
 	}
-	lock_extent(io_tree, start, end, NULL);
 
 	/* We have fall back to uncompressed write */
 	if (!async_extent->pages)
@@ -1052,7 +1050,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 
 	/* Clear dirty, set writeback and unlock the pages. */
 	extent_clear_unlock_delalloc(inode, start, end,
-			NULL, EXTENT_LOCKED | EXTENT_DELALLOC,
+			NULL, EXTENT_DELALLOC,
 			PAGE_UNLOCK | PAGE_START_WRITEBACK);
 	if (btrfs_submit_compressed_write(inode, start,	/* file_offset */
 			    async_extent->ram_size,	/* num_bytes */
@@ -1080,7 +1078,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
 out_free:
 	extent_clear_unlock_delalloc(inode, start, end,
-				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
+				     NULL, EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW |
 				     EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING,
 				     PAGE_UNLOCK | PAGE_START_WRITEBACK |
@@ -1248,7 +1246,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 			 */
 			extent_clear_unlock_delalloc(inode, start, end,
 				     locked_page,
-				     EXTENT_LOCKED | EXTENT_DELALLOC |
+				     EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 				     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
 				     PAGE_START_WRITEBACK | PAGE_END_WRITEBACK);
@@ -1359,7 +1357,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 		extent_clear_unlock_delalloc(inode, start, start + ram_size - 1,
 					     locked_page,
-					     EXTENT_LOCKED | EXTENT_DELALLOC,
+					     EXTENT_DELALLOC,
 					     page_ops);
 		if (num_bytes < cur_alloc_size)
 			num_bytes = 0;
@@ -1410,7 +1408,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	 * We process each region below.
 	 */
 
-	clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+	clear_bits = EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
 		EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV;
 	page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
 
@@ -1560,7 +1558,7 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 	memalloc_nofs_restore(nofs_flag);
 
 	if (!ctx) {
-		unsigned clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC |
+		unsigned clear_bits = EXTENT_DELALLOC |
 			EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 			EXTENT_DO_ACCOUNTING;
 		unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
@@ -1940,7 +1938,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	path = btrfs_alloc_path();
 	if (!path) {
 		extent_clear_unlock_delalloc(inode, start, end, locked_page,
-					     EXTENT_LOCKED | EXTENT_DELALLOC |
+					     EXTENT_DELALLOC |
 					     EXTENT_DO_ACCOUNTING |
 					     EXTENT_DEFRAG, PAGE_UNLOCK |
 					     PAGE_START_WRITEBACK |
@@ -2154,7 +2152,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 						      nocow_args.num_bytes);
 
 		extent_clear_unlock_delalloc(inode, cur_offset, nocow_end,
-					     locked_page, EXTENT_LOCKED |
+					     locked_page,
 					     EXTENT_DELALLOC |
 					     EXTENT_CLEAR_DATA_RESV,
 					     PAGE_UNLOCK | PAGE_SET_ORDERED);
@@ -2190,7 +2188,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 
 	if (ret && cur_offset < end)
 		extent_clear_unlock_delalloc(inode, cur_offset, end,
-					     locked_page, EXTENT_LOCKED |
+					     locked_page,
 					     EXTENT_DELALLOC | EXTENT_DEFRAG |
 					     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
 					     PAGE_START_WRITEBACK |
@@ -7845,7 +7843,38 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 static int btrfs_writepages(struct address_space *mapping,
 			    struct writeback_control *wbc)
 {
-	return extent_writepages(mapping, wbc);
+	u64 start = 0, end = LLONG_MAX;
+	struct inode *inode = mapping->host;
+	struct extent_state *cached = NULL;
+	int ret;
+	loff_t isize = i_size_read(inode);
+	struct writeback_control new_wbc = *wbc;
+
+	if (new_wbc.range_cyclic) {
+		start = mapping->writeback_index << PAGE_SHIFT;
+		end = round_up(isize, PAGE_SIZE) - 1;
+		wbc->range_cyclic = 0;
+		wbc->range_start = start;
+		wbc->range_end = end;
+	} else {
+		start = round_down(wbc->range_start, PAGE_SIZE);
+		end = round_up(wbc->range_end, PAGE_SIZE) - 1;
+	}
+
+	if (start >= end)
+		return 0;
+
+	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
+	ret = extent_writepages(mapping, wbc);
+	unlock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
+
+	if (new_wbc.range_cyclic) {
+		wbc->range_start = new_wbc.range_start;
+		wbc->range_end = new_wbc.range_end;
+		wbc->range_cyclic = 1;
+	}
+
+	return ret;
 }
 
 static void btrfs_readahead_begin(struct readahead_control *rac)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/21] btrfs: locking extents for async writeback
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (9 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 10/21] btrfs: lock extents before pages in writepages Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-08 19:13   ` Boris Burkov
  2023-03-02 22:24 ` [PATCH 12/21] btrfs: lock extents before pages - defrag Goldwyn Rodrigues
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

For async writebacks, lock the extents and then perform the cow file
range for async. Unlock when async_chunk is free'd.

Since writeback is performed in range, so locked_page can be removed
from the structures and function parameters. Similarly for page_started
and nr_written.

A writeback could involve a hole, so check if the range locked covers
the entire extent returned by find_lock_delalloc_range().
If not try to lock the entire range or unlock the pages locked.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/compression.c |   4 +
 fs/btrfs/extent_io.c   |  10 +--
 fs/btrfs/extent_io.h   |   2 +
 fs/btrfs/inode.c       | 184 ++++++++++++++++++-----------------------
 4 files changed, 92 insertions(+), 108 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b0dd01e31078..a8fa7f2049ce 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1424,6 +1424,10 @@ static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end,
 	curr_sample_pos = 0;
 	while (index < index_end) {
 		page = find_get_page(inode->i_mapping, index);
+		if (!page) {
+			index++;
+			continue;
+		}
 		in_data = kmap_local_page(page);
 		/* Handle case where the start is not aligned to PAGE_SIZE */
 		i = start % PAGE_SIZE;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cdce2db82d7e..12aa7eaf12c5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -358,7 +358,7 @@ static int __process_pages_contig(struct address_space *mapping,
 	return err;
 }
 
-static noinline void __unlock_for_delalloc(struct inode *inode,
+noinline void __unlock_for_delalloc(struct inode *inode,
 					   struct page *locked_page,
 					   u64 start, u64 end)
 {
@@ -383,8 +383,7 @@ static noinline int lock_delalloc_pages(struct inode *inode,
 	u64 processed_end = delalloc_start;
 	int ret;
 
-	ASSERT(locked_page);
-	if (index == locked_page->index && index == end_index)
+	if (locked_page && index == locked_page->index && index == end_index)
 		return 0;
 
 	ret = __process_pages_contig(inode->i_mapping, locked_page, delalloc_start,
@@ -432,8 +431,9 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 	ASSERT(orig_end > orig_start);
 
 	/* The range should at least cover part of the page */
-	ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
-		 orig_end <= page_offset(locked_page)));
+	if (locked_page)
+		ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
+			 orig_end <= page_offset(locked_page)));
 again:
 	/* step one, find a bunch of delalloc bytes starting at start */
 	delalloc_start = *start;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4341ad978fb8..ddfa100ab629 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -279,6 +279,8 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
+void __unlock_for_delalloc(struct inode *inode, struct page *locked_page,
+		u64 start, u64 end);
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eeddd7cdff58..fb02b2b3ac2e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -506,7 +506,6 @@ struct async_extent {
 
 struct async_chunk {
 	struct btrfs_inode *inode;
-	struct page *locked_page;
 	u64 start;
 	u64 end;
 	blk_opf_t write_flags;
@@ -887,18 +886,6 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 		}
 	}
 cleanup_and_bail_uncompressed:
-	/*
-	 * No compression, but we still need to write the pages in the file
-	 * we've been given so far.  redirty the locked page if it corresponds
-	 * to our extent and set things up for the async work queue to run
-	 * cow_file_range to do the normal delalloc dance.
-	 */
-	if (async_chunk->locked_page &&
-	    (page_offset(async_chunk->locked_page) >= start &&
-	     page_offset(async_chunk->locked_page)) <= end) {
-		__set_page_dirty_nobuffers(async_chunk->locked_page);
-		/* unlocked later on in the async handlers */
-	}
 
 	if (redirty)
 		extent_range_redirty_for_io(&inode->vfs_inode, start, end);
@@ -926,8 +913,7 @@ static void free_async_extent_pages(struct async_extent *async_extent)
 }
 
 static int submit_uncompressed_range(struct btrfs_inode *inode,
-				     struct async_extent *async_extent,
-				     struct page *locked_page)
+				     struct async_extent *async_extent)
 {
 	u64 start = async_extent->start;
 	u64 end = async_extent->start + async_extent->ram_size - 1;
@@ -942,7 +928,7 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
 	 * Also we call cow_file_range() with @unlock_page == 0, so that we
 	 * can directly submit them without interruption.
 	 */
-	ret = cow_file_range(inode, locked_page, start, end, &page_started,
+	ret = cow_file_range(inode, NULL, start, end, &page_started,
 			     &nr_written, 0, NULL);
 	/* Inline extent inserted, page gets unlocked and everything is done */
 	if (page_started) {
@@ -950,23 +936,12 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
 		goto out;
 	}
 	if (ret < 0) {
-		btrfs_cleanup_ordered_extents(inode, locked_page, start, end - start + 1);
-		if (locked_page) {
-			const u64 page_start = page_offset(locked_page);
-			const u64 page_end = page_start + PAGE_SIZE - 1;
-
-			btrfs_page_set_error(inode->root->fs_info, locked_page,
-					     page_start, PAGE_SIZE);
-			set_page_writeback(locked_page);
-			end_page_writeback(locked_page);
-			end_extent_writepage(locked_page, ret, page_start, page_end);
-			unlock_page(locked_page);
-		}
+		btrfs_cleanup_ordered_extents(inode, NULL, start, end - start + 1);
 		goto out;
 	}
 
 	ret = extent_write_locked_range(&inode->vfs_inode, start, end);
-	/* All pages will be unlocked, including @locked_page */
+	/* All pages will be unlocked */
 out:
 	kfree(async_extent);
 	return ret;
@@ -980,27 +955,14 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key ins;
-	struct page *locked_page = NULL;
 	struct extent_map *em;
 	int ret = 0;
 	u64 start = async_extent->start;
 	u64 end = async_extent->start + async_extent->ram_size - 1;
 
-	/*
-	 * If async_chunk->locked_page is in the async_extent range, we need to
-	 * handle it.
-	 */
-	if (async_chunk->locked_page) {
-		u64 locked_page_start = page_offset(async_chunk->locked_page);
-		u64 locked_page_end = locked_page_start + PAGE_SIZE - 1;
-
-		if (!(start >= locked_page_end || end <= locked_page_start))
-			locked_page = async_chunk->locked_page;
-	}
-
 	/* We have fall back to uncompressed write */
 	if (!async_extent->pages)
-		return submit_uncompressed_range(inode, async_extent, locked_page);
+		return submit_uncompressed_range(inode, async_extent);
 
 	ret = btrfs_reserve_extent(root, async_extent->ram_size,
 				   async_extent->compressed_size,
@@ -1476,6 +1438,8 @@ static noinline void async_cow_start(struct btrfs_work *work)
 
 	compressed_extents = compress_file_range(async_chunk);
 	if (compressed_extents == 0) {
+		unlock_extent(&async_chunk->inode->io_tree,
+				async_chunk->start, async_chunk->end, NULL);
 		btrfs_add_delayed_iput(async_chunk->inode);
 		async_chunk->inode = NULL;
 	}
@@ -1515,11 +1479,15 @@ static noinline void async_cow_free(struct btrfs_work *work)
 	struct async_cow *async_cow;
 
 	async_chunk = container_of(work, struct async_chunk, work);
-	if (async_chunk->inode)
+	if (async_chunk->inode) {
+		unlock_extent(&async_chunk->inode->io_tree,
+				async_chunk->start, async_chunk->end, NULL);
 		btrfs_add_delayed_iput(async_chunk->inode);
+	}
 	if (async_chunk->blkcg_css)
 		css_put(async_chunk->blkcg_css);
 
+
 	async_cow = async_chunk->async_cow;
 	if (atomic_dec_and_test(&async_cow->num_chunks))
 		kvfree(async_cow);
@@ -1527,9 +1495,7 @@ static noinline void async_cow_free(struct btrfs_work *work)
 
 static int cow_file_range_async(struct btrfs_inode *inode,
 				struct writeback_control *wbc,
-				struct page *locked_page,
-				u64 start, u64 end, int *page_started,
-				unsigned long *nr_written)
+				u64 start, u64 end)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc);
@@ -1539,20 +1505,9 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 	u64 cur_end;
 	u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
 	int i;
-	bool should_compress;
 	unsigned nofs_flag;
 	const blk_opf_t write_flags = wbc_to_write_flags(wbc);
 
-	unlock_extent(&inode->io_tree, start, end, NULL);
-
-	if (inode->flags & BTRFS_INODE_NOCOMPRESS &&
-	    !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
-		num_chunks = 1;
-		should_compress = false;
-	} else {
-		should_compress = true;
-	}
-
 	nofs_flag = memalloc_nofs_save();
 	ctx = kvmalloc(struct_size(ctx, chunks, num_chunks), GFP_KERNEL);
 	memalloc_nofs_restore(nofs_flag);
@@ -1564,19 +1519,17 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
 					 PAGE_END_WRITEBACK | PAGE_SET_ERROR;
 
-		extent_clear_unlock_delalloc(inode, start, end, locked_page,
+		extent_clear_unlock_delalloc(inode, start, end, NULL,
 					     clear_bits, page_ops);
 		return -ENOMEM;
 	}
 
+	set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
 	async_chunk = ctx->chunks;
 	atomic_set(&ctx->num_chunks, num_chunks);
 
 	for (i = 0; i < num_chunks; i++) {
-		if (should_compress)
-			cur_end = min(end, start + SZ_512K - 1);
-		else
-			cur_end = end;
+		cur_end = min(end, start + SZ_512K - 1);
 
 		/*
 		 * igrab is called higher up in the call chain, take only the
@@ -1590,33 +1543,6 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		async_chunk[i].write_flags = write_flags;
 		INIT_LIST_HEAD(&async_chunk[i].extents);
 
-		/*
-		 * The locked_page comes all the way from writepage and its
-		 * the original page we were actually given.  As we spread
-		 * this large delalloc region across multiple async_chunk
-		 * structs, only the first struct needs a pointer to locked_page
-		 *
-		 * This way we don't need racey decisions about who is supposed
-		 * to unlock it.
-		 */
-		if (locked_page) {
-			/*
-			 * Depending on the compressibility, the pages might or
-			 * might not go through async.  We want all of them to
-			 * be accounted against wbc once.  Let's do it here
-			 * before the paths diverge.  wbc accounting is used
-			 * only for foreign writeback detection and doesn't
-			 * need full accuracy.  Just account the whole thing
-			 * against the first page.
-			 */
-			wbc_account_cgroup_owner(wbc, locked_page,
-						 cur_end - start);
-			async_chunk[i].locked_page = locked_page;
-			locked_page = NULL;
-		} else {
-			async_chunk[i].locked_page = NULL;
-		}
-
 		if (blkcg_css != blkcg_root_css) {
 			css_get(blkcg_css);
 			async_chunk[i].blkcg_css = blkcg_css;
@@ -1632,10 +1558,8 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 
 		btrfs_queue_work(fs_info->delalloc_workers, &async_chunk[i].work);
 
-		*nr_written += nr_pages;
 		start = cur_end + 1;
 	}
-	*page_started = 1;
 	return 0;
 }
 
@@ -2238,18 +2162,13 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 		ASSERT(!zoned || btrfs_is_data_reloc_root(inode->root));
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, nr_written);
-	} else if (!btrfs_inode_can_compress(inode) ||
-		   !inode_need_compress(inode, start, end)) {
+	} else {
 		if (zoned)
 			ret = run_delalloc_zoned(inode, locked_page, start, end,
 						 page_started, nr_written);
 		else
 			ret = cow_file_range(inode, locked_page, start, end,
 					     page_started, nr_written, 1, NULL);
-	} else {
-		set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
-		ret = cow_file_range_async(inode, wbc, locked_page, start, end,
-					   page_started, nr_written);
 	}
 	ASSERT(ret <= 0);
 	if (ret)
@@ -7840,14 +7759,68 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
 }
 
+static int btrfs_writepages_async(struct btrfs_inode *inode, struct writeback_control *wbc, u64 start, u64 end)
+{
+	u64 last_start, cur_start = start;
+	u64 cur_end;
+	int ret = 0;
+
+	lock_extent(&inode->io_tree, start, end, NULL);
+
+	while (cur_start < end) {
+		bool found;
+		last_start = cur_start;
+		cur_end = end;
+
+		found = find_lock_delalloc_range(&inode->vfs_inode, NULL, &cur_start, &cur_end);
+		/* Nothing to writeback */
+		if (!found) {
+			unlock_extent(&inode->io_tree, cur_start, cur_end, NULL);
+			cur_start = cur_end + 1;
+			continue;
+		}
+
+		/* A hole with no pages, unlock part therof */
+		if (cur_start > last_start)
+			unlock_extent(&inode->io_tree, last_start, cur_start - 1, NULL);
+
+		/* Got more than we requested for */
+		if (cur_end > end) {
+			if (try_lock_extent(&inode->io_tree, end + 1, cur_end, NULL)) {
+				/* Try writing the whole extent */
+				end = cur_end;
+			} else {
+				/*
+				 * Someone is holding the extent lock.
+				 * Unlock pages from last part of extent, and
+				 * write just as much writepage requested for
+				 */
+				__unlock_for_delalloc(&inode->vfs_inode, NULL, end + 1, cur_end);
+				cur_end = end;
+			}
+		}
+
+		ret = cow_file_range_async(inode, wbc, cur_start, cur_end);
+		if (ret < 0) {
+			unlock_extent(&inode->io_tree, cur_start, end, NULL);
+			break;
+		}
+
+		cur_start = cur_end + 1;
+	}
+
+	return ret;
+}
+
 static int btrfs_writepages(struct address_space *mapping,
 			    struct writeback_control *wbc)
 {
 	u64 start = 0, end = LLONG_MAX;
-	struct inode *inode = mapping->host;
+	struct btrfs_inode *inode = BTRFS_I(mapping->host);
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
 	struct extent_state *cached = NULL;
 	int ret;
-	loff_t isize = i_size_read(inode);
+	loff_t isize = i_size_read(&inode->vfs_inode);
 	struct writeback_control new_wbc = *wbc;
 
 	if (new_wbc.range_cyclic) {
@@ -7864,9 +7837,14 @@ static int btrfs_writepages(struct address_space *mapping,
 	if (start >= end)
 		return 0;
 
-	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
-	ret = extent_writepages(mapping, wbc);
-	unlock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
+	if (btrfs_test_opt(fs_info, COMPRESS) &&
+			btrfs_inode_can_compress(inode)) {
+		ret = btrfs_writepages_async(inode, wbc, start, end);
+	} else {
+		lock_extent(&inode->io_tree, start, end, &cached);
+		ret = extent_writepages(mapping, wbc);
+		unlock_extent(&inode->io_tree, start, end, &cached);
+	}
 
 	if (new_wbc.range_cyclic) {
 		wbc->range_start = new_wbc.range_start;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/21] btrfs: lock extents before pages - defrag
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (10 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 11/21] btrfs: locking extents for async writeback Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 13/21] btrfs: Perform memory faults under locked extent Goldwyn Rodrigues
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

lock and flush the range before performing defrag.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/defrag.c | 48 ++++++++++-------------------------------------
 1 file changed, 10 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 8065341d831a..dff53458bf52 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -721,9 +721,6 @@ static struct page *defrag_prepare_one_page(struct btrfs_inode *inode, pgoff_t i
 {
 	struct address_space *mapping = inode->vfs_inode.i_mapping;
 	gfp_t mask = btrfs_alloc_write_mask(mapping);
-	u64 page_start = (u64)index << PAGE_SHIFT;
-	u64 page_end = page_start + PAGE_SIZE - 1;
-	struct extent_state *cached_state = NULL;
 	struct page *page;
 	int ret;
 
@@ -753,32 +750,6 @@ static struct page *defrag_prepare_one_page(struct btrfs_inode *inode, pgoff_t i
 		return ERR_PTR(ret);
 	}
 
-	/* Wait for any existing ordered extent in the range */
-	while (1) {
-		struct btrfs_ordered_extent *ordered;
-
-		lock_extent(&inode->io_tree, page_start, page_end, &cached_state);
-		ordered = btrfs_lookup_ordered_range(inode, page_start, PAGE_SIZE);
-		unlock_extent(&inode->io_tree, page_start, page_end,
-			      &cached_state);
-		if (!ordered)
-			break;
-
-		unlock_page(page);
-		btrfs_start_ordered_extent(ordered);
-		btrfs_put_ordered_extent(ordered);
-		lock_page(page);
-		/*
-		 * We unlocked the page above, so we need check if it was
-		 * released or not.
-		 */
-		if (page->mapping != mapping || !PagePrivate(page)) {
-			unlock_page(page);
-			put_page(page);
-			goto again;
-		}
-	}
-
 	/*
 	 * Now the page range has no ordered extent any more.  Read the page to
 	 * make it uptodate.
@@ -1076,6 +1047,11 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 	if (!pages)
 		return -ENOMEM;
 
+	/* Lock the pages range */
+	btrfs_lock_and_flush_ordered_range(inode, start_index << PAGE_SHIFT,
+		    (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
+		    &cached_state);
+
 	/* Prepare all pages */
 	for (i = 0; i < nr_pages; i++) {
 		pages[i] = defrag_prepare_one_page(inode, start_index + i);
@@ -1088,10 +1064,6 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 	for (i = 0; i < nr_pages; i++)
 		wait_on_page_writeback(pages[i]);
 
-	/* Lock the pages range */
-	lock_extent(&inode->io_tree, start_index << PAGE_SHIFT,
-		    (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
-		    &cached_state);
 	/*
 	 * Now we have a consistent view about the extent map, re-check
 	 * which range really needs to be defragged.
@@ -1103,7 +1075,7 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 				     newer_than, do_compress, true,
 				     &target_list, last_scanned_ret);
 	if (ret < 0)
-		goto unlock_extent;
+		goto free_pages;
 
 	list_for_each_entry(entry, &target_list, list) {
 		ret = defrag_one_locked_target(inode, entry, pages, nr_pages,
@@ -1116,10 +1088,6 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 		list_del_init(&entry->list);
 		kfree(entry);
 	}
-unlock_extent:
-	unlock_extent(&inode->io_tree, start_index << PAGE_SHIFT,
-		      (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
-		      &cached_state);
 free_pages:
 	for (i = 0; i < nr_pages; i++) {
 		if (pages[i]) {
@@ -1128,6 +1096,10 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 		}
 	}
 	kfree(pages);
+
+	unlock_extent(&inode->io_tree, start_index << PAGE_SHIFT,
+		      (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
+		      &cached_state);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/21] btrfs: Perform memory faults under locked extent
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (11 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 12/21] btrfs: lock extents before pages - defrag Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:24 ` [PATCH 14/21] btrfs: writepage fixup lock rearrangement Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

As a part of locking extents before pages, lock entire memfault region
while servicing faults.

Remove extent locking from page_mkwrite(), since it is part of the
fault.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c  | 18 +++++++++++++++++-
 fs/btrfs/inode.c |  6 ------
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2e835096e3ce..fe1f63456142 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1973,8 +1973,24 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	goto out;
 }
 
+static vm_fault_t btrfs_fault(struct vm_fault *vmf)
+{
+	struct extent_state *cached_state = NULL;
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	u64 page_start = round_down(vmf->pgoff, PAGE_SIZE);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	vm_fault_t ret;
+
+	lock_extent(io_tree, page_start, page_end, &cached_state);
+	ret = filemap_fault(vmf);
+	unlock_extent(io_tree, page_start, page_end, &cached_state);
+
+	return ret;
+}
+
 static const struct vm_operations_struct btrfs_file_vm_ops = {
-	.fault		= filemap_fault,
+	.fault		= btrfs_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= btrfs_page_mkwrite,
 };
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fb02b2b3ac2e..ed3553ff2c31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8132,7 +8132,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	struct page *page = vmf->page;
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
 	struct extent_changeset *data_reserved = NULL;
@@ -8187,11 +8186,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	}
 	wait_on_page_writeback(page);
 
-	lock_extent(io_tree, page_start, page_end, &cached_state);
 	ret2 = set_page_extent_mapped(page);
 	if (ret2 < 0) {
 		ret = vmf_error(ret2);
-		unlock_extent(io_tree, page_start, page_end, &cached_state);
 		goto out_unlock;
 	}
 
@@ -8202,7 +8199,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
 			PAGE_SIZE);
 	if (ordered) {
-		unlock_extent(io_tree, page_start, page_end, &cached_state);
 		unlock_page(page);
 		up_read(&BTRFS_I(inode)->i_mmap_lock);
 		btrfs_start_ordered_extent(ordered);
@@ -8235,7 +8231,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
 					&cached_state);
 	if (ret2) {
-		unlock_extent(io_tree, page_start, page_end, &cached_state);
 		ret = VM_FAULT_SIGBUS;
 		goto out_unlock;
 	}
@@ -8255,7 +8250,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 
 	btrfs_set_inode_last_sub_trans(BTRFS_I(inode));
 
-	unlock_extent(io_tree, page_start, page_end, &cached_state);
 	up_read(&BTRFS_I(inode)->i_mmap_lock);
 
 	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/21] btrfs: writepage fixup lock rearrangement
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (12 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 13/21] btrfs: Perform memory faults under locked extent Goldwyn Rodrigues
@ 2023-03-02 22:24 ` Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 15/21] btrfs: lock extent before pages for encoded read ioctls Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Perform extent lock before pages while performing
writepage_fixup_worker.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ed3553ff2c31..f879c65ee8cc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2722,6 +2722,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	u64 page_end;
 	int ret = 0;
 	bool free_delalloc_space = true;
+	bool flushed = false;
 
 	fixup = container_of(work, struct btrfs_writepage_fixup, work);
 	page = fixup->page;
@@ -2733,9 +2734,16 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	 * This is similar to page_mkwrite, we need to reserve the space before
 	 * we take the page lock.
 	 */
+reserve:
 	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
 					   PAGE_SIZE);
+	if (ret == -EDQUOT && !flushed) {
+		btrfs_qgroup_flush(inode->root);
+		flushed = true;
+		goto reserve;
+	}
 again:
+	lock_extent(&inode->io_tree, page_start, page_end, NULL);
 	lock_page(page);
 
 	/*
@@ -2778,19 +2786,18 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	if (ret)
 		goto out_page;
 
-	lock_extent(&inode->io_tree, page_start, page_end, &cached_state);
-
 	/* already ordered? We're done */
 	if (PageOrdered(page))
 		goto out_reserved;
 
 	ordered = btrfs_lookup_ordered_range(inode, page_start, PAGE_SIZE);
 	if (ordered) {
-		unlock_extent(&inode->io_tree, page_start, page_end,
-			      &cached_state);
 		unlock_page(page);
+		unlock_extent(&inode->io_tree, page_start, page_end,
+			      NULL);
 		btrfs_start_ordered_extent(ordered);
 		btrfs_put_ordered_extent(ordered);
+
 		goto again;
 	}
 
@@ -2813,7 +2820,6 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	if (free_delalloc_space)
 		btrfs_delalloc_release_space(inode, data_reserved, page_start,
 					     PAGE_SIZE, true);
-	unlock_extent(&inode->io_tree, page_start, page_end, &cached_state);
 out_page:
 	if (ret) {
 		/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/21] btrfs: lock extent before pages for encoded read ioctls
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (13 preceding siblings ...)
  2023-03-02 22:24 ` [PATCH 14/21] btrfs: writepage fixup lock rearrangement Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 16/21] btrfs: lock extent before pages in encoded write Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock extent before pages while performing read ioctls.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f879c65ee8cc..729def5969d8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9839,13 +9839,11 @@ static ssize_t btrfs_encoded_read_inline(
 				u64 lockend,
 				struct extent_state **cached_state,
 				u64 extent_start, size_t count,
-				struct btrfs_ioctl_encoded_io_args *encoded,
-				bool *unlocked)
+				struct btrfs_ioctl_encoded_io_args *encoded)
 {
 	struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp));
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
-	struct extent_io_tree *io_tree = &inode->io_tree;
 	struct btrfs_path *path;
 	struct extent_buffer *leaf;
 	struct btrfs_file_extent_item *item;
@@ -9907,9 +9905,6 @@ static ssize_t btrfs_encoded_read_inline(
 	}
 	read_extent_buffer(leaf, tmp, ptr, count);
 	btrfs_release_path(path);
-	unlock_extent(io_tree, start, lockend, cached_state);
-	btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
-	*unlocked = true;
 
 	ret = copy_to_iter(tmp, count, iter);
 	if (ret != count)
@@ -10003,11 +9998,9 @@ static ssize_t btrfs_encoded_read_regular(struct kiocb *iocb,
 					  u64 start, u64 lockend,
 					  struct extent_state **cached_state,
 					  u64 disk_bytenr, u64 disk_io_size,
-					  size_t count, bool compressed,
-					  bool *unlocked)
+					  size_t count, bool compressed)
 {
 	struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp));
-	struct extent_io_tree *io_tree = &inode->io_tree;
 	struct page **pages;
 	unsigned long nr_pages, i;
 	u64 cur;
@@ -10029,10 +10022,6 @@ static ssize_t btrfs_encoded_read_regular(struct kiocb *iocb,
 	if (ret)
 		goto out;
 
-	unlock_extent(io_tree, start, lockend, cached_state);
-	btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
-	*unlocked = true;
-
 	if (compressed) {
 		i = 0;
 		page_offset = 0;
@@ -10075,7 +10064,6 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 	u64 start, lockend, disk_bytenr, disk_io_size;
 	struct extent_state *cached_state = NULL;
 	struct extent_map *em;
-	bool unlocked = false;
 
 	file_accessed(iocb->ki_filp);
 
@@ -10126,7 +10114,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 		em = NULL;
 		ret = btrfs_encoded_read_inline(iocb, iter, start, lockend,
 						&cached_state, extent_start,
-						count, encoded, &unlocked);
+						count, encoded);
 		goto out;
 	}
 
@@ -10179,9 +10167,6 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 	em = NULL;
 
 	if (disk_bytenr == EXTENT_MAP_HOLE) {
-		unlock_extent(io_tree, start, lockend, &cached_state);
-		btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
-		unlocked = true;
 		ret = iov_iter_zero(count, iter);
 		if (ret != count)
 			ret = -EFAULT;
@@ -10189,8 +10174,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 		ret = btrfs_encoded_read_regular(iocb, iter, start, lockend,
 						 &cached_state, disk_bytenr,
 						 disk_io_size, count,
-						 encoded->compression,
-						 &unlocked);
+						 encoded->compression);
 	}
 
 out:
@@ -10199,11 +10183,9 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 out_em:
 	free_extent_map(em);
 out_unlock_extent:
-	if (!unlocked)
-		unlock_extent(io_tree, start, lockend, &cached_state);
+	unlock_extent(io_tree, start, lockend, &cached_state);
 out_unlock_inode:
-	if (!unlocked)
-		btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
+	btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 16/21] btrfs: lock extent before pages in encoded write
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (14 preceding siblings ...)
  2023-03-02 22:25 ` [PATCH 15/21] btrfs: lock extent before pages for encoded read ioctls Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 17/21] btrfs: btree_writepages lock extents before pages Goldwyn Rodrigues
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Josef Bacik

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock the extent range while performing direct encoded writes, as opposed
to individual pages.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 52 +++++++++++++++++++++++++-----------------------
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 729def5969d8..70cf852a3efd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10289,37 +10289,18 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	pages = kvcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL_ACCOUNT);
 	if (!pages)
 		return -ENOMEM;
-	for (i = 0; i < nr_pages; i++) {
-		size_t bytes = min_t(size_t, PAGE_SIZE, iov_iter_count(from));
-		char *kaddr;
-
-		pages[i] = alloc_page(GFP_KERNEL_ACCOUNT);
-		if (!pages[i]) {
-			ret = -ENOMEM;
-			goto out_pages;
-		}
-		kaddr = kmap_local_page(pages[i]);
-		if (copy_from_iter(kaddr, bytes, from) != bytes) {
-			kunmap_local(kaddr);
-			ret = -EFAULT;
-			goto out_pages;
-		}
-		if (bytes < PAGE_SIZE)
-			memset(kaddr + bytes, 0, PAGE_SIZE - bytes);
-		kunmap_local(kaddr);
-	}
 
 	for (;;) {
 		struct btrfs_ordered_extent *ordered;
 
 		ret = btrfs_wait_ordered_range(&inode->vfs_inode, start, num_bytes);
 		if (ret)
-			goto out_pages;
+			goto out;
 		ret = invalidate_inode_pages2_range(inode->vfs_inode.i_mapping,
 						    start >> PAGE_SHIFT,
 						    end >> PAGE_SHIFT);
 		if (ret)
-			goto out_pages;
+			goto out;
 		lock_extent(io_tree, start, end, &cached_state);
 		ordered = btrfs_lookup_ordered_range(inode, start, num_bytes);
 		if (!ordered &&
@@ -10331,6 +10312,26 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 		cond_resched();
 	}
 
+	for (i = 0; i < nr_pages; i++) {
+		size_t bytes = min_t(size_t, PAGE_SIZE, iov_iter_count(from));
+		char *kaddr;
+
+		pages[i] = alloc_page(GFP_KERNEL_ACCOUNT);
+		if (!pages[i]) {
+			ret = -ENOMEM;
+			goto out_pages;
+		}
+		kaddr = kmap_local_page(pages[i]);
+		if (copy_from_iter(kaddr, bytes, from) != bytes) {
+			kunmap_local(kaddr);
+			ret = -EFAULT;
+			goto out_pages;
+		}
+		if (bytes < PAGE_SIZE)
+			memset(kaddr + bytes, 0, PAGE_SIZE - bytes);
+		kunmap_local(kaddr);
+	}
+
 	/*
 	 * We don't use the higher-level delalloc space functions because our
 	 * num_bytes and disk_num_bytes are different.
@@ -10389,8 +10390,6 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	if (start + encoded->len > inode->vfs_inode.i_size)
 		i_size_write(&inode->vfs_inode, start + encoded->len);
 
-	unlock_extent(io_tree, start, end, &cached_state);
-
 	btrfs_delalloc_release_extents(inode, num_bytes);
 
 	if (btrfs_submit_compressed_write(inode, start, num_bytes, ins.objectid,
@@ -10400,6 +10399,9 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 		ret = -EIO;
 		goto out_pages;
 	}
+
+	unlock_extent(io_tree, start, end, &cached_state);
+
 	ret = orig_count;
 	goto out;
 
@@ -10419,14 +10421,14 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	 */
 	if (!extent_reserved)
 		btrfs_free_reserved_data_space_noquota(fs_info, disk_num_bytes);
-out_unlock:
-	unlock_extent(io_tree, start, end, &cached_state);
 out_pages:
 	for (i = 0; i < nr_pages; i++) {
 		if (pages[i])
 			__free_page(pages[i]);
 	}
 	kvfree(pages);
+out_unlock:
+	unlock_extent(io_tree, start, end, &cached_state);
 out:
 	if (ret >= 0)
 		iocb->ki_pos += encoded->len;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 17/21] btrfs: btree_writepages lock extents before pages
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (15 preceding siblings ...)
  2023-03-02 22:25 ` [PATCH 16/21] btrfs: lock extent before pages in encoded write Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 18/21] btrfs: check if writeback pages exist before starting writeback Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock extents before pages while performing btree_writepages().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c2b954134851..5164bb9f6e2d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -725,8 +725,25 @@ static int btree_migrate_folio(struct address_space *mapping,
 static int btree_writepages(struct address_space *mapping,
 			    struct writeback_control *wbc)
 {
+	u64 start, end;
+	struct btrfs_inode *inode = BTRFS_I(mapping->host);
+        struct extent_state *cached = NULL;
 	struct btrfs_fs_info *fs_info;
 	int ret;
+	u64 isize = round_up(i_size_read(&inode->vfs_inode), PAGE_SIZE) - 1;
+
+	if (wbc->range_cyclic) {
+		start = mapping->writeback_index << PAGE_SHIFT;
+		end = isize;
+	} else {
+		start = round_down(wbc->range_start, PAGE_SIZE);
+		end = round_up(wbc->range_end, PAGE_SIZE) - 1;
+		end = min(isize, end);
+	}
+
+	if (start >= end)
+		return 0;
+
 
 	if (wbc->sync_mode == WB_SYNC_NONE) {
 
@@ -741,7 +758,12 @@ static int btree_writepages(struct address_space *mapping,
 		if (ret < 0)
 			return 0;
 	}
-	return btree_write_cache_pages(mapping, wbc);
+
+	lock_extent(&inode->io_tree, start, end, &cached);
+	ret = btree_write_cache_pages(mapping, wbc);
+	unlock_extent(&inode->io_tree, start, end, &cached);
+
+	return ret;
 }
 
 static bool btree_release_folio(struct folio *folio, gfp_t gfp_flags)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 18/21] btrfs: check if writeback pages exist before starting writeback
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (16 preceding siblings ...)
  2023-03-02 22:25 ` [PATCH 17/21] btrfs: btree_writepages lock extents before pages Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 19/21] btrfs: lock extents before pages in relocation Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Check if there are still pages to writeback after locking. This avoids
calls to check for extents.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 70cf852a3efd..c4e5eb5d9ee4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7773,6 +7773,11 @@ static int btrfs_writepages_async(struct btrfs_inode *inode, struct writeback_co
 
 	lock_extent(&inode->io_tree, start, end, NULL);
 
+	if (!filemap_range_has_writeback(inode->vfs_inode.i_mapping, start, end)) {
+		unlock_extent(&inode->io_tree, start, end, NULL);
+		return 0;
+	}
+
 	while (cur_start < end) {
 		bool found;
 		last_start = cur_start;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 19/21] btrfs: lock extents before pages in relocation
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (17 preceding siblings ...)
  2023-03-02 22:25 ` [PATCH 18/21] btrfs: check if writeback pages exist before starting writeback Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold() Goldwyn Rodrigues
  2023-03-02 22:25 ` [PATCH 21/21] btrfs: debug extent locking Goldwyn Rodrigues
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

While relocating extents, lock the extents first. The locking is
performed before setup_relocation_extent() and unlocked after all pages
have been set as dirty.

All allocation is consolidated into one call to reserve metadata. Call
balance dirty pages outside of locks.

Q: This rearranges the sequence of calls. Not sure if this is correct.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/relocation.c | 44 +++++++++++++++++++------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ef13a9d4e370..f15e9b1bfc45 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2911,7 +2911,6 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 				u64 start, u64 end, u64 block_start)
 {
 	struct extent_map *em;
-	struct extent_state *cached_state = NULL;
 	int ret = 0;
 
 	em = alloc_extent_map();
@@ -2924,9 +2923,7 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 	em->block_start = block_start;
 	set_bit(EXTENT_FLAG_PINNED, &em->flags);
 
-	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
 	ret = btrfs_replace_extent_map_range(BTRFS_I(inode), em, false);
-	unlock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
 	free_extent_map(em);
 
 	return ret;
@@ -2971,8 +2968,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
 	ASSERT(page_index <= last_index);
 	page = find_lock_page(inode->i_mapping, page_index);
 	if (!page) {
-		page_cache_sync_readahead(inode->i_mapping, ra, NULL,
-				page_index, last_index + 1 - page_index);
 		page = find_or_create_page(inode->i_mapping, page_index, mask);
 		if (!page)
 			return -ENOMEM;
@@ -2981,11 +2976,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
 	if (ret < 0)
 		goto release_page;
 
-	if (PageReadahead(page))
-		page_cache_async_readahead(inode->i_mapping, ra, NULL,
-				page_folio(page), page_index,
-				last_index + 1 - page_index);
-
 	if (!PageUptodate(page)) {
 		btrfs_read_folio(NULL, page_folio(page));
 		lock_page(page);
@@ -3012,16 +3002,7 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
 		u64 clamped_end = min(page_end, extent_end);
 		u32 clamped_len = clamped_end + 1 - clamped_start;
 
-		/* Reserve metadata for this range */
-		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
-						      clamped_len, clamped_len,
-						      false);
-		if (ret)
-			goto release_page;
-
 		/* Mark the range delalloc and dirty for later writeback */
-		lock_extent(&BTRFS_I(inode)->io_tree, clamped_start, clamped_end,
-			    &cached_state);
 		ret = btrfs_set_extent_delalloc(BTRFS_I(inode), clamped_start,
 						clamped_end, 0, &cached_state);
 		if (ret) {
@@ -3055,9 +3036,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
 					boundary_start, boundary_end,
 					EXTENT_BOUNDARY);
 		}
-		unlock_extent(&BTRFS_I(inode)->io_tree, clamped_start, clamped_end,
-			      &cached_state);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), clamped_len);
 		cur += clamped_len;
 
 		/* Crossed extent end, go to next extent */
@@ -3071,7 +3049,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
 	unlock_page(page);
 	put_page(page);
 
-	balance_dirty_pages_ratelimited(inode->i_mapping);
 	btrfs_throttle(fs_info);
 	if (btrfs_should_cancel_balance(fs_info))
 		ret = -ECANCELED;
@@ -3092,6 +3069,10 @@ static int relocate_file_extent_cluster(struct inode *inode,
 	struct file_ra_state *ra;
 	int cluster_nr = 0;
 	int ret = 0;
+	u64 start = cluster->start - offset;
+	u64 end = cluster->end - offset;
+	loff_t len = end + 1 - start;
+	struct extent_state *cached_state = NULL;
 
 	if (!cluster->nr)
 		return 0;
@@ -3106,17 +3087,30 @@ static int relocate_file_extent_cluster(struct inode *inode,
 
 	file_ra_state_init(ra, inode->i_mapping);
 
-	ret = setup_relocation_extent_mapping(inode, cluster->start - offset,
-				   cluster->end - offset, cluster->start);
+	page_cache_sync_readahead(inode->i_mapping, ra, NULL,
+			start >> PAGE_SHIFT, len >> PAGE_SHIFT);
+
+	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len, len , false);
 	if (ret)
 		goto out;
 
+	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
+
+	ret = setup_relocation_extent_mapping(inode, start, end, cluster->start);
+	if (ret)
+		goto unlock;
+
 	last_index = (cluster->end - offset) >> PAGE_SHIFT;
 	for (index = (cluster->start - offset) >> PAGE_SHIFT;
 	     index <= last_index && !ret; index++)
 		ret = relocate_one_page(inode, ra, cluster, &cluster_nr, index);
 	if (ret == 0)
 		WARN_ON(cluster_nr != cluster->nr);
+unlock:
+	unlock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), len);
+
+	balance_dirty_pages_ratelimited(inode->i_mapping);
 out:
 	kfree(ra);
 	return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold()
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (18 preceding siblings ...)
  2023-03-02 22:25 ` [PATCH 19/21] btrfs: lock extents before pages in relocation Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  2023-03-07 17:06   ` Christoph Hellwig
  2023-03-02 22:25 ` [PATCH 21/21] btrfs: debug extent locking Goldwyn Rodrigues
  20 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

I am not sure of this patch, but put it to avoid the WARN_ON() in
ihold().  I am not sure why the i_count would drop below one at this
point of time since this is still called within writepages context.

Perhaps, there is a better way to solve this?
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c4e5eb5d9ee4..b5f5c1896dbb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1535,7 +1535,7 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		 * igrab is called higher up in the call chain, take only the
 		 * lightweight reference for the callback lifetime
 		 */
-		ihold(&inode->vfs_inode);
+		atomic_inc(&inode->vfs_inode.i_count);
 		async_chunk[i].async_cow = ctx;
 		async_chunk[i].inode = inode;
 		async_chunk[i].start = start;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 21/21] btrfs: debug extent locking
       [not found] <cover.1677793433.git.rgoldwyn@suse.com>
                   ` (19 preceding siblings ...)
  2023-03-02 22:25 ` [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold() Goldwyn Rodrigues
@ 2023-03-02 22:25 ` Goldwyn Rodrigues
  20 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-02 22:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This is the patch I used to figure out who locked the extent before the
deadlock. While this patch is not required, it may be helpful to debug
extent based deadlocks.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent-io-tree.c    | 27 +++++++++++----------
 fs/btrfs/extent-io-tree.h    | 46 +++++++++++++++++++++---------------
 fs/btrfs/extent_io.c         |  2 +-
 fs/btrfs/extent_map.c        |  2 +-
 fs/btrfs/inode.c             |  4 ++--
 fs/btrfs/ordered-data.c      |  8 +++----
 fs/btrfs/ordered-data.h      |  3 ++-
 include/trace/events/btrfs.h | 18 +++++++++-----
 8 files changed, 63 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index d467c614c84e..25faf587f050 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -545,7 +545,7 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
  *
  * This takes the tree lock, and returns 0 on success and < 0 on error.
  */
-int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+int __clear_extent_bit(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		       u32 bits, struct extent_state **cached_state,
 		       gfp_t mask, struct extent_changeset *changeset)
 {
@@ -559,7 +559,7 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	int delete = (bits & EXTENT_CLEAR_ALL_BITS);
 
 	btrfs_debug_check_extent_io_range(tree, start, end);
-	trace_btrfs_clear_extent_bit(tree, start, end - start + 1, bits);
+	trace_btrfs_clear_extent_bit(tree, func, start, end - start + 1, bits);
 
 	if (delete)
 		bits |= ~EXTENT_CTLBITS;
@@ -965,7 +965,7 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
  *
  * [start, end] is inclusive This takes the tree lock.
  */
-static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+static int __set_extent_bit(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 			    u32 bits, u64 *failed_start,
 			    struct extent_state **failed_state,
 			    struct extent_state **cached_state,
@@ -981,7 +981,7 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	u32 exclusive_bits = (bits & EXTENT_LOCKED);
 
 	btrfs_debug_check_extent_io_range(tree, start, end);
-	trace_btrfs_set_extent_bit(tree, start, end - start + 1, bits);
+	trace_btrfs_set_extent_bit(tree, func, start, end - start + 1, bits);
 
 	if (exclusive_bits)
 		ASSERT(failed_start);
@@ -1188,10 +1188,10 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 }
 
-int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+int set_extent_bit(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		   u32 bits, struct extent_state **cached_state, gfp_t mask)
 {
-	return __set_extent_bit(tree, start, end, bits, NULL, NULL,
+	return __set_extent_bit(tree, func, start, end, bits, NULL, NULL,
 				cached_state, NULL, mask);
 }
 
@@ -1688,7 +1688,7 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 	 */
 	ASSERT(!(bits & EXTENT_LOCKED));
 
-	return __set_extent_bit(tree, start, end, bits, NULL, NULL, NULL,
+	return __set_extent_bit(tree, NULL, start, end, bits, NULL, NULL, NULL,
 				changeset, GFP_NOFS);
 }
 
@@ -1701,19 +1701,20 @@ int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 	 */
 	ASSERT(!(bits & EXTENT_LOCKED));
 
-	return __clear_extent_bit(tree, start, end, bits, NULL, GFP_NOFS,
+	return __clear_extent_bit(tree, __func__, start, end, bits, NULL, GFP_NOFS,
 				  changeset);
 }
 
-int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
+int __try_lock_extent(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		    struct extent_state **cached)
 {
 	int err;
 	u64 failed_start;
 
 	WARN_ON(start > end);
-	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
+	err = __set_extent_bit(tree, func, start, end, EXTENT_LOCKED, &failed_start,
 			       NULL, cached, NULL, GFP_NOFS);
+
 	if (err == -EEXIST) {
 		if (failed_start > start)
 			clear_extent_bit(tree, start, failed_start - 1,
@@ -1727,7 +1728,7 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
  * Either insert or lock state struct between start and end use mask to tell
  * us if waiting is desired.
  */
-int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
+int __lock_extent(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		struct extent_state **cached_state)
 {
 	struct extent_state *failed_state = NULL;
@@ -1735,7 +1736,7 @@ int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
 	u64 failed_start;
 
 	WARN_ON(start > end);
-	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
+	err = __set_extent_bit(tree, func, start, end, EXTENT_LOCKED, &failed_start,
 			       &failed_state, cached_state, NULL, GFP_NOFS);
 	while (err == -EEXIST) {
 		if (failed_start != start)
@@ -1744,7 +1745,7 @@ int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
 
 		wait_extent_bit(tree, failed_start, end, EXTENT_LOCKED,
 				&failed_state);
-		err = __set_extent_bit(tree, start, end, EXTENT_LOCKED,
+		err = __set_extent_bit(tree, func, start, end, EXTENT_LOCKED,
 				       &failed_start, &failed_state,
 				       cached_state, NULL, GFP_NOFS);
 	}
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 21766e49ec02..2ad38b43baaf 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -107,12 +107,15 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 			 struct extent_io_tree *tree, unsigned int owner);
 void extent_io_tree_release(struct extent_io_tree *tree);
 
-int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
+int __lock_extent(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		struct extent_state **cached);
 
-int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
+int __try_lock_extent(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		    struct extent_state **cached);
 
+#define lock_extent(t, s, e, c) __lock_extent(t, __func__, s, e, c)
+#define try_lock_extent(t, s, e, c) __try_lock_extent(t, __func__, s, e, c)
+
 int __init extent_state_init_cachep(void);
 void __cold extent_state_free_cachep(void);
 
@@ -126,25 +129,30 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   u32 bits, int filled, struct extent_state *cached_state);
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 			     u32 bits, struct extent_changeset *changeset);
-int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+int __clear_extent_bit(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		       u32 bits, struct extent_state **cached, gfp_t mask,
 		       struct extent_changeset *changeset);
 
-static inline int clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				   u64 end, u32 bits,
-				   struct extent_state **cached)
+#define clear_extent_bit(t, s, e, b, c) __clear_extent_bit(t, __func__, s, e, b, c, GFP_NOFS, NULL)
+
+static inline int __unlock_extent(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
+				struct extent_state **cached)
 {
-	return __clear_extent_bit(tree, start, end, bits, cached,
+	return __clear_extent_bit(tree, func, start, end, EXTENT_LOCKED, cached,
 				  GFP_NOFS, NULL);
 }
 
-static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end,
-				struct extent_state **cached)
+static inline int __unlock_extent_atomic(struct extent_io_tree *tree, const char *func, u64 start,
+				       u64 end, struct extent_state **cached)
 {
-	return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached,
-				  GFP_NOFS, NULL);
+	return __clear_extent_bit(tree, func, start, end, EXTENT_LOCKED, cached,
+				  GFP_ATOMIC, NULL);
 }
 
+#define unlock_extent(t, s, e, c) __unlock_extent(t, __func__, s, e, c)
+#define unlock_extent_atomic(t, s, e, c) __unlock_extent_atomic(t, __func__, s, e, c)
+
+
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 				    u64 end, u32 bits)
 {
@@ -153,32 +161,32 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 			   u32 bits, struct extent_changeset *changeset);
-int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+int set_extent_bit(struct extent_io_tree *tree, const char *func, u64 start, u64 end,
 		   u32 bits, struct extent_state **cached_state, gfp_t mask);
 
 static inline int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start,
 					 u64 end, u32 bits)
 {
-	return set_extent_bit(tree, start, end, bits, NULL, GFP_NOWAIT);
+	return set_extent_bit(tree, __func__, start, end, bits, NULL, GFP_NOWAIT);
 }
 
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
 		u64 end, u32 bits)
 {
-	return set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS);
+	return set_extent_bit(tree, __func__, start, end, bits, NULL, GFP_NOFS);
 }
 
 static inline int clear_extent_uptodate(struct extent_io_tree *tree, u64 start,
 		u64 end, struct extent_state **cached_state)
 {
-	return __clear_extent_bit(tree, start, end, EXTENT_UPTODATE,
+	return __clear_extent_bit(tree, __func__, start, end, EXTENT_UPTODATE,
 				  cached_state, GFP_NOFS, NULL);
 }
 
 static inline int set_extent_dirty(struct extent_io_tree *tree, u64 start,
 		u64 end, gfp_t mask)
 {
-	return set_extent_bit(tree, start, end, EXTENT_DIRTY, NULL, mask);
+	return set_extent_bit(tree, __func__, start, end, EXTENT_DIRTY, NULL, mask);
 }
 
 static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start,
@@ -197,7 +205,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
 				      u64 end, u32 extra_bits,
 				      struct extent_state **cached_state)
 {
-	return set_extent_bit(tree, start, end,
+	return set_extent_bit(tree, __func__, start, end,
 			      EXTENT_DELALLOC | extra_bits,
 			      cached_state, GFP_NOFS);
 }
@@ -205,7 +213,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
 static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
 		u64 end, struct extent_state **cached_state)
 {
-	return set_extent_bit(tree, start, end,
+	return set_extent_bit(tree, __func__, start, end,
 			      EXTENT_DELALLOC | EXTENT_DEFRAG,
 			      cached_state, GFP_NOFS);
 }
@@ -213,7 +221,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
 static inline int set_extent_new(struct extent_io_tree *tree, u64 start,
 		u64 end)
 {
-	return set_extent_bit(tree, start, end, EXTENT_NEW, NULL, GFP_NOFS);
+	return set_extent_bit(tree, __func__, start, end, EXTENT_NEW, NULL, GFP_NOFS);
 }
 
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 12aa7eaf12c5..fa2fedc9577f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2772,7 +2772,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
 		 * The delalloc new bit will be cleared by ordered extent
 		 * completion.
 		 */
-		ret = __clear_extent_bit(tree, start, end, clear_bits, NULL,
+		ret = __clear_extent_bit(tree, __func__, start, end, clear_bits, NULL,
 					 mask, NULL);
 
 		/* if clear_extent_bit failed for enomem reasons,
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index be94030e1dfb..d255d31130db 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -379,7 +379,7 @@ static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits)
 		struct btrfs_io_stripe *stripe = &map->stripes[i];
 		struct btrfs_device *device = stripe->dev;
 
-		__clear_extent_bit(&device->alloc_state, stripe->physical,
+		__clear_extent_bit(&device->alloc_state, __func__, stripe->physical,
 				   stripe->physical + stripe_size - 1, bits,
 				   NULL, GFP_NOWAIT, NULL);
 	}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b5f5c1896dbb..eebced86bd70 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2663,7 +2663,7 @@ static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
 		if (em_len > search_len)
 			em_len = search_len;
 
-		ret = set_extent_bit(&inode->io_tree, search_start,
+		ret = set_extent_bit(&inode->io_tree, __func__, search_start,
 				     search_start + em_len - 1,
 				     EXTENT_DELALLOC_NEW, cached_state,
 				     GFP_NOFS);
@@ -4779,7 +4779,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	btrfs_page_set_dirty(fs_info, page, block_start, block_end + 1 - block_start);
 
 	if (only_release_metadata)
-		set_extent_bit(&inode->io_tree, block_start, block_end,
+		set_extent_bit(&inode->io_tree, __func__, block_start, block_end,
 			       EXTENT_NORESERVE, NULL, GFP_NOFS);
 
 out_unlock:
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 6c24b69e2d0a..9854494d3bcf 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -231,8 +231,8 @@ int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
 			   &entry->rb_node);
 	if (node)
 		btrfs_panic(fs_info, -EEXIST,
-				"inconsistency in ordered tree at offset %llu",
-				file_offset);
+				"inconsistency in ordered tree ino %lu at offset %llu",
+				inode->vfs_inode.i_ino, file_offset);
 	spin_unlock_irq(&tree->lock);
 
 	spin_lock(&root->ordered_extent_lock);
@@ -1032,7 +1032,7 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
  * Always return with the given range locked, ensuring after it's called no
  * order extent can be pending.
  */
-void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
+void __btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, const char *func, u64 start,
 					u64 end,
 					struct extent_state **cached_state)
 {
@@ -1044,7 +1044,7 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 		cachedp = cached_state;
 
 	while (1) {
-		lock_extent(&inode->io_tree, start, end, cachedp);
+		__lock_extent(&inode->io_tree, func, start, end, cachedp);
 		ordered = btrfs_lookup_ordered_range(inode, start,
 						     end - start + 1);
 		if (!ordered) {
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index eb40cb39f842..e426aeda71d5 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -202,9 +202,10 @@ u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
 			       const u64 range_start, const u64 range_len);
 void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
 			      const u64 range_start, const u64 range_len);
-void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
+void __btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, const char *func, u64 start,
 					u64 end,
 					struct extent_state **cached_state);
+#define btrfs_lock_and_flush_ordered_range(i, s, e, c) __btrfs_lock_and_flush_ordered_range(i, __func__, s, e, c)
 bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct extent_state **cached_state);
 int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 8ea9cea9bfeb..d4f4a415d085 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -2059,12 +2059,14 @@ DEFINE_EVENT(btrfs__block_group, btrfs_skip_unused_block_group,
 
 TRACE_EVENT(btrfs_set_extent_bit,
 	TP_PROTO(const struct extent_io_tree *tree,
+		const char *func,
 		 u64 start, u64 len, unsigned set_bits),
 
-	TP_ARGS(tree, start, len, set_bits),
+	TP_ARGS(tree, func, start, len, set_bits),
 
 	TP_STRUCT__entry_btrfs(
 		__field(	unsigned,	owner	)
+		__string(	func,		func	)
 		__field(	u64,		ino	)
 		__field(	u64,		rootid	)
 		__field(	u64,		start	)
@@ -2083,26 +2085,29 @@ TRACE_EVENT(btrfs_set_extent_bit,
 			__entry->ino	= 0;
 			__entry->rootid	= 0;
 		}
+		__assign_str(func, func);
 		__entry->start		= start;
 		__entry->len		= len;
 		__entry->set_bits	= set_bits;
 	),
 
 	TP_printk_btrfs(
-		"io_tree=%s ino=%llu root=%llu start=%llu len=%llu set_bits=%s",
-		__print_symbolic(__entry->owner, IO_TREE_OWNER), __entry->ino,
+		"io_tree=%s func=%s ino=%llu root=%llu start=%llu len=%llu set_bits=%s",
+		__print_symbolic(__entry->owner, IO_TREE_OWNER), __get_str(func), __entry->ino,
 		__entry->rootid, __entry->start, __entry->len,
 		__print_flags(__entry->set_bits, "|", EXTENT_FLAGS))
 );
 
 TRACE_EVENT(btrfs_clear_extent_bit,
 	TP_PROTO(const struct extent_io_tree *tree,
+		const char *func,
 		 u64 start, u64 len, unsigned clear_bits),
 
-	TP_ARGS(tree, start, len, clear_bits),
+	TP_ARGS(tree, func, start, len, clear_bits),
 
 	TP_STRUCT__entry_btrfs(
 		__field(	unsigned,	owner	)
+		__string(	func,		func	)
 		__field(	u64,		ino	)
 		__field(	u64,		rootid	)
 		__field(	u64,		start	)
@@ -2121,14 +2126,15 @@ TRACE_EVENT(btrfs_clear_extent_bit,
 			__entry->ino	= 0;
 			__entry->rootid	= 0;
 		}
+		__assign_str(func, func)
 		__entry->start		= start;
 		__entry->len		= len;
 		__entry->clear_bits	= clear_bits;
 	),
 
 	TP_printk_btrfs(
-		"io_tree=%s ino=%llu root=%llu start=%llu len=%llu clear_bits=%s",
-		__print_symbolic(__entry->owner, IO_TREE_OWNER), __entry->ino,
+		"io_tree=%s func=%s ino=%llu root=%llu start=%llu len=%llu clear_bits=%s",
+		__print_symbolic(__entry->owner, IO_TREE_OWNER), __get_str(func), __entry->ino,
 		__entry->rootid, __entry->start, __entry->len,
 		__print_flags(__entry->clear_bits, "|", EXTENT_FLAGS))
 );
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/21] fs: readahead_begin() to call before locking folio
  2023-03-02 22:24 ` [PATCH 01/21] fs: readahead_begin() to call before locking folio Goldwyn Rodrigues
@ 2023-03-06 16:53   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-03-06 16:53 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, Goldwyn Rodrigues, linux-mm, linux-fsdevel, Matthew Wilcox

On Thu, Mar 02, 2023 at 04:24:46PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The btrfs filesystem needs to lock the extents before locking folios
> to be read from disk. So, introduce a function in
> address_space_operaitons, called btrfs_readahead_begin() which is called
> before the folio are allocateed and locked.

Please Cc the mm and fsdevel and willy on these kinds of changes.

But I'd also like to take this opportunity to ask what the rationale
behind the extent locking for reads in btrfs is to start with.

All other file systems rely on filemap_invalidate_lock_shared for
locking page reads vs invalidates and it seems to work great.  btrfs
creates a lot of overhead with the extent locking, and introduces
a lot of additional trouble like the readahead code here, or the problem
with O_DIRECT writes that read from the same region that Boris recently
fixed.

Maybe we can think really hard and find a way to normalize the locking
and simply both btrfs and common infrastructure?

> ---
>  include/linux/fs.h | 1 +
>  mm/readahead.c     | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c1769a2c5d70..6b650db57ca3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -363,6 +363,7 @@ struct address_space_operations {
>  	/* Mark a folio dirty.  Return true if this dirtied it */
>  	bool (*dirty_folio)(struct address_space *, struct folio *);
>  
> +	void (*readahead_begin)(struct readahead_control *);
>  	void (*readahead)(struct readahead_control *);
>  
>  	int (*write_begin)(struct file *, struct address_space *mapping,
> diff --git a/mm/readahead.c b/mm/readahead.c
> index b10f0cf81d80..6924d5fed350 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -520,6 +520,9 @@ void page_cache_ra_order(struct readahead_control *ractl,
>  			new_order--;
>  	}
>  
> +	if (mapping->a_ops->readahead_begin)
> +		mapping->a_ops->readahead_begin(ractl);
> +
>  	filemap_invalidate_lock_shared(mapping);
>  	while (index <= limit) {
>  		unsigned int order = new_order;
> -- 
> 2.39.2
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/21] btrfs: wait ordered range before locking during truncate
  2023-03-02 22:24 ` [PATCH 06/21] btrfs: wait ordered range before locking during truncate Goldwyn Rodrigues
@ 2023-03-07 17:03   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-03-07 17:03 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: linux-btrfs, Goldwyn Rodrigues, Johannes Thumshirn, Josef Bacik

So, one thing I've been wondering about for a while is why btrfs even
does all these explicit waits for ordered extents.  The ordered_extent
is effectively a mechanism to describe a range of I/O.

So why can't we use the nornal mechanisms to wait for I/O, that is the
completion of writeback for buffered I/O (i.e. filemap_fdatawait*)
and inode_dio_wait for direct I/O?  I've been wanting to look deeper
into this for a while, so this might be a good time to bring it up.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold()
  2023-03-02 22:25 ` [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold() Goldwyn Rodrigues
@ 2023-03-07 17:06   ` Christoph Hellwig
  2023-03-08 23:03     ` Goldwyn Rodrigues
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-03-07 17:06 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Thu, Mar 02, 2023 at 04:25:05PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> I am not sure of this patch, but put it to avoid the WARN_ON() in
> ihold().  I am not sure why the i_count would drop below one at this
> point of time since this is still called within writepages context.
> 
> Perhaps, there is a better way to solve this?

How do you trigger the warning?  Basically i_count could only be
0 when doing writeback from inode evication, and just incrementing
i_count blindly will do the wrong thing there.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/21] btrfs: locking extents for async writeback
  2023-03-02 22:24 ` [PATCH 11/21] btrfs: locking extents for async writeback Goldwyn Rodrigues
@ 2023-03-08 19:13   ` Boris Burkov
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Burkov @ 2023-03-08 19:13 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Thu, Mar 02, 2023 at 04:24:56PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> For async writebacks, lock the extents and then perform the cow file
> range for async. Unlock when async_chunk is free'd.
> 
> Since writeback is performed in range, so locked_page can be removed
> from the structures and function parameters. Similarly for page_started
> and nr_written.
> 
> A writeback could involve a hole, so check if the range locked covers
> the entire extent returned by find_lock_delalloc_range().
> If not try to lock the entire range or unlock the pages locked.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/compression.c |   4 +
>  fs/btrfs/extent_io.c   |  10 +--
>  fs/btrfs/extent_io.h   |   2 +
>  fs/btrfs/inode.c       | 184 ++++++++++++++++++-----------------------
>  4 files changed, 92 insertions(+), 108 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index b0dd01e31078..a8fa7f2049ce 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1424,6 +1424,10 @@ static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end,
>  	curr_sample_pos = 0;
>  	while (index < index_end) {
>  		page = find_get_page(inode->i_mapping, index);
> +		if (!page) {
> +			index++;
> +			continue;
> +		}
>  		in_data = kmap_local_page(page);
>  		/* Handle case where the start is not aligned to PAGE_SIZE */
>  		i = start % PAGE_SIZE;
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index cdce2db82d7e..12aa7eaf12c5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -358,7 +358,7 @@ static int __process_pages_contig(struct address_space *mapping,
>  	return err;
>  }
>  
> -static noinline void __unlock_for_delalloc(struct inode *inode,
> +noinline void __unlock_for_delalloc(struct inode *inode,
>  					   struct page *locked_page,
>  					   u64 start, u64 end)
>  {
> @@ -383,8 +383,7 @@ static noinline int lock_delalloc_pages(struct inode *inode,
>  	u64 processed_end = delalloc_start;
>  	int ret;
>  
> -	ASSERT(locked_page);
> -	if (index == locked_page->index && index == end_index)
> +	if (locked_page && index == locked_page->index && index == end_index)
>  		return 0;
>  
>  	ret = __process_pages_contig(inode->i_mapping, locked_page, delalloc_start,
> @@ -432,8 +431,9 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
>  	ASSERT(orig_end > orig_start);
>  
>  	/* The range should at least cover part of the page */
> -	ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
> -		 orig_end <= page_offset(locked_page)));
> +	if (locked_page)
> +		ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
> +			 orig_end <= page_offset(locked_page)));
>  again:
>  	/* step one, find a bunch of delalloc bytes starting at start */
>  	delalloc_start = *start;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 4341ad978fb8..ddfa100ab629 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -279,6 +279,8 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
>  int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array);
>  
>  void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
> +void __unlock_for_delalloc(struct inode *inode, struct page *locked_page,
> +		u64 start, u64 end);
>  
>  #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>  bool find_lock_delalloc_range(struct inode *inode,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index eeddd7cdff58..fb02b2b3ac2e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -506,7 +506,6 @@ struct async_extent {
>  
>  struct async_chunk {
>  	struct btrfs_inode *inode;
> -	struct page *locked_page;
>  	u64 start;
>  	u64 end;
>  	blk_opf_t write_flags;
> @@ -887,18 +886,6 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
>  		}
>  	}
>  cleanup_and_bail_uncompressed:
> -	/*
> -	 * No compression, but we still need to write the pages in the file
> -	 * we've been given so far.  redirty the locked page if it corresponds
> -	 * to our extent and set things up for the async work queue to run
> -	 * cow_file_range to do the normal delalloc dance.
> -	 */
> -	if (async_chunk->locked_page &&
> -	    (page_offset(async_chunk->locked_page) >= start &&
> -	     page_offset(async_chunk->locked_page)) <= end) {
> -		__set_page_dirty_nobuffers(async_chunk->locked_page);
> -		/* unlocked later on in the async handlers */
> -	}
>  
>  	if (redirty)
>  		extent_range_redirty_for_io(&inode->vfs_inode, start, end);
> @@ -926,8 +913,7 @@ static void free_async_extent_pages(struct async_extent *async_extent)
>  }
>  
>  static int submit_uncompressed_range(struct btrfs_inode *inode,
> -				     struct async_extent *async_extent,
> -				     struct page *locked_page)
> +				     struct async_extent *async_extent)
>  {
>  	u64 start = async_extent->start;
>  	u64 end = async_extent->start + async_extent->ram_size - 1;
> @@ -942,7 +928,7 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
>  	 * Also we call cow_file_range() with @unlock_page == 0, so that we
>  	 * can directly submit them without interruption.
>  	 */
> -	ret = cow_file_range(inode, locked_page, start, end, &page_started,
> +	ret = cow_file_range(inode, NULL, start, end, &page_started,
>  			     &nr_written, 0, NULL);
>  	/* Inline extent inserted, page gets unlocked and everything is done */
>  	if (page_started) {
> @@ -950,23 +936,12 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
>  		goto out;
>  	}
>  	if (ret < 0) {
> -		btrfs_cleanup_ordered_extents(inode, locked_page, start, end - start + 1);
> -		if (locked_page) {
> -			const u64 page_start = page_offset(locked_page);
> -			const u64 page_end = page_start + PAGE_SIZE - 1;
> -
> -			btrfs_page_set_error(inode->root->fs_info, locked_page,
> -					     page_start, PAGE_SIZE);
> -			set_page_writeback(locked_page);
> -			end_page_writeback(locked_page);
> -			end_extent_writepage(locked_page, ret, page_start, page_end);
> -			unlock_page(locked_page);
> -		}
> +		btrfs_cleanup_ordered_extents(inode, NULL, start, end - start + 1);
>  		goto out;
>  	}
>  
>  	ret = extent_write_locked_range(&inode->vfs_inode, start, end);
> -	/* All pages will be unlocked, including @locked_page */
> +	/* All pages will be unlocked */
>  out:
>  	kfree(async_extent);
>  	return ret;
> @@ -980,27 +955,14 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
>  	struct btrfs_root *root = inode->root;
>  	struct btrfs_fs_info *fs_info = root->fs_info;
>  	struct btrfs_key ins;
> -	struct page *locked_page = NULL;
>  	struct extent_map *em;
>  	int ret = 0;
>  	u64 start = async_extent->start;
>  	u64 end = async_extent->start + async_extent->ram_size - 1;
>  
> -	/*
> -	 * If async_chunk->locked_page is in the async_extent range, we need to
> -	 * handle it.
> -	 */
> -	if (async_chunk->locked_page) {
> -		u64 locked_page_start = page_offset(async_chunk->locked_page);
> -		u64 locked_page_end = locked_page_start + PAGE_SIZE - 1;
> -
> -		if (!(start >= locked_page_end || end <= locked_page_start))
> -			locked_page = async_chunk->locked_page;
> -	}
> -
>  	/* We have fall back to uncompressed write */
>  	if (!async_extent->pages)
> -		return submit_uncompressed_range(inode, async_extent, locked_page);
> +		return submit_uncompressed_range(inode, async_extent);
>  
>  	ret = btrfs_reserve_extent(root, async_extent->ram_size,
>  				   async_extent->compressed_size,
> @@ -1476,6 +1438,8 @@ static noinline void async_cow_start(struct btrfs_work *work)
>  
>  	compressed_extents = compress_file_range(async_chunk);
>  	if (compressed_extents == 0) {
> +		unlock_extent(&async_chunk->inode->io_tree,
> +				async_chunk->start, async_chunk->end, NULL);
>  		btrfs_add_delayed_iput(async_chunk->inode);
>  		async_chunk->inode = NULL;
>  	}
> @@ -1515,11 +1479,15 @@ static noinline void async_cow_free(struct btrfs_work *work)
>  	struct async_cow *async_cow;
>  
>  	async_chunk = container_of(work, struct async_chunk, work);
> -	if (async_chunk->inode)
> +	if (async_chunk->inode) {
> +		unlock_extent(&async_chunk->inode->io_tree,
> +				async_chunk->start, async_chunk->end, NULL);
>  		btrfs_add_delayed_iput(async_chunk->inode);
> +	}
>  	if (async_chunk->blkcg_css)
>  		css_put(async_chunk->blkcg_css);
>  
> +
>  	async_cow = async_chunk->async_cow;
>  	if (atomic_dec_and_test(&async_cow->num_chunks))
>  		kvfree(async_cow);
> @@ -1527,9 +1495,7 @@ static noinline void async_cow_free(struct btrfs_work *work)
>  
>  static int cow_file_range_async(struct btrfs_inode *inode,
>  				struct writeback_control *wbc,
> -				struct page *locked_page,
> -				u64 start, u64 end, int *page_started,
> -				unsigned long *nr_written)
> +				u64 start, u64 end)
>  {
>  	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>  	struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc);
> @@ -1539,20 +1505,9 @@ static int cow_file_range_async(struct btrfs_inode *inode,
>  	u64 cur_end;
>  	u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
>  	int i;
> -	bool should_compress;
>  	unsigned nofs_flag;
>  	const blk_opf_t write_flags = wbc_to_write_flags(wbc);
>  
> -	unlock_extent(&inode->io_tree, start, end, NULL);
> -
> -	if (inode->flags & BTRFS_INODE_NOCOMPRESS &&
> -	    !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
> -		num_chunks = 1;
> -		should_compress = false;
> -	} else {
> -		should_compress = true;
> -	}
> -
>  	nofs_flag = memalloc_nofs_save();
>  	ctx = kvmalloc(struct_size(ctx, chunks, num_chunks), GFP_KERNEL);
>  	memalloc_nofs_restore(nofs_flag);
> @@ -1564,19 +1519,17 @@ static int cow_file_range_async(struct btrfs_inode *inode,
>  		unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
>  					 PAGE_END_WRITEBACK | PAGE_SET_ERROR;
>  
> -		extent_clear_unlock_delalloc(inode, start, end, locked_page,
> +		extent_clear_unlock_delalloc(inode, start, end, NULL,
>  					     clear_bits, page_ops);
>  		return -ENOMEM;
>  	}
>  
> +	set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
>  	async_chunk = ctx->chunks;
>  	atomic_set(&ctx->num_chunks, num_chunks);
>  
>  	for (i = 0; i < num_chunks; i++) {
> -		if (should_compress)
> -			cur_end = min(end, start + SZ_512K - 1);
> -		else
> -			cur_end = end;
> +		cur_end = min(end, start + SZ_512K - 1);
>  
>  		/*
>  		 * igrab is called higher up in the call chain, take only the
> @@ -1590,33 +1543,6 @@ static int cow_file_range_async(struct btrfs_inode *inode,
>  		async_chunk[i].write_flags = write_flags;
>  		INIT_LIST_HEAD(&async_chunk[i].extents);
>  
> -		/*
> -		 * The locked_page comes all the way from writepage and its
> -		 * the original page we were actually given.  As we spread
> -		 * this large delalloc region across multiple async_chunk
> -		 * structs, only the first struct needs a pointer to locked_page
> -		 *
> -		 * This way we don't need racey decisions about who is supposed
> -		 * to unlock it.
> -		 */
> -		if (locked_page) {
> -			/*
> -			 * Depending on the compressibility, the pages might or
> -			 * might not go through async.  We want all of them to
> -			 * be accounted against wbc once.  Let's do it here
> -			 * before the paths diverge.  wbc accounting is used
> -			 * only for foreign writeback detection and doesn't
> -			 * need full accuracy.  Just account the whole thing
> -			 * against the first page.
> -			 */
> -			wbc_account_cgroup_owner(wbc, locked_page,
> -						 cur_end - start);
> -			async_chunk[i].locked_page = locked_page;
> -			locked_page = NULL;
> -		} else {
> -			async_chunk[i].locked_page = NULL;
> -		}
> -
>  		if (blkcg_css != blkcg_root_css) {
>  			css_get(blkcg_css);
>  			async_chunk[i].blkcg_css = blkcg_css;
> @@ -1632,10 +1558,8 @@ static int cow_file_range_async(struct btrfs_inode *inode,
>  
>  		btrfs_queue_work(fs_info->delalloc_workers, &async_chunk[i].work);
>  
> -		*nr_written += nr_pages;
>  		start = cur_end + 1;
>  	}
> -	*page_started = 1;
>  	return 0;
>  }
>  
> @@ -2238,18 +2162,13 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
>  		ASSERT(!zoned || btrfs_is_data_reloc_root(inode->root));
>  		ret = run_delalloc_nocow(inode, locked_page, start, end,
>  					 page_started, nr_written);
> -	} else if (!btrfs_inode_can_compress(inode) ||
> -		   !inode_need_compress(inode, start, end)) {
> +	} else {
>  		if (zoned)
>  			ret = run_delalloc_zoned(inode, locked_page, start, end,
>  						 page_started, nr_written);
>  		else
>  			ret = cow_file_range(inode, locked_page, start, end,
>  					     page_started, nr_written, 1, NULL);
> -	} else {
> -		set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
> -		ret = cow_file_range_async(inode, wbc, locked_page, start, end,
> -					   page_started, nr_written);
>  	}
>  	ASSERT(ret <= 0);
>  	if (ret)
> @@ -7840,14 +7759,68 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
>  }
>  
> +static int btrfs_writepages_async(struct btrfs_inode *inode, struct writeback_control *wbc, u64 start, u64 end)
> +{
> +	u64 last_start, cur_start = start;
> +	u64 cur_end;
> +	int ret = 0;
> +
> +	lock_extent(&inode->io_tree, start, end, NULL);
> +
> +	while (cur_start < end) {
> +		bool found;
> +		last_start = cur_start;
> +		cur_end = end;
> +
> +		found = find_lock_delalloc_range(&inode->vfs_inode, NULL, &cur_start, &cur_end);

This call to find_lock_delalloc_range makes it so inode.c doesn't
compile unless CONFIG_BTRFS_FS_RUN_SANITY_TESTS is set.

> +		/* Nothing to writeback */
> +		if (!found) {
> +			unlock_extent(&inode->io_tree, cur_start, cur_end, NULL);
> +			cur_start = cur_end + 1;
> +			continue;
> +		}
> +
> +		/* A hole with no pages, unlock part therof */
> +		if (cur_start > last_start)
> +			unlock_extent(&inode->io_tree, last_start, cur_start - 1, NULL);
> +
> +		/* Got more than we requested for */
> +		if (cur_end > end) {
> +			if (try_lock_extent(&inode->io_tree, end + 1, cur_end, NULL)) {
> +				/* Try writing the whole extent */
> +				end = cur_end;
> +			} else {
> +				/*
> +				 * Someone is holding the extent lock.
> +				 * Unlock pages from last part of extent, and
> +				 * write just as much writepage requested for
> +				 */
> +				__unlock_for_delalloc(&inode->vfs_inode, NULL, end + 1, cur_end);
> +				cur_end = end;
> +			}
> +		}
> +
> +		ret = cow_file_range_async(inode, wbc, cur_start, cur_end);
> +		if (ret < 0) {
> +			unlock_extent(&inode->io_tree, cur_start, end, NULL);
> +			break;
> +		}
> +
> +		cur_start = cur_end + 1;
> +	}
> +
> +	return ret;
> +}
> +
>  static int btrfs_writepages(struct address_space *mapping,
>  			    struct writeback_control *wbc)
>  {
>  	u64 start = 0, end = LLONG_MAX;
> -	struct inode *inode = mapping->host;
> +	struct btrfs_inode *inode = BTRFS_I(mapping->host);
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
>  	struct extent_state *cached = NULL;
>  	int ret;
> -	loff_t isize = i_size_read(inode);
> +	loff_t isize = i_size_read(&inode->vfs_inode);
>  	struct writeback_control new_wbc = *wbc;
>  
>  	if (new_wbc.range_cyclic) {
> @@ -7864,9 +7837,14 @@ static int btrfs_writepages(struct address_space *mapping,
>  	if (start >= end)
>  		return 0;
>  
> -	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
> -	ret = extent_writepages(mapping, wbc);
> -	unlock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
> +	if (btrfs_test_opt(fs_info, COMPRESS) &&
> +			btrfs_inode_can_compress(inode)) {
> +		ret = btrfs_writepages_async(inode, wbc, start, end);
> +	} else {
> +		lock_extent(&inode->io_tree, start, end, &cached);
> +		ret = extent_writepages(mapping, wbc);
> +		unlock_extent(&inode->io_tree, start, end, &cached);
> +	}
>  
>  	if (new_wbc.range_cyclic) {
>  		wbc->range_start = new_wbc.range_start;
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range
  2023-03-02 22:24 ` [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range Goldwyn Rodrigues
@ 2023-03-08 19:28   ` Boris Burkov
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Burkov @ 2023-03-08 19:28 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Thu, Mar 02, 2023 at 04:24:47PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Add a WARN_ON(start > end) to make sure that the locking happens on the
> correct range and no incorrect nodes (with state->start > state->end)
> are added to the tree.

Looks good, naturally. Quick question about it: do you think that
checking this invariant applies to other extent bit setting operations?

Perhaps it could also make sense to refactor
btrfs_debug_check_extent_io_range s.t. it compiles regardless of
CONFIG_BTRFS_DEBUG, and that it's possible for a caller to opt in to
checking even when the debug setting isn't set. (Perhaps with a
_checked() variant of the set_extent_bit function or something)

> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/extent-io-tree.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
> index 29a225836e28..482721dd1eba 100644
> --- a/fs/btrfs/extent-io-tree.c
> +++ b/fs/btrfs/extent-io-tree.c
> @@ -1710,6 +1710,7 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
>  	int err;
>  	u64 failed_start;
>  
> +	WARN_ON(start > end);
>  	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
>  			       NULL, cached, NULL, GFP_NOFS);
>  	if (err == -EEXIST) {
> @@ -1732,6 +1733,7 @@ int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
>  	int err;
>  	u64 failed_start;
>  
> +	WARN_ON(start > end);
>  	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
>  			       &failed_state, cached_state, NULL, GFP_NOFS);
>  	while (err == -EEXIST) {
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range()
  2023-03-02 22:24 ` [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range() Goldwyn Rodrigues
@ 2023-03-08 19:29   ` Boris Burkov
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Burkov @ 2023-03-08 19:29 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Thu, Mar 02, 2023 at 04:24:48PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> For issues such as zero size writes, we can get start > end. Check them
> in btrfs_debug_check_extent_io_range() so this may be caught early.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reviewed-by: Boris Burkov <boris@bur.io>

> ---
>  fs/btrfs/extent-io-tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
> index 482721dd1eba..d467c614c84e 100644
> --- a/fs/btrfs/extent-io-tree.c
> +++ b/fs/btrfs/extent-io-tree.c
> @@ -65,7 +65,8 @@ static inline void __btrfs_debug_check_extent_io_range(const char *caller,
>  		return;
>  
>  	isize = i_size_read(&inode->vfs_inode);
> -	if (end >= PAGE_SIZE && (end % 2) == 0 && end != isize - 1) {
> +	if ((start > end) ||
> +	    (end >= PAGE_SIZE && (end % 2) == 0 && end != isize - 1)) {
>  		btrfs_debug_rl(inode->root->fs_info,
>  		    "%s: ino %llu isize %llu odd range [%llu,%llu]",
>  			caller, btrfs_ino(inode), isize, start, end);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold()
  2023-03-07 17:06   ` Christoph Hellwig
@ 2023-03-08 23:03     ` Goldwyn Rodrigues
  2023-03-09  9:14       ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-08 23:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

On  9:06 07/03, Christoph Hellwig wrote:
> On Thu, Mar 02, 2023 at 04:25:05PM -0600, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > I am not sure of this patch, but put it to avoid the WARN_ON() in
> > ihold().  I am not sure why the i_count would drop below one at this
> > point of time since this is still called within writepages context.
> > 
> > Perhaps, there is a better way to solve this?
> 
> How do you trigger the warning?  Basically i_count could only be
> 0 when doing writeback from inode evication, and just incrementing
> i_count blindly will do the wrong thing there.

Without this patch, performing a writeback with async writeback
(mount option compress) will trigger this warning.

Yes, this patch is incorrect. However we have to hold on to an inode
reference in order to complete the asynchronous writeback.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold()
  2023-03-08 23:03     ` Goldwyn Rodrigues
@ 2023-03-09  9:14       ` Christoph Hellwig
  2023-03-11  3:52         ` Goldwyn Rodrigues
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:14 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: Christoph Hellwig, linux-btrfs

On Wed, Mar 08, 2023 at 05:03:57PM -0600, Goldwyn Rodrigues wrote:
> Without this patch, performing a writeback with async writeback
> (mount option compress) will trigger this warning.

What is the trace in the warning?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold()
  2023-03-09  9:14       ` Christoph Hellwig
@ 2023-03-11  3:52         ` Goldwyn Rodrigues
  0 siblings, 0 replies; 30+ messages in thread
From: Goldwyn Rodrigues @ 2023-03-11  3:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

On  1:14 09/03, Christoph Hellwig wrote:
> On Wed, Mar 08, 2023 at 05:03:57PM -0600, Goldwyn Rodrigues wrote:
> > Without this patch, performing a writeback with async writeback
> > (mount option compress) will trigger this warning.
> 
> What is the trace in the warning?
[   57.105512] ------------[ cut here ]------------
[   57.108857] WARNING: CPU: 3 PID: 1631 at fs/inode.c:451 ihold+0x23/0x30
[   57.111887] Modules linked in:
[   57.113984] CPU: 3 PID: 1631 Comm: kworker/u8:9 Not tainted 6.3.0-rc1-dave+ #22 a352fb29779d7031315b84505284616e0ef1983c
[   57.117994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[   57.122344] Workqueue: writeback wb_workfn (flush-btrfs-5)
[   57.125073] RIP: 0010:ihold+0x23/0x30
[   57.127342] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 b8 01 00 00 00 f0 0f c1 87 08 02 00 00 83 c0 01 83 f8 01 7e 05 c3 cc cc cc cc <0f> 0b c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90
[   57.134365] RSP: 0018:ffffb0f28128faa0 EFLAGS: 00010246
[   57.137069] RAX: 0000000000000001 RBX: ffff9cae8a320e18 RCX: 0000000000000000
[   57.140224] RDX: ffff9cae8a320e18 RSI: ffff9cae849c5300 RDI: ffff9cae96d430a8
[   57.143447] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   57.146644] R10: ffff9cae849c5308 R11: ffff9cae97640e38 R12: 000000000019a000
[   57.149808] R13: 000000000019afff R14: ffffffffa948e340 R15: 0000000000000000
[   57.153383] FS:  0000000000000000(0000) GS:ffff9caefbd80000(0000) knlGS:0000000000000000
[   57.157007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.160028] CR2: 0000563ecc4d0230 CR3: 000000010aafa006 CR4: 0000000000370ee0
[   57.163259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   57.166647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   57.169588] Call Trace:
[   57.171486]  <TASK>
[   57.173625]  btrfs_writepages+0x3ea/0x820
[   57.175995]  do_writepages+0xd5/0x1a0
[   57.178061]  ? lock_is_held_type+0xad/0x120
[   57.180605]  __writeback_single_inode+0x54/0x630
[   57.182981]  writeback_sb_inodes+0x1fc/0x560
[   57.185249]  wb_writeback+0xc5/0x480
[   57.187273]  wb_workfn+0x84/0x650
[   57.189207]  ? lock_acquire+0xc8/0x310
[   57.191033]  ? process_one_work+0x23c/0x630
[   57.192998]  ? lock_is_held_type+0xad/0x120
[   57.194800]  process_one_work+0x2c0/0x630
[   57.196619]  worker_thread+0x50/0x3d0
[   57.198129]  ? __pfx_worker_thread+0x10/0x10
[   57.200005]  kthread+0xea/0x110
[   57.201397]  ? __pfx_kthread+0x10/0x10
[   57.202897]  ret_from_fork+0x2c/0x50
[   57.204490]  </TASK>


-- 
Goldwyn

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-03-11  3:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1677793433.git.rgoldwyn@suse.com>
2023-03-02 22:24 ` [PATCH 01/21] fs: readahead_begin() to call before locking folio Goldwyn Rodrigues
2023-03-06 16:53   ` Christoph Hellwig
2023-03-02 22:24 ` [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range Goldwyn Rodrigues
2023-03-08 19:28   ` Boris Burkov
2023-03-02 22:24 ` [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range() Goldwyn Rodrigues
2023-03-08 19:29   ` Boris Burkov
2023-03-02 22:24 ` [PATCH 04/21] btrfs: make btrfs_qgroup_flush() non-static Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 05/21] btrfs: Lock extents before pages for buffered write() Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 06/21] btrfs: wait ordered range before locking during truncate Goldwyn Rodrigues
2023-03-07 17:03   ` Christoph Hellwig
2023-03-02 22:24 ` [PATCH 07/21] btrfs: lock extents while truncating Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 08/21] btrfs: no need to lock extent while performing invalidate_folio() Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 09/21] btrfs: lock extents before folio for read()s Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 10/21] btrfs: lock extents before pages in writepages Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 11/21] btrfs: locking extents for async writeback Goldwyn Rodrigues
2023-03-08 19:13   ` Boris Burkov
2023-03-02 22:24 ` [PATCH 12/21] btrfs: lock extents before pages - defrag Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 13/21] btrfs: Perform memory faults under locked extent Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 14/21] btrfs: writepage fixup lock rearrangement Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 15/21] btrfs: lock extent before pages for encoded read ioctls Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 16/21] btrfs: lock extent before pages in encoded write Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 17/21] btrfs: btree_writepages lock extents before pages Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 18/21] btrfs: check if writeback pages exist before starting writeback Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 19/21] btrfs: lock extents before pages in relocation Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold() Goldwyn Rodrigues
2023-03-07 17:06   ` Christoph Hellwig
2023-03-08 23:03     ` Goldwyn Rodrigues
2023-03-09  9:14       ` Christoph Hellwig
2023-03-11  3:52         ` Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 21/21] btrfs: debug extent locking Goldwyn Rodrigues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.