All of lore.kernel.org
 help / color / mirror / Atom feed
From: fdmanana@kernel.org
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 5/6] btrfs: remove ordered extent check and wait during hole punching and zero range
Date: Tue, 15 Mar 2022 10:47:10 +0000	[thread overview]
Message-ID: <bc5144c1420e6d9d11163867e8ba5612ebb8eae5.1647340917.git.fdmanana@suse.com> (raw)
In-Reply-To: <cover.1647340917.git.fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

For hole punching and zero range we have this loop that checks if we have
ordered extents after locking the file range, and if so unlock the range,
wait for ordered extents, and retry until we don't find more ordered
extents.

This logic was needed in the past because:

1) Direct IO writes within the i_size boundary did not take the inode's
   VFS lock. This was because that lock used to be a mutex, then some
   years ago it was switched to a rw semaphore (commit 9902af79c01a8e
   ("parallel lookups: actual switch to rwsem")), and then btrfs was
   changed to take the VFS inode's lock in shared mode for writes that
   don't cross the i_size boundary (commit e9adabb9712ef9 ("btrfs: use
   shared lock for direct writes within EOF"));

2) We could race with memory mapped writes, because memory mapped writes
   don't acquire the inode's VFS lock. We don't have that race anymore,
   as we have a rw semaphore to synchronize memory mapped writes with
   fallocate (and reflinking too). That change happened with commit
   8d9b4a162a37ce ("btrfs: exclude mmap from happening during all
   fallocate operations").

So stop looking for ordered extents after locking the file range when
doing hole punching and zero range operations.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c | 54 +++++++++++++++++--------------------------------
 1 file changed, 18 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a7fd05c1d52f..9f8db8a12a2f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2570,10 +2570,10 @@ static int find_first_non_hole(struct btrfs_inode *inode, u64 *start, u64 *len)
 	return ret;
 }
 
-static int btrfs_punch_hole_lock_range(struct inode *inode,
-				       const u64 lockstart,
-				       const u64 lockend,
-				       struct extent_state **cached_state)
+static void btrfs_punch_hole_lock_range(struct inode *inode,
+					const u64 lockstart,
+					const u64 lockend,
+					struct extent_state **cached_state)
 {
 	/*
 	 * For subpage case, if the range is not at page boundary, we could
@@ -2587,40 +2587,27 @@ static int btrfs_punch_hole_lock_range(struct inode *inode,
 	const u64 page_lockend = round_down(lockend + 1, PAGE_SIZE) - 1;
 
 	while (1) {
-		struct btrfs_ordered_extent *ordered;
-		int ret;
-
 		truncate_pagecache_range(inode, lockstart, lockend);
 
 		lock_extent_bits(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 				 cached_state);
-		ordered = btrfs_lookup_first_ordered_extent(BTRFS_I(inode),
-							    lockend);
-
 		/*
-		 * We need to make sure we have no ordered extents in this range
-		 * and nobody raced in and read a page in this range, if we did
-		 * we need to try again.
+		 * We can't have ordered extents in the range, nor dirty/writeback
+		 * pages, because we have locked the inode's VFS lock in exclusive
+		 * mode, we have locked the inode's i_mmap_lock in exclusive mode,
+		 * we have flushed all delalloc in the range and we have waited
+		 * for any ordered extents in the range to complete.
+		 * We can race with anyone reading pages from this range, so after
+		 * locking the range check if we have pages in the range, and if
+		 * we do, unlock the range and retry.
 		 */
-		if ((!ordered ||
-		    (ordered->file_offset + ordered->num_bytes <= lockstart ||
-		     ordered->file_offset > lockend)) &&
-		     !filemap_range_has_page(inode->i_mapping,
-					     page_lockstart, page_lockend)) {
-			if (ordered)
-				btrfs_put_ordered_extent(ordered);
+		if (!filemap_range_has_page(inode->i_mapping, page_lockstart,
+					    page_lockend))
 			break;
-		}
-		if (ordered)
-			btrfs_put_ordered_extent(ordered);
+
 		unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart,
 				     lockend, cached_state);
-		ret = btrfs_wait_ordered_range(inode, lockstart,
-					       lockend - lockstart + 1);
-		if (ret)
-			return ret;
 	}
-	return 0;
 }
 
 static int btrfs_insert_replace_extent(struct btrfs_trans_handle *trans,
@@ -3072,10 +3059,7 @@ static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len)
 		goto out_only_mutex;
 	}
 
-	ret = btrfs_punch_hole_lock_range(inode, lockstart, lockend,
-					  &cached_state);
-	if (ret)
-		goto out_only_mutex;
+	btrfs_punch_hole_lock_range(inode, lockstart, lockend, &cached_state);
 
 	path = btrfs_alloc_path();
 	if (!path) {
@@ -3366,10 +3350,8 @@ static int btrfs_zero_range(struct inode *inode,
 		if (ret < 0)
 			goto out;
 		space_reserved = true;
-		ret = btrfs_punch_hole_lock_range(inode, lockstart, lockend,
-						  &cached_state);
-		if (ret)
-			goto out;
+		btrfs_punch_hole_lock_range(inode, lockstart, lockend,
+					    &cached_state);
 		ret = btrfs_qgroup_reserve_data(BTRFS_I(inode), &data_reserved,
 						alloc_start, bytes_to_reserve);
 		if (ret) {
-- 
2.33.0


  parent reply	other threads:[~2022-03-15 10:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-15 10:47 [PATCH 0/6] btrfs: one fallocate fix and removing outdated code for fallocate and reflinks fdmanana
2022-03-15 10:47 ` [PATCH 1/6] btrfs: only reserve the needed data space amount during fallocate fdmanana
2022-03-15 10:47 ` [PATCH 2/6] btrfs: remove useless dio wait call when doing fallocate zero range fdmanana
2022-03-15 10:47 ` [PATCH 3/6] btrfs: remove inode_dio_wait() calls when starting reflink operations fdmanana
2022-03-15 10:47 ` [PATCH 4/6] btrfs: remove ordered extent check and wait during fallocate fdmanana
2022-03-15 10:47 ` fdmanana [this message]
2022-03-15 10:47 ` [PATCH 6/6] btrfs: add and use helper to assert an inode range is clean fdmanana
2022-03-15 12:18 ` [PATCH v2 0/7] btrfs: one fallocate fix and removing outdated code for fallocate and reflinks fdmanana
2022-03-15 12:18   ` [PATCH v2 1/7] btrfs: only reserve the needed data space amount during fallocate fdmanana
2022-03-15 13:16     ` Wang Yugui
2022-03-15 13:45       ` Filipe Manana
2022-03-15 12:18   ` [PATCH v2 2/7] btrfs: remove useless dio wait call when doing fallocate zero range fdmanana
2022-03-15 12:18   ` [PATCH v2 3/7] btrfs: remove inode_dio_wait() calls when starting reflink operations fdmanana
2022-03-15 12:18   ` [PATCH v2 4/7] btrfs: remove ordered extent check and wait during fallocate fdmanana
2022-03-15 12:18   ` [PATCH v2 5/7] btrfs: lock the inode first before flushing range when punching hole fdmanana
2022-03-15 12:18   ` [PATCH v2 6/7] btrfs: remove ordered extent check and wait during hole punching and zero range fdmanana
2022-03-15 12:18   ` [PATCH v2 7/7] btrfs: add and use helper to assert an inode range is clean fdmanana
2022-03-15 15:22 ` [PATCH v3 0/7] btrfs: one fallocate fix and removing outdated code for fallocate and reflinks fdmanana
2022-03-15 15:22   ` [PATCH v3 1/7] btrfs: only reserve the needed data space amount during fallocate fdmanana
2022-03-15 15:22   ` [PATCH v3 2/7] btrfs: remove useless dio wait call when doing fallocate zero range fdmanana
2022-03-15 15:22   ` [PATCH v3 3/7] btrfs: remove inode_dio_wait() calls when starting reflink operations fdmanana
2022-03-15 15:22   ` [PATCH v3 4/7] btrfs: remove ordered extent check and wait during fallocate fdmanana
2022-03-15 15:22   ` [PATCH v3 5/7] btrfs: lock the inode first before flushing range when punching hole fdmanana
2022-03-15 15:22   ` [PATCH v3 6/7] btrfs: remove ordered extent check and wait during hole punching and zero range fdmanana
2022-03-15 15:22   ` [PATCH v3 7/7] btrfs: add and use helper to assert an inode range is clean fdmanana
2022-03-16 16:29   ` [PATCH v3 0/7] btrfs: one fallocate fix and removing outdated code for fallocate and reflinks David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc5144c1420e6d9d11163867e8ba5612ebb8eae5.1647340917.git.fdmanana@suse.com \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.