All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: basic refactor of btrfs_buffered_write()
@ 2020-08-24  6:31 Qu Wenruo
  2020-08-24  6:31 ` [PATCH 1/3] btrfs: change how we calculate the nrptrs for btrfs_buffered_write() Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Qu Wenruo @ 2020-08-24  6:31 UTC (permalink / raw)
  To: linux-btrfs

This patchset will refactor btrfs_buffered_write() by:
- Extra the nrptrs calculation into one function
  The main concern here is the batch number of pages.
  Originally there is a max(nrptrs, 8), which looks so incorrect that my
  eyes change it to min(nrptrs, 8) automatically.

  This time we kill that max(nrptrr, 8), and replace it to a fixed size
  up limit (64K), which should bring the same performance for different
  page sized.

  The limit can be changed if the 64K is not large enough for certain
  buffered write.

- Refactor the main loop into process_one_batch()
  No functional change, just plain refactor.

- Remove the again: tag by integrating page and extent locking
  The new function lock_pages_and_extent() will handle the retry so we
  don't need the tag any more in the main loop.

Qu Wenruo (3):
  btrfs: change how we calculate the nrptrs for btrfs_buffered_write()
  btrfs: refactor btrfs_buffered_write() into process_one_batch()
  btrfs: remove the again: tag in process_one_batch()

 fs/btrfs/file.c | 568 +++++++++++++++++++++++++-----------------------
 1 file changed, 301 insertions(+), 267 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] btrfs: change how we calculate the nrptrs for btrfs_buffered_write()
  2020-08-24  6:31 [PATCH 0/3] btrfs: basic refactor of btrfs_buffered_write() Qu Wenruo
@ 2020-08-24  6:31 ` Qu Wenruo
  2020-08-24  6:31 ` [PATCH 2/3] btrfs: refactor btrfs_buffered_write() into process_one_batch() Qu Wenruo
  2020-08-24  6:31 ` [PATCH 3/3] btrfs: remove the again: tag in process_one_batch() Qu Wenruo
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2020-08-24  6:31 UTC (permalink / raw)
  To: linux-btrfs

@nrptrs of btrfs_bufferd_write() determines the up limit of pages we can
process in one batch.

Normally we want it to be as large as possible as btrfs metadata/data
reserve and release can cost quite a lot of time.

Commit 142349f541d0 ("btrfs: lower the dirty balance poll interval")
introduced two extra limitations which are suspicious now:
- limit the page number to nr_dirtied_pause - nr_dirtied
  However I can't find any mainline fs nor iomap utilize these two
  members.
  Although page write back still uses those two members, as no other fs
  utilizeing them at all, I doubt about the usefulness.

- ensure we always have 8 pages allocates
  The 8 lower limit looks pretty strange, this means even we're just
  writing 4K, we will allocate page* for 8 pages no matter what.
  To me, this 8 pages look more like a upper limit.

This patch will change it by:
- Extract the calculation into another function
  This allows us to add more comment explaining every calculation.

- Do proper page alignment calculation
  The old calculation, DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE)
  doesn't take @pos into consideration.
  In fact we can easily have iov_iter_count(i) == 2, but still cross two
  pages. (pos == page_offset() + PAGE_SIZE - 1).

- Remove the useless max(8)

- Use PAGE_SIZE independent up limit
  Now we use 64K as nr_pages limit, so we should get similar performance
  between different arches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5a818ebcb01f..c592350a5a82 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1620,6 +1620,29 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
 	btrfs_drew_write_unlock(&inode->root->snapshot_lock);
 }
 
+/* Helper to get how many pages we should alloc for the batch */
+static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos,
+			struct iov_iter *iov)
+{
+	int nr_pages;
+
+	/*
+	 * Try to cover the full iov range, as btrfs metadata/data reserve
+	 * and release can be pretty slow, thus the more pages we process in
+	 * one batch the better.
+	 */
+	nr_pages = (round_up(pos + iov_iter_count(iov), PAGE_SIZE) -
+		    round_down(pos, PAGE_SIZE)) / PAGE_SIZE;
+
+	/*
+	 * But still limit it to 64KiB, so we can still get a similar
+	 * buffered write performance between different page sizes
+	 */
+	nr_pages = min_t(int, nr_pages, SZ_64K / PAGE_SIZE);
+
+	return nr_pages;
+}
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
@@ -1638,10 +1661,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	bool only_release_metadata = false;
 	bool force_page_uptodate = false;
 
-	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
-			PAGE_SIZE / (sizeof(struct page *)));
-	nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
-	nrptrs = max(nrptrs, 8);
+	nrptrs = get_nr_pages(fs_info, pos, i);
 	pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL);
 	if (!pages)
 		return -ENOMEM;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] btrfs: refactor btrfs_buffered_write() into process_one_batch()
  2020-08-24  6:31 [PATCH 0/3] btrfs: basic refactor of btrfs_buffered_write() Qu Wenruo
  2020-08-24  6:31 ` [PATCH 1/3] btrfs: change how we calculate the nrptrs for btrfs_buffered_write() Qu Wenruo
@ 2020-08-24  6:31 ` Qu Wenruo
  2020-08-24  6:31 ` [PATCH 3/3] btrfs: remove the again: tag in process_one_batch() Qu Wenruo
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2020-08-24  6:31 UTC (permalink / raw)
  To: linux-btrfs

Inside btrfs_buffered_write() we had a big chunk of code in the while()
loop.

Refactor this big chunk into process_one_batch(), which will do the main
job.

This refactor doesn't touch the functioniality at all, just make life
easier with more headroom for codes.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 407 +++++++++++++++++++++++++-----------------------
 1 file changed, 216 insertions(+), 191 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c592350a5a82..c9516c079cc6 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1402,7 +1402,10 @@ static int prepare_uptodate_page(struct inode *inode,
 }
 
 /*
- * this just gets pages into the page cache and locks them down.
+ * This just gets pages into the page cache and locks them down.
+ *
+ * @force_uptodate: Whether to force the first page uptodate.
+ *		    This happens after a short copy on non-uptodate page.
  */
 static noinline int prepare_pages(struct inode *inode, struct page **pages,
 				  size_t num_pages, loff_t pos,
@@ -1620,7 +1623,7 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
 	btrfs_drew_write_unlock(&inode->root->snapshot_lock);
 }
 
-/* Helper to get how many pages we should alloc for the batch */
+/* The helper to get how many pages we should alloc for the batch */
 static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos,
 			struct iov_iter *iov)
 {
@@ -1643,226 +1646,193 @@ static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos,
 	return nr_pages;
 }
 
-static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
-					       struct iov_iter *i)
+/*
+ * The helper to copy one batch of data from iov to pages.
+ *
+ * @pages:	The page pointer array we're going to use.
+ * @nrptrs:	The number of page pointers we can utilize
+ * @pos:	The position of this write, in bytes
+ * @iov:	The source of data.
+ * @force_uptodate: Do we need to force the first page uptodate.
+ * 		    true if previous run we hit a short copy.
+ *
+ * Return >0 as the bytes we copied, with iov advanced.
+ * Return 0 if we hit a short copy, and should force the next run to have the
+ * first page uptodate.
+ * Return <0 for error.
+ */
+static ssize_t process_one_batch(struct inode *inode, struct page **pages,
+			int nrptrs, loff_t pos, struct iov_iter *iov,
+			bool force_uptodate)
 {
-	struct file *file = iocb->ki_filp;
-	loff_t pos = iocb->ki_pos;
-	struct inode *inode = file_inode(file);
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct page **pages = NULL;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	struct extent_state *cached_state = NULL;
 	struct extent_changeset *data_reserved = NULL;
+	size_t offset = offset_in_page(pos);
+	size_t sector_offset;
+	size_t write_bytes = min(iov_iter_count(iov),
+				 nrptrs * (size_t)PAGE_SIZE - offset);
+	size_t num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
+	size_t reserve_bytes;
+	size_t dirty_pages;
+	size_t copied = 0;
+	size_t dirty_sectors;
+	size_t num_sectors;
+	bool only_release_metadata = false;
 	u64 release_bytes = 0;
 	u64 lockstart;
 	u64 lockend;
-	size_t num_written = 0;
-	int nrptrs;
-	int ret = 0;
-	bool only_release_metadata = false;
-	bool force_page_uptodate = false;
-
-	nrptrs = get_nr_pages(fs_info, pos, i);
-	pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL);
-	if (!pages)
-		return -ENOMEM;
-
-	while (iov_iter_count(i) > 0) {
-		struct extent_state *cached_state = NULL;
-		size_t offset = offset_in_page(pos);
-		size_t sector_offset;
-		size_t write_bytes = min(iov_iter_count(i),
-					 nrptrs * (size_t)PAGE_SIZE -
-					 offset);
-		size_t num_pages = DIV_ROUND_UP(write_bytes + offset,
-						PAGE_SIZE);
-		size_t reserve_bytes;
-		size_t dirty_pages;
-		size_t copied;
-		size_t dirty_sectors;
-		size_t num_sectors;
-		int extents_locked;
-
-		WARN_ON(num_pages > nrptrs);
+	int extents_locked;
+	int ret;
 
-		/*
-		 * Fault pages before locking them in prepare_pages
-		 * to avoid recursive lock
-		 */
-		if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) {
-			ret = -EFAULT;
-			break;
-		}
+	WARN_ON(num_pages > nrptrs);
 
-		only_release_metadata = false;
-		sector_offset = pos & (fs_info->sectorsize - 1);
-		reserve_bytes = round_up(write_bytes + sector_offset,
-				fs_info->sectorsize);
+	/*
+	 * Fault pages before locking them in prepare_pages to avoid recursive
+	 * lock.
+	 */
+	if (unlikely(iov_iter_fault_in_readable(iov, write_bytes))) {
+		ret = -EFAULT;
+		goto out;
+	}
 
-		extent_changeset_release(data_reserved);
-		ret = btrfs_check_data_free_space(BTRFS_I(inode),
-						  &data_reserved, pos,
-						  write_bytes);
-		if (ret < 0) {
-			if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
-						   &write_bytes) > 0) {
-				/*
-				 * For nodata cow case, no need to reserve
-				 * data space.
-				 */
-				only_release_metadata = true;
-				/*
-				 * our prealloc extent may be smaller than
-				 * write_bytes, so scale down.
-				 */
-				num_pages = DIV_ROUND_UP(write_bytes + offset,
-							 PAGE_SIZE);
-				reserve_bytes = round_up(write_bytes +
-							 sector_offset,
-							 fs_info->sectorsize);
-			} else {
-				break;
-			}
-		}
+	sector_offset = pos & (fs_info->sectorsize - 1);
+	reserve_bytes = round_up(write_bytes + sector_offset,
+				 fs_info->sectorsize);
 
-		WARN_ON(reserve_bytes == 0);
-		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
-				reserve_bytes);
-		if (ret) {
-			if (!only_release_metadata)
-				btrfs_free_reserved_data_space(BTRFS_I(inode),
-						data_reserved, pos,
-						write_bytes);
-			else
-				btrfs_check_nocow_unlock(BTRFS_I(inode));
-			break;
+	/* Reserve space for data and metadata */
+	ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, pos,
+					  write_bytes);
+	if (ret < 0) {
+		if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
+					   &write_bytes) > 0) {
+			/* Nodata cow case, no need to reserve data space. */
+			only_release_metadata = true;
+			/*
+			 * our prealloc extent may be smaller than
+			 * write_bytes, so scale down.
+			 */
+			num_pages = DIV_ROUND_UP(write_bytes + offset,
+						 PAGE_SIZE);
+			reserve_bytes = round_up(write_bytes + sector_offset,
+						 fs_info->sectorsize);
+		} else {
+			goto out;
 		}
+	}
 
-		release_bytes = reserve_bytes;
-again:
-		/*
-		 * This is going to setup the pages array with the number of
-		 * pages we want, so we don't really need to worry about the
-		 * contents of pages from loop to loop
-		 */
-		ret = prepare_pages(inode, pages, num_pages,
-				    pos, write_bytes,
-				    force_page_uptodate);
-		if (ret) {
-			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
-			break;
-		}
+	WARN_ON(reserve_bytes == 0);
+	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), reserve_bytes);
+	if (ret) {
+		if (!only_release_metadata)
+			btrfs_free_reserved_data_space(BTRFS_I(inode),
+					data_reserved, pos, write_bytes);
+		else
+			btrfs_check_nocow_unlock(BTRFS_I(inode));
+		goto out;
+	}
 
-		extents_locked = lock_and_cleanup_extent_if_need(
-				BTRFS_I(inode), pages,
-				num_pages, pos, write_bytes, &lockstart,
-				&lockend, &cached_state);
-		if (extents_locked < 0) {
-			if (extents_locked == -EAGAIN)
-				goto again;
-			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
-			ret = extents_locked;
-			break;
-		}
+	release_bytes = reserve_bytes;
 
-		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
+	/* Lock the pages and extent range */
+again:
+	/*
+	 * This is going to setup the pages array with the number of pages we
+	 * want, so we don't really need to worry about the contents of pages
+	 * from loop to loop.
+	 */
+	ret = prepare_pages(inode, pages, num_pages, pos, write_bytes,
+			    force_uptodate);
+	if (ret) {
+		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+		goto out;
+	}
 
-		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
-		dirty_sectors = round_up(copied + sector_offset,
-					fs_info->sectorsize);
-		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
+	extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages,
+			num_pages, pos, write_bytes, &lockstart,
+			&lockend, &cached_state);
+	if (extents_locked < 0) {
+		if (extents_locked == -EAGAIN)
+			goto again;
+		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+		ret = extents_locked;
+		goto out;
+	}
 
-		/*
-		 * if we have trouble faulting in the pages, fall
-		 * back to one page at a time
-		 */
-		if (copied < write_bytes)
-			nrptrs = 1;
+	/* Copy the data from iov to pages */
+	copied = btrfs_copy_from_user(pos, write_bytes, pages, iov);
 
-		if (copied == 0) {
-			force_page_uptodate = true;
-			dirty_sectors = 0;
-			dirty_pages = 0;
-		} else {
-			force_page_uptodate = false;
-			dirty_pages = DIV_ROUND_UP(copied + offset,
-						   PAGE_SIZE);
-		}
+	num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
+	dirty_sectors = round_up(copied + sector_offset, fs_info->sectorsize);
+	dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
 
-		if (num_sectors > dirty_sectors) {
-			/* release everything except the sectors we dirtied */
-			release_bytes -= dirty_sectors <<
-						fs_info->sb->s_blocksize_bits;
-			if (only_release_metadata) {
-				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-							release_bytes, true);
-			} else {
-				u64 __pos;
+	if (copied == 0) {
+		dirty_sectors = 0;
+		dirty_pages = 0;
+	} else {
+		dirty_pages = DIV_ROUND_UP(copied + offset, PAGE_SIZE);
+	}
 
-				__pos = round_down(pos,
-						   fs_info->sectorsize) +
-					(dirty_pages << PAGE_SHIFT);
-				btrfs_delalloc_release_space(BTRFS_I(inode),
-						data_reserved, __pos,
+	if (num_sectors > dirty_sectors) {
+		/* release everything except the sectors we dirtied */
+		release_bytes -= dirty_sectors << fs_info->sb->s_blocksize_bits;
+		if (only_release_metadata) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
 						release_bytes, true);
-			}
-		}
-
-		release_bytes = round_up(copied + sector_offset,
-					fs_info->sectorsize);
-
-		if (copied > 0)
-			ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
-						dirty_pages, pos, copied,
-						&cached_state);
-
-		/*
-		 * If we have not locked the extent range, because the range's
-		 * start offset is >= i_size, we might still have a non-NULL
-		 * cached extent state, acquired while marking the extent range
-		 * as delalloc through btrfs_dirty_pages(). Therefore free any
-		 * possible cached extent state to avoid a memory leak.
-		 */
-		if (extents_locked)
-			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     lockstart, lockend, &cached_state);
-		else
-			free_extent_state(cached_state);
+		} else {
+			u64 __pos;
 
-		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
-		if (ret) {
-			btrfs_drop_pages(pages, num_pages);
-			break;
+			__pos = round_down(pos, fs_info->sectorsize) +
+				(dirty_pages << PAGE_SHIFT);
+			btrfs_delalloc_release_space(BTRFS_I(inode),
+					data_reserved, __pos,
+					release_bytes, true);
 		}
+	}
 
-		release_bytes = 0;
-		if (only_release_metadata)
-			btrfs_check_nocow_unlock(BTRFS_I(inode));
+	release_bytes = round_up(copied + sector_offset, fs_info->sectorsize);
 
-		if (only_release_metadata && copied > 0) {
-			lockstart = round_down(pos,
-					       fs_info->sectorsize);
-			lockend = round_up(pos + copied,
-					   fs_info->sectorsize) - 1;
+	if (copied > 0)
+		ret = btrfs_dirty_pages(BTRFS_I(inode), pages, dirty_pages, pos,
+				copied, &cached_state);
 
-			set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
-				       lockend, EXTENT_NORESERVE, NULL,
-				       NULL, GFP_NOFS);
-		}
+	/*
+	 * If we have not locked the extent range, because the range's
+	 * start offset is >= i_size, we might still have a non-NULL
+	 * cached extent state, acquired while marking the extent range
+	 * as delalloc through btrfs_dirty_pages(). Therefore free any
+	 * possible cached extent state to avoid a memory leak.
+	 */
+	if (extents_locked)
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				     lockstart, lockend, &cached_state);
+	else
+		free_extent_state(cached_state);
 
+	btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+	if (ret) {
 		btrfs_drop_pages(pages, num_pages);
+		goto out;
+	}
 
-		cond_resched();
+	release_bytes = 0;
+	if (only_release_metadata)
+		btrfs_check_nocow_unlock(BTRFS_I(inode));
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+	if (only_release_metadata && copied > 0) {
+		lockstart = round_down(pos, fs_info->sectorsize);
+		lockend = round_up(pos + copied, fs_info->sectorsize) - 1;
 
-		pos += copied;
-		num_written += copied;
+		set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend,
+				EXTENT_NORESERVE, NULL, NULL, GFP_NOFS);
 	}
 
-	kfree(pages);
+	btrfs_drop_pages(pages, num_pages);
 
+	cond_resched();
+
+	balance_dirty_pages_ratelimited(inode->i_mapping);
+out:
 	if (release_bytes) {
 		if (only_release_metadata) {
 			btrfs_check_nocow_unlock(BTRFS_I(inode));
@@ -1876,7 +1846,62 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		}
 	}
 
-	extent_changeset_free(data_reserved);
+	if (!copied && ret < 0)
+		return ret;
+	return copied;
+}
+
+static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
+					       struct iov_iter *i)
+{
+	struct file *file = iocb->ki_filp;
+	loff_t pos = iocb->ki_pos;
+	struct inode *inode = file_inode(file);
+	struct page **pages = NULL;
+	size_t num_written = 0;
+	int nrptrs;
+	int orig_nrptrs;
+	int ret = 0;
+	bool force_page_uptodate = false;
+
+	orig_nrptrs = get_nr_pages(BTRFS_I(inode)->root->fs_info, pos, i);
+	nrptrs = orig_nrptrs;
+	pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	while (iov_iter_count(i) > 0) {
+		ssize_t copied;
+
+		copied = process_one_batch(inode, pages, nrptrs, pos, i,
+					   force_page_uptodate);
+		if (copied < 0)
+			break;
+
+		if (copied == 0) {
+			/*
+			 * We had a short copy on even the first page, need to
+			 * force the next page uptodate and fall back to page
+			 * by page pace.
+			 */
+			nrptrs = 1;
+			force_page_uptodate = true;
+		} else {
+			/*
+			 * Copy finished without problem. No longer need to
+			 * force next page uptodate, and revert to regular
+			 * multi-page pace.
+			 */
+			nrptrs = orig_nrptrs;
+			force_page_uptodate = false;
+		}
+
+		pos += copied;
+		num_written += copied;
+	}
+
+	kfree(pages);
+
 	return num_written ? num_written : ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] btrfs: remove the again: tag in process_one_batch()
  2020-08-24  6:31 [PATCH 0/3] btrfs: basic refactor of btrfs_buffered_write() Qu Wenruo
  2020-08-24  6:31 ` [PATCH 1/3] btrfs: change how we calculate the nrptrs for btrfs_buffered_write() Qu Wenruo
  2020-08-24  6:31 ` [PATCH 2/3] btrfs: refactor btrfs_buffered_write() into process_one_batch() Qu Wenruo
@ 2020-08-24  6:31 ` Qu Wenruo
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2020-08-24  6:31 UTC (permalink / raw)
  To: linux-btrfs

The again: tag here is for us to retry when we failed to lock extent
range, which is only used once.

Instead of the open tag, integrate prepare_pages() and
lock_and_cleanup_extent_if_need() into lock_pages_and_extent(), and do
the retry inside the function.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 183 +++++++++++++++++++++++-------------------------
 1 file changed, 86 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c9516c079cc6..6d6c04d28dec 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1456,83 +1456,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 
 }
 
-/*
- * This function locks the extent and properly waits for data=ordered extents
- * to finish before allowing the pages to be modified if need.
- *
- * The return value:
- * 1 - the extent is locked
- * 0 - the extent is not locked, and everything is OK
- * -EAGAIN - need re-prepare the pages
- * the other < 0 number - Something wrong happens
- */
-static noinline int
-lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
-				size_t num_pages, loff_t pos,
-				size_t write_bytes,
-				u64 *lockstart, u64 *lockend,
-				struct extent_state **cached_state)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u64 start_pos;
-	u64 last_pos;
-	int i;
-	int ret = 0;
-
-	start_pos = round_down(pos, fs_info->sectorsize);
-	last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
-
-	if (start_pos < inode->vfs_inode.i_size) {
-		struct btrfs_ordered_extent *ordered;
-
-		lock_extent_bits(&inode->io_tree, start_pos, last_pos,
-				cached_state);
-		ordered = btrfs_lookup_ordered_range(inode, start_pos,
-						     last_pos - start_pos + 1);
-		if (ordered &&
-		    ordered->file_offset + ordered->num_bytes > start_pos &&
-		    ordered->file_offset <= last_pos) {
-			unlock_extent_cached(&inode->io_tree, start_pos,
-					last_pos, cached_state);
-			for (i = 0; i < num_pages; i++) {
-				unlock_page(pages[i]);
-				put_page(pages[i]);
-			}
-			btrfs_start_ordered_extent(&inode->vfs_inode,
-					ordered, 1);
-			btrfs_put_ordered_extent(ordered);
-			return -EAGAIN;
-		}
-		if (ordered)
-			btrfs_put_ordered_extent(ordered);
-
-		*lockstart = start_pos;
-		*lockend = last_pos;
-		ret = 1;
-	}
-
-	/*
-	 * It's possible the pages are dirty right now, but we don't want
-	 * to clean them yet because copy_from_user may catch a page fault
-	 * and we might have to fall back to one page at a time.  If that
-	 * happens, we'll unlock these pages and we'd have a window where
-	 * reclaim could sneak in and drop the once-dirty page on the floor
-	 * without writing it.
-	 *
-	 * We have the pages locked and the extent range locked, so there's
-	 * no way someone can start IO on any dirty pages in this range.
-	 *
-	 * We'll call btrfs_dirty_pages() later on, and that will flip around
-	 * delalloc bits and dirty the pages as required.
-	 */
-	for (i = 0; i < num_pages; i++) {
-		set_page_extent_mapped(pages[i]);
-		WARN_ON(!PageLocked(pages[i]));
-	}
-
-	return ret;
-}
-
 static int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
 			   size_t *write_bytes, bool nowait)
 {
@@ -1646,6 +1569,87 @@ static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos,
 	return nr_pages;
 }
 
+/*
+ * The helper to lock both pages and extent bits
+ *
+ * Return 0 if the extent is not locked and everything is OK.
+ * Return 1 if the extent is locked and everything is OK.
+ * Return <0 for error.
+ */
+static int lock_pages_and_extent(struct btrfs_inode *inode, struct page **pages,
+				 int num_pages, loff_t pos, size_t write_bytes,
+				 u64 *lockstart, u64 *lockend,
+				 struct extent_state **cached_state,
+				 bool force_uptodate)
+{
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	u64 start_pos;
+	u64 last_pos;
+	int ret;
+	int i;
+
+again:
+	/* Lock the pages */
+	ret = prepare_pages(&inode->vfs_inode, pages, num_pages, pos,
+			    write_bytes, force_uptodate);
+	if (ret < 0)
+		return ret;
+
+	start_pos = round_down(pos, fs_info->sectorsize);
+	last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
+
+	if (start_pos < inode->vfs_inode.i_size) {
+		struct btrfs_ordered_extent *ordered;
+
+		/* Lock the extent range */
+		lock_extent_bits(&inode->io_tree, start_pos, last_pos,
+				cached_state);
+		ordered = btrfs_lookup_ordered_range(inode, start_pos,
+						     last_pos - start_pos + 1);
+		if (ordered &&
+		    ordered->file_offset + ordered->num_bytes > start_pos &&
+		    ordered->file_offset <= last_pos) {
+			unlock_extent_cached(&inode->io_tree, start_pos,
+					last_pos, cached_state);
+			for (i = 0; i < num_pages; i++) {
+				unlock_page(pages[i]);
+				put_page(pages[i]);
+			}
+			btrfs_start_ordered_extent(&inode->vfs_inode,
+					ordered, 1);
+			btrfs_put_ordered_extent(ordered);
+			goto again;
+		}
+		if (ordered)
+			btrfs_put_ordered_extent(ordered);
+
+		*lockstart = start_pos;
+		*lockend = last_pos;
+		ret = 1;
+	}
+
+	/*
+	 * It's possible the pages are dirty right now, but we don't want
+	 * to clean them yet because copy_from_user may catch a page fault
+	 * and we might have to fall back to one page at a time.  If that
+	 * happens, we'll unlock these pages and we'd have a window where
+	 * reclaim could sneak in and drop the once-dirty page on the floor
+	 * without writing it.
+	 *
+	 * We have the pages locked and the extent range locked, so there's
+	 * no way someone can start IO on any dirty pages in this range.
+	 *
+	 * We'll call btrfs_dirty_pages() later on, and that will flip around
+	 * delalloc bits and dirty the pages as required.
+	 */
+	for (i = 0; i < num_pages; i++) {
+		set_page_extent_mapped(pages[i]);
+		WARN_ON(!PageLocked(pages[i]));
+	}
+
+	return ret;
+}
+
 /*
  * The helper to copy one batch of data from iov to pages.
  *
@@ -1735,29 +1739,14 @@ static ssize_t process_one_batch(struct inode *inode, struct page **pages,
 	release_bytes = reserve_bytes;
 
 	/* Lock the pages and extent range */
-again:
-	/*
-	 * This is going to setup the pages array with the number of pages we
-	 * want, so we don't really need to worry about the contents of pages
-	 * from loop to loop.
-	 */
-	ret = prepare_pages(inode, pages, num_pages, pos, write_bytes,
-			    force_uptodate);
-	if (ret) {
-		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
-		goto out;
-	}
-
-	extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages,
-			num_pages, pos, write_bytes, &lockstart,
-			&lockend, &cached_state);
-	if (extents_locked < 0) {
-		if (extents_locked == -EAGAIN)
-			goto again;
+	ret = lock_pages_and_extent(BTRFS_I(inode), pages, num_pages, pos,
+			write_bytes, &lockstart, &lockend, &cached_state,
+			force_uptodate);
+	if (ret < 0) {
 		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
-		ret = extents_locked;
 		goto out;
 	}
+	extents_locked = ret;
 
 	/* Copy the data from iov to pages */
 	copied = btrfs_copy_from_user(pos, write_bytes, pages, iov);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-24  6:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24  6:31 [PATCH 0/3] btrfs: basic refactor of btrfs_buffered_write() Qu Wenruo
2020-08-24  6:31 ` [PATCH 1/3] btrfs: change how we calculate the nrptrs for btrfs_buffered_write() Qu Wenruo
2020-08-24  6:31 ` [PATCH 2/3] btrfs: refactor btrfs_buffered_write() into process_one_batch() Qu Wenruo
2020-08-24  6:31 ` [PATCH 3/3] btrfs: remove the again: tag in process_one_batch() Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.