All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/31] btrfs buffered iomap support
@ 2021-06-13 13:39 Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page() Goldwyn Rodrigues
                   ` (30 more replies)
  0 siblings, 31 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This is an attempt to perform the bufered read and write using iomap on btrfs.
I propose these changes as an RFC because I would like to know on your comments on the feasibility of going ahead with the design. It is not feature complete
with the most important support missing is multi-device and compression.
Sending to BTRFS mailing only for perspective from btrfs developers first.

Locking sequence change
One of the biggest architecture changes is locking extents before pages.
writepage(), called during memory pressure provides the page which is locked
already. Perform a best effort locking and work with what is feasible.
For truncate->invalidatepage(), lock the pages before calling setsize().
Are there any other areas which need to be covered?

TODO
====

Bio submission for multi-device
  Multi-device submission would require some work with respect to asynchronous
  completions. iomap can merge bio's making it larger than extent_map's
  map_length. There is a check in btrfs_map_bio which WARN()s on this.
  Perhaps a more modular approach of having callbacks would be easier to
  deal with.

Compression
  This would require extra flags to iomap, say type IOMAP_ENCODED, which would
  require iomap to read/write complete extents as opposed to performing I/O on
  only what is requested. An additional field io_size would be required to
  know how much of compressed size maps to uncompressed size.


Known issues
============
 BIO Submission: described above
 Random WARN_ON in kernel/locking/lockdep.c:895 points to memory corruption.

Finally, I have added some questions in the patch headers as well.


-- 
Goldwyn


Goldwyn Rodrigues (31):
  iomap: Check if blocksize == PAGE_SIZE in to_iomap_page()
  iomap: Add submit_io to writepage_ops
  iomap: Export iomap_writepage_end_bio()
  iomap: Introduce iomap_readpage_ops
  btrfs: writepage() lock extent before pages
  btrfs: lock extents while truncating
  btrfs: write() perform extent locks before locking page
  btrfs: btrfs_em_to_iomap () to convert em to iomap
  btrfs: Don't process pages if locked_page is NULL
  btrfs: Add btrfs_map_blocks to for iomap_writeback_ops
  btrfs: Use submit_io to submit btrfs writeback bios
  btrfs: endio sequence for writepage
  btrfs: do not checksum for free space inode
  btrfs: end EXTENT_MAP_INLINE writeback early
  btrfs: Switch to iomap_writepages()
  btrfs: remove force_page_uptodate
  btrfs: Introduce btrfs_iomap
  btrfs: Add btrfs_iomap_release()
  btrfs: Add reservation information to btrfs_iomap
  btrfs: Carve out btrfs_buffered_iomap_begin() from write path
  btrfs: Carve out btrfs_buffered_iomap_end from the write path
  btrfs: Set extents delalloc in iomap_end
  btrfs: define iomap_page_ops
  btrfs: Switch to iomap_file_buffered_write()
  btrfs: remove all page related functions
  btrfs: use srcmap for read-before-write cases
  btrfs: Rename end_bio_extent_readpage to btrfs_readpage_endio
  btrfs: iomap_begin() for buffered read
  btrfs: Use iomap_readpage_ops to allocate and submit bio
  btrfs: Do not call btrfs_io_bio_free_csum() if BTRFS_INODE_NODATASUM
    is not set
  btrfs: Switch to iomap_readpage() and iomap_readahead()

 fs/btrfs/ctree.h       |   3 +
 fs/btrfs/disk-io.c     |   9 +-
 fs/btrfs/extent_io.c   | 103 +++++--
 fs/btrfs/extent_io.h   |   4 +
 fs/btrfs/file.c        | 642 ++++++++++++++---------------------------
 fs/btrfs/inode.c       | 219 ++++++++++++--
 fs/gfs2/aops.c         |   4 +-
 fs/iomap/buffered-io.c |  80 +++--
 fs/xfs/xfs_aops.c      |   4 +-
 fs/zonefs/super.c      |   4 +-
 include/linux/iomap.h  |  26 +-
 11 files changed, 601 insertions(+), 497 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page()
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-15 14:17   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 02/31] iomap: Add submit_io to writepage_ops Goldwyn Rodrigues
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

btrfs requires page->private set to get a callback to releasepage(). So,
for normal circumstances of blocksize==PAGE_SIZE, btrfs will have
page->private set to 1. In order to avoid a crash, check for the
blocksize==PAGE_SIZE in to_iomap_page().
---
 fs/iomap/buffered-io.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9023717c5188..d30683734d01 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -41,9 +41,11 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
 	 */
 	VM_BUG_ON_PGFLAGS(PageTail(page), page);
 
-	if (page_has_private(page))
-		return (struct iomap_page *)page_private(page);
-	return NULL;
+	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
+		return NULL;
+	if (!page_has_private(page))
+		return NULL;
+	return (struct iomap_page *)page_private(page);
 }
 
 static struct bio_set iomap_ioend_bioset;
@@ -163,7 +165,7 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	if (PageError(page))
 		return;
 
-	if (page_has_private(page))
+	if (i_blocksize(page->mapping->host) != PAGE_SIZE)
 		iomap_iop_set_range_uptodate(page, off, len);
 	else
 		SetPageUptodate(page);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 02/31] iomap: Add submit_io to writepage_ops
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page() Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-15 14:15   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 03/31] iomap: Export iomap_writepage_end_bio() Goldwyn Rodrigues
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Filesystems such as btrfs need to perform pre and post bio operations
such as csum calculations, device mapping etc.

Add a submit_io() function to writepage_ops so filesystems can control
how the bio is submitted.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/iomap/buffered-io.c | 11 ++++++++++-
 include/linux/iomap.h  |  6 ++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d30683734d01..b6fd6d6118a6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1209,7 +1209,11 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
 		return error;
 	}
 
-	submit_bio(ioend->io_bio);
+	if (wpc->ops->submit_io)
+		wpc->ops->submit_io(ioend->io_inode, ioend->io_bio);
+	else
+		submit_bio(ioend->io_bio);
+
 	return 0;
 }
 
@@ -1305,8 +1309,13 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 
 	if (!merged) {
 		if (bio_full(wpc->ioend->io_bio, len)) {
+			struct bio *bio = wpc->ioend->io_bio;
 			wpc->ioend->io_bio =
 				iomap_chain_bio(wpc->ioend->io_bio);
+			if (wpc->ops->submit_io)
+				wpc->ops->submit_io(inode, bio);
+			else
+				submit_bio(bio);
 		}
 		bio_add_page(wpc->ioend->io_bio, page, len, poff);
 	}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c87d0cb0de6d..689d799b1915 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -223,6 +223,12 @@ struct iomap_writeback_ops {
 	 * we failed to submit any I/O.
 	 */
 	void (*discard_page)(struct page *page, loff_t fileoff);
+
+	/*
+	 * Optional, allows the filesystem to perform a custom submission of
+	 * bio, such as csum calculations or multi-device bio split
+	 */
+	void (*submit_io)(struct inode *inode, struct bio *bio);
 };
 
 struct iomap_writepage_ctx {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 03/31] iomap: Export iomap_writepage_end_bio()
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page() Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 02/31] iomap: Add submit_io to writepage_ops Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 04/31] iomap: Introduce iomap_readpage_ops Goldwyn Rodrigues
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

BTRFS marks ordered extents as uptodate clearing bits such as delalloc
after the bio is deemed complete. After marking the extents, btrfs needs
to call iomap_writepage_end_bio() to perform housekeeping on the pages:
end of writeback.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/iomap/buffered-io.c | 3 ++-
 include/linux/iomap.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b6fd6d6118a6..f88f058cdefb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1173,12 +1173,13 @@ iomap_sort_ioends(struct list_head *ioend_list)
 }
 EXPORT_SYMBOL_GPL(iomap_sort_ioends);
 
-static void iomap_writepage_end_bio(struct bio *bio)
+void iomap_writepage_end_bio(struct bio *bio)
 {
 	struct iomap_ioend *ioend = bio->bi_private;
 
 	iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status));
 }
+EXPORT_SYMBOL_GPL(iomap_writepage_end_bio);
 
 /*
  * Submit the final bio for an ioend.
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 689d799b1915..8944711aa92e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -238,6 +238,7 @@ struct iomap_writepage_ctx {
 };
 
 void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
+void iomap_writepage_end_bio(struct bio *bio);
 void iomap_ioend_try_merge(struct iomap_ioend *ioend,
 		struct list_head *more_ioends);
 void iomap_sort_ioends(struct list_head *ioend_list);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 04/31] iomap: Introduce iomap_readpage_ops
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 03/31] iomap: Export iomap_writepage_end_bio() Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-15 14:44   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 05/31] btrfs: writepage() lock extent before pages Goldwyn Rodrigues
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

iomap_readpage_ops provide additional functions to allocate or submit
the bio.

alloc_bio() is used to allocate bio from the filesystem, in case of
btrfs: to allocate btrfs_io_bio.

submit_io() similar to the one introduced with direct I/O, submits the
bio.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/gfs2/aops.c         |  4 +--
 fs/iomap/buffered-io.c | 56 +++++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_aops.c      |  4 +--
 fs/zonefs/super.c      |  4 +--
 include/linux/iomap.h  | 19 ++++++++++++--
 5 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 23b5be3db044..f5bd7d0dab90 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -474,7 +474,7 @@ static int __gfs2_readpage(void *file, struct page *page)
 
 	if (!gfs2_is_jdata(ip) ||
 	    (i_blocksize(inode) == PAGE_SIZE && !page_has_buffers(page))) {
-		error = iomap_readpage(page, &gfs2_iomap_ops);
+		error = iomap_readpage(page, &gfs2_iomap_ops, NULL);
 	} else if (gfs2_is_stuffed(ip)) {
 		error = stuffed_readpage(ip, page);
 		unlock_page(page);
@@ -563,7 +563,7 @@ static void gfs2_readahead(struct readahead_control *rac)
 	else if (gfs2_is_jdata(ip))
 		mpage_readahead(rac, gfs2_block_map);
 	else
-		iomap_readahead(rac, &gfs2_iomap_ops);
+		iomap_readahead(rac, &gfs2_iomap_ops, NULL);
 }
 
 /**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f88f058cdefb..ec2304eea51a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -201,10 +201,11 @@ iomap_read_end_io(struct bio *bio)
 }
 
 struct iomap_readpage_ctx {
-	struct page		*cur_page;
-	bool			cur_page_in_bio;
-	struct bio		*bio;
-	struct readahead_control *rac;
+	struct page			*cur_page;
+	bool				cur_page_in_bio;
+	struct bio			*bio;
+	struct readahead_control	*rac;
+	const struct iomap_readpage_ops	*ops;
 };
 
 static void
@@ -282,19 +283,31 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		gfp_t orig_gfp = gfp;
 		unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
 
-		if (ctx->bio)
-			submit_bio(ctx->bio);
+		if (ctx->bio) {
+			if (ctx->ops && ctx->ops->submit_io)
+				ctx->ops->submit_io(inode, ctx->bio);
+			else
+				submit_bio(ctx->bio);
+		}
 
 		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
-		ctx->bio = bio_alloc(gfp, bio_max_segs(nr_vecs));
+		if (ctx->ops->alloc_bio)
+			ctx->bio = ctx->ops->alloc_bio(gfp,
+					bio_max_segs(nr_vecs));
+		else
+			ctx->bio = bio_alloc(gfp, bio_max_segs(nr_vecs));
 		/*
 		 * If the bio_alloc fails, try it again for a single page to
 		 * avoid having to deal with partial page reads.  This emulates
 		 * what do_mpage_readpage does.
 		 */
-		if (!ctx->bio)
-			ctx->bio = bio_alloc(orig_gfp, 1);
+		if (!ctx->bio) {
+			if (ctx->ops->alloc_bio)
+				ctx->bio = ctx->ops->alloc_bio(orig_gfp, 1);
+			else
+				ctx->bio = bio_alloc(orig_gfp, 1);
+		}
 		ctx->bio->bi_opf = REQ_OP_READ;
 		if (ctx->rac)
 			ctx->bio->bi_opf |= REQ_RAHEAD;
@@ -315,9 +328,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 }
 
 int
-iomap_readpage(struct page *page, const struct iomap_ops *ops)
+iomap_readpage(struct page *page, const struct iomap_ops *ops,
+		const struct iomap_readpage_ops *readpage_ops)
 {
-	struct iomap_readpage_ctx ctx = { .cur_page = page };
+	struct iomap_readpage_ctx ctx = {
+		.cur_page	= page,
+		.ops		= readpage_ops,
+	};
 	struct inode *inode = page->mapping->host;
 	unsigned poff;
 	loff_t ret;
@@ -336,7 +353,10 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 	}
 
 	if (ctx.bio) {
-		submit_bio(ctx.bio);
+		if (ctx.ops->submit_io)
+			ctx.ops->submit_io(inode, ctx.bio);
+		else
+			submit_bio(ctx.bio);
 		WARN_ON_ONCE(!ctx.cur_page_in_bio);
 	} else {
 		WARN_ON_ONCE(ctx.cur_page_in_bio);
@@ -392,13 +412,15 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
  * function is called with memalloc_nofs set, so allocations will not cause
  * the filesystem to be reentered.
  */
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
+		const struct iomap_readpage_ops *readpage_ops)
 {
 	struct inode *inode = rac->mapping->host;
 	loff_t pos = readahead_pos(rac);
 	size_t length = readahead_length(rac);
 	struct iomap_readpage_ctx ctx = {
 		.rac	= rac,
+		.ops	= readpage_ops,
 	};
 
 	trace_iomap_readahead(inode, readahead_count(rac));
@@ -414,8 +436,12 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 		length -= ret;
 	}
 
-	if (ctx.bio)
-		submit_bio(ctx.bio);
+	if (ctx.bio) {
+		if (ctx.ops && ctx.ops->submit_io)
+			ctx.ops->submit_io(inode, ctx.bio);
+		else
+			submit_bio(ctx.bio);
+	}
 	if (ctx.cur_page) {
 		if (!ctx.cur_page_in_bio)
 			unlock_page(ctx.cur_page);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 826caa6b4a5a..7f49bbf325a1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -535,14 +535,14 @@ xfs_vm_readpage(
 	struct file		*unused,
 	struct page		*page)
 {
-	return iomap_readpage(page, &xfs_read_iomap_ops);
+	return iomap_readpage(page, &xfs_read_iomap_ops, NULL);
 }
 
 STATIC void
 xfs_vm_readahead(
 	struct readahead_control	*rac)
 {
-	iomap_readahead(rac, &xfs_read_iomap_ops);
+	iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
 }
 
 static int
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index cd145d318b17..f8157f91b8b1 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -115,12 +115,12 @@ static const struct iomap_ops zonefs_iomap_ops = {
 
 static int zonefs_readpage(struct file *unused, struct page *page)
 {
-	return iomap_readpage(page, &zonefs_iomap_ops);
+	return iomap_readpage(page, &zonefs_iomap_ops, NULL);
 }
 
 static void zonefs_readahead(struct readahead_control *rac)
 {
-	iomap_readahead(rac, &zonefs_iomap_ops);
+	iomap_readahead(rac, &zonefs_iomap_ops, NULL);
 }
 
 /*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8944711aa92e..cc8c00026edb 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -157,8 +157,6 @@ loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
 
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
-int iomap_readpage(struct page *page, const struct iomap_ops *ops);
-void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
 int iomap_set_page_dirty(struct page *page);
 int iomap_is_partially_uptodate(struct page *page, unsigned long from,
 		unsigned long count);
@@ -188,6 +186,23 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset,
 sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
 		const struct iomap_ops *ops);
 
+struct iomap_readpage_ops {
+
+	/* allow the filesystem to allocate custom struct bio */
+	struct bio *(*alloc_bio)(gfp_t gfp_mask, unsigned short nr_iovecs);
+
+	/*
+	 * Optional, allows the filesystem to perform a custom submission of
+	 * bio, such as csum calculations or multi-device bio split
+	 */
+	void (*submit_io)(struct inode *inode, struct bio *bio);
+};
+
+int iomap_readpage(struct page *page, const struct iomap_ops *ops,
+		const struct iomap_readpage_ops *readpage_ops);
+void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
+		const struct iomap_readpage_ops *readpage_ops);
+
 /*
  * Structure for writeback I/O completions.
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 05/31] btrfs: writepage() lock extent before pages
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 04/31] iomap: Introduce iomap_readpage_ops Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-15 15:47   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 06/31] btrfs: lock extents while truncating Goldwyn Rodrigues
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock Order change: Extent locks before page locks

writepage() is called with the page locked. So make an attempt
to lock the extents and proceed only if successful.

The new function best_effort_lock_extent() attempts to lock the range
provided.

If the entire range was not locked, but it still covers the locked
page, work with the reduced range by resetting delalloc_end()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 66 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 51 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9e81d25dea70..75ad809e8c86 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1488,6 +1488,47 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end)
 	return 1;
 }
 
+/*
+ * best_effort_lock_extent() - locks the extent to the best of effort
+ * making sure the minimum range is locked and the
+ * delalloc bits are set. If the full requested range is not locked,
+ * delalloc_end shifts to until the range that can be locked.
+ */
+static bool best_effort_lock_extent(struct extent_io_tree *tree, u64 start,
+		u64 *delalloc_end, u64 min_end, struct extent_state **cached_state)
+{
+	u64 failed_start;
+	u64 end = *delalloc_end;
+	int ret;
+
+	ret = set_extent_bit(tree, start, end, EXTENT_LOCKED, EXTENT_LOCKED,
+			     &failed_start, cached_state, GFP_NOFS, NULL);
+
+	if (!ret)
+		return false;
+
+	if (failed_start < min_end) {
+		/* Could not lock the whole range */
+		if (failed_start > start)
+			unlock_extent_cached(tree, start,
+					failed_start - 1, cached_state);
+		return false;
+	} else if (end > failed_start) {
+		/* Work with what could be locked */
+		end = failed_start - 1;
+	}
+
+	/* Check if delalloc bits are set */
+	ret = test_range_bit(tree, start, end,
+			     EXTENT_DELALLOC, 1, *cached_state);
+	if (!ret) {
+		unlock_extent_cached(tree, start, end - 1, cached_state);
+		return false;
+	}
+	*delalloc_end = end;
+	return true;
+}
+
 void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end)
 {
 	unsigned long index = start >> PAGE_SHIFT;
@@ -2018,7 +2059,16 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 	if (delalloc_end + 1 - delalloc_start > max_bytes)
 		delalloc_end = delalloc_start + max_bytes - 1;
 
-	/* step two, lock all the pages after the page that has start */
+	/* step two, lock the state bits for the range */
+	found = best_effort_lock_extent(tree, delalloc_start, &delalloc_end,
+			((page_index(locked_page) + 1) << PAGE_SHIFT),
+			&cached_state);
+	if (!found) {
+		free_extent_state(cached_state);
+		goto out_failed;
+	}
+
+	/* step three, lock all the pages after the page that has start */
 	ret = lock_delalloc_pages(inode, locked_page,
 				  delalloc_start, delalloc_end);
 	ASSERT(!ret || ret == -EAGAIN);
@@ -2038,20 +2088,6 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 		}
 	}
 
-	/* step three, lock the state bits for the whole range */
-	lock_extent_bits(tree, delalloc_start, delalloc_end, &cached_state);
-
-	/* then test to make sure it is all still delalloc */
-	ret = test_range_bit(tree, delalloc_start, delalloc_end,
-			     EXTENT_DELALLOC, 1, cached_state);
-	if (!ret) {
-		unlock_extent_cached(tree, delalloc_start, delalloc_end,
-				     &cached_state);
-		__unlock_for_delalloc(inode, locked_page,
-			      delalloc_start, delalloc_end);
-		cond_resched();
-		goto again;
-	}
 	free_extent_state(cached_state);
 	*start = delalloc_start;
 	*end = delalloc_end;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 06/31] btrfs: lock extents while truncating
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (4 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 05/31] btrfs: writepage() lock extent before pages Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16  6:42   ` Nikolay Borisov
  2021-06-16 13:13   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 07/31] btrfs: write() perform extent locks before locking page Goldwyn Rodrigues
                   ` (24 subsequent siblings)
  30 siblings, 2 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock order change: Lock extents before pages.

This removes extent locking in invalidatepages().
invalidatepage() is either called during truncating or while evicting
inode. Inode eviction does not require locking.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 794d906cba6c..7761a60788d0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5201,6 +5201,9 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		btrfs_end_transaction(trans);
 	} else {
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+		u64 start = round_down(oldsize, fs_info->sectorsize);
+		u64 end = round_up(newsize, fs_info->sectorsize) - 1;
+		struct extent_state **cached = NULL;
 
 		if (btrfs_is_zoned(fs_info)) {
 			ret = btrfs_wait_ordered_range(inode,
@@ -5219,7 +5222,10 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 			set_bit(BTRFS_INODE_FLUSH_ON_CLOSE,
 				&BTRFS_I(inode)->runtime_flags);
 
+		lock_extent_bits(&BTRFS_I(inode)->io_tree, start, end, cached);
 		truncate_setsize(inode, newsize);
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree, start, end,
+				cached);
 
 		inode_dio_wait(inode);
 
@@ -8322,9 +8328,23 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 
 static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
+	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
+	struct extent_map *em;
+	int ret;
+
 	if (PageWriteback(page) || PageDirty(page))
 		return 0;
-	return __btrfs_releasepage(page, gfp_flags);
+
+	em = lookup_extent_mapping(&inode->extent_tree, page_offset(page),
+			PAGE_SIZE - 1);
+	if (!em)
+		return 0;
+	if (!try_lock_extent(&inode->io_tree, em->start, extent_map_end(em) - 1))
+		return 0;
+	ret = __btrfs_releasepage(page, gfp_flags);
+	unlock_extent(&inode->io_tree, em->start, extent_map_end(em));
+	free_extent_map(em);
+	return ret;
 }
 
 #ifdef CONFIG_MIGRATION
@@ -8398,9 +8418,6 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		return;
 	}
 
-	if (!inode_evicting)
-		lock_extent_bits(tree, page_start, page_end, &cached_state);
-
 	cur = page_start;
 	while (cur < page_end) {
 		struct btrfs_ordered_extent *ordered;
@@ -8458,7 +8475,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		if (!inode_evicting)
 			clear_extent_bit(tree, cur, range_end,
 					 EXTENT_DELALLOC |
-					 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
+					 EXTENT_DO_ACCOUNTING |
 					 EXTENT_DEFRAG, 1, 0, &cached_state);
 
 		spin_lock_irq(&inode->ordered_tree.lock);
@@ -8503,7 +8520,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 */
 		btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur);
 		if (!inode_evicting) {
-			clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED |
+			clear_extent_bit(tree, cur, range_end,
 				 EXTENT_DELALLOC | EXTENT_UPTODATE |
 				 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 1,
 				 delete_states, &cached_state);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 07/31] btrfs: write() perform extent locks before locking page
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (5 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 06/31] btrfs: lock extents while truncating Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap Goldwyn Rodrigues
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock order change: Extent locks before page locks.

While performing writes, lock the extents before locking the pages.

Since pages will no longer involved, lock_and_cleanup_extent_if_need()
can be deleted and btrfs_lock_and_flush_ordered_range() is used instead.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 95 +++++++++----------------------------------------
 1 file changed, 16 insertions(+), 79 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 28a05ba47060..e7d33c8177a0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1407,70 +1407,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 
 }
 
-/*
- * This function locks the extent and properly waits for data=ordered extents
- * to finish before allowing the pages to be modified if need.
- *
- * The return value:
- * 1 - the extent is locked
- * 0 - the extent is not locked, and everything is OK
- * -EAGAIN - need re-prepare the pages
- * the other < 0 number - Something wrong happens
- */
-static noinline int
-lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
-				size_t num_pages, loff_t pos,
-				size_t write_bytes,
-				u64 *lockstart, u64 *lockend,
-				struct extent_state **cached_state)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u64 start_pos;
-	u64 last_pos;
-	int i;
-	int ret = 0;
-
-	start_pos = round_down(pos, fs_info->sectorsize);
-	last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
-
-	if (start_pos < inode->vfs_inode.i_size) {
-		struct btrfs_ordered_extent *ordered;
-
-		lock_extent_bits(&inode->io_tree, start_pos, last_pos,
-				cached_state);
-		ordered = btrfs_lookup_ordered_range(inode, start_pos,
-						     last_pos - start_pos + 1);
-		if (ordered &&
-		    ordered->file_offset + ordered->num_bytes > start_pos &&
-		    ordered->file_offset <= last_pos) {
-			unlock_extent_cached(&inode->io_tree, start_pos,
-					last_pos, cached_state);
-			for (i = 0; i < num_pages; i++) {
-				unlock_page(pages[i]);
-				put_page(pages[i]);
-			}
-			btrfs_start_ordered_extent(ordered, 1);
-			btrfs_put_ordered_extent(ordered);
-			return -EAGAIN;
-		}
-		if (ordered)
-			btrfs_put_ordered_extent(ordered);
-
-		*lockstart = start_pos;
-		*lockend = last_pos;
-		ret = 1;
-	}
-
-	/*
-	 * We should be called after prepare_pages() which should have locked
-	 * all pages in the range.
-	 */
-	for (i = 0; i < num_pages; i++)
-		WARN_ON(!PageLocked(pages[i]));
-
-	return ret;
-}
-
 static int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
 			   size_t *write_bytes, bool nowait)
 {
@@ -1693,7 +1629,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		size_t copied;
 		size_t dirty_sectors;
 		size_t num_sectors;
-		int extents_locked;
+		int extents_locked = false;
 
 		/*
 		 * Fault pages before locking them in prepare_pages
@@ -1742,7 +1678,15 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		}
 
 		release_bytes = reserve_bytes;
-again:
+
+		if (pos < inode->i_size) {
+			lockstart = round_down(pos, fs_info->sectorsize);
+			lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
+			btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
+					lockstart, lockend, &cached_state);
+			extents_locked = true;
+		}
+
 		/*
 		 * This is going to setup the pages array with the number of
 		 * pages we want, so we don't really need to worry about the
@@ -1754,19 +1698,11 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
-			break;
-		}
-
-		extents_locked = lock_and_cleanup_extent_if_need(
-				BTRFS_I(inode), pages,
-				num_pages, pos, write_bytes, &lockstart,
-				&lockend, &cached_state);
-		if (extents_locked < 0) {
-			if (extents_locked == -EAGAIN)
-				goto again;
-			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
-			ret = extents_locked;
+			if (extents_locked)
+				unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+						lockstart, lockend, &cached_state);
+			else
+				free_extent_state(cached_state);
 			break;
 		}
 
@@ -1831,6 +1767,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					     lockstart, lockend, &cached_state);
 		else
 			free_extent_state(cached_state);
+		extents_locked = false;
 
 		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
 		if (ret) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (6 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 07/31] btrfs: write() perform extent locks before locking page Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16  6:49   ` Nikolay Borisov
  2021-06-16  6:51   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 09/31] btrfs: Don't process pages if locked_page is NULL Goldwyn Rodrigues
                   ` (22 subsequent siblings)
  30 siblings, 2 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

btrfs_em_to_iomap() converts and extent map into passed argument struct
iomap. It makes sure the information is set close to sectorsize block.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h |  2 ++
 fs/btrfs/file.c  | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c9256f862085..d4567c7a93cb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3267,6 +3267,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 			   size_t *write_bytes);
 void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
+void btrfs_em_to_iomap(struct inode *inode, struct extent_map *em,
+		struct iomap *iomap, loff_t pos);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e7d33c8177a0..cb9471e7224a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1570,6 +1570,28 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
 	return 0;
 }
 
+void btrfs_em_to_iomap(struct inode *inode,
+		struct extent_map *em, struct iomap *iomap,
+		loff_t sector_pos)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	loff_t diff = sector_pos - em->start;
+
+	if (em->block_start == EXTENT_MAP_HOLE) {
+		iomap->addr = IOMAP_NULL_ADDR;
+		iomap->type = IOMAP_HOLE;
+	} else if (em->block_start == EXTENT_MAP_INLINE) {
+		iomap->addr = IOMAP_NULL_ADDR;
+		iomap->type = IOMAP_INLINE;
+	} else {
+		iomap->addr = em->block_start + diff;
+		iomap->type = IOMAP_MAPPED;
+	}
+	iomap->offset = em->start + diff;
+	iomap->bdev = fs_info->fs_devices->latest_bdev;
+	iomap->length = em->len - diff;
+}
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 09/31] btrfs: Don't process pages if locked_page is NULL
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (7 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 10/31] btrfs: Add btrfs_map_blocks to for iomap_writeback_ops Goldwyn Rodrigues
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This is an ugly hack so we don't process pages since all page processing
is performed in iomap.

Ideally, we would want to change this to remove locked_page argument
from the run_delalloc_range()->cow_file_range() calls.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 3 +++
 fs/btrfs/inode.c     | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 75ad809e8c86..71a59fbeffe1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2101,6 +2101,9 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 {
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
+	if (!locked_page)
+		return;
+
 	__process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
 			       start, end, page_ops, NULL);
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7761a60788d0..5491d0e5600f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1112,7 +1112,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 			 * can't use page_started to determine if it's an
 			 * inline extent or a compressed extent.
 			 */
-			unlock_page(locked_page);
+			if (locked_page)
+				unlock_page(locked_page);
 			goto out;
 		} else if (ret < 0) {
 			goto out_unlock;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 10/31] btrfs: Add btrfs_map_blocks to for iomap_writeback_ops
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (8 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 09/31] btrfs: Don't process pages if locked_page is NULL Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16 14:15   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 11/31] btrfs: Use submit_io to submit btrfs writeback bios Goldwyn Rodrigues
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

btrfs_map_blocks() runs delayed allocation range to allcate new CoW
space if required and returns the io_map associated with the
location to write.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5491d0e5600f..eff4ec44fecd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8283,6 +8283,77 @@ int btrfs_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static int find_delalloc_range(struct inode *inode, u64 *start, u64 *end)
+{
+	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
+	bool found;
+	int ret;
+	struct extent_state *cached = NULL;
+
+	/* Find the range */
+	found = btrfs_find_delalloc_range(tree, start, end,
+			BTRFS_MAX_EXTENT_SIZE, &cached);
+	if (!found)
+		return -EIO;
+
+	lock_extent(tree, *start, *end);
+
+	/* Verify it is set to delalloc */
+	ret = test_range_bit(tree, *start, *end,
+			EXTENT_DELALLOC, 1, cached);
+	return ret;
+}
+
+static int btrfs_set_iomap(struct inode *inode, loff_t pos,
+			   loff_t length, struct iomap *srcmap)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	loff_t sector_start = round_down(pos, fs_info->sectorsize);
+	struct extent_map *em;
+
+	em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, sector_start, length);
+	if (IS_ERR(em))
+		return PTR_ERR(em);
+
+	btrfs_em_to_iomap(inode, em, srcmap, sector_start);
+
+	free_extent_map(em);
+	return 0;
+}
+
+static int btrfs_map_blocks(struct iomap_writepage_ctx *wpc,
+		struct inode *inode, loff_t offset)
+{
+	int ret;
+	unsigned long nr_written; /* unused */
+	int page_started; /* unused */
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	u64 start = round_down(offset, fs_info->sectorsize);
+	u64 end = 0;
+
+	/* Check if iomap is valid */
+	if (offset >= wpc->iomap.offset &&
+			offset < wpc->iomap.offset + wpc->iomap.length)
+		return 0;
+
+	ret = find_delalloc_range(inode, &start, &end);
+	if (ret < 0)
+		return ret;
+
+	ret = btrfs_run_delalloc_range(BTRFS_I(inode), NULL,
+			start, end, &page_started, &nr_written, NULL);
+	if (ret < 0)
+		return ret;
+
+	/* TODO: handle compressed extents */
+	return btrfs_set_iomap(inode, offset, end - offset, &wpc->iomap);
+}
+
+static const struct iomap_writeback_ops btrfs_writeback_ops = {
+	.map_blocks             = btrfs_map_blocks,
+};
+
+
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
 {
 	struct inode *inode = page->mapping->host;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 11/31] btrfs: Use submit_io to submit btrfs writeback bios
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (9 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 10/31] btrfs: Add btrfs_map_blocks to for iomap_writeback_ops Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16 14:26   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 12/31] btrfs: endio sequence for writepage Goldwyn Rodrigues
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The submit_io sequence calls set_page_extent_mapped() for data
inodes to make sure that page->private is set. Depending on
the type of inode: metadata or data, the respective submit_bio
function is called.

This will also be used for readpage() sequence.
---
 fs/btrfs/inode.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eff4ec44fecd..87d0730f59d8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8349,11 +8349,26 @@ static int btrfs_map_blocks(struct iomap_writepage_ctx *wpc,
 	return btrfs_set_iomap(inode, offset, end - offset, &wpc->iomap);
 }
 
+static void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
+{
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+
+	if (is_data_inode(inode))
+		bio_for_each_segment_all(bvec, bio, iter_all)
+			set_page_extent_mapped(bvec->bv_page);
+
+	if (is_data_inode(inode))
+		btrfs_submit_data_bio(inode, bio, 0, 0);
+	else
+		btrfs_submit_metadata_bio(inode, bio, 0, 0);
+}
+
 static const struct iomap_writeback_ops btrfs_writeback_ops = {
 	.map_blocks             = btrfs_map_blocks,
+	.submit_io		= btrfs_buffered_submit_io,
 };
 
-
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
 {
 	struct inode *inode = page->mapping->host;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 12/31] btrfs: endio sequence for writepage
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (10 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 11/31] btrfs: Use submit_io to submit btrfs writeback bios Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 13/31] btrfs: do not checksum for free space inode Goldwyn Rodrigues
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Perform extent housekeeping after completion of bio. This calls
iomap_writepage_end_bio() to perform end of write sequence for
the pages.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 87d0730f59d8..b8fcf9102eb2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8349,6 +8349,23 @@ static int btrfs_map_blocks(struct iomap_writepage_ctx *wpc,
 	return btrfs_set_iomap(inode, offset, end - offset, &wpc->iomap);
 }
 
+static void btrfs_writepage_endio(struct bio *bio)
+{
+	int error = blk_status_to_errno(bio->bi_status);
+	struct iomap_ioend *ioend = bio->bi_private;
+	struct btrfs_inode *inode = BTRFS_I(ioend->io_inode);
+	struct btrfs_fs_info *fs_info = btrfs_sb(ioend->io_inode->i_sb);
+	struct btrfs_ordered_extent *ordered = NULL;
+
+	if (!btrfs_dec_test_ordered_pending(inode, &ordered, ioend->io_offset,
+					    ioend->io_size, !error))
+		return;
+
+	btrfs_init_work(&ordered->work, finish_ordered_fn, NULL, NULL);
+	btrfs_queue_work(fs_info->endio_write_workers, &ordered->work);
+	iomap_writepage_end_bio(bio);
+}
+
 static void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
 {
 	struct bio_vec *bvec;
@@ -8358,6 +8375,7 @@ static void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
 		bio_for_each_segment_all(bvec, bio, iter_all)
 			set_page_extent_mapped(bvec->bv_page);
 
+	bio->bi_end_io = btrfs_writepage_endio;
 	if (is_data_inode(inode))
 		btrfs_submit_data_bio(inode, bio, 0, 0);
 	else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 13/31] btrfs: do not checksum for free space inode
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (11 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 12/31] btrfs: endio sequence for writepage Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16 14:46   ` Nikolay Borisov
  2021-06-13 13:39 ` [RFC PATCH 14/31] btrfs: end EXTENT_MAP_INLINE writeback early Goldwyn Rodrigues
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Free space inode writes should not be checksummed.

Q: Now sure why this is required now (as opposed to earlier), but it
would fail on writing free_space_inode. How is this avoided in the
existing code?

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/disk-io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d1d5091a8385..3af505ec22dc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -946,9 +946,12 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
 			goto out_w_error;
 		ret = btrfs_map_bio(fs_info, bio, mirror_num);
 	} else if (!should_async_write(fs_info, BTRFS_I(inode))) {
-		ret = btree_csum_one_bio(bio);
-		if (ret)
-			goto out_w_error;
+		/* Do not checksum free space inode */
+		if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
+			ret = btree_csum_one_bio(bio);
+			if (ret)
+				goto out_w_error;
+		}
 		ret = btrfs_map_bio(fs_info, bio, mirror_num);
 	} else {
 		/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 14/31] btrfs: end EXTENT_MAP_INLINE writeback early
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (12 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 13/31] btrfs: do not checksum for free space inode Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 15/31] btrfs: Switch to iomap_writepages() Goldwyn Rodrigues
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

writes to inline extent maps are finished immediately and does
not go through the entire writeback cycle of creation or submission of
bios.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b8fcf9102eb2..0601cf375b9c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8346,7 +8346,22 @@ static int btrfs_map_blocks(struct iomap_writepage_ctx *wpc,
 		return ret;
 
 	/* TODO: handle compressed extents */
-	return btrfs_set_iomap(inode, offset, end - offset, &wpc->iomap);
+	ret = btrfs_set_iomap(inode, offset, end - offset, &wpc->iomap);
+	if (ret < 0) {
+		__endio_write_update_ordered(BTRFS_I(inode), start, end, false);
+		return ret;
+	}
+
+	if (wpc->iomap.type == IOMAP_INLINE) {
+		/*
+		 * In case of EXTENT_MAP_INLINE, call endio function
+		 * and reset type to IOMAP_HOLE, so iomap code does not
+		 * perform any action
+		 */
+		__endio_write_update_ordered(BTRFS_I(inode), start, end, true);
+		wpc->iomap.type = IOMAP_HOLE;
+	}
+	return 0;
 }
 
 static void btrfs_writepage_endio(struct bio *bio)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 15/31] btrfs: Switch to iomap_writepages()
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (13 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 14/31] btrfs: end EXTENT_MAP_INLINE writeback early Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 16/31] btrfs: remove force_page_uptodate Goldwyn Rodrigues
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Switch to iomap_writepages()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0601cf375b9c..be0caf3a11f9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8430,7 +8430,9 @@ static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
 static int btrfs_writepages(struct address_space *mapping,
 			    struct writeback_control *wbc)
 {
-	return extent_writepages(mapping, wbc);
+	struct iomap_writepage_ctx wpc = {0};
+
+	return iomap_writepages(mapping, wbc, &wpc, &btrfs_writeback_ops);
 }
 
 static void btrfs_readahead(struct readahead_control *rac)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 16/31] btrfs: remove force_page_uptodate
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (14 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 15/31] btrfs: Switch to iomap_writepages() Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 17/31] btrfs: Introduce btrfs_iomap Goldwyn Rodrigues
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

iomap handles short writes well. Removing force_page_uptodate eases out
porting.

Note, this change is not that important because these functions are
deleted down the line. This is performed to make the switch easier.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cb9471e7224a..a94ab3c8da1d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1326,13 +1326,11 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
  * on success we return a locked page and 0
  */
 static int prepare_uptodate_page(struct inode *inode,
-				 struct page *page, u64 pos,
-				 bool force_uptodate)
+				 struct page *page, u64 pos)
 {
 	int ret = 0;
 
-	if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
-	    !PageUptodate(page)) {
+	if ((pos & (PAGE_SIZE - 1)) && !PageUptodate(page)) {
 		ret = btrfs_readpage(NULL, page);
 		if (ret)
 			return ret;
@@ -1354,7 +1352,7 @@ static int prepare_uptodate_page(struct inode *inode,
  */
 static noinline int prepare_pages(struct inode *inode, struct page **pages,
 				  size_t num_pages, loff_t pos,
-				  size_t write_bytes, bool force_uptodate)
+				  size_t write_bytes)
 {
 	int i;
 	unsigned long index = pos >> PAGE_SHIFT;
@@ -1379,11 +1377,10 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 		}
 
 		if (i == 0)
-			err = prepare_uptodate_page(inode, pages[i], pos,
-						    force_uptodate);
+			err = prepare_uptodate_page(inode, pages[i], pos);
 		if (!err && i == num_pages - 1)
 			err = prepare_uptodate_page(inode, pages[i],
-						    pos + write_bytes, false);
+						    pos + write_bytes);
 		if (err) {
 			put_page(pages[i]);
 			if (err == -EAGAIN) {
@@ -1608,7 +1605,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	int nrptrs;
 	ssize_t ret;
 	bool only_release_metadata = false;
-	bool force_page_uptodate = false;
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
 
@@ -1715,8 +1711,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		 * contents of pages from loop to loop
 		 */
 		ret = prepare_pages(inode, pages, num_pages,
-				    pos, write_bytes,
-				    force_page_uptodate);
+				    pos, write_bytes);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
@@ -1743,11 +1738,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			nrptrs = 1;
 
 		if (copied == 0) {
-			force_page_uptodate = true;
 			dirty_sectors = 0;
 			dirty_pages = 0;
 		} else {
-			force_page_uptodate = false;
 			dirty_pages = DIV_ROUND_UP(copied + offset,
 						   PAGE_SIZE);
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 17/31] btrfs: Introduce btrfs_iomap
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (15 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 16/31] btrfs: remove force_page_uptodate Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16 20:02   ` David Sterba
  2021-06-13 13:39 ` [RFC PATCH 18/31] btrfs: Add btrfs_iomap_release() Goldwyn Rodrigues
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

A structure which will be used to transfer information from
iomap_begin() to iomap_end().

Move all locking information into btrfs_iomap.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 44 ++++++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a94ab3c8da1d..a28435a6bb7e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -56,6 +56,15 @@ struct inode_defrag {
 	int cycled;
 };
 
+struct btrfs_iomap {
+
+	/* Locking */
+	u64 lockstart;
+	u64 lockend;
+	struct extent_state *cached_state;
+	int extents_locked;
+};
+
 static int __compare_inode_defrag(struct inode_defrag *defrag1,
 				  struct inode_defrag *defrag2)
 {
@@ -1599,14 +1608,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	struct page **pages = NULL;
 	struct extent_changeset *data_reserved = NULL;
 	u64 release_bytes = 0;
-	u64 lockstart;
-	u64 lockend;
 	size_t num_written = 0;
 	int nrptrs;
 	ssize_t ret;
 	bool only_release_metadata = false;
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
+	struct btrfs_iomap *bi = NULL;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		ilock_flags |= BTRFS_ILOCK_TRY;
@@ -1634,6 +1642,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		goto out;
 	}
 
+	bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS);
+	if (!bi) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	while (iov_iter_count(i) > 0) {
 		struct extent_state *cached_state = NULL;
 		size_t offset = offset_in_page(pos);
@@ -1647,7 +1661,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		size_t copied;
 		size_t dirty_sectors;
 		size_t num_sectors;
-		int extents_locked = false;
 
 		/*
 		 * Fault pages before locking them in prepare_pages
@@ -1658,6 +1671,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			break;
 		}
 
+		bi->extents_locked = false;
 		only_release_metadata = false;
 		sector_offset = pos & (fs_info->sectorsize - 1);
 
@@ -1698,11 +1712,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		release_bytes = reserve_bytes;
 
 		if (pos < inode->i_size) {
-			lockstart = round_down(pos, fs_info->sectorsize);
-			lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
+			bi->lockstart = round_down(pos, fs_info->sectorsize);
+			bi->lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
 			btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
-					lockstart, lockend, &cached_state);
-			extents_locked = true;
+					bi->lockstart, bi->lockend,
+					&cached_state);
+			bi->extents_locked = true;
 		}
 
 		/*
@@ -1715,11 +1730,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
-			if (extents_locked)
-				unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-						lockstart, lockend, &cached_state);
-			else
-				free_extent_state(cached_state);
 			break;
 		}
 
@@ -1777,12 +1787,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		 * as delalloc through btrfs_dirty_pages(). Therefore free any
 		 * possible cached extent state to avoid a memory leak.
 		 */
-		if (extents_locked)
+		if (bi->extents_locked)
 			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     lockstart, lockend, &cached_state);
+					     bi->lockstart, bi->lockend,
+					     &bi->cached_state);
 		else
-			free_extent_state(cached_state);
-		extents_locked = false;
+			free_extent_state(bi->cached_state);
+		bi->extents_locked = false;
 
 		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
 		if (ret) {
@@ -1825,6 +1836,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		iocb->ki_pos += num_written;
 	}
 out:
+	kfree(bi);
 	btrfs_inode_unlock(inode, ilock_flags);
 	return num_written ? num_written : ret;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 18/31] btrfs: Add btrfs_iomap_release()
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (16 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 17/31] btrfs: Introduce btrfs_iomap Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 19/31] btrfs: Add reservation information to btrfs_iomap Goldwyn Rodrigues
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Function to release allocated btrfs_iomap resources.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a28435a6bb7e..3d4b8fde47f4 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1598,6 +1598,30 @@ void btrfs_em_to_iomap(struct inode *inode,
 	iomap->length = em->len - diff;
 }
 
+/*
+ * btrfs_iomap_release - release reservation passed as length and free
+ * the btrfs_iomap structure
+ */
+static void btrfs_iomap_release(struct inode *inode,
+		loff_t pos, size_t length, struct btrfs_iomap *bi)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+
+	if (length) {
+		if (bi->metadata_only) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+					length, true);
+		} else {
+			btrfs_delalloc_release_space(BTRFS_I(inode),
+					bi->data_reserved,
+					round_down(pos, fs_info->sectorsize),
+					length, true);
+		}
+	}
+	extent_changeset_free(bi->data_reserved);
+	kfree(bi);
+}
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
@@ -1817,24 +1841,14 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 	kfree(pages);
 
-	if (release_bytes) {
-		if (only_release_metadata) {
-			btrfs_check_nocow_unlock(BTRFS_I(inode));
-			btrfs_delalloc_release_metadata(BTRFS_I(inode),
-					release_bytes, true);
-		} else {
-			btrfs_delalloc_release_space(BTRFS_I(inode),
-					data_reserved,
-					round_down(pos, fs_info->sectorsize),
-					release_bytes, true);
-		}
-	}
-
-	extent_changeset_free(data_reserved);
 	if (num_written > 0) {
 		pagecache_isize_extended(inode, old_isize, iocb->ki_pos);
 		iocb->ki_pos += num_written;
 	}
+
+	if (release_bytes && bi->metadata_only)
+		btrfs_check_nocow_unlock(BTRFS_I(inode));
+	btrfs_iomap_release(inode, pos, release_bytes, bi);
 out:
 	kfree(bi);
 	btrfs_inode_unlock(inode, ilock_flags);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 19/31] btrfs: Add reservation information to btrfs_iomap
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (17 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 18/31] btrfs: Add btrfs_iomap_release() Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-16 20:03   ` David Sterba
  2021-06-13 13:39 ` [RFC PATCH 20/31] btrfs: Carve out btrfs_buffered_iomap_begin() from write path Goldwyn Rodrigues
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Move reservation information into btrfs_iomap.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 49 +++++++++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 3d4b8fde47f4..ccae8ce7ec4f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -63,6 +63,11 @@ struct btrfs_iomap {
 	u64 lockend;
 	struct extent_state *cached_state;
 	int extents_locked;
+
+	/* Reservation */
+	bool metadata_only;
+	struct extent_changeset *data_reserved;
+	size_t reserved_bytes;
 };
 
 static int __compare_inode_defrag(struct inode_defrag *defrag1,
@@ -1630,12 +1635,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct page **pages = NULL;
-	struct extent_changeset *data_reserved = NULL;
 	u64 release_bytes = 0;
 	size_t num_written = 0;
 	int nrptrs;
-	ssize_t ret;
-	bool only_release_metadata = false;
+	int ret;
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
 	struct btrfs_iomap *bi = NULL;
@@ -1673,14 +1676,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	}
 
 	while (iov_iter_count(i) > 0) {
-		struct extent_state *cached_state = NULL;
 		size_t offset = offset_in_page(pos);
 		size_t sector_offset;
 		size_t write_bytes = min(iov_iter_count(i),
 					 nrptrs * (size_t)PAGE_SIZE -
 					 offset);
 		size_t num_pages;
-		size_t reserve_bytes;
 		size_t dirty_pages;
 		size_t copied;
 		size_t dirty_sectors;
@@ -1696,12 +1697,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		}
 
 		bi->extents_locked = false;
-		only_release_metadata = false;
+		bi->metadata_only = false;
 		sector_offset = pos & (fs_info->sectorsize - 1);
 
-		extent_changeset_release(data_reserved);
+		extent_changeset_release(bi->data_reserved);
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
-						  &data_reserved, pos,
+						  &bi->data_reserved, pos,
 						  write_bytes);
 		if (ret < 0) {
 			/*
@@ -1711,36 +1712,36 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			 */
 			if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
 						   &write_bytes) > 0)
-				only_release_metadata = true;
+				bi->metadata_only = true;
 			else
 				break;
 		}
 
 		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
 		WARN_ON(num_pages > nrptrs);
-		reserve_bytes = round_up(write_bytes + sector_offset,
+		bi->reserved_bytes = round_up(write_bytes + sector_offset,
 					 fs_info->sectorsize);
-		WARN_ON(reserve_bytes == 0);
+		WARN_ON(bi->reserved_bytes == 0);
 		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
-				reserve_bytes);
+				bi->reserved_bytes);
 		if (ret) {
-			if (!only_release_metadata)
+			if (!bi->metadata_only)
 				btrfs_free_reserved_data_space(BTRFS_I(inode),
-						data_reserved, pos,
+						bi->data_reserved, pos,
 						write_bytes);
 			else
 				btrfs_check_nocow_unlock(BTRFS_I(inode));
 			break;
 		}
 
-		release_bytes = reserve_bytes;
+		release_bytes = bi->reserved_bytes;
 
 		if (pos < inode->i_size) {
 			bi->lockstart = round_down(pos, fs_info->sectorsize);
 			bi->lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
 			btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
 					bi->lockstart, bi->lockend,
-					&cached_state);
+					&bi->cached_state);
 			bi->extents_locked = true;
 		}
 
@@ -1753,13 +1754,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 				    pos, write_bytes);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
+						       bi->reserved_bytes);
 			break;
 		}
 
 		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
 
-		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
+		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bi->reserved_bytes);
 		dirty_sectors = round_up(copied + sector_offset,
 					fs_info->sectorsize);
 		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
@@ -1782,7 +1783,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		if (num_sectors > dirty_sectors) {
 			/* release everything except the sectors we dirtied */
 			release_bytes -= dirty_sectors << fs_info->sectorsize_bits;
-			if (only_release_metadata) {
+			if (bi->metadata_only) {
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
 							release_bytes, true);
 			} else {
@@ -1792,7 +1793,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 						   fs_info->sectorsize) +
 					(dirty_pages << PAGE_SHIFT);
 				btrfs_delalloc_release_space(BTRFS_I(inode),
-						data_reserved, __pos,
+						bi->data_reserved, __pos,
 						release_bytes, true);
 			}
 		}
@@ -1802,7 +1803,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
 					dirty_pages, pos, copied,
-					&cached_state, only_release_metadata);
+					&bi->cached_state,
+					bi->metadata_only);
 
 		/*
 		 * If we have not locked the extent range, because the range's
@@ -1819,14 +1821,14 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			free_extent_state(bi->cached_state);
 		bi->extents_locked = false;
 
-		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), bi->reserved_bytes);
 		if (ret) {
 			btrfs_drop_pages(pages, num_pages);
 			break;
 		}
 
 		release_bytes = 0;
-		if (only_release_metadata)
+		if (bi->metadata_only)
 			btrfs_check_nocow_unlock(BTRFS_I(inode));
 
 		btrfs_drop_pages(pages, num_pages);
@@ -1850,7 +1852,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		btrfs_check_nocow_unlock(BTRFS_I(inode));
 	btrfs_iomap_release(inode, pos, release_bytes, bi);
 out:
-	kfree(bi);
 	btrfs_inode_unlock(inode, ilock_flags);
 	return num_written ? num_written : ret;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 20/31] btrfs: Carve out btrfs_buffered_iomap_begin() from write path
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (18 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 19/31] btrfs: Add reservation information to btrfs_iomap Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 21/31] btrfs: Carve out btrfs_buffered_iomap_end from the " Goldwyn Rodrigues
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Carve out reservation and locking from bufferd write sequence in
btrfs_buffered_iomap_begin().

Shortcomings: page handling with respect to nrptrs is messed up.
However, since we will be passing the responsibility of pages to iomap,
this will not be problem.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 100 ++++++++++++++++++++++++++++--------------------
 1 file changed, 59 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ccae8ce7ec4f..5751bb5e0656 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1627,6 +1627,57 @@ static void btrfs_iomap_release(struct inode *inode,
 	kfree(bi);
 }
 
+static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos,
+		size_t length, struct btrfs_iomap *bi)
+{
+	int ret;
+	size_t write_bytes = length;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	size_t sector_offset = pos & (fs_info->sectorsize - 1);
+
+	ret = btrfs_check_data_free_space(BTRFS_I(inode),
+			&bi->data_reserved, pos, write_bytes);
+	if (ret < 0) {
+		/*
+		 * If we don't have to COW at the offset, reserve
+		 * metadata only. write_bytes may get smaller than
+		 * requested here.
+		 */
+		if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
+					&write_bytes) > 0)
+			bi->metadata_only = true;
+		else
+			return ret;
+	}
+
+	bi->reserved_bytes = round_up(write_bytes + sector_offset,
+			fs_info->sectorsize);
+	WARN_ON(bi->reserved_bytes == 0);
+	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
+			bi->reserved_bytes);
+	if (ret) {
+		if (!bi->metadata_only)
+			btrfs_free_reserved_data_space(BTRFS_I(inode),
+					bi->data_reserved, pos,
+					write_bytes);
+		else
+			btrfs_check_nocow_unlock(BTRFS_I(inode));
+		return ret;
+	}
+
+	if (pos < inode->i_size) {
+		bi->lockstart = round_down(pos, fs_info->sectorsize);
+		bi->lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
+		btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
+				bi->lockstart, bi->lockend,
+				&bi->cached_state);
+		bi->extents_locked = true;
+	}
+
+	return 0;
+
+}
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
@@ -1701,50 +1752,15 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		sector_offset = pos & (fs_info->sectorsize - 1);
 
 		extent_changeset_release(bi->data_reserved);
-		ret = btrfs_check_data_free_space(BTRFS_I(inode),
-						  &bi->data_reserved, pos,
-						  write_bytes);
-		if (ret < 0) {
-			/*
-			 * If we don't have to COW at the offset, reserve
-			 * metadata only. write_bytes may get smaller than
-			 * requested here.
-			 */
-			if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
-						   &write_bytes) > 0)
-				bi->metadata_only = true;
-			else
-				break;
-		}
 
-		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
-		WARN_ON(num_pages > nrptrs);
-		bi->reserved_bytes = round_up(write_bytes + sector_offset,
-					 fs_info->sectorsize);
-		WARN_ON(bi->reserved_bytes == 0);
-		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
-				bi->reserved_bytes);
-		if (ret) {
-			if (!bi->metadata_only)
-				btrfs_free_reserved_data_space(BTRFS_I(inode),
-						bi->data_reserved, pos,
-						write_bytes);
-			else
-				btrfs_check_nocow_unlock(BTRFS_I(inode));
-			break;
-		}
-
-		release_bytes = bi->reserved_bytes;
+		ret = btrfs_buffered_iomap_begin(inode, pos, write_bytes,
+				bi);
 
-		if (pos < inode->i_size) {
-			bi->lockstart = round_down(pos, fs_info->sectorsize);
-			bi->lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
-			btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
-					bi->lockstart, bi->lockend,
-					&bi->cached_state);
-			bi->extents_locked = true;
-		}
+		if (ret < 0)
+			goto out;
 
+		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
+		WARN_ON(num_pages > nrptrs);
 		/*
 		 * This is going to setup the pages array with the number of
 		 * pages we want, so we don't really need to worry about the
@@ -1780,6 +1796,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 						   PAGE_SIZE);
 		}
 
+		release_bytes = bi->reserved_bytes;
+
 		if (num_sectors > dirty_sectors) {
 			/* release everything except the sectors we dirtied */
 			release_bytes -= dirty_sectors << fs_info->sectorsize_bits;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 21/31] btrfs: Carve out btrfs_buffered_iomap_end from the write path
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (19 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 20/31] btrfs: Carve out btrfs_buffered_iomap_begin() from write path Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 22/31] btrfs: Set extents delalloc in iomap_end Goldwyn Rodrigues
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Carve out btrfs_buffered_iomap_end() from the write path to form
iomap_end() function for writes.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 97 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5751bb5e0656..ab2b1790e0bb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1678,6 +1678,63 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos,
 
 }
 
+static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos,
+		loff_t length, ssize_t written, struct btrfs_iomap *bi)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	size_t num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bi->reserved_bytes);
+	size_t dirty_sectors = 0;
+	int dirty_pages = 0;
+	int sector_offset = pos & (fs_info->sectorsize - 1);
+
+	if (written) {
+		dirty_sectors = round_up(written + sector_offset,
+				fs_info->sectorsize);
+		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
+		dirty_pages = DIV_ROUND_UP(written + offset_in_page(pos),
+				PAGE_SIZE);
+	}
+
+	/* Release excess reservations */
+	if (num_sectors > dirty_sectors) {
+		size_t release_bytes = bi->reserved_bytes -
+			(dirty_sectors << fs_info->sb->s_blocksize_bits);
+		if (bi->metadata_only) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+					release_bytes, true);
+		} else {
+			u64 p;
+
+			p = round_down(pos,
+					fs_info->sectorsize) +
+				(dirty_pages << PAGE_SHIFT);
+			btrfs_delalloc_release_space(BTRFS_I(inode),
+					bi->data_reserved, p,
+					release_bytes, true);
+		}
+	}
+
+	/*
+	 * If we have not locked the extent range, because the range's
+	 * start offset is >= i_size, we might still have a non-NULL
+	 * cached extent state, acquired while marking the extent range
+	 * as delalloc through btrfs_dirty_pages(). Therefore free any
+	 * possible cached extent state to avoid a memory leak.
+	 */
+	if (bi->extents_locked)
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				bi->lockstart, bi->lockend,
+				&bi->cached_state);
+	else
+		free_extent_state(bi->cached_state);
+
+	btrfs_delalloc_release_extents(BTRFS_I(inode), bi->reserved_bytes);
+	if (bi->metadata_only)
+		btrfs_check_nocow_unlock(BTRFS_I(inode));
+
+	return 0;
+}
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
@@ -1736,7 +1793,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		size_t dirty_pages;
 		size_t copied;
 		size_t dirty_sectors;
-		size_t num_sectors;
 
 		/*
 		 * Fault pages before locking them in prepare_pages
@@ -1776,7 +1832,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
 
-		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bi->reserved_bytes);
 		dirty_sectors = round_up(copied + sector_offset,
 					fs_info->sectorsize);
 		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
@@ -1796,26 +1851,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 						   PAGE_SIZE);
 		}
 
-		release_bytes = bi->reserved_bytes;
-
-		if (num_sectors > dirty_sectors) {
-			/* release everything except the sectors we dirtied */
-			release_bytes -= dirty_sectors << fs_info->sectorsize_bits;
-			if (bi->metadata_only) {
-				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-							release_bytes, true);
-			} else {
-				u64 __pos;
-
-				__pos = round_down(pos,
-						   fs_info->sectorsize) +
-					(dirty_pages << PAGE_SHIFT);
-				btrfs_delalloc_release_space(BTRFS_I(inode),
-						bi->data_reserved, __pos,
-						release_bytes, true);
-			}
-		}
-
 		release_bytes = round_up(copied + sector_offset,
 					fs_info->sectorsize);
 
@@ -1824,27 +1859,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					&bi->cached_state,
 					bi->metadata_only);
 
-		/*
-		 * If we have not locked the extent range, because the range's
-		 * start offset is >= i_size, we might still have a non-NULL
-		 * cached extent state, acquired while marking the extent range
-		 * as delalloc through btrfs_dirty_pages(). Therefore free any
-		 * possible cached extent state to avoid a memory leak.
-		 */
-		if (bi->extents_locked)
-			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     bi->lockstart, bi->lockend,
-					     &bi->cached_state);
-		else
-			free_extent_state(bi->cached_state);
-		bi->extents_locked = false;
-
-		btrfs_delalloc_release_extents(BTRFS_I(inode), bi->reserved_bytes);
 		if (ret) {
 			btrfs_drop_pages(pages, num_pages);
 			break;
 		}
 
+		btrfs_buffered_iomap_end(inode, pos, write_bytes, copied, bi);
+
 		release_bytes = 0;
 		if (bi->metadata_only)
 			btrfs_check_nocow_unlock(BTRFS_I(inode));
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 22/31] btrfs: Set extents delalloc in iomap_end
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (20 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 21/31] btrfs: Carve out btrfs_buffered_iomap_end from the " Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 23/31] btrfs: define iomap_page_ops Goldwyn Rodrigues
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Since the new code path would not be calling btrfs_dirty_pages(),
set extent delalloc for the extent just written.
In order to make the flow easier, modify btrfs_buffered_iomap_end() to
use written_block_end and block_end to calculate respective written and
length sectorsize boundaries.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 58 +++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ab2b1790e0bb..d311b01a2b71 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1682,44 +1682,38 @@ static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos,
 		loff_t length, ssize_t written, struct btrfs_iomap *bi)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	size_t num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bi->reserved_bytes);
-	size_t dirty_sectors = 0;
-	int dirty_pages = 0;
-	int sector_offset = pos & (fs_info->sectorsize - 1);
-
-	if (written) {
-		dirty_sectors = round_up(written + sector_offset,
-				fs_info->sectorsize);
-		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
-		dirty_pages = DIV_ROUND_UP(written + offset_in_page(pos),
-				PAGE_SIZE);
-	}
+	int ret = 0;
+	size_t release_bytes = 0;
+	u64 start = round_down(pos, fs_info->sectorsize);
+	u64 written_block_end = round_up(pos + written, fs_info->sectorsize) - 1;
+	u64 block_end = round_up(pos + length, fs_info->sectorsize) - 1;
+        int extra_bits = 0;
 
-	/* Release excess reservations */
-	if (num_sectors > dirty_sectors) {
-		size_t release_bytes = bi->reserved_bytes -
-			(dirty_sectors << fs_info->sb->s_blocksize_bits);
-		if (bi->metadata_only) {
-			btrfs_delalloc_release_metadata(BTRFS_I(inode),
-					release_bytes, true);
-		} else {
-			u64 p;
+	if (written == 0)
+		release_bytes = bi->reserved_bytes;
+	else if (written < length)
+		release_bytes = block_end - written_block_end + 1;
 
-			p = round_down(pos,
-					fs_info->sectorsize) +
-				(dirty_pages << PAGE_SHIFT);
-			btrfs_delalloc_release_space(BTRFS_I(inode),
-					bi->data_reserved, p,
-					release_bytes, true);
-		}
-	}
+	if (bi->metadata_only)
+		extra_bits |= EXTENT_NORESERVE;
+
+	clear_extent_bit(&BTRFS_I(inode)->io_tree, start, written_block_end,
+			 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
+			 0, 0, &bi->cached_state);
+
+	ret = btrfs_set_extent_delalloc(BTRFS_I(inode), start,
+			written_block_end, extra_bits, &bi->cached_state);
+
+	/* In case of error, release everything in btrfs_iomap_release() */
+	if (ret < 0)
+		release_bytes = bi->reserved_bytes;
 
 	/*
 	 * If we have not locked the extent range, because the range's
 	 * start offset is >= i_size, we might still have a non-NULL
 	 * cached extent state, acquired while marking the extent range
-	 * as delalloc through btrfs_dirty_pages(). Therefore free any
-	 * possible cached extent state to avoid a memory leak.
+	 * as delalloc. Therefore free any possible cached extent state
+	 * to avoid a memory leak.
 	 */
 	if (bi->extents_locked)
 		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
@@ -1732,6 +1726,8 @@ static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos,
 	if (bi->metadata_only)
 		btrfs_check_nocow_unlock(BTRFS_I(inode));
 
+	btrfs_iomap_release(inode, pos, release_bytes, bi);
+
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 23/31] btrfs: define iomap_page_ops
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (21 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 22/31] btrfs: Set extents delalloc in iomap_end Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 24/31] btrfs: Switch to iomap_file_buffered_write() Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The page_done() sets page->private to EXTENT_PAGE_PRIVATE if not
already set. The check is in set_page_extent_mapped().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d311b01a2b71..fe6d24c6f7bf 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -467,6 +467,17 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
 	}
 }
 
+static void btrfs_page_done(struct inode *inode, loff_t pos,
+		unsigned int copied, struct page *page,
+		struct iomap *iomap)
+{
+	set_page_extent_mapped(page);
+}
+
+static const struct iomap_page_ops btrfs_iomap_page_ops = {
+	.page_done = btrfs_page_done,
+};
+
 /*
  * After btrfs_copy_from_user(), update the following things for delalloc:
  * - Mark newly dirtied pages as DELALLOC in the io tree.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 24/31] btrfs: Switch to iomap_file_buffered_write()
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (22 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 23/31] btrfs: define iomap_page_ops Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 25/31] btrfs: remove all page related functions Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Move allocation of btrfs_iomap into iomap_begin().
Change begin()/end() functions to fill iomap data structure.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 157 +++++++++---------------------------------------
 1 file changed, 27 insertions(+), 130 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fe6d24c6f7bf..6c6e3343bf37 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1639,12 +1639,18 @@ static void btrfs_iomap_release(struct inode *inode,
 }
 
 static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos,
-		size_t length, struct btrfs_iomap *bi)
+		loff_t length, unsigned int flags, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	int ret;
 	size_t write_bytes = length;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	size_t sector_offset = pos & (fs_info->sectorsize - 1);
+	struct btrfs_iomap *bi;
+
+	bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS);
+	if (!bi)
+		return -ENOMEM;
 
 	ret = btrfs_check_data_free_space(BTRFS_I(inode),
 			&bi->data_reserved, pos, write_bytes);
@@ -1685,14 +1691,25 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos,
 		bi->extents_locked = true;
 	}
 
+	iomap->private = bi;
+	iomap->length = round_up(write_bytes + sector_offset,
+				 fs_info->sectorsize);
+	iomap->offset = round_down(pos, fs_info->sectorsize);
+	iomap->addr = IOMAP_NULL_ADDR;
+	iomap->type = IOMAP_DELALLOC;
+	iomap->bdev = fs_info->fs_devices->latest_bdev;
+	iomap->page_ops = &btrfs_iomap_page_ops;
+
 	return 0;
 
 }
 
 static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos,
-		loff_t length, ssize_t written, struct btrfs_iomap *bi)
+		loff_t length, ssize_t written, unsigned flags,
+		struct iomap *iomap)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_iomap *bi = iomap->private;
 	int ret = 0;
 	size_t release_bytes = 0;
 	u64 start = round_down(pos, fs_info->sectorsize);
@@ -1742,21 +1759,19 @@ static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos,
 	return 0;
 }
 
+const struct iomap_ops btrfs_buffered_iomap_ops = {
+	.iomap_begin = btrfs_buffered_iomap_begin,
+	.iomap_end = btrfs_buffered_iomap_end,
+};
+
 static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 					       struct iov_iter *i)
 {
 	struct file *file = iocb->ki_filp;
-	loff_t pos;
 	struct inode *inode = file_inode(file);
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct page **pages = NULL;
-	u64 release_bytes = 0;
 	size_t num_written = 0;
-	int nrptrs;
 	int ret;
-	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
-	struct btrfs_iomap *bi = NULL;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		ilock_flags |= BTRFS_ILOCK_TRY;
@@ -1773,130 +1788,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	if (ret < 0)
 		goto out;
 
-	pos = iocb->ki_pos;
-	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
-			PAGE_SIZE / (sizeof(struct page *)));
-	nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
-	nrptrs = max(nrptrs, 8);
-	pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL);
-	if (!pages) {
-		ret = -ENOMEM;
-		goto out;
-	}
+	num_written = iomap_file_buffered_write(iocb, i,
+						&btrfs_buffered_iomap_ops);
 
-	bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS);
-	if (!bi) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	while (iov_iter_count(i) > 0) {
-		size_t offset = offset_in_page(pos);
-		size_t sector_offset;
-		size_t write_bytes = min(iov_iter_count(i),
-					 nrptrs * (size_t)PAGE_SIZE -
-					 offset);
-		size_t num_pages;
-		size_t dirty_pages;
-		size_t copied;
-		size_t dirty_sectors;
-
-		/*
-		 * Fault pages before locking them in prepare_pages
-		 * to avoid recursive lock
-		 */
-		if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) {
-			ret = -EFAULT;
-			break;
-		}
-
-		bi->extents_locked = false;
-		bi->metadata_only = false;
-		sector_offset = pos & (fs_info->sectorsize - 1);
-
-		extent_changeset_release(bi->data_reserved);
-
-		ret = btrfs_buffered_iomap_begin(inode, pos, write_bytes,
-				bi);
-
-		if (ret < 0)
-			goto out;
-
-		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
-		WARN_ON(num_pages > nrptrs);
-		/*
-		 * This is going to setup the pages array with the number of
-		 * pages we want, so we don't really need to worry about the
-		 * contents of pages from loop to loop
-		 */
-		ret = prepare_pages(inode, pages, num_pages,
-				    pos, write_bytes);
-		if (ret) {
-			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       bi->reserved_bytes);
-			break;
-		}
-
-		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
-
-		dirty_sectors = round_up(copied + sector_offset,
-					fs_info->sectorsize);
-		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
-
-		/*
-		 * if we have trouble faulting in the pages, fall
-		 * back to one page at a time
-		 */
-		if (copied < write_bytes)
-			nrptrs = 1;
-
-		if (copied == 0) {
-			dirty_sectors = 0;
-			dirty_pages = 0;
-		} else {
-			dirty_pages = DIV_ROUND_UP(copied + offset,
-						   PAGE_SIZE);
-		}
-
-		release_bytes = round_up(copied + sector_offset,
-					fs_info->sectorsize);
-
-		ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
-					dirty_pages, pos, copied,
-					&bi->cached_state,
-					bi->metadata_only);
-
-		if (ret) {
-			btrfs_drop_pages(pages, num_pages);
-			break;
-		}
-
-		btrfs_buffered_iomap_end(inode, pos, write_bytes, copied, bi);
-
-		release_bytes = 0;
-		if (bi->metadata_only)
-			btrfs_check_nocow_unlock(BTRFS_I(inode));
-
-		btrfs_drop_pages(pages, num_pages);
-
-		cond_resched();
-
-		balance_dirty_pages_ratelimited(inode->i_mapping);
-
-		pos += copied;
-		num_written += copied;
-	}
-
-	kfree(pages);
-
-	if (num_written > 0) {
-		pagecache_isize_extended(inode, old_isize, iocb->ki_pos);
+	if (num_written > 0)
 		iocb->ki_pos += num_written;
-	}
 
-	if (release_bytes && bi->metadata_only)
-		btrfs_check_nocow_unlock(BTRFS_I(inode));
-	btrfs_iomap_release(inode, pos, release_bytes, bi);
 out:
 	btrfs_inode_unlock(inode, ilock_flags);
 	return num_written ? num_written : ret;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 25/31] btrfs: remove all page related functions
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (23 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 24/31] btrfs: Switch to iomap_file_buffered_write() Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 26/31] btrfs: use srcmap for read-before-write cases Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Cleanup after switching to iomap writes. Remove all page handling
functions.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 156 ------------------------------------------------
 1 file changed, 156 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 6c6e3343bf37..b3e48bfd75df 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -394,79 +394,6 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
-/* simple helper to fault in pages and copy.  This should go away
- * and be replaced with calls into generic code.
- */
-static noinline int btrfs_copy_from_user(loff_t pos, size_t write_bytes,
-					 struct page **prepared_pages,
-					 struct iov_iter *i)
-{
-	size_t copied = 0;
-	size_t total_copied = 0;
-	int pg = 0;
-	int offset = offset_in_page(pos);
-
-	while (write_bytes > 0) {
-		size_t count = min_t(size_t,
-				     PAGE_SIZE - offset, write_bytes);
-		struct page *page = prepared_pages[pg];
-		/*
-		 * Copy data from userspace to the current page
-		 */
-		copied = iov_iter_copy_from_user_atomic(page, i, offset, count);
-
-		/* Flush processor's dcache for this page */
-		flush_dcache_page(page);
-
-		/*
-		 * if we get a partial write, we can end up with
-		 * partially up to date pages.  These add
-		 * a lot of complexity, so make sure they don't
-		 * happen by forcing this copy to be retried.
-		 *
-		 * The rest of the btrfs_file_write code will fall
-		 * back to page at a time copies after we return 0.
-		 */
-		if (!PageUptodate(page) && copied < count)
-			copied = 0;
-
-		iov_iter_advance(i, copied);
-		write_bytes -= copied;
-		total_copied += copied;
-
-		/* Return to btrfs_file_write_iter to fault page */
-		if (unlikely(copied == 0))
-			break;
-
-		if (copied < PAGE_SIZE - offset) {
-			offset += copied;
-		} else {
-			pg++;
-			offset = 0;
-		}
-	}
-	return total_copied;
-}
-
-/*
- * unlocks pages after btrfs_file_write is done with them
- */
-static void btrfs_drop_pages(struct page **pages, size_t num_pages)
-{
-	size_t i;
-	for (i = 0; i < num_pages; i++) {
-		/* page checked is some magic around finding pages that
-		 * have been modified without going through btrfs_set_page_dirty
-		 * clear it here. There should be no need to mark the pages
-		 * accessed as prepare_pages should have marked them accessed
-		 * in prepare_pages via find_or_create_page()
-		 */
-		ClearPageChecked(pages[i]);
-		unlock_page(pages[i]);
-		put_page(pages[i]);
-	}
-}
-
 static void btrfs_page_done(struct inode *inode, loff_t pos,
 		unsigned int copied, struct page *page,
 		struct iomap *iomap)
@@ -1346,89 +1273,6 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-/*
- * on error we return an unlocked page and the error value
- * on success we return a locked page and 0
- */
-static int prepare_uptodate_page(struct inode *inode,
-				 struct page *page, u64 pos)
-{
-	int ret = 0;
-
-	if ((pos & (PAGE_SIZE - 1)) && !PageUptodate(page)) {
-		ret = btrfs_readpage(NULL, page);
-		if (ret)
-			return ret;
-		lock_page(page);
-		if (!PageUptodate(page)) {
-			unlock_page(page);
-			return -EIO;
-		}
-		if (page->mapping != inode->i_mapping) {
-			unlock_page(page);
-			return -EAGAIN;
-		}
-	}
-	return 0;
-}
-
-/*
- * this just gets pages into the page cache and locks them down.
- */
-static noinline int prepare_pages(struct inode *inode, struct page **pages,
-				  size_t num_pages, loff_t pos,
-				  size_t write_bytes)
-{
-	int i;
-	unsigned long index = pos >> PAGE_SHIFT;
-	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
-	int err = 0;
-	int faili;
-
-	for (i = 0; i < num_pages; i++) {
-again:
-		pages[i] = find_or_create_page(inode->i_mapping, index + i,
-					       mask | __GFP_WRITE);
-		if (!pages[i]) {
-			faili = i - 1;
-			err = -ENOMEM;
-			goto fail;
-		}
-
-		err = set_page_extent_mapped(pages[i]);
-		if (err < 0) {
-			faili = i;
-			goto fail;
-		}
-
-		if (i == 0)
-			err = prepare_uptodate_page(inode, pages[i], pos);
-		if (!err && i == num_pages - 1)
-			err = prepare_uptodate_page(inode, pages[i],
-						    pos + write_bytes);
-		if (err) {
-			put_page(pages[i]);
-			if (err == -EAGAIN) {
-				err = 0;
-				goto again;
-			}
-			faili = i - 1;
-			goto fail;
-		}
-		wait_on_page_writeback(pages[i]);
-	}
-
-	return 0;
-fail:
-	while (faili >= 0) {
-		unlock_page(pages[faili]);
-		put_page(pages[faili]);
-		faili--;
-	}
-	return err;
-
-}
-
 static int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
 			   size_t *write_bytes, bool nowait)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 26/31] btrfs: use srcmap for read-before-write cases
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (24 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 25/31] btrfs: remove all page related functions Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 27/31] btrfs: Rename end_bio_extent_readpage to btrfs_readpage_endio Goldwyn Rodrigues
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

srcmap supplies the iomap structure to read from in case the write is
not aligned to blocksize boundaries.

Note, range is already locked, so no extent locking is required.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b3e48bfd75df..b10252d93462 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -64,6 +64,9 @@ struct btrfs_iomap {
 	struct extent_state *cached_state;
 	int extents_locked;
 
+	/* Source extent-map in order to read from in case not sector aligned */
+	struct extent_map *em;
+
 	/* Reservation */
 	bool metadata_only;
 	struct extent_changeset *data_reserved;
@@ -1479,6 +1482,7 @@ static void btrfs_iomap_release(struct inode *inode,
 		}
 	}
 	extent_changeset_free(bi->data_reserved);
+	free_extent_map(bi->em);
 	kfree(bi);
 }
 
@@ -1491,11 +1495,37 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	size_t sector_offset = pos & (fs_info->sectorsize - 1);
 	struct btrfs_iomap *bi;
+	loff_t end = pos + length;
 
 	bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS);
 	if (!bi)
 		return -ENOMEM;
 
+	if ((pos & (PAGE_SIZE - 1) || end & (PAGE_SIZE - 1))) {
+		loff_t isize = i_size_read(inode);
+
+		if (pos >= isize) {
+			srcmap->addr = IOMAP_NULL_ADDR;
+			srcmap->type = IOMAP_HOLE;
+			srcmap->offset = isize;
+			srcmap->length = end - isize;
+		} else {
+			bi->em = btrfs_get_extent(BTRFS_I(inode), NULL, 0,
+					pos - sector_offset, length);
+			if (IS_ERR(bi->em)) {
+				kfree(bi);
+				return PTR_ERR(bi->em);
+			}
+			btrfs_em_to_iomap(inode, bi->em, srcmap,
+					pos - sector_offset);
+		}
+	}
+
+	if ((srcmap->type != IOMAP_HOLE) &&
+	    (end > srcmap->offset + srcmap->length))
+			write_bytes = srcmap->offset + srcmap->length - pos;
+
+
 	ret = btrfs_check_data_free_space(BTRFS_I(inode),
 			&bi->data_reserved, pos, write_bytes);
 	if (ret < 0) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 27/31] btrfs: Rename end_bio_extent_readpage to btrfs_readpage_endio
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (25 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 26/31] btrfs: use srcmap for read-before-write cases Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 28/31] btrfs: iomap_begin() for buffered read Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Also, declare it in extent_io.h so it may be used by bio submission
function.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 12 ++++++------
 fs/btrfs/extent_io.h |  1 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 71a59fbeffe1..b0a01e3e4e7f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2970,7 +2970,7 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 /*
  * Find extent buffer for a givne bytenr.
  *
- * This is for end_bio_extent_readpage(), thus we can't do any unsafe locking
+ * This is for btrfs_readpage_endio(), thus we can't do any unsafe locking
  * in endio context.
  */
 static struct extent_buffer *find_extent_buffer_readpage(
@@ -3007,7 +3007,7 @@ static struct extent_buffer *find_extent_buffer_readpage(
  * Scheduling is not allowed, so the extent state tree is expected
  * to have one and only one object corresponding to this IO.
  */
-static void end_bio_extent_readpage(struct bio *bio)
+void btrfs_readpage_endio(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
@@ -3035,7 +3035,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		u32 len;
 
 		btrfs_debug(fs_info,
-			"end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u",
+			"btrfs_readpage_endio: bi_sector=%llu, err=%d, mirror=%u",
 			bio->bi_iter.bi_sector, bio->bi_status,
 			io_bio->mirror_num);
 		tree = &BTRFS_I(inode)->io_tree;
@@ -3687,7 +3687,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		ret = submit_extent_page(REQ_OP_READ | read_flags, NULL,
 					 bio_ctrl, page, disk_bytenr, iosize,
 					 pg_offset,
-					 end_bio_extent_readpage, 0,
+					 btrfs_readpage_endio, 0,
 					 this_bio_flag,
 					 force_bio_submit);
 		if (!ret) {
@@ -6411,7 +6411,7 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, &bio_ctrl,
 				 page, eb->start, eb->len,
 				 eb->start - page_offset(page),
-				 end_bio_extent_readpage, mirror_num, 0,
+				 btrfs_readpage_endio, mirror_num, 0,
 				 true);
 	if (ret) {
 		/*
@@ -6513,7 +6513,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 			ClearPageError(page);
 			err = submit_extent_page(REQ_OP_READ | REQ_META, NULL,
 					 &bio_ctrl, page, page_offset(page),
-					 PAGE_SIZE, 0, end_bio_extent_readpage,
+					 PAGE_SIZE, 0, btrfs_readpage_endio,
 					 mirror_num, 0, false);
 			if (err) {
 				/*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 62027f551b44..e3685b071eba 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -287,6 +287,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 		      unsigned int pg_offset, int mirror_num);
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
+void btrfs_readpage_endio(struct bio *bio);
 
 /*
  * When IO fails, either with EIO or csum verification fails, we
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 28/31] btrfs: iomap_begin() for buffered read
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (26 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 27/31] btrfs: Rename end_bio_extent_readpage to btrfs_readpage_endio Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 29/31] btrfs: Use iomap_readpage_ops to allocate and submit bio Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Lock the extents before performing a read.
Convert the offset/length to iomap by calling btrfs_em_to_iomap().
If reading from a hole, unlock the extent because iomap provides page
with zeros set.

Lock only the range on which iomap I/O can be performed. If iomap is
lesser than requested range, unlock extents beyond iomap
range.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index be0caf3a11f9..6b9b238837b8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8267,6 +8267,56 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
 }
 
+
+static int btrfs_read_iomap_begin(struct inode *inode, loff_t pos,
+		loff_t length, unsigned int flags, struct iomap *iomap,
+		struct iomap *srcmap)
+{
+	struct extent_state *cached_state = NULL;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct extent_map *em;
+	u64 start = round_down(pos, fs_info->sectorsize);
+	u64 end = round_up(pos + length, fs_info->sectorsize) - 1;
+
+	/* Lock the extent */
+	btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
+			start, end, &cached_state);
+
+	em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, end - start + 1);
+	if (IS_ERR(em))
+		return PTR_ERR(em);
+
+	btrfs_em_to_iomap(inode, em, iomap, start);
+	iomap->private = em;
+
+	if (iomap->type == IOMAP_HOLE) {
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				start, end, &cached_state);
+	} else if (end > iomap->offset + iomap->length) {
+		/* Unlock part beyond iomap */
+		unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+				iomap->offset + iomap->length,
+				end, &cached_state);
+	}
+
+	return 0;
+}
+
+static int btrfs_read_iomap_end(struct inode *inode, loff_t pos,
+		loff_t length, ssize_t written, unsigned int flags,
+		struct iomap *iomap)
+{
+	struct extent_map *em = iomap->private;
+
+	free_extent_map(em);
+	return 0;
+}
+
+const struct iomap_ops btrfs_buffered_read_iomap_ops = {
+	.iomap_begin = btrfs_read_iomap_begin,
+	.iomap_end = btrfs_read_iomap_end,
+};
+
 int btrfs_readpage(struct file *file, struct page *page)
 {
 	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 29/31] btrfs: Use iomap_readpage_ops to allocate and submit bio
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (27 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 28/31] btrfs: iomap_begin() for buffered read Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 30/31] btrfs: Do not call btrfs_io_bio_free_csum() if BTRFS_INODE_NODATASUM is not set Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 31/31] btrfs: Switch to iomap_readpage() and iomap_readahead() Goldwyn Rodrigues
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Set the alloc_io() and submit_io() functions of iomap_readpage_ops
While performing readpage, the an btrfs_io_bio is allocated to fill
csums and check data csums.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ctree.h     |  1 +
 fs/btrfs/extent_io.c | 10 ++++++++++
 fs/btrfs/extent_io.h |  3 +++
 fs/btrfs/inode.c     |  9 +++++++--
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d4567c7a93cb..5826933ba4d2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3214,6 +3214,7 @@ void btrfs_inode_unlock(struct inode *inode, unsigned int ilock_flags);
 void btrfs_update_inode_bytes(struct btrfs_inode *inode,
 			      const u64 add_bytes,
 			      const u64 del_bytes);
+void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio);
 
 /* ioctl.c */
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b0a01e3e4e7f..88a8fbf467b0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -7284,3 +7284,13 @@ void btrfs_readahead_node_child(struct extent_buffer *node, int slot)
 				   btrfs_node_ptr_generation(node, slot),
 				   btrfs_header_level(node) - 1);
 }
+
+static struct bio *btrfs_readpage_alloc_bio(gfp_t gfp_mask, unsigned short nr)
+{
+	return btrfs_io_bio_alloc(nr);
+}
+
+const struct iomap_readpage_ops btrfs_iomap_readpage_ops = {
+	.alloc_bio = btrfs_readpage_alloc_bio,
+	.submit_io = btrfs_buffered_submit_io,
+};
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index e3685b071eba..8cb79f8d1af8 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -7,6 +7,7 @@
 #include <linux/refcount.h>
 #include <linux/fiemap.h>
 #include <linux/btrfs_tree.h>
+#include <linux/iomap.h>
 #include "ulist.h"
 
 /*
@@ -313,6 +314,8 @@ int btrfs_repair_one_sector(struct inode *inode,
 			    u64 start, int failed_mirror,
 			    submit_bio_hook_t *submit_bio_hook);
 
+extern const struct iomap_readpage_ops btrfs_iomap_readpage_ops;
+
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
 			     struct page *locked_page, u64 *start,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6b9b238837b8..1ca83c11e8b9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -35,6 +35,7 @@
 #include "misc.h"
 #include "ctree.h"
 #include "disk-io.h"
+#include "extent_io.h"
 #include "transaction.h"
 #include "btrfs_inode.h"
 #include "print-tree.h"
@@ -8431,7 +8432,7 @@ static void btrfs_writepage_endio(struct bio *bio)
 	iomap_writepage_end_bio(bio);
 }
 
-static void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
+void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
 {
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
@@ -8440,7 +8441,11 @@ static void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
 		bio_for_each_segment_all(bvec, bio, iter_all)
 			set_page_extent_mapped(bvec->bv_page);
 
-	bio->bi_end_io = btrfs_writepage_endio;
+	if (bio_op(bio) == REQ_OP_WRITE)
+		bio->bi_end_io = btrfs_writepage_endio;
+	else
+		bio->bi_end_io = btrfs_readpage_endio;
+
 	if (is_data_inode(inode))
 		btrfs_submit_data_bio(inode, bio, 0, 0);
 	else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 30/31] btrfs: Do not call btrfs_io_bio_free_csum() if BTRFS_INODE_NODATASUM is not set
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (28 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 29/31] btrfs: Use iomap_readpage_ops to allocate and submit bio Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  2021-06-13 13:39 ` [RFC PATCH 31/31] btrfs: Switch to iomap_readpage() and iomap_readahead() Goldwyn Rodrigues
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Q: Not sure why this patch is required. But there are cases when I received
a would fail at kfree(io_bio->csum). io_bio->csum_inline would be set
to some random value. What would cause this?

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 88a8fbf467b0..a9e8217c0935 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3013,6 +3013,7 @@ void btrfs_readpage_endio(struct bio *bio)
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
+	struct inode *inode;
 	/*
 	 * The offset to the beginning of a bio, since one bio can never be
 	 * larger than UINT_MAX, u32 here is enough.
@@ -3026,14 +3027,16 @@ void btrfs_readpage_endio(struct bio *bio)
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		bool uptodate = !bio->bi_status;
 		struct page *page = bvec->bv_page;
-		struct inode *inode = page->mapping->host;
-		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-		const u32 sectorsize = fs_info->sectorsize;
+		struct btrfs_fs_info *fs_info;
+		u32 sectorsize;
 		unsigned int error_bitmap = (unsigned int)-1;
 		u64 start;
 		u64 end;
 		u32 len;
 
+		inode = page->mapping->host;
+		fs_info = btrfs_sb(inode->i_sb);
+		sectorsize = fs_info->sectorsize;
 		btrfs_debug(fs_info,
 			"btrfs_readpage_endio: bi_sector=%llu, err=%d, mirror=%u",
 			bio->bi_iter.bi_sector, bio->bi_status,
@@ -3140,7 +3143,8 @@ void btrfs_readpage_endio(struct bio *bio)
 	}
 	/* Release the last extent */
 	endio_readpage_release_extent(&processed, NULL, 0, 0, false);
-	btrfs_io_bio_free_csum(io_bio);
+	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
+		btrfs_io_bio_free_csum(io_bio);
 	bio_put(bio);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 31/31] btrfs: Switch to iomap_readpage() and iomap_readahead()
  2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
                   ` (29 preceding siblings ...)
  2021-06-13 13:39 ` [RFC PATCH 30/31] btrfs: Do not call btrfs_io_bio_free_csum() if BTRFS_INODE_NODATASUM is not set Goldwyn Rodrigues
@ 2021-06-13 13:39 ` Goldwyn Rodrigues
  30 siblings, 0 replies; 45+ messages in thread
From: Goldwyn Rodrigues @ 2021-06-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Use iomap readpage and readahead sequences.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/inode.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1ca83c11e8b9..56d9cbeb151d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8320,18 +8320,8 @@ const struct iomap_ops btrfs_buffered_read_iomap_ops = {
 
 int btrfs_readpage(struct file *file, struct page *page)
 {
-	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
-	u64 start = page_offset(page);
-	u64 end = start + PAGE_SIZE - 1;
-	struct btrfs_bio_ctrl bio_ctrl = { 0 };
-	int ret;
-
-	btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
-
-	ret = btrfs_do_readpage(page, NULL, &bio_ctrl, 0, NULL);
-	if (bio_ctrl.bio)
-		ret = submit_one_bio(bio_ctrl.bio, 0, bio_ctrl.bio_flags);
-	return ret;
+	return iomap_readpage(page, &btrfs_buffered_read_iomap_ops,
+				&btrfs_iomap_readpage_ops);
 }
 
 static int find_delalloc_range(struct inode *inode, u64 *start, u64 *end)
@@ -8492,7 +8482,8 @@ static int btrfs_writepages(struct address_space *mapping,
 
 static void btrfs_readahead(struct readahead_control *rac)
 {
-	extent_readahead(rac);
+	iomap_readahead(rac, &btrfs_buffered_read_iomap_ops,
+			&btrfs_iomap_readpage_ops);
 }
 
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 02/31] iomap: Add submit_io to writepage_ops
  2021-06-13 13:39 ` [RFC PATCH 02/31] iomap: Add submit_io to writepage_ops Goldwyn Rodrigues
@ 2021-06-15 14:15   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-15 14:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Filesystems such as btrfs need to perform pre and post bio operations
> such as csum calculations, device mapping etc.
> 
> Add a submit_io() function to writepage_ops so filesystems can control
> how the bio is submitted.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/iomap/buffered-io.c | 11 ++++++++++-
>  include/linux/iomap.h  |  6 ++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d30683734d01..b6fd6d6118a6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1209,7 +1209,11 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
>  		return error;
>  	}
>  
> -	submit_bio(ioend->io_bio);
> +	if (wpc->ops->submit_io)
> +		wpc->ops->submit_io(ioend->io_inode, ioend->io_bio);
> +	else
> +		submit_bio(ioend->io_bio);
> +
>  	return 0;
>  }
>  
> @@ -1305,8 +1309,13 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
>  
>  	if (!merged) {
>  		if (bio_full(wpc->ioend->io_bio, len)) {
> +			struct bio *bio = wpc->ioend->io_bio;
>  			wpc->ioend->io_bio =
>  				iomap_chain_bio(wpc->ioend->io_bio);
> +			if (wpc->ops->submit_io)
> +				wpc->ops->submit_io(inode, bio);
> +			else
> +				submit_bio(bio);

Why do you submit the bio here? It seems iomap_add_to_ioend is used to
build a chain of ioend/bio structures which are then submitted by the
loop in iomap_writepage_map. At the very least this change should go
into a separate patch whose commit message should explain why do you
introduce this new submission point.


>  		}
>  		bio_add_page(wpc->ioend->io_bio, page, len, poff);
>  	}
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index c87d0cb0de6d..689d799b1915 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -223,6 +223,12 @@ struct iomap_writeback_ops {
>  	 * we failed to submit any I/O.
>  	 */
>  	void (*discard_page)(struct page *page, loff_t fileoff);
> +
> +	/*
> +	 * Optional, allows the filesystem to perform a custom submission of
> +	 * bio, such as csum calculations or multi-device bio split
> +	 */
> +	void (*submit_io)(struct inode *inode, struct bio *bio);

nit: I'm wondering if we have the ->submit_bio callback doesn't it
really subsume prepare_ioend.

>  };
>  
>  struct iomap_writepage_ctx {
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page()
  2021-06-13 13:39 ` [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page() Goldwyn Rodrigues
@ 2021-06-15 14:17   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-15 14:17 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> btrfs requires page->private set to get a callback to releasepage(). 

nit: This is somewhat terse of an explanation and someone not familiar
with btrfs might wonder what's going on. In fact, btrfs uses the
PagePrivate flag to indicate that the page has undergone CoW and has
ordered extents created. And this is in fact problematic because iomap
assume page_private is not in use by the underlying fs.

So,
> for normal circumstances of blocksize==PAGE_SIZE, btrfs will have
> page->private set to 1. In order to avoid a crash, check for the
> blocksize==PAGE_SIZE in to_iomap_page().
> ---
>  fs/iomap/buffered-io.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9023717c5188..d30683734d01 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -41,9 +41,11 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>  	 */
>  	VM_BUG_ON_PGFLAGS(PageTail(page), page);
>  
> -	if (page_has_private(page))
> -		return (struct iomap_page *)page_private(page);
> -	return NULL;
> +	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
> +		return NULL;
> +	if (!page_has_private(page))
> +		return NULL;
> +	return (struct iomap_page *)page_private(page);
>  }
>  
>  static struct bio_set iomap_ioend_bioset;
> @@ -163,7 +165,7 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  	if (PageError(page))
>  		return;
>  
> -	if (page_has_private(page))
> +	if (i_blocksize(page->mapping->host) != PAGE_SIZE)
>  		iomap_iop_set_range_uptodate(page, off, len);
>  	else
>  		SetPageUptodate(page);
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 04/31] iomap: Introduce iomap_readpage_ops
  2021-06-13 13:39 ` [RFC PATCH 04/31] iomap: Introduce iomap_readpage_ops Goldwyn Rodrigues
@ 2021-06-15 14:44   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-15 14:44 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> iomap_readpage_ops provide additional functions to allocate or submit
> the bio.
> 
> alloc_bio() is used to allocate bio from the filesystem, in case of
> btrfs: to allocate btrfs_io_bio.
> 
> submit_io() similar to the one introduced with direct I/O, submits the
> bio.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/gfs2/aops.c         |  4 +--
>  fs/iomap/buffered-io.c | 56 +++++++++++++++++++++++++++++++-----------
>  fs/xfs/xfs_aops.c      |  4 +--
>  fs/zonefs/super.c      |  4 +--
>  include/linux/iomap.h  | 19 ++++++++++++--
>  5 files changed, 64 insertions(+), 23 deletions(-)
> 

<snip>

> @@ -282,19 +283,31 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		gfp_t orig_gfp = gfp;
>  		unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
>  
> -		if (ctx->bio)
> -			submit_bio(ctx->bio);
> +		if (ctx->bio) {
> +			if (ctx->ops && ctx->ops->submit_io)

here you check for the presence of ctx->ops && ctx->ops->some_op

> +				ctx->ops->submit_io(inode, ctx->bio);
> +			else
> +				submit_bio(ctx->bio);
> +		}
>  
>  		if (ctx->rac) /* same as readahead_gfp_mask */
>  			gfp |= __GFP_NORETRY | __GFP_NOWARN;
> -		ctx->bio = bio_alloc(gfp, bio_max_segs(nr_vecs));
> +		if (ctx->ops->alloc_bio)

but here you directly check for the presence of ctx->ops->some_op. The
correct should be to also check for ctx->ops since other filesystems
don't necessarily implement specific readops.

> +			ctx->bio = ctx->ops->alloc_bio(gfp,
> +					bio_max_segs(nr_vecs));
> +		else
> +			ctx->bio = bio_alloc(gfp, bio_max_segs(nr_vecs));
>  		/*
>  		 * If the bio_alloc fails, try it again for a single page to
>  		 * avoid having to deal with partial page reads.  This emulates
>  		 * what do_mpage_readpage does.
>  		 */
> -		if (!ctx->bio)
> -			ctx->bio = bio_alloc(orig_gfp, 1);
> +		if (!ctx->bio) {
> +			if (ctx->ops->alloc_bio)

ditto about checking for presence of ops.

> +				ctx->bio = ctx->ops->alloc_bio(orig_gfp, 1);
> +			else
> +				ctx->bio = bio_alloc(orig_gfp, 1);
> +		}
>  		ctx->bio->bi_opf = REQ_OP_READ;
>  		if (ctx->rac)
>  			ctx->bio->bi_opf |= REQ_RAHEAD;

<snip>

> @@ -336,7 +353,10 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  	}
>  
>  	if (ctx.bio) {
> -		submit_bio(ctx.bio);
> +		if (ctx.ops->submit_io)

readpage_ops can be NULL so you should check for it as well.

> +			ctx.ops->submit_io(inode, ctx.bio);
> +		else
> +			submit_bio(ctx.bio);
>  		WARN_ON_ONCE(!ctx.cur_page_in_bio);
>  	} else {
>  		WARN_ON_ONCE(ctx.cur_page_in_bio);

<snip>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 05/31] btrfs: writepage() lock extent before pages
  2021-06-13 13:39 ` [RFC PATCH 05/31] btrfs: writepage() lock extent before pages Goldwyn Rodrigues
@ 2021-06-15 15:47   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-15 15:47 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Lock Order change: Extent locks before page locks
> 
> writepage() is called with the page locked. So make an attempt
> to lock the extents and proceed only if successful.
> 
> The new function best_effort_lock_extent() attempts to lock the range
> provided.
> 
> If the entire range was not locked, but it still covers the locked
> page, work with the reduced range by resetting delalloc_end()
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/extent_io.c | 66 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 51 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 9e81d25dea70..75ad809e8c86 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1488,6 +1488,47 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end)
>  	return 1;
>  }
>  
> +/*
> + * best_effort_lock_extent() - locks the extent to the best of effort
> + * making sure the minimum range is locked and the
> + * delalloc bits are set. If the full requested range is not locked,
> + * delalloc_end shifts to until the range that can be locked.
> + */
> +static bool best_effort_lock_extent(struct extent_io_tree *tree, u64 start,
> +		u64 *delalloc_end, u64 min_end, struct extent_state **cached_state)
> +{
> +	u64 failed_start;
> +	u64 end = *delalloc_end;
> +	int ret;
> +
> +	ret = set_extent_bit(tree, start, end, EXTENT_LOCKED, EXTENT_LOCKED,
> +			     &failed_start, cached_state, GFP_NOFS, NULL);
> +
> +	if (!ret)
> +		return false;

The locking sequence here seems broken. Based on the behavior of
lock_extent_bits :

1. ret = 0 means the range was successfully locked instead you return
false.

2. ret can be a number of errors, including EEXIST, the original code
basically waits for the extents to be unlocked and retries.

> +
> +	if (failed_start < min_end) {
> +		/* Could not lock the whole range */
> +		if (failed_start > start)
> +			unlock_extent_cached(tree, start,
> +					failed_start - 1, cached_state);
> +		return false;
> +	} else if (end > failed_start) {
> +		/* Work with what could be locked */
> +		end = failed_start - 1;

nit: Here instead of assigning *delalloc_end to end and doing the check
I think it's more straightforward to perform the check with failed_start
and *delalloc_end and only then assign failed_start -1 to end.


> +	}
> +
> +	/* Check if delalloc bits are set */
> +	ret = test_range_bit(tree, start, end,
> +			     EXTENT_DELALLOC, 1, *cached_state);
> +	if (!ret) {
> +		unlock_extent_cached(tree, start, end - 1, cached_state);
> +		return false;
> +	}
> +	*delalloc_end = end;
> +	return true;
> +}
> +
>  void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end)
>  {
>  	unsigned long index = start >> PAGE_SHIFT;
> @@ -2018,7 +2059,16 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
>  	if (delalloc_end + 1 - delalloc_start > max_bytes)
>  		delalloc_end = delalloc_start + max_bytes - 1;
>  
> -	/* step two, lock all the pages after the page that has start */
> +	/* step two, lock the state bits for the range */
> +	found = best_effort_lock_extent(tree, delalloc_start, &delalloc_end,
> +			((page_index(locked_page) + 1) << PAGE_SHIFT),

page_index(locked_page)+1 means you'd like to ensure you have locked at
least 1 page beyond locked page, correct? This looks a bit arbitrary i.e
why 1 page and not 2 for example?

> +			&cached_state);
> +	if (!found) {
> +		free_extent_state(cached_state);
> +		goto out_failed;
> +	}
> +
> +	/* step three, lock all the pages after the page that has start */
>  	ret = lock_delalloc_pages(inode, locked_page,
>  				  delalloc_start, delalloc_end);

So now that this has become step 3 and the extents are locked you should
implement relevant error handling which would unlock the extents in case
of an error.

>  	ASSERT(!ret || ret == -EAGAIN);
> @@ -2038,20 +2088,6 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
>  		}
>  	}
>  
> -	/* step three, lock the state bits for the whole range */
> -	lock_extent_bits(tree, delalloc_start, delalloc_end, &cached_state);
> -
> -	/* then test to make sure it is all still delalloc */
> -	ret = test_range_bit(tree, delalloc_start, delalloc_end,
> -			     EXTENT_DELALLOC, 1, cached_state);
> -	if (!ret) {
> -		unlock_extent_cached(tree, delalloc_start, delalloc_end,
> -				     &cached_state);
> -		__unlock_for_delalloc(inode, locked_page,
> -			      delalloc_start, delalloc_end);
> -		cond_resched();
> -		goto again;
> -	}
>  	free_extent_state(cached_state);
>  	*start = delalloc_start;
>  	*end = delalloc_end;
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 06/31] btrfs: lock extents while truncating
  2021-06-13 13:39 ` [RFC PATCH 06/31] btrfs: lock extents while truncating Goldwyn Rodrigues
@ 2021-06-16  6:42   ` Nikolay Borisov
  2021-06-16 13:13   ` Nikolay Borisov
  1 sibling, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16  6:42 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Lock order change: Lock extents before pages.
> 
> This removes extent locking in invalidatepages().
> invalidatepage() is either called during truncating or while evicting
> inode. Inode eviction does not require locking.

Imo the changelog of that patch warrants an explicit mention why its
correct, namely that it's lifting the locking from btrfs_invalidatepage
and putting it in btrfs_setsize, wrapping truncate_setsize which
eventually boils down to calling invalidatepage,

> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/inode.c | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 794d906cba6c..7761a60788d0 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5201,6 +5201,9 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>  		btrfs_end_transaction(trans);
>  	} else {
>  		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +		u64 start = round_down(oldsize, fs_info->sectorsize);
> +		u64 end = round_up(newsize, fs_info->sectorsize) - 1;
> +		struct extent_state **cached = NULL;
>  
>  		if (btrfs_is_zoned(fs_info)) {
>  			ret = btrfs_wait_ordered_range(inode,
> @@ -5219,7 +5222,10 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>  			set_bit(BTRFS_INODE_FLUSH_ON_CLOSE,
>  				&BTRFS_I(inode)->runtime_flags);
>  
> +		lock_extent_bits(&BTRFS_I(inode)->io_tree, start, end, cached);
>  		truncate_setsize(inode, newsize);
> +		unlock_extent_cached(&BTRFS_I(inode)->io_tree, start, end,
> +				cached);

This effectively lifts the granular locking, which is performed on a
per-page basis and turns it into one coarse lock, which covers the whole
region. This might have some performance repercussions which can go
either way:

1. On the one hand you are keeping the whole ranged locked from the
start until the page cache is truncated.

2. On the other you reduce the overall number of extent lock/unlocks.

In either we should think about quantifying the impact, if any.

>  
>  		inode_dio_wait(inode);
>  
> @@ -8322,9 +8328,23 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
>  
>  static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
>  {
> +	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
> +	struct extent_map *em;
> +	int ret;
> +
>  	if (PageWriteback(page) || PageDirty(page))
>  		return 0;
> -	return __btrfs_releasepage(page, gfp_flags);
> +
> +	em = lookup_extent_mapping(&inode->extent_tree, page_offset(page),
> +			PAGE_SIZE - 1);
> +	if (!em)
> +		return 0;
> +	if (!try_lock_extent(&inode->io_tree, em->start, extent_map_end(em) - 1))
> +		return 0;
> +	ret = __btrfs_releasepage(page, gfp_flags);
> +	unlock_extent(&inode->io_tree, em->start, extent_map_end(em));
> +	free_extent_map(em);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_MIGRATION
> @@ -8398,9 +8418,6 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>  		return;
>  	}
>  
> -	if (!inode_evicting)
> -		lock_extent_bits(tree, page_start, page_end, &cached_state);
> -
>  	cur = page_start;
>  	while (cur < page_end) {
>  		struct btrfs_ordered_extent *ordered;
> @@ -8458,7 +8475,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>  		if (!inode_evicting)
>  			clear_extent_bit(tree, cur, range_end,
>  					 EXTENT_DELALLOC |
> -					 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
> +					 EXTENT_DO_ACCOUNTING |
>  					 EXTENT_DEFRAG, 1, 0, &cached_state);
>  
>  		spin_lock_irq(&inode->ordered_tree.lock);
> @@ -8503,7 +8520,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>  		 */
>  		btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur);
>  		if (!inode_evicting) {
> -			clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED |
> +			clear_extent_bit(tree, cur, range_end,
>  				 EXTENT_DELALLOC | EXTENT_UPTODATE |
>  				 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 1,
>  				 delete_states, &cached_state);
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap
  2021-06-13 13:39 ` [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap Goldwyn Rodrigues
@ 2021-06-16  6:49   ` Nikolay Borisov
  2021-06-16  6:51   ` Nikolay Borisov
  1 sibling, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16  6:49 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> btrfs_em_to_iomap() converts and extent map into passed argument struct
> iomap. It makes sure the information is set close to sectorsize block.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/ctree.h |  2 ++
>  fs/btrfs/file.c  | 22 ++++++++++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index c9256f862085..d4567c7a93cb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3267,6 +3267,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>  int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>  			   size_t *write_bytes);
>  void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
> +void btrfs_em_to_iomap(struct inode *inode, struct extent_map *em,
> +		struct iomap *iomap, loff_t pos);
>  
>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e7d33c8177a0..cb9471e7224a 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1570,6 +1570,28 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
>  	return 0;
>  }
>  
> +void btrfs_em_to_iomap(struct inode *inode,
> +		struct extent_map *em, struct iomap *iomap,
> +		loff_t sector_pos)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	loff_t diff = sector_pos - em->start;

rename this variable to offset or em_offset or something like that. diff
is too generic and somewhat non-descriptive.
> +
> +	if (em->block_start == EXTENT_MAP_HOLE) {
> +		iomap->addr = IOMAP_NULL_ADDR;
> +		iomap->type = IOMAP_HOLE;
> +	} else if (em->block_start == EXTENT_MAP_INLINE) {
> +		iomap->addr = IOMAP_NULL_ADDR;
> +		iomap->type = IOMAP_INLINE;
> +	} else {
> +		iomap->addr = em->block_start + diff;
> +		iomap->type = IOMAP_MAPPED;
> +	}
> +	iomap->offset = em->start + diff;
> +	iomap->bdev = fs_info->fs_devices->latest_bdev;
> +	iomap->length = em->len - diff;
> +}
> +
>  static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  					       struct iov_iter *i)
>  {
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap
  2021-06-13 13:39 ` [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap Goldwyn Rodrigues
  2021-06-16  6:49   ` Nikolay Borisov
@ 2021-06-16  6:51   ` Nikolay Borisov
  1 sibling, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16  6:51 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> btrfs_em_to_iomap() converts and extent map into passed argument struct
> iomap. It makes sure the information is set close to sectorsize block.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/ctree.h |  2 ++
>  fs/btrfs/file.c  | 22 ++++++++++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index c9256f862085..d4567c7a93cb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3267,6 +3267,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>  int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>  			   size_t *write_bytes);
>  void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
> +void btrfs_em_to_iomap(struct inode *inode, struct extent_map *em,
> +		struct iomap *iomap, loff_t pos);
>  
>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e7d33c8177a0..cb9471e7224a 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1570,6 +1570,28 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
>  	return 0;
>  }
>  
> +void btrfs_em_to_iomap(struct inode *inode,
> +		struct extent_map *em, struct iomap *iomap,
> +		loff_t sector_pos)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	loff_t diff = sector_pos - em->start;
> +
> +	if (em->block_start == EXTENT_MAP_HOLE) {
> +		iomap->addr = IOMAP_NULL_ADDR;
> +		iomap->type = IOMAP_HOLE;
> +	} else if (em->block_start == EXTENT_MAP_INLINE) {
> +		iomap->addr = IOMAP_NULL_ADDR;
> +		iomap->type = IOMAP_INLINE;
> +	} else {
> +		iomap->addr = em->block_start + diff;
> +		iomap->type = IOMAP_MAPPED;
> +	}
> +	iomap->offset = em->start + diff;

This is really iomap->offset = sector_pos i.e no need to perform the
addition.

> +	iomap->bdev = fs_info->fs_devices->latest_bdev;
> +	iomap->length = em->len - diff;
> +}
> +
>  static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  					       struct iov_iter *i)
>  {
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 06/31] btrfs: lock extents while truncating
  2021-06-13 13:39 ` [RFC PATCH 06/31] btrfs: lock extents while truncating Goldwyn Rodrigues
  2021-06-16  6:42   ` Nikolay Borisov
@ 2021-06-16 13:13   ` Nikolay Borisov
  1 sibling, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16 13:13 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Lock order change: Lock extents before pages.
> 
> This removes extent locking in invalidatepages().
> invalidatepage() is either called during truncating or while evicting
> inode. Inode eviction does not require locking.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/inode.c | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 794d906cba6c..7761a60788d0 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5201,6 +5201,9 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>  		btrfs_end_transaction(trans);
>  	} else {
>  		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +		u64 start = round_down(oldsize, fs_info->sectorsize);
> +		u64 end = round_up(newsize, fs_info->sectorsize) - 1;
> +		struct extent_state **cached = NULL;
>  
>  		if (btrfs_is_zoned(fs_info)) {
>  			ret = btrfs_wait_ordered_range(inode,
> @@ -5219,7 +5222,10 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>  			set_bit(BTRFS_INODE_FLUSH_ON_CLOSE,
>  				&BTRFS_I(inode)->runtime_flags);
>  
> +		lock_extent_bits(&BTRFS_I(inode)->io_tree, start, end, cached);
>  		truncate_setsize(inode, newsize);
> +		unlock_extent_cached(&BTRFS_I(inode)->io_tree, start, end,
> +				cached);
>  
>  		inode_dio_wait(inode);
>  
> @@ -8322,9 +8328,23 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
>  
>  static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
>  {
> +	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
> +	struct extent_map *em;
> +	int ret;
> +
>  	if (PageWriteback(page) || PageDirty(page))
>  		return 0;
> -	return __btrfs_releasepage(page, gfp_flags);
> +
> +	em = lookup_extent_mapping(&inode->extent_tree, page_offset(page),
> +			PAGE_SIZE - 1);
> +	if (!em)
> +		return 0;
> +	if (!try_lock_extent(&inode->io_tree, em->start, extent_map_end(em) - 1))
> +		return 0;
> +	ret = __btrfs_releasepage(page, gfp_flags);
> +	unlock_extent(&inode->io_tree, em->start, extent_map_end(em));
> +	free_extent_map(em);
> +	return ret;
>  }

Care to elaborate why this is needed, looking at the code I couldn't
find the answer.

>  
>  #ifdef CONFIG_MIGRATION
> @@ -8398,9 +8418,6 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>  		return;
>  	}
>  
> -	if (!inode_evicting)
> -		lock_extent_bits(tree, page_start, page_end, &cached_state);
> -
>  	cur = page_start;
>  	while (cur < page_end) {
>  		struct btrfs_ordered_extent *ordered;
> @@ -8458,7 +8475,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>  		if (!inode_evicting)
>  			clear_extent_bit(tree, cur, range_end,
>  					 EXTENT_DELALLOC |
> -					 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
> +					 EXTENT_DO_ACCOUNTING |
>  					 EXTENT_DEFRAG, 1, 0, &cached_state);
>  
>  		spin_lock_irq(&inode->ordered_tree.lock);
> @@ -8503,7 +8520,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>  		 */
>  		btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur);
>  		if (!inode_evicting) {
> -			clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED |
> +			clear_extent_bit(tree, cur, range_end,
>  				 EXTENT_DELALLOC | EXTENT_UPTODATE |
>  				 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 1,
>  				 delete_states, &cached_state);
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 10/31] btrfs: Add btrfs_map_blocks to for iomap_writeback_ops
  2021-06-13 13:39 ` [RFC PATCH 10/31] btrfs: Add btrfs_map_blocks to for iomap_writeback_ops Goldwyn Rodrigues
@ 2021-06-16 14:15   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16 14:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> btrfs_map_blocks() runs delayed allocation range to allcate new CoW
> space if required and returns the io_map associated with the
> location to write.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/inode.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5491d0e5600f..eff4ec44fecd 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8283,6 +8283,77 @@ int btrfs_readpage(struct file *file, struct page *page)
>  	return ret;
>  }
>  

<snip>

> +static int btrfs_map_blocks(struct iomap_writepage_ctx *wpc,
> +		struct inode *inode, loff_t offset)
> +{
> +	int ret;
> +	unsigned long nr_written; /* unused */
> +	int page_started; /* unused */
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	u64 start = round_down(offset, fs_info->sectorsize);
> +	u64 end = 0;
> +
> +	/* Check if iomap is valid */
> +	if (offset >= wpc->iomap.offset &&
> +			offset < wpc->iomap.offset + wpc->iomap.length)

use in_range macro from btrfs/misc.h :

if (in_range(offset, wpc->iomap.offset, wpc->iomap.length))

<snip>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 11/31] btrfs: Use submit_io to submit btrfs writeback bios
  2021-06-13 13:39 ` [RFC PATCH 11/31] btrfs: Use submit_io to submit btrfs writeback bios Goldwyn Rodrigues
@ 2021-06-16 14:26   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16 14:26 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The submit_io sequence calls set_page_extent_mapped() for data
> inodes to make sure that page->private is set. Depending on
> the type of inode: metadata or data, the respective submit_bio
> function is called.
> 
> This will also be used for readpage() sequence.
> ---
>  fs/btrfs/inode.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index eff4ec44fecd..87d0730f59d8 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8349,11 +8349,26 @@ static int btrfs_map_blocks(struct iomap_writepage_ctx *wpc,
>  	return btrfs_set_iomap(inode, offset, end - offset, &wpc->iomap);
>  }
>  
> +static void btrfs_buffered_submit_io(struct inode *inode, struct bio *bio)
> +{
> +	struct bio_vec *bvec;
> +	struct bvec_iter_all iter_all;
> +
> +	if (is_data_inode(inode))
> +		bio_for_each_segment_all(bvec, bio, iter_all)
> +			set_page_extent_mapped(bvec->bv_page);
> +
> +	if (is_data_inode(inode))
> +		btrfs_submit_data_bio(inode, bio, 0, 0);
> +	else
> +		btrfs_submit_metadata_bio(inode, bio, 0, 0);

Move the submit_data_bio into the first if() and then
submit_metadata_bio into an else clause. That way you'll have just a
single check in this function.

> +}
> +
>  static const struct iomap_writeback_ops btrfs_writeback_ops = {
>  	.map_blocks             = btrfs_map_blocks,
> +	.submit_io		= btrfs_buffered_submit_io,
>  };
>  
> -
>  static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
>  {
>  	struct inode *inode = page->mapping->host;
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 13/31] btrfs: do not checksum for free space inode
  2021-06-13 13:39 ` [RFC PATCH 13/31] btrfs: do not checksum for free space inode Goldwyn Rodrigues
@ 2021-06-16 14:46   ` Nikolay Borisov
  0 siblings, 0 replies; 45+ messages in thread
From: Nikolay Borisov @ 2021-06-16 14:46 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs



On 13.06.21 г. 16:39, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Free space inode writes should not be checksummed.
> 
> Q: Now sure why this is required now (as opposed to earlier), but it
> would fail on writing free_space_inode. How is this avoided in the
> existing code?
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/disk-io.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d1d5091a8385..3af505ec22dc 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -946,9 +946,12 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
>  			goto out_w_error;
>  		ret = btrfs_map_bio(fs_info, bio, mirror_num);
>  	} else if (!should_async_write(fs_info, BTRFS_I(inode))) {
> -		ret = btree_csum_one_bio(bio);
> -		if (ret)
> -			goto out_w_error;
> +		/* Do not checksum free space inode */
> +		if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
> +			ret = btree_csum_one_bio(bio);
> +			if (ret)
> +				goto out_w_error;
> +		}

Something is broken here, the freespace inode is considered a data
inode, this means that this check should always be true i.e this patch
is a null op?  I.e the only inode you should be writing through
submit_metadata_bio is the btree_inode.

>  		ret = btrfs_map_bio(fs_info, bio, mirror_num);
>  	} else {
>  		/*
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 17/31] btrfs: Introduce btrfs_iomap
  2021-06-13 13:39 ` [RFC PATCH 17/31] btrfs: Introduce btrfs_iomap Goldwyn Rodrigues
@ 2021-06-16 20:02   ` David Sterba
  0 siblings, 0 replies; 45+ messages in thread
From: David Sterba @ 2021-06-16 20:02 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Sun, Jun 13, 2021 at 08:39:45AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> A structure which will be used to transfer information from
> iomap_begin() to iomap_end().
> 
> Move all locking information into btrfs_iomap.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/file.c | 44 ++++++++++++++++++++++++++++----------------
>  1 file changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a94ab3c8da1d..a28435a6bb7e 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -56,6 +56,15 @@ struct inode_defrag {
>  	int cycled;
>  };
>  
> +struct btrfs_iomap {
> +
> +	/* Locking */
> +	u64 lockstart;
> +	u64 lockend;
> +	struct extent_state *cached_state;
> +	int extents_locked;
> +};
> +
>  static int __compare_inode_defrag(struct inode_defrag *defrag1,
>  				  struct inode_defrag *defrag2)
>  {
> @@ -1599,14 +1608,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  	struct page **pages = NULL;
>  	struct extent_changeset *data_reserved = NULL;
>  	u64 release_bytes = 0;
> -	u64 lockstart;
> -	u64 lockend;
>  	size_t num_written = 0;
>  	int nrptrs;
>  	ssize_t ret;
>  	bool only_release_metadata = false;
>  	loff_t old_isize = i_size_read(inode);
>  	unsigned int ilock_flags = 0;
> +	struct btrfs_iomap *bi = NULL;
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT)
>  		ilock_flags |= BTRFS_ILOCK_TRY;
> @@ -1634,6 +1642,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		goto out;
>  	}
>  
> +	bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS);

This is adding an allocation per write. The struct btrfs_iomap is not
that big so it could be on stack, but as there are more members added to
it in further patches the size could become unsuitable for stack. OTOH
some optimization of the member layout can reduce the size, I haven't
looked closer.

> +	if (!bi) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
>  	while (iov_iter_count(i) > 0) {
>  		struct extent_state *cached_state = NULL;
>  		size_t offset = offset_in_page(pos);
> @@ -1647,7 +1661,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		size_t copied;
>  		size_t dirty_sectors;
>  		size_t num_sectors;
> -		int extents_locked = false;
>  
>  		/*
>  		 * Fault pages before locking them in prepare_pages
> @@ -1658,6 +1671,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  			break;
>  		}
>  
> +		bi->extents_locked = false;
>  		only_release_metadata = false;
>  		sector_offset = pos & (fs_info->sectorsize - 1);
>  
> @@ -1698,11 +1712,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		release_bytes = reserve_bytes;
>  
>  		if (pos < inode->i_size) {
> -			lockstart = round_down(pos, fs_info->sectorsize);
> -			lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
> +			bi->lockstart = round_down(pos, fs_info->sectorsize);
> +			bi->lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
>  			btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
> -					lockstart, lockend, &cached_state);
> -			extents_locked = true;
> +					bi->lockstart, bi->lockend,
> +					&cached_state);
> +			bi->extents_locked = true;
>  		}
>  
>  		/*
> @@ -1715,11 +1730,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		if (ret) {
>  			btrfs_delalloc_release_extents(BTRFS_I(inode),
>  						       reserve_bytes);
> -			if (extents_locked)
> -				unlock_extent_cached(&BTRFS_I(inode)->io_tree,
> -						lockstart, lockend, &cached_state);
> -			else
> -				free_extent_state(cached_state);
>  			break;
>  		}
>  
> @@ -1777,12 +1787,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		 * as delalloc through btrfs_dirty_pages(). Therefore free any
>  		 * possible cached extent state to avoid a memory leak.
>  		 */
> -		if (extents_locked)
> +		if (bi->extents_locked)
>  			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
> -					     lockstart, lockend, &cached_state);
> +					     bi->lockstart, bi->lockend,
> +					     &bi->cached_state);
>  		else
> -			free_extent_state(cached_state);
> -		extents_locked = false;
> +			free_extent_state(bi->cached_state);
> +		bi->extents_locked = false;
>  
>  		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
>  		if (ret) {
> @@ -1825,6 +1836,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		iocb->ki_pos += num_written;
>  	}
>  out:
> +	kfree(bi);

It's released in the same function, so that's the lifetime.

>  	btrfs_inode_unlock(inode, ilock_flags);
>  	return num_written ? num_written : ret;
>  }
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 19/31] btrfs: Add reservation information to btrfs_iomap
  2021-06-13 13:39 ` [RFC PATCH 19/31] btrfs: Add reservation information to btrfs_iomap Goldwyn Rodrigues
@ 2021-06-16 20:03   ` David Sterba
  0 siblings, 0 replies; 45+ messages in thread
From: David Sterba @ 2021-06-16 20:03 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Sun, Jun 13, 2021 at 08:39:47AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Move reservation information into btrfs_iomap.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/file.c | 49 +++++++++++++++++++++++++------------------------
>  1 file changed, 25 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 3d4b8fde47f4..ccae8ce7ec4f 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -63,6 +63,11 @@ struct btrfs_iomap {
>  	u64 lockend;
>  	struct extent_state *cached_state;
>  	int extents_locked;
> +
> +	/* Reservation */
> +	bool metadata_only;
> +	struct extent_changeset *data_reserved;
> +	size_t reserved_bytes;
>  };
>  
>  static int __compare_inode_defrag(struct inode_defrag *defrag1,
> @@ -1630,12 +1635,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  	struct inode *inode = file_inode(file);
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	struct page **pages = NULL;
> -	struct extent_changeset *data_reserved = NULL;
>  	u64 release_bytes = 0;
>  	size_t num_written = 0;
>  	int nrptrs;
> -	ssize_t ret;
> -	bool only_release_metadata = false;
> +	int ret;
>  	loff_t old_isize = i_size_read(inode);
>  	unsigned int ilock_flags = 0;
>  	struct btrfs_iomap *bi = NULL;
> @@ -1673,14 +1676,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  	}
>  
>  	while (iov_iter_count(i) > 0) {
> -		struct extent_state *cached_state = NULL;
>  		size_t offset = offset_in_page(pos);
>  		size_t sector_offset;
>  		size_t write_bytes = min(iov_iter_count(i),
>  					 nrptrs * (size_t)PAGE_SIZE -
>  					 offset);
>  		size_t num_pages;
> -		size_t reserve_bytes;
>  		size_t dirty_pages;
>  		size_t copied;
>  		size_t dirty_sectors;
> @@ -1696,12 +1697,12 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		}
>  
>  		bi->extents_locked = false;
> -		only_release_metadata = false;
> +		bi->metadata_only = false;
>  		sector_offset = pos & (fs_info->sectorsize - 1);
>  
> -		extent_changeset_release(data_reserved);
> +		extent_changeset_release(bi->data_reserved);
>  		ret = btrfs_check_data_free_space(BTRFS_I(inode),
> -						  &data_reserved, pos,
> +						  &bi->data_reserved, pos,
>  						  write_bytes);
>  		if (ret < 0) {
>  			/*
> @@ -1711,36 +1712,36 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  			 */
>  			if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
>  						   &write_bytes) > 0)
> -				only_release_metadata = true;
> +				bi->metadata_only = true;
>  			else
>  				break;
>  		}
>  
>  		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
>  		WARN_ON(num_pages > nrptrs);
> -		reserve_bytes = round_up(write_bytes + sector_offset,
> +		bi->reserved_bytes = round_up(write_bytes + sector_offset,
>  					 fs_info->sectorsize);
> -		WARN_ON(reserve_bytes == 0);
> +		WARN_ON(bi->reserved_bytes == 0);
>  		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
> -				reserve_bytes);
> +				bi->reserved_bytes);
>  		if (ret) {
> -			if (!only_release_metadata)
> +			if (!bi->metadata_only)
>  				btrfs_free_reserved_data_space(BTRFS_I(inode),
> -						data_reserved, pos,
> +						bi->data_reserved, pos,
>  						write_bytes);
>  			else
>  				btrfs_check_nocow_unlock(BTRFS_I(inode));
>  			break;
>  		}
>  
> -		release_bytes = reserve_bytes;
> +		release_bytes = bi->reserved_bytes;
>  
>  		if (pos < inode->i_size) {
>  			bi->lockstart = round_down(pos, fs_info->sectorsize);
>  			bi->lockend = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
>  			btrfs_lock_and_flush_ordered_range(BTRFS_I(inode),
>  					bi->lockstart, bi->lockend,
> -					&cached_state);
> +					&bi->cached_state);
>  			bi->extents_locked = true;
>  		}
>  
> @@ -1753,13 +1754,13 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  				    pos, write_bytes);
>  		if (ret) {
>  			btrfs_delalloc_release_extents(BTRFS_I(inode),
> -						       reserve_bytes);
> +						       bi->reserved_bytes);
>  			break;
>  		}
>  
>  		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
>  
> -		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
> +		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bi->reserved_bytes);
>  		dirty_sectors = round_up(copied + sector_offset,
>  					fs_info->sectorsize);
>  		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
> @@ -1782,7 +1783,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		if (num_sectors > dirty_sectors) {
>  			/* release everything except the sectors we dirtied */
>  			release_bytes -= dirty_sectors << fs_info->sectorsize_bits;
> -			if (only_release_metadata) {
> +			if (bi->metadata_only) {
>  				btrfs_delalloc_release_metadata(BTRFS_I(inode),
>  							release_bytes, true);
>  			} else {
> @@ -1792,7 +1793,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  						   fs_info->sectorsize) +
>  					(dirty_pages << PAGE_SHIFT);
>  				btrfs_delalloc_release_space(BTRFS_I(inode),
> -						data_reserved, __pos,
> +						bi->data_reserved, __pos,
>  						release_bytes, true);
>  			}
>  		}
> @@ -1802,7 +1803,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  
>  		ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
>  					dirty_pages, pos, copied,
> -					&cached_state, only_release_metadata);
> +					&bi->cached_state,
> +					bi->metadata_only);
>  
>  		/*
>  		 * If we have not locked the extent range, because the range's
> @@ -1819,14 +1821,14 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  			free_extent_state(bi->cached_state);
>  		bi->extents_locked = false;
>  
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), bi->reserved_bytes);
>  		if (ret) {
>  			btrfs_drop_pages(pages, num_pages);
>  			break;
>  		}
>  
>  		release_bytes = 0;
> -		if (only_release_metadata)
> +		if (bi->metadata_only)
>  			btrfs_check_nocow_unlock(BTRFS_I(inode));
>  
>  		btrfs_drop_pages(pages, num_pages);
> @@ -1850,7 +1852,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>  		btrfs_check_nocow_unlock(BTRFS_I(inode));
>  	btrfs_iomap_release(inode, pos, release_bytes, bi);
>  out:
> -	kfree(bi);

Is this intentional? Or I don't see the pairing kfree that happens
elsewhere.

>  	btrfs_inode_unlock(inode, ilock_flags);
>  	return num_written ? num_written : ret;
>  }
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-06-16 20:06 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-13 13:39 [RFC PATCH 00/31] btrfs buffered iomap support Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 01/31] iomap: Check if blocksize == PAGE_SIZE in to_iomap_page() Goldwyn Rodrigues
2021-06-15 14:17   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 02/31] iomap: Add submit_io to writepage_ops Goldwyn Rodrigues
2021-06-15 14:15   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 03/31] iomap: Export iomap_writepage_end_bio() Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 04/31] iomap: Introduce iomap_readpage_ops Goldwyn Rodrigues
2021-06-15 14:44   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 05/31] btrfs: writepage() lock extent before pages Goldwyn Rodrigues
2021-06-15 15:47   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 06/31] btrfs: lock extents while truncating Goldwyn Rodrigues
2021-06-16  6:42   ` Nikolay Borisov
2021-06-16 13:13   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 07/31] btrfs: write() perform extent locks before locking page Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 08/31] btrfs: btrfs_em_to_iomap () to convert em to iomap Goldwyn Rodrigues
2021-06-16  6:49   ` Nikolay Borisov
2021-06-16  6:51   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 09/31] btrfs: Don't process pages if locked_page is NULL Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 10/31] btrfs: Add btrfs_map_blocks to for iomap_writeback_ops Goldwyn Rodrigues
2021-06-16 14:15   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 11/31] btrfs: Use submit_io to submit btrfs writeback bios Goldwyn Rodrigues
2021-06-16 14:26   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 12/31] btrfs: endio sequence for writepage Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 13/31] btrfs: do not checksum for free space inode Goldwyn Rodrigues
2021-06-16 14:46   ` Nikolay Borisov
2021-06-13 13:39 ` [RFC PATCH 14/31] btrfs: end EXTENT_MAP_INLINE writeback early Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 15/31] btrfs: Switch to iomap_writepages() Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 16/31] btrfs: remove force_page_uptodate Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 17/31] btrfs: Introduce btrfs_iomap Goldwyn Rodrigues
2021-06-16 20:02   ` David Sterba
2021-06-13 13:39 ` [RFC PATCH 18/31] btrfs: Add btrfs_iomap_release() Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 19/31] btrfs: Add reservation information to btrfs_iomap Goldwyn Rodrigues
2021-06-16 20:03   ` David Sterba
2021-06-13 13:39 ` [RFC PATCH 20/31] btrfs: Carve out btrfs_buffered_iomap_begin() from write path Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 21/31] btrfs: Carve out btrfs_buffered_iomap_end from the " Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 22/31] btrfs: Set extents delalloc in iomap_end Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 23/31] btrfs: define iomap_page_ops Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 24/31] btrfs: Switch to iomap_file_buffered_write() Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 25/31] btrfs: remove all page related functions Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 26/31] btrfs: use srcmap for read-before-write cases Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 27/31] btrfs: Rename end_bio_extent_readpage to btrfs_readpage_endio Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 28/31] btrfs: iomap_begin() for buffered read Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 29/31] btrfs: Use iomap_readpage_ops to allocate and submit bio Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 30/31] btrfs: Do not call btrfs_io_bio_free_csum() if BTRFS_INODE_NODATASUM is not set Goldwyn Rodrigues
2021-06-13 13:39 ` [RFC PATCH 31/31] btrfs: Switch to iomap_readpage() and iomap_readahead() Goldwyn Rodrigues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.