linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] gfs2/buffer folio changes
@ 2023-05-17  3:24 Matthew Wilcox (Oracle)
  2023-05-17  3:24 ` [PATCH 1/6] gfs2: Use a folio inside gfs2_jdata_writepage() Matthew Wilcox (Oracle)
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

This kind of started off as a gfs2 patch series, then became entwined
with buffer heads once I realised that gfs2 was the only remaining
caller of __block_write_full_page().  For those not in the gfs2 world,
the big point of this series is that block_write_full_page() should now
handle large folios correctly.

It probably makes most sense to take this through Andrew's tree, once
enough people have signed off on it?

Matthew Wilcox (Oracle) (6):
  gfs2: Use a folio inside gfs2_jdata_writepage()
  gfs2: Pass a folio to __gfs2_jdata_write_folio()
  gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
  buffer: Convert __block_write_full_page() to
    __block_write_full_folio()
  gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
  buffer: Make block_write_full_page() handle large folios correctly

 fs/buffer.c                 | 75 ++++++++++++++++++-------------------
 fs/gfs2/aops.c              | 66 ++++++++++++++++----------------
 fs/gfs2/aops.h              |  2 +-
 fs/ntfs/aops.c              |  2 +-
 fs/reiserfs/inode.c         |  2 +-
 include/linux/buffer_head.h |  2 +-
 6 files changed, 75 insertions(+), 74 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] gfs2: Use a folio inside gfs2_jdata_writepage()
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
@ 2023-05-17  3:24 ` Matthew Wilcox (Oracle)
  2023-05-17  3:24 ` [PATCH 2/6] gfs2: Pass a folio to __gfs2_jdata_write_folio() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

Replace a few implicit calls to compound_head() with one explicit one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/gfs2/aops.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a5f4be6b9213..0518861df783 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -150,20 +150,21 @@ static int __gfs2_jdata_writepage(struct page *page, struct writeback_control *w
 
 static int gfs2_jdata_writepage(struct page *page, struct writeback_control *wbc)
 {
+	struct folio *folio = page_folio(page);
 	struct inode *inode = page->mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 
 	if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl)))
 		goto out;
-	if (PageChecked(page) || current->journal_info)
+	if (folio_test_checked(folio) || current->journal_info)
 		goto out_ignore;
-	return __gfs2_jdata_writepage(page, wbc);
+	return __gfs2_jdata_writepage(&folio->page, wbc);
 
 out_ignore:
-	redirty_page_for_writepage(wbc, page);
+	folio_redirty_for_writepage(wbc, folio);
 out:
-	unlock_page(page);
+	folio_unlock(folio);
 	return 0;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] gfs2: Pass a folio to __gfs2_jdata_write_folio()
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
  2023-05-17  3:24 ` [PATCH 1/6] gfs2: Use a folio inside gfs2_jdata_writepage() Matthew Wilcox (Oracle)
@ 2023-05-17  3:24 ` Matthew Wilcox (Oracle)
  2023-05-17  3:24 ` [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio() Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

Remove a couple of folio->page conversions in the callers, and two
calls to compound_head() in the function itself.  Rename it from
__gfs2_jdata_writepage() to __gfs2_jdata_write_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/gfs2/aops.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 0518861df783..749135252d52 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -113,30 +113,31 @@ static int gfs2_write_jdata_page(struct page *page,
 }
 
 /**
- * __gfs2_jdata_writepage - The core of jdata writepage
- * @page: The page to write
+ * __gfs2_jdata_write_folio - The core of jdata writepage
+ * @folio: The folio to write
  * @wbc: The writeback control
  *
  * This is shared between writepage and writepages and implements the
  * core of the writepage operation. If a transaction is required then
- * PageChecked will have been set and the transaction will have
+ * the checked flag will have been set and the transaction will have
  * already been started before this is called.
  */
-
-static int __gfs2_jdata_writepage(struct page *page, struct writeback_control *wbc)
+static int __gfs2_jdata_write_folio(struct folio *folio,
+		struct writeback_control *wbc)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = folio->mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	if (PageChecked(page)) {
-		ClearPageChecked(page);
-		if (!page_has_buffers(page)) {
-			create_empty_buffers(page, inode->i_sb->s_blocksize,
-					     BIT(BH_Dirty)|BIT(BH_Uptodate));
+	if (folio_test_checked(folio)) {
+		folio_clear_checked(folio);
+		if (!folio_buffers(folio)) {
+			folio_create_empty_buffers(folio,
+					inode->i_sb->s_blocksize,
+					BIT(BH_Dirty)|BIT(BH_Uptodate));
 		}
-		gfs2_trans_add_databufs(ip, page_folio(page), 0, PAGE_SIZE);
+		gfs2_trans_add_databufs(ip, folio, 0, folio_size(folio));
 	}
-	return gfs2_write_jdata_page(page, wbc);
+	return gfs2_write_jdata_page(&folio->page, wbc);
 }
 
 /**
@@ -159,7 +160,7 @@ static int gfs2_jdata_writepage(struct page *page, struct writeback_control *wbc
 		goto out;
 	if (folio_test_checked(folio) || current->journal_info)
 		goto out_ignore;
-	return __gfs2_jdata_writepage(&folio->page, wbc);
+	return __gfs2_jdata_write_folio(folio, wbc);
 
 out_ignore:
 	folio_redirty_for_writepage(wbc, folio);
@@ -256,7 +257,7 @@ static int gfs2_write_jdata_batch(struct address_space *mapping,
 
 		trace_wbc_writepage(wbc, inode_to_bdi(inode));
 
-		ret = __gfs2_jdata_writepage(&folio->page, wbc);
+		ret = __gfs2_jdata_write_folio(folio, wbc);
 		if (unlikely(ret)) {
 			if (ret == AOP_WRITEPAGE_ACTIVATE) {
 				folio_unlock(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
  2023-05-17  3:24 ` [PATCH 1/6] gfs2: Use a folio inside gfs2_jdata_writepage() Matthew Wilcox (Oracle)
  2023-05-17  3:24 ` [PATCH 2/6] gfs2: Pass a folio to __gfs2_jdata_write_folio() Matthew Wilcox (Oracle)
@ 2023-05-17  3:24 ` Matthew Wilcox (Oracle)
  2023-06-03  9:34   ` Andreas Gruenbacher
  2023-05-17  3:24 ` [PATCH 4/6] buffer: Convert __block_write_full_page() to __block_write_full_folio() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

This function now supports large folios, even if nothing around it does.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/gfs2/aops.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 749135252d52..0f92e3e117da 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -82,33 +82,34 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
 }
 
 /**
- * gfs2_write_jdata_page - gfs2 jdata-specific version of block_write_full_page
- * @page: The page to write
+ * gfs2_write_jdata_folio - gfs2 jdata-specific version of block_write_full_page
+ * @folio: The folio to write
  * @wbc: The writeback control
  *
  * This is the same as calling block_write_full_page, but it also
  * writes pages outside of i_size
  */
-static int gfs2_write_jdata_page(struct page *page,
+static int gfs2_write_jdata_folio(struct folio *folio,
 				 struct writeback_control *wbc)
 {
-	struct inode * const inode = page->mapping->host;
+	struct inode * const inode = folio->mapping->host;
 	loff_t i_size = i_size_read(inode);
-	const pgoff_t end_index = i_size >> PAGE_SHIFT;
-	unsigned offset;
 
+	if (folio_pos(folio) >= i_size)
+		return 0;
 	/*
-	 * The page straddles i_size.  It must be zeroed out on each and every
+	 * The folio straddles i_size.  It must be zeroed out on each and every
 	 * writepage invocation because it may be mmapped.  "A file is mapped
 	 * in multiples of the page size.  For a file that is not a multiple of
-	 * the  page size, the remaining memory is zeroed when mapped, and
+	 * the page size, the remaining memory is zeroed when mapped, and
 	 * writes to that region are not written out to the file."
 	 */
-	offset = i_size & (PAGE_SIZE - 1);
-	if (page->index == end_index && offset)
-		zero_user_segment(page, offset, PAGE_SIZE);
+	if (i_size < folio_pos(folio) + folio_size(folio))
+		folio_zero_segment(folio, offset_in_folio(folio, i_size),
+				folio_size(folio));
 
-	return __block_write_full_page(inode, page, gfs2_get_block_noalloc, wbc,
+	return __block_write_full_page(inode, &folio->page,
+				       gfs2_get_block_noalloc, wbc,
 				       end_buffer_async_write);
 }
 
@@ -137,7 +138,7 @@ static int __gfs2_jdata_write_folio(struct folio *folio,
 		}
 		gfs2_trans_add_databufs(ip, folio, 0, folio_size(folio));
 	}
-	return gfs2_write_jdata_page(&folio->page, wbc);
+	return gfs2_write_jdata_folio(folio, wbc);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] buffer: Convert __block_write_full_page() to __block_write_full_folio()
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-05-17  3:24 ` [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio() Matthew Wilcox (Oracle)
@ 2023-05-17  3:24 ` Matthew Wilcox (Oracle)
       [not found]   ` <CGME20230517144703eucas1p1550db888e29fc5b182c202f24adddb87@eucas1p1.samsung.com>
  2023-05-17  3:24 ` [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

Remove nine hidden calls to compound_head() by using a folio instead
of a page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/buffer.c                 | 53 +++++++++++++++++++------------------
 fs/gfs2/aops.c              |  5 ++--
 fs/ntfs/aops.c              |  2 +-
 fs/reiserfs/inode.c         |  2 +-
 include/linux/buffer_head.h |  2 +-
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index a7fc561758b1..4d518df50fab 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1764,7 +1764,7 @@ static struct buffer_head *folio_create_buffers(struct folio *folio,
  * WB_SYNC_ALL, the writes are posted using REQ_SYNC; this
  * causes the writes to be flagged as synchronous writes.
  */
-int __block_write_full_page(struct inode *inode, struct page *page,
+int __block_write_full_folio(struct inode *inode, struct folio *folio,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler)
 {
@@ -1776,14 +1776,14 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	int nr_underway = 0;
 	blk_opf_t write_flags = wbc_to_write_flags(wbc);
 
-	head = folio_create_buffers(page_folio(page), inode,
+	head = folio_create_buffers(folio, inode,
 				    (1 << BH_Dirty) | (1 << BH_Uptodate));
 
 	/*
 	 * Be very careful.  We have no exclusion from block_dirty_folio
 	 * here, and the (potentially unmapped) buffers may become dirty at
 	 * any time.  If a buffer becomes dirty here after we've inspected it
-	 * then we just miss that fact, and the page stays dirty.
+	 * then we just miss that fact, and the folio stays dirty.
 	 *
 	 * Buffers outside i_size may be dirtied by block_dirty_folio;
 	 * handle that here by just cleaning them.
@@ -1793,7 +1793,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	blocksize = bh->b_size;
 	bbits = block_size_bits(blocksize);
 
-	block = (sector_t)page->index << (PAGE_SHIFT - bbits);
+	block = (sector_t)folio->index << (PAGE_SHIFT - bbits);
 	last_block = (i_size_read(inode) - 1) >> bbits;
 
 	/*
@@ -1804,7 +1804,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 		if (block > last_block) {
 			/*
 			 * mapped buffers outside i_size will occur, because
-			 * this page can be outside i_size when there is a
+			 * this folio can be outside i_size when there is a
 			 * truncate in progress.
 			 */
 			/*
@@ -1834,7 +1834,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 			continue;
 		/*
 		 * If it's a fully non-blocking write attempt and we cannot
-		 * lock the buffer then redirty the page.  Note that this can
+		 * lock the buffer then redirty the folio.  Note that this can
 		 * potentially cause a busy-wait loop from writeback threads
 		 * and kswapd activity, but those code paths have their own
 		 * higher-level throttling.
@@ -1842,7 +1842,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 		if (wbc->sync_mode != WB_SYNC_NONE) {
 			lock_buffer(bh);
 		} else if (!trylock_buffer(bh)) {
-			redirty_page_for_writepage(wbc, page);
+			folio_redirty_for_writepage(wbc, folio);
 			continue;
 		}
 		if (test_clear_buffer_dirty(bh)) {
@@ -1853,11 +1853,11 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	} while ((bh = bh->b_this_page) != head);
 
 	/*
-	 * The page and its buffers are protected by PageWriteback(), so we can
-	 * drop the bh refcounts early.
+	 * The folio and its buffers are protected by the writeback flag,
+	 * so we can drop the bh refcounts early.
 	 */
-	BUG_ON(PageWriteback(page));
-	set_page_writeback(page);
+	BUG_ON(folio_test_writeback(folio));
+	folio_start_writeback(folio);
 
 	do {
 		struct buffer_head *next = bh->b_this_page;
@@ -1867,20 +1867,20 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 		}
 		bh = next;
 	} while (bh != head);
-	unlock_page(page);
+	folio_unlock(folio);
 
 	err = 0;
 done:
 	if (nr_underway == 0) {
 		/*
-		 * The page was marked dirty, but the buffers were
+		 * The folio was marked dirty, but the buffers were
 		 * clean.  Someone wrote them back by hand with
 		 * write_dirty_buffer/submit_bh.  A rare case.
 		 */
-		end_page_writeback(page);
+		folio_end_writeback(folio);
 
 		/*
-		 * The page and buffer_heads can be released at any time from
+		 * The folio and buffer_heads can be released at any time from
 		 * here on.
 		 */
 	}
@@ -1891,7 +1891,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	 * ENOSPC, or some other error.  We may already have added some
 	 * blocks to the file, so we need to write these out to avoid
 	 * exposing stale data.
-	 * The page is currently locked and not marked for writeback
+	 * The folio is currently locked and not marked for writeback
 	 */
 	bh = head;
 	/* Recovery: lock and submit the mapped buffers */
@@ -1903,15 +1903,15 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 		} else {
 			/*
 			 * The buffer may have been set dirty during
-			 * attachment to a dirty page.
+			 * attachment to a dirty folio.
 			 */
 			clear_buffer_dirty(bh);
 		}
 	} while ((bh = bh->b_this_page) != head);
-	SetPageError(page);
-	BUG_ON(PageWriteback(page));
-	mapping_set_error(page->mapping, err);
-	set_page_writeback(page);
+	folio_set_error(folio);
+	BUG_ON(folio_test_writeback(folio));
+	mapping_set_error(folio->mapping, err);
+	folio_start_writeback(folio);
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
@@ -1921,10 +1921,10 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 		}
 		bh = next;
 	} while (bh != head);
-	unlock_page(page);
+	folio_unlock(folio);
 	goto done;
 }
-EXPORT_SYMBOL(__block_write_full_page);
+EXPORT_SYMBOL(__block_write_full_folio);
 
 /*
  * If a page has any new buffers, zero them out here, and mark them uptodate
@@ -2677,6 +2677,7 @@ EXPORT_SYMBOL(block_truncate_page);
 int block_write_full_page(struct page *page, get_block_t *get_block,
 			struct writeback_control *wbc)
 {
+	struct folio *folio = page_folio(page);
 	struct inode * const inode = page->mapping->host;
 	loff_t i_size = i_size_read(inode);
 	const pgoff_t end_index = i_size >> PAGE_SHIFT;
@@ -2684,13 +2685,13 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
 
 	/* Is the page fully inside i_size? */
 	if (page->index < end_index)
-		return __block_write_full_page(inode, page, get_block, wbc,
+		return __block_write_full_folio(inode, folio, get_block, wbc,
 					       end_buffer_async_write);
 
 	/* Is the page fully outside i_size? (truncate in progress) */
 	offset = i_size & (PAGE_SIZE-1);
 	if (page->index >= end_index+1 || !offset) {
-		unlock_page(page);
+		folio_unlock(folio);
 		return 0; /* don't care */
 	}
 
@@ -2702,7 +2703,7 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
 	 * writes to that region are not written out to the file."
 	 */
 	zero_user_segment(page, offset, PAGE_SIZE);
-	return __block_write_full_page(inode, page, get_block, wbc,
+	return __block_write_full_folio(inode, folio, get_block, wbc,
 							end_buffer_async_write);
 }
 EXPORT_SYMBOL(block_write_full_page);
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 0f92e3e117da..e97462a5302e 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -108,9 +108,8 @@ static int gfs2_write_jdata_folio(struct folio *folio,
 		folio_zero_segment(folio, offset_in_folio(folio, i_size),
 				folio_size(folio));
 
-	return __block_write_full_page(inode, &folio->page,
-				       gfs2_get_block_noalloc, wbc,
-				       end_buffer_async_write);
+	return __block_write_full_folio(inode, folio, gfs2_get_block_noalloc,
+			wbc, end_buffer_async_write);
 }
 
 /**
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index e8aeba124a95..4e158bce4192 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -526,7 +526,7 @@ static int ntfs_read_folio(struct file *file, struct folio *folio)
  *
  * Return 0 on success and -errno on error.
  *
- * Based on ntfs_read_block() and __block_write_full_page().
+ * Based on ntfs_read_block() and __block_write_full_folio().
  */
 static int ntfs_write_block(struct page *page, struct writeback_control *wbc)
 {
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index d8debbb6105f..ff34ee49106f 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2506,7 +2506,7 @@ static int map_block_for_writepage(struct inode *inode,
 
 /*
  * mason@suse.com: updated in 2.5.54 to follow the same general io
- * start/recovery path as __block_write_full_page, along with special
+ * start/recovery path as __block_write_full_folio, along with special
  * code to handle reiserfs tails.
  */
 static int reiserfs_write_full_page(struct page *page,
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 1520793c72da..a366e01f8bd4 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -263,7 +263,7 @@ extern int buffer_heads_over_limit;
 void block_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 int block_write_full_page(struct page *page, get_block_t *get_block,
 				struct writeback_control *wbc);
-int __block_write_full_page(struct inode *inode, struct page *page,
+int __block_write_full_folio(struct inode *inode, struct folio *folio,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler);
 int block_read_full_folio(struct folio *, get_block_t *);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-05-17  3:24 ` [PATCH 4/6] buffer: Convert __block_write_full_page() to __block_write_full_folio() Matthew Wilcox (Oracle)
@ 2023-05-17  3:24 ` Matthew Wilcox (Oracle)
  2023-05-23 12:46   ` Andreas Gruenbacher
  2023-05-17  3:24 ` [PATCH 6/6] buffer: Make block_write_full_page() handle large folios correctly Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

We may someday support folios larger than 4GB, so use a size_t for
the byte count within a folio to prevent unpleasant truncations.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/gfs2/aops.c | 2 +-
 fs/gfs2/aops.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index e97462a5302e..8da4397aafc6 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -38,7 +38,7 @@
 
 
 void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
-			     unsigned int from, unsigned int len)
+			     size_t from, size_t len)
 {
 	struct buffer_head *head = folio_buffers(folio);
 	unsigned int bsize = head->b_size;
diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
index 09db1914425e..f08322ef41cf 100644
--- a/fs/gfs2/aops.h
+++ b/fs/gfs2/aops.h
@@ -10,6 +10,6 @@
 
 extern void adjust_fs_space(struct inode *inode);
 extern void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
-				    unsigned int from, unsigned int len);
+				    size_t from, size_t len);
 
 #endif /* __AOPS_DOT_H__ */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] buffer: Make block_write_full_page() handle large folios correctly
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-05-17  3:24 ` [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs() Matthew Wilcox (Oracle)
@ 2023-05-17  3:24 ` Matthew Wilcox (Oracle)
  2023-05-17 18:06 ` [PATCH 0/6] gfs2/buffer folio changes Bob Peterson
  2023-06-02 12:30 ` Bob Peterson
  7 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-05-17  3:24 UTC (permalink / raw)
  To: linux-fsdevel, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Bob Peterson, Andreas Gruenbacher, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

Keep the interface as struct page, but work entirely on the folio
internally.  Removes several PAGE_SIZE assumptions and removes
some references to page->index and page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/buffer.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4d518df50fab..d8c2c000676b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2678,33 +2678,31 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
 			struct writeback_control *wbc)
 {
 	struct folio *folio = page_folio(page);
-	struct inode * const inode = page->mapping->host;
+	struct inode * const inode = folio->mapping->host;
 	loff_t i_size = i_size_read(inode);
-	const pgoff_t end_index = i_size >> PAGE_SHIFT;
-	unsigned offset;
 
-	/* Is the page fully inside i_size? */
-	if (page->index < end_index)
+	/* Is the folio fully inside i_size? */
+	if (folio_pos(folio) + folio_size(folio) <= i_size)
 		return __block_write_full_folio(inode, folio, get_block, wbc,
 					       end_buffer_async_write);
 
-	/* Is the page fully outside i_size? (truncate in progress) */
-	offset = i_size & (PAGE_SIZE-1);
-	if (page->index >= end_index+1 || !offset) {
+	/* Is the folio fully outside i_size? (truncate in progress) */
+	if (folio_pos(folio) > i_size) {
 		folio_unlock(folio);
 		return 0; /* don't care */
 	}
 
 	/*
-	 * The page straddles i_size.  It must be zeroed out on each and every
+	 * The folio straddles i_size.  It must be zeroed out on each and every
 	 * writepage invocation because it may be mmapped.  "A file is mapped
 	 * in multiples of the page size.  For a file that is not a multiple of
-	 * the  page size, the remaining memory is zeroed when mapped, and
+	 * the page size, the remaining memory is zeroed when mapped, and
 	 * writes to that region are not written out to the file."
 	 */
-	zero_user_segment(page, offset, PAGE_SIZE);
+	folio_zero_segment(folio, offset_in_folio(folio, i_size),
+			folio_size(folio));
 	return __block_write_full_folio(inode, folio, get_block, wbc,
-							end_buffer_async_write);
+			end_buffer_async_write);
 }
 EXPORT_SYMBOL(block_write_full_page);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] buffer: Convert __block_write_full_page() to __block_write_full_folio()
       [not found]   ` <CGME20230517144703eucas1p1550db888e29fc5b182c202f24adddb87@eucas1p1.samsung.com>
@ 2023-05-17 14:47     ` Pankaj Raghav
  2023-05-17 15:43       ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Pankaj Raghav @ 2023-05-17 14:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, Andreas Gruenbacher,
	cluster-devel, Hannes Reinecke, Luis Chamberlain, p.raghav

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

> @@ -1793,7 +1793,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>  	blocksize = bh->b_size;
>  	bbits = block_size_bits(blocksize);
>  
> -	block = (sector_t)page->index << (PAGE_SHIFT - bbits);
> +	block = (sector_t)folio->index << (PAGE_SHIFT - bbits);

Shouldn't the PAGE_SHIFT be folio_shift(folio) as you allow larger
folios to be passed to this function in the later patches?

>  	last_block = (i_size_read(inode) - 1) >> bbits;
>  

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] buffer: Convert __block_write_full_page() to __block_write_full_folio()
  2023-05-17 14:47     ` Pankaj Raghav
@ 2023-05-17 15:43       ` Matthew Wilcox
  0 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2023-05-17 15:43 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, Andreas Gruenbacher,
	cluster-devel, Hannes Reinecke, Luis Chamberlain

On Wed, May 17, 2023 at 04:47:01PM +0200, Pankaj Raghav wrote:
> > @@ -1793,7 +1793,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
> >  	blocksize = bh->b_size;
> >  	bbits = block_size_bits(blocksize);
> >  
> > -	block = (sector_t)page->index << (PAGE_SHIFT - bbits);
> > +	block = (sector_t)folio->index << (PAGE_SHIFT - bbits);
> 
> Shouldn't the PAGE_SHIFT be folio_shift(folio) as you allow larger
> folios to be passed to this function in the later patches?

No, the folio->index is expressed in multiples of PAGE_SIZE.

> >  	last_block = (i_size_read(inode) - 1) >> bbits;
> >  



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] gfs2/buffer folio changes
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2023-05-17  3:24 ` [PATCH 6/6] buffer: Make block_write_full_page() handle large folios correctly Matthew Wilcox (Oracle)
@ 2023-05-17 18:06 ` Bob Peterson
  2023-06-02 12:30 ` Bob Peterson
  7 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2023-05-17 18:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-fsdevel, Andrew Morton
  Cc: Andreas Gruenbacher, cluster-devel, Hannes Reinecke, Luis Chamberlain

On 5/16/23 10:24 PM, Matthew Wilcox (Oracle) wrote:
> This kind of started off as a gfs2 patch series, then became entwined
> with buffer heads once I realised that gfs2 was the only remaining
> caller of __block_write_full_page().  For those not in the gfs2 world,
> the big point of this series is that block_write_full_page() should now
> handle large folios correctly.
> 
> It probably makes most sense to take this through Andrew's tree, once
> enough people have signed off on it?
> 
> Matthew Wilcox (Oracle) (6):
>    gfs2: Use a folio inside gfs2_jdata_writepage()
>    gfs2: Pass a folio to __gfs2_jdata_write_folio()
>    gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
>    buffer: Convert __block_write_full_page() to
>      __block_write_full_folio()
>    gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
>    buffer: Make block_write_full_page() handle large folios correctly
> 
>   fs/buffer.c                 | 75 ++++++++++++++++++-------------------
>   fs/gfs2/aops.c              | 66 ++++++++++++++++----------------
>   fs/gfs2/aops.h              |  2 +-
>   fs/ntfs/aops.c              |  2 +-
>   fs/reiserfs/inode.c         |  2 +-
>   include/linux/buffer_head.h |  2 +-
>   6 files changed, 75 insertions(+), 74 deletions(-)
> 
Hi Matthew,

I recently started looking into doing this, too, so this is apropos.
I'll give it a careful review and some testing. The jdata stuff in gfs2 
is very touchy, but this looks like a step in the right direction.
I'll let you know how it fares.

Regards,

Bob Peterson


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
  2023-05-17  3:24 ` [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs() Matthew Wilcox (Oracle)
@ 2023-05-23 12:46   ` Andreas Gruenbacher
  2023-05-23 13:37     ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Gruenbacher @ 2023-05-23 12:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

On Wed, May 17, 2023 at 5:24 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> We may someday support folios larger than 4GB, so use a size_t for
> the byte count within a folio to prevent unpleasant truncations.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/gfs2/aops.c | 2 +-
>  fs/gfs2/aops.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index e97462a5302e..8da4397aafc6 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -38,7 +38,7 @@
>
>
>  void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
> -                            unsigned int from, unsigned int len)
> +                            size_t from, size_t len)
>  {
>         struct buffer_head *head = folio_buffers(folio);
>         unsigned int bsize = head->b_size;

This only makes sense if the to, start, and end variables in
gfs2_trans_add_databufs() are changed from unsigned int to size_t as
well.

> diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
> index 09db1914425e..f08322ef41cf 100644
> --- a/fs/gfs2/aops.h
> +++ b/fs/gfs2/aops.h
> @@ -10,6 +10,6 @@
>
>  extern void adjust_fs_space(struct inode *inode);
>  extern void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
> -                                   unsigned int from, unsigned int len);
> +                                   size_t from, size_t len);
>
>  #endif /* __AOPS_DOT_H__ */
> --
> 2.39.2
>

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
  2023-05-23 12:46   ` Andreas Gruenbacher
@ 2023-05-23 13:37     ` Matthew Wilcox
  2023-06-03  9:46       ` Andreas Gruenbacher
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2023-05-23 13:37 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

On Tue, May 23, 2023 at 02:46:07PM +0200, Andreas Gruenbacher wrote:
> >  void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
> > -                            unsigned int from, unsigned int len)
> > +                            size_t from, size_t len)
> >  {
> >         struct buffer_head *head = folio_buffers(folio);
> >         unsigned int bsize = head->b_size;
> 
> This only makes sense if the to, start, and end variables in
> gfs2_trans_add_databufs() are changed from unsigned int to size_t as
> well.

The history of this patch is that I started doing conversions from page
-> folio in gfs2, then you came out with a very similar series.  This
patch is the remainder after rebasing my patches on yours.  So we can
either drop this patch or just apply it.  I wasn't making a concerted
effort to make gfs2 support 4GB+ sized folios, it's just part of the
conversion that I do.

> >  extern void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
> > -                                   unsigned int from, unsigned int len);
> > +                                   size_t from, size_t len);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] gfs2/buffer folio changes
  2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2023-05-17 18:06 ` [PATCH 0/6] gfs2/buffer folio changes Bob Peterson
@ 2023-06-02 12:30 ` Bob Peterson
  2023-06-02 13:25   ` Bob Peterson
  7 siblings, 1 reply; 18+ messages in thread
From: Bob Peterson @ 2023-06-02 12:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-fsdevel, Andrew Morton
  Cc: Andreas Gruenbacher, cluster-devel, Hannes Reinecke, Luis Chamberlain

On 5/16/23 10:24 PM, Matthew Wilcox (Oracle) wrote:
> This kind of started off as a gfs2 patch series, then became entwined
> with buffer heads once I realised that gfs2 was the only remaining
> caller of __block_write_full_page().  For those not in the gfs2 world,
> the big point of this series is that block_write_full_page() should now
> handle large folios correctly.
> 
> It probably makes most sense to take this through Andrew's tree, once
> enough people have signed off on it?
> 
> Matthew Wilcox (Oracle) (6):
>    gfs2: Use a folio inside gfs2_jdata_writepage()
>    gfs2: Pass a folio to __gfs2_jdata_write_folio()
>    gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
>    buffer: Convert __block_write_full_page() to
>      __block_write_full_folio()
>    gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
>    buffer: Make block_write_full_page() handle large folios correctly
> 
>   fs/buffer.c                 | 75 ++++++++++++++++++-------------------
>   fs/gfs2/aops.c              | 66 ++++++++++++++++----------------
>   fs/gfs2/aops.h              |  2 +-
>   fs/ntfs/aops.c              |  2 +-
>   fs/reiserfs/inode.c         |  2 +-
>   include/linux/buffer_head.h |  2 +-
>   6 files changed, 75 insertions(+), 74 deletions(-)
> 
Hi Willy,

I did some fundamental testing with this patch set in a five-node 
cluster, as well as xfstests, and it seemed to work properly. The 
testing was somewhat limited, but it passed basic cluster coherency 
testing. Sorry it took so long.

If you want you can add:
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>

Regards,

Bob Peterson


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] gfs2/buffer folio changes
  2023-06-02 12:30 ` Bob Peterson
@ 2023-06-02 13:25   ` Bob Peterson
  0 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2023-06-02 13:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-fsdevel, Andrew Morton
  Cc: Andreas Gruenbacher, cluster-devel, Hannes Reinecke, Luis Chamberlain

On 6/2/23 7:30 AM, Bob Peterson wrote:
> On 5/16/23 10:24 PM, Matthew Wilcox (Oracle) wrote:
>> This kind of started off as a gfs2 patch series, then became entwined
>> with buffer heads once I realised that gfs2 was the only remaining
>> caller of __block_write_full_page().  For those not in the gfs2 world,
>> the big point of this series is that block_write_full_page() should now
>> handle large folios correctly.
>>
>> It probably makes most sense to take this through Andrew's tree, once
>> enough people have signed off on it?
> Hi Willy,
> 
> I did some fundamental testing with this patch set in a five-node 
> cluster, as well as xfstests, and it seemed to work properly. The 
> testing was somewhat limited, but it passed basic cluster coherency 
> testing. Sorry it took so long.
> 
> If you want you can add:
> Tested-by: Bob Peterson <rpeterso@redhat.com>
> Reviewed-by: Bob Peterson <rpeterso@redhat.com>
> 
> Regards,
> 
> Bob Peterson

I was talking with Andreas G and he still has some concerns, so don't 
push this to a repo quite yet.

Regards,

Bob Peterson



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
  2023-05-17  3:24 ` [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio() Matthew Wilcox (Oracle)
@ 2023-06-03  9:34   ` Andreas Gruenbacher
  2023-06-04  3:35     ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Gruenbacher @ 2023-06-03  9:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

Hi Willy,

thanks for these patches. This particular one looks problematic:

On Wed, May 17, 2023 at 5:24 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> This function now supports large folios, even if nothing around it does.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/gfs2/aops.c | 27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 749135252d52..0f92e3e117da 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -82,33 +82,34 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
>  }
>
>  /**
> - * gfs2_write_jdata_page - gfs2 jdata-specific version of block_write_full_page
> - * @page: The page to write
> + * gfs2_write_jdata_folio - gfs2 jdata-specific version of block_write_full_page
> + * @folio: The folio to write
>   * @wbc: The writeback control
>   *
>   * This is the same as calling block_write_full_page, but it also
>   * writes pages outside of i_size
>   */
> -static int gfs2_write_jdata_page(struct page *page,
> +static int gfs2_write_jdata_folio(struct folio *folio,
>                                  struct writeback_control *wbc)
>  {
> -       struct inode * const inode = page->mapping->host;
> +       struct inode * const inode = folio->mapping->host;
>         loff_t i_size = i_size_read(inode);
> -       const pgoff_t end_index = i_size >> PAGE_SHIFT;
> -       unsigned offset;
>
> +       if (folio_pos(folio) >= i_size)
> +               return 0;

Function gfs2_write_jdata_page was originally introduced as
gfs2_write_full_page in commit fd4c5748b8d3 ("gfs2: writeout truncated
pages") to allow writing pages even when they are beyond EOF, as the
function description documents.

This hack was added because simply skipping journaled pages isn't
enough on gfs2; before a journaled page can be freed, it needs to be
marked as "revoked" in the journal. Journal recovery will then skip
the revoked blocks, which allows them to be reused for regular,
non-journaled data. We can end up here in contexts in which we cannot
"revoke" pages, so instead, we write the original pages even when they
are beyond EOF. This hack could be revisited, but it's pretty nasty
code to pick apart.

So at least the above if needs to go for now.

>         /*
> -        * The page straddles i_size.  It must be zeroed out on each and every
> +        * The folio straddles i_size.  It must be zeroed out on each and every
>          * writepage invocation because it may be mmapped.  "A file is mapped
>          * in multiples of the page size.  For a file that is not a multiple of
> -        * the  page size, the remaining memory is zeroed when mapped, and
> +        * the page size, the remaining memory is zeroed when mapped, and
>          * writes to that region are not written out to the file."
>          */
> -       offset = i_size & (PAGE_SIZE - 1);
> -       if (page->index == end_index && offset)
> -               zero_user_segment(page, offset, PAGE_SIZE);
> +       if (i_size < folio_pos(folio) + folio_size(folio))
> +               folio_zero_segment(folio, offset_in_folio(folio, i_size),
> +                               folio_size(folio));
>
> -       return __block_write_full_page(inode, page, gfs2_get_block_noalloc, wbc,
> +       return __block_write_full_page(inode, &folio->page,
> +                                      gfs2_get_block_noalloc, wbc,
>                                        end_buffer_async_write);
>  }
>
> @@ -137,7 +138,7 @@ static int __gfs2_jdata_write_folio(struct folio *folio,
>                 }
>                 gfs2_trans_add_databufs(ip, folio, 0, folio_size(folio));
>         }
> -       return gfs2_write_jdata_page(&folio->page, wbc);
> +       return gfs2_write_jdata_folio(folio, wbc);
>  }
>
>  /**
> --
> 2.39.2
>

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs()
  2023-05-23 13:37     ` Matthew Wilcox
@ 2023-06-03  9:46       ` Andreas Gruenbacher
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2023-06-03  9:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

On Tue, May 23, 2023 at 3:37 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, May 23, 2023 at 02:46:07PM +0200, Andreas Gruenbacher wrote:
> > >  void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
> > > -                            unsigned int from, unsigned int len)
> > > +                            size_t from, size_t len)
> > >  {
> > >         struct buffer_head *head = folio_buffers(folio);
> > >         unsigned int bsize = head->b_size;
> >
> > This only makes sense if the to, start, and end variables in
> > gfs2_trans_add_databufs() are changed from unsigned int to size_t as
> > well.
>
> The history of this patch is that I started doing conversions from page
> -> folio in gfs2, then you came out with a very similar series.  This
> patch is the remainder after rebasing my patches on yours.  So we can
> either drop this patch or just apply it.  I wasn't making a concerted
> effort to make gfs2 support 4GB+ sized folios, it's just part of the
> conversion that I do.

Right. What do we do with these patches now, though? We probably don't
want to put them in the gfs2 tree given the buffer.c changes. Shall I
post a revised version? Will you?

> > >  extern void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
> > > -                                   unsigned int from, unsigned int len);
> > > +                                   size_t from, size_t len);

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
  2023-06-03  9:34   ` Andreas Gruenbacher
@ 2023-06-04  3:35     ` Matthew Wilcox
  2023-06-04  7:27       ` Andreas Gruenbacher
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2023-06-04  3:35 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

On Sat, Jun 03, 2023 at 11:34:14AM +0200, Andreas Gruenbacher wrote:
> >   * This is the same as calling block_write_full_page, but it also
> >   * writes pages outside of i_size
> >   */
> > -static int gfs2_write_jdata_page(struct page *page,
> > +static int gfs2_write_jdata_folio(struct folio *folio,
> >                                  struct writeback_control *wbc)
> >  {
> > -       struct inode * const inode = page->mapping->host;
> > +       struct inode * const inode = folio->mapping->host;
> >         loff_t i_size = i_size_read(inode);
> > -       const pgoff_t end_index = i_size >> PAGE_SHIFT;
> > -       unsigned offset;
> >
> > +       if (folio_pos(folio) >= i_size)
> > +               return 0;
> 
> Function gfs2_write_jdata_page was originally introduced as
> gfs2_write_full_page in commit fd4c5748b8d3 ("gfs2: writeout truncated
> pages") to allow writing pages even when they are beyond EOF, as the
> function description documents.

Well, that was stupid of me.

> This hack was added because simply skipping journaled pages isn't
> enough on gfs2; before a journaled page can be freed, it needs to be
> marked as "revoked" in the journal. Journal recovery will then skip
> the revoked blocks, which allows them to be reused for regular,
> non-journaled data. We can end up here in contexts in which we cannot
> "revoke" pages, so instead, we write the original pages even when they
> are beyond EOF. This hack could be revisited, but it's pretty nasty
> code to pick apart.
> 
> So at least the above if needs to go for now.

Understood.  So we probably don't want to waste time zeroing the folio
if it is entirely beyond i_size, right?  Because at the moment we'd
zero some essentially random part of the folio if I just take out the
check.  Should it look like this?

        if (folio_pos(folio) < i_size &&
            i_size < folio_pos(folio) + folio_size(folio))
               folio_zero_segment(folio, offset_in_folio(folio, i_size),
                                folio_size(folio));


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
  2023-06-04  3:35     ` Matthew Wilcox
@ 2023-06-04  7:27       ` Andreas Gruenbacher
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2023-06-04  7:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-fsdevel, Andrew Morton, Bob Peterson, cluster-devel,
	Hannes Reinecke, Luis Chamberlain

On Sun, Jun 4, 2023 at 5:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Jun 03, 2023 at 11:34:14AM +0200, Andreas Gruenbacher wrote:
> > >   * This is the same as calling block_write_full_page, but it also
> > >   * writes pages outside of i_size
> > >   */
> > > -static int gfs2_write_jdata_page(struct page *page,
> > > +static int gfs2_write_jdata_folio(struct folio *folio,
> > >                                  struct writeback_control *wbc)
> > >  {
> > > -       struct inode * const inode = page->mapping->host;
> > > +       struct inode * const inode = folio->mapping->host;
> > >         loff_t i_size = i_size_read(inode);
> > > -       const pgoff_t end_index = i_size >> PAGE_SHIFT;
> > > -       unsigned offset;
> > >
> > > +       if (folio_pos(folio) >= i_size)
> > > +               return 0;
> >
> > Function gfs2_write_jdata_page was originally introduced as
> > gfs2_write_full_page in commit fd4c5748b8d3 ("gfs2: writeout truncated
> > pages") to allow writing pages even when they are beyond EOF, as the
> > function description documents.
>
> Well, that was stupid of me.
>
> > This hack was added because simply skipping journaled pages isn't
> > enough on gfs2; before a journaled page can be freed, it needs to be
> > marked as "revoked" in the journal. Journal recovery will then skip
> > the revoked blocks, which allows them to be reused for regular,
> > non-journaled data. We can end up here in contexts in which we cannot
> > "revoke" pages, so instead, we write the original pages even when they
> > are beyond EOF. This hack could be revisited, but it's pretty nasty
> > code to pick apart.
> >
> > So at least the above if needs to go for now.
>
> Understood.  So we probably don't want to waste time zeroing the folio
> if it is entirely beyond i_size, right?  Because at the moment we'd
> zero some essentially random part of the folio if I just take out the
> check.  Should it look like this?
>
>         if (folio_pos(folio) < i_size &&
>             i_size < folio_pos(folio) + folio_size(folio))
>                folio_zero_segment(folio, offset_in_folio(folio, i_size),
>                                 folio_size(folio));

Yes, looking good, thanks.

If you haven't already, could you please consider my other comment as
well before you repost?

https://lore.kernel.org/linux-fsdevel/CAHc6FU6GowpTfX-MgRiqqwZZJ0r-85C9exc2pNkBkySCGUT0FA@mail.gmail.com/

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-06-04  7:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-17  3:24 [PATCH 0/6] gfs2/buffer folio changes Matthew Wilcox (Oracle)
2023-05-17  3:24 ` [PATCH 1/6] gfs2: Use a folio inside gfs2_jdata_writepage() Matthew Wilcox (Oracle)
2023-05-17  3:24 ` [PATCH 2/6] gfs2: Pass a folio to __gfs2_jdata_write_folio() Matthew Wilcox (Oracle)
2023-05-17  3:24 ` [PATCH 3/6] gfs2: Convert gfs2_write_jdata_page() to gfs2_write_jdata_folio() Matthew Wilcox (Oracle)
2023-06-03  9:34   ` Andreas Gruenbacher
2023-06-04  3:35     ` Matthew Wilcox
2023-06-04  7:27       ` Andreas Gruenbacher
2023-05-17  3:24 ` [PATCH 4/6] buffer: Convert __block_write_full_page() to __block_write_full_folio() Matthew Wilcox (Oracle)
     [not found]   ` <CGME20230517144703eucas1p1550db888e29fc5b182c202f24adddb87@eucas1p1.samsung.com>
2023-05-17 14:47     ` Pankaj Raghav
2023-05-17 15:43       ` Matthew Wilcox
2023-05-17  3:24 ` [PATCH 5/6] gfs2: Support ludicrously large folios in gfs2_trans_add_databufs() Matthew Wilcox (Oracle)
2023-05-23 12:46   ` Andreas Gruenbacher
2023-05-23 13:37     ` Matthew Wilcox
2023-06-03  9:46       ` Andreas Gruenbacher
2023-05-17  3:24 ` [PATCH 6/6] buffer: Make block_write_full_page() handle large folios correctly Matthew Wilcox (Oracle)
2023-05-17 18:06 ` [PATCH 0/6] gfs2/buffer folio changes Bob Peterson
2023-06-02 12:30 ` Bob Peterson
2023-06-02 13:25   ` Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).