Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/14] Transparent Huge Page support for XFS
@ 2020-10-14  3:03 Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range Matthew Wilcox (Oracle)
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

This patchset includes all the filesystem and iomap changes needed to
support transparent huge pages.  A separate patchset will enable the MM
side of things.  I'm hoping Darrick agrees to take the first two patches
through the iomap tree,

These patches are for review.  They don't currently apply to the iomap
tree due to some conflicting patches.  I'll rebase on top of -rc1 once
it's out (they do apply to 5.9).  

Matthew Wilcox (Oracle) (14):
  fs: Support THPs in vfs_dedupe_file_range
  fs: Make page_mkwrite_check_truncate thp-aware
  iomap: Support THPs in BIO completion path
  iomap: Support THPs in iomap_adjust_read_range
  iomap: Support THPs in invalidatepage
  iomap: Support THPs in iomap_is_partially_uptodate
  iomap: Support THPs in readpage
  iomap: Support THPs in readahead
  iomap: Change iomap_write_begin calling convention
  iomap: Handle THPs when writing to pages
  iomap: Support THP writeback
  iomap: Inline data shouldn't see THPs
  iomap: Handle tail pages in iomap_page_mkwrite
  xfs: Support THPs

 fs/iomap/buffered-io.c  | 307 +++++++++++++++++++++++++++-------------
 fs/read_write.c         |   8 +-
 fs/xfs/xfs_aops.c       |   2 +-
 fs/xfs/xfs_super.c      |   2 +-
 include/linux/pagemap.h |  10 +-
 5 files changed, 219 insertions(+), 110 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14 16:12   ` Darrick J. Wong
  2020-10-14  3:03 ` [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

We may get tail pages returned from vfs_dedupe_get_page().  If we do,
we have to call page_mapping() instead of dereferencing page->mapping
directly.  We may also deadlock trying to lock the page twice if they're
subpages of the same THP, so compare the head pages instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/read_write.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 19f5c4bf75aa..ead675fef582 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1604,6 +1604,8 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
  */
 static void vfs_lock_two_pages(struct page *page1, struct page *page2)
 {
+	page1 = thp_head(page1);
+	page2 = thp_head(page2);
 	/* Always lock in order of increasing index. */
 	if (page1->index > page2->index)
 		swap(page1, page2);
@@ -1616,6 +1618,8 @@ static void vfs_lock_two_pages(struct page *page1, struct page *page2)
 /* Unlock two pages, being careful not to unlock the same page twice. */
 static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
 {
+	page1 = thp_head(page1);
+	page2 = thp_head(page2);
 	unlock_page(page1);
 	if (page1 != page2)
 		unlock_page(page2);
@@ -1670,8 +1674,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 		 * someone is invalidating pages on us and we lose.
 		 */
 		if (!PageUptodate(src_page) || !PageUptodate(dest_page) ||
-		    src_page->mapping != src->i_mapping ||
-		    dest_page->mapping != dest->i_mapping) {
+		    page_mapping(src_page) != src->i_mapping ||
+		    page_mapping(dest_page) != dest->i_mapping) {
 			same = false;
 			goto unlock;
 		}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14 16:17   ` Darrick J. Wong
  2020-10-14  3:03 ` [PATCH 03/14] iomap: Support THPs in BIO completion path Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

If the page is compound, check the last index in the page and return
the appropriate size.  Change the return type to ssize_t in case we ever
support pages larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7dc6bf713d27..5083a0efafa8 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -983,22 +983,22 @@ static inline unsigned long dir_pages(struct inode *inode)
  * @page: the page to check
  * @inode: the inode to check the page against
  *
- * Returns the number of bytes in the page up to EOF,
+ * Return: The number of bytes in the page up to EOF,
  * or -EFAULT if the page was truncated.
  */
-static inline int page_mkwrite_check_truncate(struct page *page,
+static inline ssize_t page_mkwrite_check_truncate(struct page *page,
 					      struct inode *inode)
 {
 	loff_t size = i_size_read(inode);
 	pgoff_t index = size >> PAGE_SHIFT;
-	int offset = offset_in_page(size);
+	unsigned long offset = offset_in_thp(page, size);
 
 	if (page->mapping != inode->i_mapping)
 		return -EFAULT;
 
 	/* page is wholly inside EOF */
-	if (page->index < index)
-		return PAGE_SIZE;
+	if (page->index + thp_nr_pages(page) - 1 < index)
+		return thp_size(page);
 	/* page is wholly past EOF */
 	if (page->index > index || !offset)
 		return -EFAULT;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 03/14] iomap: Support THPs in BIO completion path
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-15  9:50   ` Christoph Hellwig
  2020-10-14  3:03 ` [PATCH 04/14] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

bio_for_each_segment_all() iterates once per regular sized page.
Use bio_for_each_bvec_all() to iterate once per bvec and handle
merged THPs ourselves, instead of teaching the block layer about THPs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 62 ++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3e1eb40a73fd..935468d79d9d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -167,32 +167,45 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 		SetPageUptodate(page);
 }
 
-static void
-iomap_read_page_end_io(struct bio_vec *bvec, int error)
+static void iomap_finish_page_read(struct page *page, size_t offset,
+		size_t length, int error)
 {
-	struct page *page = bvec->bv_page;
 	struct iomap_page *iop = to_iomap_page(page);
 
 	if (unlikely(error)) {
 		ClearPageUptodate(page);
 		SetPageError(page);
 	} else {
-		iomap_set_range_uptodate(page, bvec->bv_offset, bvec->bv_len);
+		iomap_set_range_uptodate(page, offset, length);
 	}
 
-	if (!iop || atomic_sub_and_test(bvec->bv_len, &iop->read_bytes_pending))
+	if (!iop || atomic_sub_and_test(length, &iop->read_bytes_pending))
 		unlock_page(page);
 }
 
-static void
-iomap_read_end_io(struct bio *bio)
+static void iomap_finish_bvec_read(struct page *page, size_t offset,
+		size_t length, int error)
+{
+	while (length > 0) {
+		size_t count = min(thp_size(page) - offset, length);
+
+		iomap_finish_page_read(page, offset, count, error);
+
+		page += (offset + count) / PAGE_SIZE;
+		offset = 0;
+		length -= count;
+	}
+}
+
+static void iomap_read_end_io(struct bio *bio)
 {
-	int error = blk_status_to_errno(bio->bi_status);
+	int i, error = blk_status_to_errno(bio->bi_status);
 	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
 
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		iomap_read_page_end_io(bvec, error);
+	bio_for_each_bvec_all(bvec, bio, i)
+		iomap_finish_bvec_read(bvec->bv_page, bvec->bv_offset,
+				bvec->bv_len, error);
+
 	bio_put(bio);
 }
 
@@ -1035,9 +1048,8 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 }
 EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
-static void
-iomap_finish_page_writeback(struct inode *inode, struct page *page,
-		int error, unsigned int len)
+static void iomap_finish_page_write(struct inode *inode, struct page *page,
+		unsigned int len, int error)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 
@@ -1053,6 +1065,20 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
 		end_page_writeback(page);
 }
 
+static void iomap_finish_bvec_write(struct inode *inode, struct page *page,
+		size_t offset, size_t length, int error)
+{
+	while (length > 0) {
+		size_t count = min(thp_size(page) - offset, length);
+
+		iomap_finish_page_write(inode, page, count, error);
+
+		page += (offset + count) / PAGE_SIZE;
+		offset = 0;
+		length -= count;
+	}
+}
+
 /*
  * We're now finished for good with this ioend structure.  Update the page
  * state, release holds on bios, and finally free up memory.  Do not use the
@@ -1070,7 +1096,7 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 
 	for (bio = &ioend->io_inline_bio; bio; bio = next) {
 		struct bio_vec *bv;
-		struct bvec_iter_all iter_all;
+		int i;
 
 		/*
 		 * For the last bio, bi_private points to the ioend, so we
@@ -1082,9 +1108,9 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bv, bio, iter_all)
-			iomap_finish_page_writeback(inode, bv->bv_page, error,
-					bv->bv_len);
+		bio_for_each_bvec_all(bv, bio, i)
+			iomap_finish_bvec_write(inode, bv->bv_page,
+					bv->bv_offset, bv->bv_len, error);
 		bio_put(bio);
 	}
 	/* The ioend has been freed by bio_put() */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 04/14] iomap: Support THPs in iomap_adjust_read_range
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 03/14] iomap: Support THPs in BIO completion path Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-15  9:50   ` Christoph Hellwig
  2020-10-14  3:03 ` [PATCH 05/14] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

Pass the struct page instead of the iomap_page so we can determine the
size of the page.  Use offset_in_thp() instead of offset_in_page() and
use thp_size() instead of PAGE_SIZE.  Convert the arguments to be size_t
instead of unsigned int, in case pages ever get larger than 2^31 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 935468d79d9d..95ac66731297 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -82,16 +82,16 @@ iomap_page_release(struct page *page)
 /*
  * Calculate the range inside the page that we actually need to read.
  */
-static void
-iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
-		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
+static void iomap_adjust_read_range(struct inode *inode, struct page *page,
+		loff_t *pos, loff_t length, size_t *offp, size_t *lenp)
 {
+	struct iomap_page *iop = to_iomap_page(page);
 	loff_t orig_pos = *pos;
 	loff_t isize = i_size_read(inode);
 	unsigned block_bits = inode->i_blkbits;
 	unsigned block_size = (1 << block_bits);
-	unsigned poff = offset_in_page(*pos);
-	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+	size_t poff = offset_in_thp(page, *pos);
+	size_t plen = min_t(loff_t, thp_size(page) - poff, length);
 	unsigned first = poff >> block_bits;
 	unsigned last = (poff + plen - 1) >> block_bits;
 
@@ -129,7 +129,7 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 	 * page cache for blocks that are entirely outside of i_size.
 	 */
 	if (orig_pos <= isize && orig_pos + length > isize) {
-		unsigned end = offset_in_page(isize - 1) >> block_bits;
+		unsigned end = offset_in_thp(page, isize - 1) >> block_bits;
 
 		if (first <= end && last > end)
 			plen -= (last - end) * block_size;
@@ -253,7 +253,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	bool same_page = false, is_contig = false;
 	loff_t orig_pos = pos;
-	unsigned poff, plen;
+	size_t poff, plen;
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
@@ -263,7 +263,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	}
 
 	/* zero post-eof blocks as the page may be mapped */
-	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
+	iomap_adjust_read_range(inode, page, &pos, length, &poff, &plen);
 	if (plen == 0)
 		goto done;
 
@@ -561,18 +561,19 @@ static int
 __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		struct page *page, struct iomap *srcmap)
 {
-	struct iomap_page *iop = iomap_page_create(inode, page);
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
+	unsigned from = offset_in_page(pos), to = from + len;
+	size_t poff, plen;
 
 	if (PageUptodate(page))
 		return 0;
 	ClearPageError(page);
+	iomap_page_create(inode, page);
 
 	do {
-		iomap_adjust_read_range(inode, iop, &block_start,
+		iomap_adjust_read_range(inode, page, &block_start,
 				block_end - block_start, &poff, &plen);
 		if (plen == 0)
 			break;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 05/14] iomap: Support THPs in invalidatepage
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 04/14] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14 16:33   ` Darrick J. Wong
  2020-10-14  3:03 ` [PATCH 06/14] iomap: Support THPs in iomap_is_partially_uptodate Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

If we're punching a hole in a THP, we need to remove the per-page
iomap data as the THP is about to be split and each page will need
its own.  This means that writepage can now come across a page with
no iop allocated, so remove the assertion that there is already one,
and just create one (with the uptodate bits set) if there isn't one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 95ac66731297..4633ebd03a3f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -60,6 +60,8 @@ iomap_page_create(struct inode *inode, struct page *page)
 	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
 			GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
+	if (PageUptodate(page))
+		bitmap_fill(iop->uptodate, nr_blocks);
 	attach_page_private(page, iop);
 	return iop;
 }
@@ -494,10 +496,14 @@ iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
 	 * If we are invalidating the entire page, clear the dirty state from it
 	 * and release it to avoid unnecessary buildup of the LRU.
 	 */
-	if (offset == 0 && len == PAGE_SIZE) {
+	if (offset == 0 && len == thp_size(page)) {
 		WARN_ON_ONCE(PageWriteback(page));
 		cancel_dirty_page(page);
 		iomap_page_release(page);
+	} else if (PageTransHuge(page)) {
+		/* Punching a hole in a THP requires releasing the iop */
+		WARN_ON_ONCE(!PageUptodate(page) && PageDirty(page));
+		iomap_page_release(page);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_invalidatepage);
@@ -1363,14 +1369,13 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct inode *inode,
 		struct page *page, u64 end_offset)
 {
-	struct iomap_page *iop = to_iomap_page(page);
+	struct iomap_page *iop = iomap_page_create(inode, page);
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
 	u64 file_offset; /* file offset of page */
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
 
 	/*
@@ -1415,7 +1420,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 			 */
 			if (wpc->ops->discard_page)
 				wpc->ops->discard_page(page);
-			ClearPageUptodate(page);
 			unlock_page(page);
 			goto done;
 		}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 06/14] iomap: Support THPs in iomap_is_partially_uptodate
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 05/14] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 07/14] iomap: Support THPs in readpage Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

Factor iomap_range_uptodate() out of iomap_is_partially_uptodate() to use
by iomap_readpage() later.  iomap_is_partially_uptodate() can be called
on a tail page by generic_file_buffered_read(), so handle that correctly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 43 ++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4633ebd03a3f..4ea6c601a183 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -141,6 +141,24 @@ static void iomap_adjust_read_range(struct inode *inode, struct page *page,
 	*lenp = plen;
 }
 
+static bool iomap_range_uptodate(struct inode *inode, struct page *page,
+		size_t start, size_t len)
+{
+	struct iomap_page *iop = to_iomap_page(page);
+	size_t first = start >> inode->i_blkbits;
+	size_t last = (start + len - 1) >> inode->i_blkbits;
+	size_t i;
+
+	VM_BUG_ON_PGFLAGS(!PageLocked(page), page);
+	if (!iop)
+		return false;
+
+	for (i = first; i <= last; i++)
+		if (!test_bit(i, iop->uptodate))
+			return false;
+	return true;
+}
+
 static void
 iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 {
@@ -446,26 +464,15 @@ int
 iomap_is_partially_uptodate(struct page *page, unsigned long from,
 		unsigned long count)
 {
-	struct iomap_page *iop = to_iomap_page(page);
-	struct inode *inode = page->mapping->host;
-	unsigned len, first, last;
-	unsigned i;
-
-	/* Limit range to one page */
-	len = min_t(unsigned, PAGE_SIZE - from, count);
+	struct page *head = thp_head(page);
+	size_t len;
 
-	/* First and last blocks in range within page */
-	first = from >> inode->i_blkbits;
-	last = (from + len - 1) >> inode->i_blkbits;
+	/* 'from' is relative to page, but the bitmap is relative to head */
+	from += (page - head) * PAGE_SIZE;
+	/* Limit range to this page */
+	len = min(thp_size(head) - from, count);
 
-	if (iop) {
-		for (i = first; i <= last; i++)
-			if (!test_bit(i, iop->uptodate))
-				return 0;
-		return 1;
-	}
-
-	return 0;
+	return iomap_range_uptodate(head->mapping->host, head, from, len);
 }
 EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 07/14] iomap: Support THPs in readpage
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 06/14] iomap: Support THPs in iomap_is_partially_uptodate Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14 16:39   ` Darrick J. Wong
  2020-10-14  3:03 ` [PATCH 08/14] iomap: Support THPs in readahead Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

The VFS only calls readpage if readahead has encountered an error.
Assume that any error requires the page to be split, and attempt to
do so.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4ea6c601a183..ca305fbaf811 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -343,15 +343,50 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	return pos - orig_pos + plen;
 }
 
+/*
+ * The page that was passed in has become Uptodate.  This may be due to
+ * the storage being synchronous or due to a page split finding the page
+ * is actually uptodate.  The page is still locked.
+ * Lift this into the VFS at some point.
+ */
+#define AOP_UPDATED_PAGE       (AOP_TRUNCATED_PAGE + 1)
+
+static int iomap_split_page(struct inode *inode, struct page *page)
+{
+	struct page *head = thp_head(page);
+	bool uptodate = iomap_range_uptodate(inode, head,
+				(page - head) * PAGE_SIZE, PAGE_SIZE);
+
+	iomap_page_release(head);
+	if (split_huge_page(page) < 0) {
+		unlock_page(page);
+		return AOP_TRUNCATED_PAGE;
+	}
+	if (!uptodate)
+		return 0;
+	SetPageUptodate(page);
+	return AOP_UPDATED_PAGE;
+}
+
 int
 iomap_readpage(struct page *page, const struct iomap_ops *ops)
 {
 	struct iomap_readpage_ctx ctx = { .cur_page = page };
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = thp_head(page)->mapping->host;
 	unsigned poff;
 	loff_t ret;
 
-	trace_iomap_readpage(page->mapping->host, 1);
+	trace_iomap_readpage(inode, 1);
+
+	if (PageTransCompound(page)) {
+		int status = iomap_split_page(inode, page);
+		if (status == AOP_UPDATED_PAGE) {
+			unlock_page(page);
+			return 0;
+		}
+		if (status)
+			return status;
+	}
 
 	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
 		ret = iomap_apply(inode, page_offset(page) + poff,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 08/14] iomap: Support THPs in readahead
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 07/14] iomap: Support THPs in readpage Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-15  9:52   ` Christoph Hellwig
  2020-10-14  3:03 ` [PATCH 09/14] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

Use offset_in_thp() instead of offset_in_page() and change how we estimate
the number of bio_vecs we're going to need.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ca305fbaf811..4ef02afaedc5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -264,6 +264,16 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 		pos >= i_size_read(inode);
 }
 
+/*
+ * Estimate the number of vectors we need based on the current page size;
+ * if we're wrong we'll end up doing an overly large allocation or needing
+ * to do a second allocation, neither of which is a big deal.
+ */
+static unsigned int iomap_nr_vecs(struct page *page, loff_t length)
+{
+	return (length + thp_size(page) - 1) >> page_shift(page);
+}
+
 static loff_t
 iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap, struct iomap *srcmap)
@@ -309,7 +319,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (!is_contig || bio_full(ctx->bio, plen)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
 		gfp_t orig_gfp = gfp;
-		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		int nr_vecs = iomap_nr_vecs(page, length);
 
 		if (ctx->bio)
 			submit_bio(ctx->bio);
@@ -424,7 +434,8 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 	loff_t done, ret;
 
 	for (done = 0; done < length; done += ret) {
-		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
+		if (ctx->cur_page &&
+		    offset_in_thp(ctx->cur_page, pos + done) == 0) {
 			if (!ctx->cur_page_in_bio)
 				unlock_page(ctx->cur_page);
 			put_page(ctx->cur_page);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 09/14] iomap: Change iomap_write_begin calling convention
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 08/14] iomap: Support THPs in readahead Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14 16:47   ` Darrick J. Wong
  2020-10-14  3:03 ` [PATCH 10/14] iomap: Handle THPs when writing to pages Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

Pass (up to) the remaining length of the extent to iomap_write_begin()
and have it return the number of bytes that will fit in the page.
That lets us copy more bytes per call to iomap_write_begin() if the page
cache has already allocated a THP (and will in future allow us to pass
a hint to the page cache that it should try to allocate a larger page
if there are none in the cache).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 61 +++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4ef02afaedc5..397795db3ce5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -616,14 +616,14 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 	return submit_bio_wait(&bio);
 }
 
-static int
-__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
-		struct page *page, struct iomap *srcmap)
+static ssize_t __iomap_write_begin(struct inode *inode, loff_t pos,
+		size_t len, int flags, struct page *page, struct iomap *srcmap)
 {
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-	unsigned from = offset_in_page(pos), to = from + len;
+	size_t from = offset_in_thp(page, pos);
+	size_t to = from + len;
 	size_t poff, plen;
 
 	if (PageUptodate(page))
@@ -658,12 +658,13 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	return 0;
 }
 
-static int
-iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
-		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
+static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
+		unsigned flags, struct page **pagep, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	struct page *page;
+	size_t offset;
 	int status = 0;
 
 	BUG_ON(pos + len > iomap->offset + iomap->length);
@@ -674,6 +675,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		return -EINTR;
 
 	if (page_ops && page_ops->page_prepare) {
+		if (len > UINT_MAX)
+			len = UINT_MAX;
 		status = page_ops->page_prepare(inode, pos, len, iomap);
 		if (status)
 			return status;
@@ -685,6 +688,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = -ENOMEM;
 		goto out_no_page;
 	}
+	page = thp_head(page);
+	offset = offset_in_thp(page, pos);
+	if (len > thp_size(page) - offset)
+		len = thp_size(page) - offset;
 
 	if (srcmap->type == IOMAP_INLINE)
 		iomap_read_inline_data(inode, page, srcmap);
@@ -694,11 +701,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = __iomap_write_begin(inode, pos, len, flags, page,
 				srcmap);
 
-	if (unlikely(status))
+	if (status < 0)
 		goto out_unlock;
 
 	*pagep = page;
-	return 0;
+	return len;
 
 out_unlock:
 	unlock_page(page);
@@ -854,8 +861,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
 				srcmap);
-		if (unlikely(status))
+		if (status < 0)
 			break;
+		/* We may be partway through a THP */
+		offset = offset_in_thp(page, pos);
 
 		if (mapping_writably_mapped(inode->i_mapping))
 			flush_dcache_page(page);
@@ -915,7 +924,6 @@ static loff_t
 iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap, struct iomap *srcmap)
 {
-	long status = 0;
 	loff_t written = 0;
 
 	/* don't bother with blocks that are not shared to start with */
@@ -926,25 +934,24 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return length;
 
 	do {
-		unsigned long offset = offset_in_page(pos);
-		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 		struct page *page;
+		ssize_t bytes;
 
-		status = iomap_write_begin(inode, pos, bytes,
+		bytes = iomap_write_begin(inode, pos, length,
 				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
-		if (unlikely(status))
-			return status;
+		if (bytes < 0)
+			return bytes;
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+		bytes = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
 				srcmap);
-		if (WARN_ON_ONCE(status == 0))
+		if (WARN_ON_ONCE(bytes == 0))
 			return -EIO;
 
 		cond_resched();
 
-		pos += status;
-		written += status;
-		length -= status;
+		pos += bytes;
+		written += bytes;
+		length -= bytes;
 
 		balance_dirty_pages_ratelimited(inode->i_mapping);
 	} while (length);
@@ -975,15 +982,13 @@ static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
 		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page;
-	int status;
-	unsigned offset = offset_in_page(pos);
-	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
+	ssize_t bytes;
 
-	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
-	if (status)
-		return status;
+	bytes = iomap_write_begin(inode, pos, length, 0, &page, iomap, srcmap);
+	if (bytes < 0)
+		return bytes;
 
-	zero_user(page, offset, bytes);
+	zero_user(page, offset_in_thp(page, pos), bytes);
 	mark_page_accessed(page);
 
 	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 10/14] iomap: Handle THPs when writing to pages
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 09/14] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 11/14] iomap: Support THP writeback Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

If we come across a THP that is not uptodate when writing to the page
cache, this must be due to a readahead error, so behave the same way as
readpage and split it.  Make sure to flush the right page after completing
the write.  We still only copy up to a page boundary, so there's no need
to flush multiple pages at this time.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 397795db3ce5..0a1fe7d1a27c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -682,12 +682,19 @@ static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
 			return status;
 	}
 
+retry:
 	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
 			AOP_FLAG_NOFS);
 	if (!page) {
 		status = -ENOMEM;
 		goto out_no_page;
 	}
+	if (PageTransCompound(page) && !PageUptodate(page)) {
+		if (iomap_split_page(inode, page) == AOP_TRUNCATED_PAGE) {
+			put_page(page);
+			goto retry;
+		}
+	}
 	page = thp_head(page);
 	offset = offset_in_thp(page, pos);
 	if (len > thp_size(page) - offset)
@@ -724,6 +731,7 @@ iomap_set_page_dirty(struct page *page)
 	struct address_space *mapping = page_mapping(page);
 	int newly_dirty;
 
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
 	if (unlikely(!mapping))
 		return !TestSetPageDirty(page);
 
@@ -746,7 +754,9 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		size_t copied, struct page *page)
 {
-	flush_dcache_page(page);
+	size_t offset = offset_in_thp(page, pos);
+
+	flush_dcache_page(page + offset / PAGE_SIZE);
 
 	/*
 	 * The blocks that were entirely written will now be uptodate, so we
@@ -761,7 +771,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 */
 	if (unlikely(copied < len && !PageUptodate(page)))
 		return 0;
-	iomap_set_range_uptodate(page, offset_in_page(pos), len);
+	iomap_set_range_uptodate(page, offset, len);
 	iomap_set_page_dirty(page);
 	return copied;
 }
@@ -837,6 +847,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		unsigned long bytes;	/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
 
+		/*
+		 * XXX: We don't know what size page we'll find in the
+		 * page cache, so only copy up to a regular page boundary.
+		 */
 		offset = offset_in_page(pos);
 		bytes = min_t(unsigned long, PAGE_SIZE - offset,
 						iov_iter_count(i));
@@ -867,7 +881,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		offset = offset_in_thp(page, pos);
 
 		if (mapping_writably_mapped(inode->i_mapping))
-			flush_dcache_page(page);
+			flush_dcache_page(page + offset / PAGE_SIZE);
 
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 11/14] iomap: Support THP writeback
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 10/14] iomap: Handle THPs when writing to pages Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 12/14] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

Use offset_in_thp() and similar helpers to submit THPs for writeback.
This simplifies the logic in iomap_do_writepage() around handling the
end of the file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0a1fe7d1a27c..38fd69ebd4cc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1394,7 +1394,7 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 {
 	sector_t sector = iomap_sector(&wpc->iomap, offset);
 	unsigned len = i_blocksize(inode);
-	unsigned poff = offset & (PAGE_SIZE - 1);
+	unsigned poff = offset_in_thp(page, offset);
 	bool merged, same_page = false;
 
 	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, offset, sector)) {
@@ -1444,8 +1444,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
-	u64 file_offset; /* file offset of page */
+	loff_t pos;
 	int error = 0, count = 0, i;
+	int nr_blocks = i_blocks_per_page(inode, page);
 	LIST_HEAD(submit_list);
 
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
@@ -1455,20 +1456,20 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * end of the current map or find the current map invalid, grab a new
 	 * one.
 	 */
-	for (i = 0, file_offset = page_offset(page);
-	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
-	     i++, file_offset += len) {
+	for (i = 0, pos = page_offset(page);
+	     i < nr_blocks && pos < end_offset;
+	     i++, pos += len) {
 		if (iop && !test_bit(i, iop->uptodate))
 			continue;
 
-		error = wpc->ops->map_blocks(wpc, inode, file_offset);
+		error = wpc->ops->map_blocks(wpc, inode, pos);
 		if (error)
 			break;
 		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
 			continue;
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
-		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
+		iomap_add_to_ioend(inode, pos, page, iop, wpc, wbc,
 				 &submit_list);
 		count++;
 	}
@@ -1549,11 +1550,11 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 {
 	struct iomap_writepage_ctx *wpc = data;
 	struct inode *inode = page->mapping->host;
-	pgoff_t end_index;
 	u64 end_offset;
 	loff_t offset;
 
-	trace_iomap_writepage(inode, page_offset(page), PAGE_SIZE);
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	trace_iomap_writepage(inode, page_offset(page), thp_size(page));
 
 	/*
 	 * Refuse to write the page out if we are called from reclaim context.
@@ -1590,10 +1591,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 	 * ---------------------------------^------------------|
 	 */
 	offset = i_size_read(inode);
-	end_index = offset >> PAGE_SHIFT;
-	if (page->index < end_index)
-		end_offset = (loff_t)(page->index + 1) << PAGE_SHIFT;
-	else {
+	end_offset = page_offset(page) + thp_size(page);
+	if (end_offset > offset) {
 		/*
 		 * Check whether the page to write out is beyond or straddles
 		 * i_size or not.
@@ -1605,7 +1604,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * |				    |      Straddles     |
 		 * ---------------------------------^-----------|--------|
 		 */
-		unsigned offset_into_page = offset & (PAGE_SIZE - 1);
+		unsigned offset_into_page = offset_in_thp(page, offset);
+		pgoff_t end_index = offset >> PAGE_SHIFT;
 
 		/*
 		 * Skip the page if it is fully outside i_size, e.g. due to a
@@ -1636,7 +1636,7 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * memory is zeroed when mapped, and writes to that region are
 		 * not written out to the file."
 		 */
-		zero_user_segment(page, offset_into_page, PAGE_SIZE);
+		zero_user_segment(page, offset_into_page, thp_size(page));
 
 		/* Adjust the end_offset to the end of file */
 		end_offset = offset;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 12/14] iomap: Inline data shouldn't see THPs
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 11/14] iomap: Support THP writeback Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 13/14] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 14/14] xfs: Support THPs Matthew Wilcox (Oracle)
  13 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs
  Cc: Matthew Wilcox (Oracle), linux-mm, Christoph Hellwig

Assert that we're not seeing THPs in functions that read/write
inline data, rather than zeroing out the tail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 38fd69ebd4cc..38b6b75e7639 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -247,6 +247,7 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
 		return;
 
 	BUG_ON(page->index);
+	BUG_ON(PageCompound(page));
 	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
@@ -782,6 +783,7 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
+	BUG_ON(PageCompound(page));
 	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	flush_dcache_page(page);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 13/14] iomap: Handle tail pages in iomap_page_mkwrite
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 12/14] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14  3:03 ` [PATCH 14/14] xfs: Support THPs Matthew Wilcox (Oracle)
  13 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

iomap_page_mkwrite() can be called with a tail page.  If we are,
operate on the head page, since we're treating the entire thing as a
single unit and the whole page is dirtied at the same time.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 38b6b75e7639..c2540c85f629 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1098,7 +1098,7 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 {
-	struct page *page = vmf->page;
+	struct page *page = thp_head(vmf->page);
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	unsigned long length;
 	loff_t offset;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 14/14] xfs: Support THPs
  2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2020-10-14  3:03 ` [PATCH 13/14] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
@ 2020-10-14  3:03 ` Matthew Wilcox (Oracle)
  2020-10-14 16:51   ` Darrick J. Wong
  13 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-14  3:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs; +Cc: Matthew Wilcox (Oracle), linux-mm

There is one place which assumes the size of a page; fix it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/xfs/xfs_aops.c  | 2 +-
 fs/xfs/xfs_super.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 55d126d4e096..20968842b2f0 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -548,7 +548,7 @@ xfs_discard_page(
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
-	iomap_invalidatepage(page, 0, PAGE_SIZE);
+	iomap_invalidatepage(page, 0, thp_size(page));
 }
 
 static const struct iomap_writeback_ops xfs_writeback_ops = {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 71ac6c1cdc36..4b6e1cfc57a8 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1840,7 +1840,7 @@ static struct file_system_type xfs_fs_type = {
 	.init_fs_context	= xfs_init_fs_context,
 	.parameters		= xfs_fs_parameters,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV,
+	.fs_flags		= FS_REQUIRES_DEV | FS_THP_SUPPORT,
 };
 MODULE_ALIAS_FS("xfs");
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range
  2020-10-14  3:03 ` [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range Matthew Wilcox (Oracle)
@ 2020-10-14 16:12   ` Darrick J. Wong
  2020-10-14 17:16     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-10-14 16:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:44AM +0100, Matthew Wilcox (Oracle) wrote:
> We may get tail pages returned from vfs_dedupe_get_page().  If we do,
> we have to call page_mapping() instead of dereferencing page->mapping
> directly.  We may also deadlock trying to lock the page twice if they're
> subpages of the same THP, so compare the head pages instead.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/read_write.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 19f5c4bf75aa..ead675fef582 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1604,6 +1604,8 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>   */
>  static void vfs_lock_two_pages(struct page *page1, struct page *page2)
>  {
> +	page1 = thp_head(page1);
> +	page2 = thp_head(page2);

Hmm, is this usage (calling thp_head() to extract the head page from an
arbitrary page reference) a common enough idiom that it doesn't need a
comment saying why we need the head page?

I'm asking that genuinely-- thp_head() is new to me but maybe it's super
obvious to everyone else?  Or at least the mm developers?  I suspect
that might be the case....?

Aside from that question, this looks fine to me.

Also, I was sort of thinking about sending a patch to Linus at the end
of the merge window moving all the remap/clone/dedupe common code to a
separate file to declutter fs/read_write.c and mm/filemap.c.  Does that
sound ok?

--D

>  	/* Always lock in order of increasing index. */
>  	if (page1->index > page2->index)
>  		swap(page1, page2);
> @@ -1616,6 +1618,8 @@ static void vfs_lock_two_pages(struct page *page1, struct page *page2)
>  /* Unlock two pages, being careful not to unlock the same page twice. */
>  static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
>  {
> +	page1 = thp_head(page1);
> +	page2 = thp_head(page2);
>  	unlock_page(page1);
>  	if (page1 != page2)
>  		unlock_page(page2);
> @@ -1670,8 +1674,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>  		 * someone is invalidating pages on us and we lose.
>  		 */
>  		if (!PageUptodate(src_page) || !PageUptodate(dest_page) ||
> -		    src_page->mapping != src->i_mapping ||
> -		    dest_page->mapping != dest->i_mapping) {
> +		    page_mapping(src_page) != src->i_mapping ||
> +		    page_mapping(dest_page) != dest->i_mapping) {
>  			same = false;
>  			goto unlock;
>  		}
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware
  2020-10-14  3:03 ` [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
@ 2020-10-14 16:17   ` Darrick J. Wong
  2020-10-14 17:23     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-10-14 16:17 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:45AM +0100, Matthew Wilcox (Oracle) wrote:
> If the page is compound, check the last index in the page and return
> the appropriate size.  Change the return type to ssize_t in case we ever
> support pages larger than 2GB.

"But what about 16G pages?" :P

> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pagemap.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 7dc6bf713d27..5083a0efafa8 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -983,22 +983,22 @@ static inline unsigned long dir_pages(struct inode *inode)
>   * @page: the page to check
>   * @inode: the inode to check the page against
>   *
> - * Returns the number of bytes in the page up to EOF,
> + * Return: The number of bytes in the page up to EOF,
>   * or -EFAULT if the page was truncated.
>   */
> -static inline int page_mkwrite_check_truncate(struct page *page,
> +static inline ssize_t page_mkwrite_check_truncate(struct page *page,
>  					      struct inode *inode)
>  {
>  	loff_t size = i_size_read(inode);
>  	pgoff_t index = size >> PAGE_SHIFT;
> -	int offset = offset_in_page(size);
> +	unsigned long offset = offset_in_thp(page, size);
>  
>  	if (page->mapping != inode->i_mapping)
>  		return -EFAULT;
>  
>  	/* page is wholly inside EOF */
> -	if (page->index < index)
> -		return PAGE_SIZE;
> +	if (page->index + thp_nr_pages(page) - 1 < index)

Just curious, is this expression common enough to create a helper for
that too?

#define thp_last_index(page) ((page)->index + thp_nr_pages(page) - 1) ?

(Maybe make that a static inline, I used a macro for brevity)

Otherwise looks fine to me...

--D

> +		return thp_size(page);
>  	/* page is wholly past EOF */
>  	if (page->index > index || !offset)
>  		return -EFAULT;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/14] iomap: Support THPs in invalidatepage
  2020-10-14  3:03 ` [PATCH 05/14] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
@ 2020-10-14 16:33   ` Darrick J. Wong
  2020-10-14 17:26     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-10-14 16:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:48AM +0100, Matthew Wilcox (Oracle) wrote:
> If we're punching a hole in a THP, we need to remove the per-page
> iomap data as the THP is about to be split and each page will need
> its own.  This means that writepage can now come across a page with
> no iop allocated, so remove the assertion that there is already one,
> and just create one (with the uptodate bits set) if there isn't one.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 95ac66731297..4633ebd03a3f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -60,6 +60,8 @@ iomap_page_create(struct inode *inode, struct page *page)
>  	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
>  			GFP_NOFS | __GFP_NOFAIL);
>  	spin_lock_init(&iop->uptodate_lock);
> +	if (PageUptodate(page))
> +		bitmap_fill(iop->uptodate, nr_blocks);
>  	attach_page_private(page, iop);
>  	return iop;
>  }
> @@ -494,10 +496,14 @@ iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
>  	 * If we are invalidating the entire page, clear the dirty state from it
>  	 * and release it to avoid unnecessary buildup of the LRU.
>  	 */
> -	if (offset == 0 && len == PAGE_SIZE) {
> +	if (offset == 0 && len == thp_size(page)) {
>  		WARN_ON_ONCE(PageWriteback(page));
>  		cancel_dirty_page(page);
>  		iomap_page_release(page);
> +	} else if (PageTransHuge(page)) {
> +		/* Punching a hole in a THP requires releasing the iop */
> +		WARN_ON_ONCE(!PageUptodate(page) && PageDirty(page));
> +		iomap_page_release(page);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(iomap_invalidatepage);
> @@ -1363,14 +1369,13 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  		struct writeback_control *wbc, struct inode *inode,
>  		struct page *page, u64 end_offset)
>  {
> -	struct iomap_page *iop = to_iomap_page(page);
> +	struct iomap_page *iop = iomap_page_create(inode, page);
>  	struct iomap_ioend *ioend, *next;
>  	unsigned len = i_blocksize(inode);
>  	u64 file_offset; /* file offset of page */
>  	int error = 0, count = 0, i;
>  	LIST_HEAD(submit_list);
>  
> -	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
>  	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
>  
>  	/*
> @@ -1415,7 +1420,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  			 */
>  			if (wpc->ops->discard_page)
>  				wpc->ops->discard_page(page);
> -			ClearPageUptodate(page);

Er, I don't get it -- why do we now leave the page up to date after
writeback fails?

--D

>  			unlock_page(page);
>  			goto done;
>  		}
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/14] iomap: Support THPs in readpage
  2020-10-14  3:03 ` [PATCH 07/14] iomap: Support THPs in readpage Matthew Wilcox (Oracle)
@ 2020-10-14 16:39   ` Darrick J. Wong
  2020-10-14 17:35     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-10-14 16:39 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:50AM +0100, Matthew Wilcox (Oracle) wrote:
> The VFS only calls readpage if readahead has encountered an error.
> Assume that any error requires the page to be split, and attempt to
> do so.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4ea6c601a183..ca305fbaf811 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -343,15 +343,50 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	return pos - orig_pos + plen;
>  }
>  
> +/*
> + * The page that was passed in has become Uptodate.  This may be due to
> + * the storage being synchronous or due to a page split finding the page
> + * is actually uptodate.  The page is still locked.
> + * Lift this into the VFS at some point.
> + */
> +#define AOP_UPDATED_PAGE       (AOP_TRUNCATED_PAGE + 1)

Er... why not lift it now?

"Because touching fs.h causes the whole kernel to be rebuilt and that's
annoying"? :D

--D

> +static int iomap_split_page(struct inode *inode, struct page *page)
> +{
> +	struct page *head = thp_head(page);
> +	bool uptodate = iomap_range_uptodate(inode, head,
> +				(page - head) * PAGE_SIZE, PAGE_SIZE);
> +
> +	iomap_page_release(head);
> +	if (split_huge_page(page) < 0) {
> +		unlock_page(page);
> +		return AOP_TRUNCATED_PAGE;
> +	}
> +	if (!uptodate)
> +		return 0;
> +	SetPageUptodate(page);
> +	return AOP_UPDATED_PAGE;
> +}
> +
>  int
>  iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  {
>  	struct iomap_readpage_ctx ctx = { .cur_page = page };
> -	struct inode *inode = page->mapping->host;
> +	struct inode *inode = thp_head(page)->mapping->host;
>  	unsigned poff;
>  	loff_t ret;
>  
> -	trace_iomap_readpage(page->mapping->host, 1);
> +	trace_iomap_readpage(inode, 1);
> +
> +	if (PageTransCompound(page)) {
> +		int status = iomap_split_page(inode, page);
> +		if (status == AOP_UPDATED_PAGE) {
> +			unlock_page(page);

/me wonders why not do the unlock in iomap_split_page?

--D

> +			return 0;
> +		}
> +		if (status)
> +			return status;
> +	}
>  
>  	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
>  		ret = iomap_apply(inode, page_offset(page) + poff,
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] iomap: Change iomap_write_begin calling convention
  2020-10-14  3:03 ` [PATCH 09/14] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
@ 2020-10-14 16:47   ` Darrick J. Wong
  2020-10-14 17:41     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-10-14 16:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:52AM +0100, Matthew Wilcox (Oracle) wrote:
> Pass (up to) the remaining length of the extent to iomap_write_begin()
> and have it return the number of bytes that will fit in the page.
> That lets us copy more bytes per call to iomap_write_begin() if the page
> cache has already allocated a THP (and will in future allow us to pass
> a hint to the page cache that it should try to allocate a larger page
> if there are none in the cache).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 61 +++++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4ef02afaedc5..397795db3ce5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -616,14 +616,14 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  	return submit_bio_wait(&bio);
>  }
>  
> -static int
> -__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> -		struct page *page, struct iomap *srcmap)
> +static ssize_t __iomap_write_begin(struct inode *inode, loff_t pos,
> +		size_t len, int flags, struct page *page, struct iomap *srcmap)
>  {
>  	loff_t block_size = i_blocksize(inode);
>  	loff_t block_start = pos & ~(block_size - 1);
>  	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
> -	unsigned from = offset_in_page(pos), to = from + len;
> +	size_t from = offset_in_thp(page, pos);
> +	size_t to = from + len;
>  	size_t poff, plen;
>  
>  	if (PageUptodate(page))
> @@ -658,12 +658,13 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  	return 0;
>  }
>  
> -static int
> -iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> -		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> +static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
> +		unsigned flags, struct page **pagep, struct iomap *iomap,

loff_t len?  You've been using size_t (ssize_t?) for length elsewhere,
can't return more than ssize_t, and afaik MAX_RW_COUNT will never go
larger than 2GB so I'm confused about types here...?

Mostly because my brain has been trained to think that if it sees
"size_t len" as an input parameter and a ssize_t return value, then
probably the return value is however much of @len we managed to process.

> +		struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	struct page *page;
> +	size_t offset;
>  	int status = 0;
>  
>  	BUG_ON(pos + len > iomap->offset + iomap->length);
> @@ -674,6 +675,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		return -EINTR;
>  
>  	if (page_ops && page_ops->page_prepare) {
> +		if (len > UINT_MAX)
> +			len = UINT_MAX;

I'm not especially familiar with page_prepare (since it's a gfs2 thing);
why do you clamp len to UINT_MAX here?

--D

>  		status = page_ops->page_prepare(inode, pos, len, iomap);
>  		if (status)
>  			return status;
> @@ -685,6 +688,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		status = -ENOMEM;
>  		goto out_no_page;
>  	}
> +	page = thp_head(page);
> +	offset = offset_in_thp(page, pos);
> +	if (len > thp_size(page) - offset)
> +		len = thp_size(page) - offset;
>  
>  	if (srcmap->type == IOMAP_INLINE)
>  		iomap_read_inline_data(inode, page, srcmap);
> @@ -694,11 +701,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		status = __iomap_write_begin(inode, pos, len, flags, page,
>  				srcmap);
>  
> -	if (unlikely(status))
> +	if (status < 0)
>  		goto out_unlock;
>  
>  	*pagep = page;
> -	return 0;
> +	return len;
>  
>  out_unlock:
>  	unlock_page(page);
> @@ -854,8 +861,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
>  				srcmap);
> -		if (unlikely(status))
> +		if (status < 0)
>  			break;
> +		/* We may be partway through a THP */
> +		offset = offset_in_thp(page, pos);
>  
>  		if (mapping_writably_mapped(inode->i_mapping))
>  			flush_dcache_page(page);
> @@ -915,7 +924,6 @@ static loff_t
>  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct iomap *iomap, struct iomap *srcmap)
>  {
> -	long status = 0;
>  	loff_t written = 0;
>  
>  	/* don't bother with blocks that are not shared to start with */
> @@ -926,25 +934,24 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		return length;
>  
>  	do {
> -		unsigned long offset = offset_in_page(pos);
> -		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
>  		struct page *page;
> +		ssize_t bytes;
>  
> -		status = iomap_write_begin(inode, pos, bytes,
> +		bytes = iomap_write_begin(inode, pos, length,
>  				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> -		if (unlikely(status))
> -			return status;
> +		if (bytes < 0)
> +			return bytes;
>  
> -		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> +		bytes = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
>  				srcmap);
> -		if (WARN_ON_ONCE(status == 0))
> +		if (WARN_ON_ONCE(bytes == 0))
>  			return -EIO;
>  
>  		cond_resched();
>  
> -		pos += status;
> -		written += status;
> -		length -= status;
> +		pos += bytes;
> +		written += bytes;
> +		length -= bytes;
>  
>  		balance_dirty_pages_ratelimited(inode->i_mapping);
>  	} while (length);
> @@ -975,15 +982,13 @@ static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
>  		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page;
> -	int status;
> -	unsigned offset = offset_in_page(pos);
> -	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> +	ssize_t bytes;
>  
> -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> -	if (status)
> -		return status;
> +	bytes = iomap_write_begin(inode, pos, length, 0, &page, iomap, srcmap);
> +	if (bytes < 0)
> +		return bytes;
>  
> -	zero_user(page, offset, bytes);
> +	zero_user(page, offset_in_thp(page, pos), bytes);
>  	mark_page_accessed(page);
>  
>  	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 14/14] xfs: Support THPs
  2020-10-14  3:03 ` [PATCH 14/14] xfs: Support THPs Matthew Wilcox (Oracle)
@ 2020-10-14 16:51   ` Darrick J. Wong
  2020-10-14 17:30     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-10-14 16:51 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:57AM +0100, Matthew Wilcox (Oracle) wrote:
> There is one place which assumes the size of a page; fix it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/xfs/xfs_aops.c  | 2 +-
>  fs/xfs/xfs_super.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 55d126d4e096..20968842b2f0 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -548,7 +548,7 @@ xfs_discard_page(
>  	if (error && !XFS_FORCED_SHUTDOWN(mp))
>  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  out_invalidate:
> -	iomap_invalidatepage(page, 0, PAGE_SIZE);
> +	iomap_invalidatepage(page, 0, thp_size(page));
>  }
>  
>  static const struct iomap_writeback_ops xfs_writeback_ops = {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 71ac6c1cdc36..4b6e1cfc57a8 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1840,7 +1840,7 @@ static struct file_system_type xfs_fs_type = {
>  	.init_fs_context	= xfs_init_fs_context,
>  	.parameters		= xfs_fs_parameters,
>  	.kill_sb		= kill_block_super,
> -	.fs_flags		= FS_REQUIRES_DEV,
> +	.fs_flags		= FS_REQUIRES_DEV | FS_THP_SUPPORT,

Mostly looks reasonable to me so far, though I forget where exactly
FS_THP_SUPPORT got added...?

--D

>  };
>  MODULE_ALIAS_FS("xfs");
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range
  2020-10-14 16:12   ` Darrick J. Wong
@ 2020-10-14 17:16     ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 17:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 09:12:16AM -0700, Darrick J. Wong wrote:
> On Wed, Oct 14, 2020 at 04:03:44AM +0100, Matthew Wilcox (Oracle) wrote:
> > We may get tail pages returned from vfs_dedupe_get_page().  If we do,
> > we have to call page_mapping() instead of dereferencing page->mapping
> > directly.  We may also deadlock trying to lock the page twice if they're
> > subpages of the same THP, so compare the head pages instead.

> >  static void vfs_lock_two_pages(struct page *page1, struct page *page2)
> >  {
> > +	page1 = thp_head(page1);
> > +	page2 = thp_head(page2);
> 
> Hmm, is this usage (calling thp_head() to extract the head page from an
> arbitrary page reference) a common enough idiom that it doesn't need a
> comment saying why we need the head page?

It's pretty common.  Lots of times it gets hidden inside macros,
and sometimes it gets spelled as 'compound_head' instead of
thp_head.  The advantage of thp_head() is that it compiles away if
CONFIG_TRANSPARENT_HUGEPAGE is disabled, while compound pages always
exist.

> I'm asking that genuinely-- thp_head() is new to me but maybe it's super
> obvious to everyone else?  Or at least the mm developers?  I suspect
> that might be the case....?

thp_head is indeed new.  It was merged in August this year, partly in
response to Dave Chinner getting annoyed at the mixing of metaphors --
some things were thp_*, some were hpage_* and some were compound_*.
Now everything is in the thp_* namespace if it refers to THPs.

> Also, I was sort of thinking about sending a patch to Linus at the end
> of the merge window moving all the remap/clone/dedupe common code to a
> separate file to declutter fs/read_write.c and mm/filemap.c.  Does that
> sound ok?

I don't think that would bother me at all.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware
  2020-10-14 16:17   ` Darrick J. Wong
@ 2020-10-14 17:23     ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 17:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 09:17:04AM -0700, Darrick J. Wong wrote:
> On Wed, Oct 14, 2020 at 04:03:45AM +0100, Matthew Wilcox (Oracle) wrote:
> > If the page is compound, check the last index in the page and return
> > the appropriate size.  Change the return type to ssize_t in case we ever
> > support pages larger than 2GB.
> 
> "But what about 16G pages?" :P

They're not practical with today's I/O limits; a x4 PCIe link running at
gen4 speeds will take 2 seconds to do 16GB of I/O.  Assuming doubling of PCIe
speeds every four years, and a reasonable latency of 0.1 seconds, we're
about fifteen years away from that being reasonable.  I'm sure this
code will have bitrotted by then and whoever tries to add support for
them will have work to do ...

> >  	/* page is wholly inside EOF */
> > -	if (page->index < index)
> > -		return PAGE_SIZE;
> > +	if (page->index + thp_nr_pages(page) - 1 < index)
> 
> Just curious, is this expression common enough to create a helper for
> that too?
> 
> #define thp_last_index(page) ((page)->index + thp_nr_pages(page) - 1) ?
> 
> (Maybe make that a static inline, I used a macro for brevity)

I had that in an earlier version and there were two callers, so I
took it out again because it was more effort to explain what it did
than it was to open-code it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/14] iomap: Support THPs in invalidatepage
  2020-10-14 16:33   ` Darrick J. Wong
@ 2020-10-14 17:26     ` Matthew Wilcox
  2020-10-14 20:00       ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 17:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 09:33:47AM -0700, Darrick J. Wong wrote:
> > @@ -1415,7 +1420,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> >  			 */
> >  			if (wpc->ops->discard_page)
> >  				wpc->ops->discard_page(page);
> > -			ClearPageUptodate(page);
> 
> Er, I don't get it -- why do we now leave the page up to date after
> writeback fails?

The page is still uptodate -- every byte in this page is at least as new
as the corresponding bytes on disk.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 14/14] xfs: Support THPs
  2020-10-14 16:51   ` Darrick J. Wong
@ 2020-10-14 17:30     ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 17:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 09:51:16AM -0700, Darrick J. Wong wrote:
> > -	.fs_flags		= FS_REQUIRES_DEV,
> > +	.fs_flags		= FS_REQUIRES_DEV | FS_THP_SUPPORT,
> 
> Mostly looks reasonable to me so far, though I forget where exactly
> FS_THP_SUPPORT got added...?

That's in akpm's tree ... not sure if it's in his current pull request,
but I'm assuming it'll go upstream this merge window.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=akpm&id=b97a1c642769622b2f22f575f624aefcd1cd9b7f

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/14] iomap: Support THPs in readpage
  2020-10-14 16:39   ` Darrick J. Wong
@ 2020-10-14 17:35     ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 17:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 09:39:36AM -0700, Darrick J. Wong wrote:
> On Wed, Oct 14, 2020 at 04:03:50AM +0100, Matthew Wilcox (Oracle) wrote:
> > +/*
> > + * The page that was passed in has become Uptodate.  This may be due to
> > + * the storage being synchronous or due to a page split finding the page
> > + * is actually uptodate.  The page is still locked.
> > + * Lift this into the VFS at some point.
> > + */
> > +#define AOP_UPDATED_PAGE       (AOP_TRUNCATED_PAGE + 1)
> 
> Er... why not lift it now?
> 
> "Because touching fs.h causes the whole kernel to be rebuilt and that's
> annoying"? :D

Hah!  Because I already have a patch series out that does lift it to
the VFS:

https://lore.kernel.org/linux-fsdevel/20201009143104.22673-1-willy@infradead.org/

which I'm kind of hoping gets merged first.  Then I can adjust this patch.

> > +static int iomap_split_page(struct inode *inode, struct page *page)
> > +{
> > +	struct page *head = thp_head(page);
> > +	bool uptodate = iomap_range_uptodate(inode, head,
> > +				(page - head) * PAGE_SIZE, PAGE_SIZE);
> > +
> > +	iomap_page_release(head);
> > +	if (split_huge_page(page) < 0) {
> > +		unlock_page(page);
> > +		return AOP_TRUNCATED_PAGE;
> > +	}
> > +	if (!uptodate)
> > +		return 0;
> > +	SetPageUptodate(page);
> > +	return AOP_UPDATED_PAGE;
> > +}
> > +
> >  int
> >  iomap_readpage(struct page *page, const struct iomap_ops *ops)
> >  {
> >  	struct iomap_readpage_ctx ctx = { .cur_page = page };
> > -	struct inode *inode = page->mapping->host;
> > +	struct inode *inode = thp_head(page)->mapping->host;
> >  	unsigned poff;
> >  	loff_t ret;
> >  
> > -	trace_iomap_readpage(page->mapping->host, 1);
> > +	trace_iomap_readpage(inode, 1);
> > +
> > +	if (PageTransCompound(page)) {
> > +		int status = iomap_split_page(inode, page);
> > +		if (status == AOP_UPDATED_PAGE) {
> > +			unlock_page(page);
> 
> /me wonders why not do the unlock in iomap_split_page?

This is working around AOP_UPDATED_PAGE not existing in the VFS.  So
adjusting this patch becomes:

		int status = iomap_split_page(inode, page);
		if (status)
			return status;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] iomap: Change iomap_write_begin calling convention
  2020-10-14 16:47   ` Darrick J. Wong
@ 2020-10-14 17:41     ` Matthew Wilcox
  2020-10-14 18:08       ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 17:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 09:47:44AM -0700, Darrick J. Wong wrote:
> > -static int
> > -iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > -		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> > +static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
> > +		unsigned flags, struct page **pagep, struct iomap *iomap,
> 
> loff_t len?  You've been using size_t (ssize_t?) for length elsewhere,
> can't return more than ssize_t, and afaik MAX_RW_COUNT will never go
> larger than 2GB so I'm confused about types here...?

Yes, you're right.  This one should be size_t.

> >  	if (page_ops && page_ops->page_prepare) {
> > +		if (len > UINT_MAX)
> > +			len = UINT_MAX;
> 
> I'm not especially familiar with page_prepare (since it's a gfs2 thing);
> why do you clamp len to UINT_MAX here?

The len parameter of ->page_prepare is an unsigned int.  I don't want
a 1<<32+1 byte I/O to be seen as a 1 byte I/O.  We could upgrade the
parameter to size_t from unsigned int?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] iomap: Change iomap_write_begin calling convention
  2020-10-14 17:41     ` Matthew Wilcox
@ 2020-10-14 18:08       ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-10-14 18:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 06:41:58PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 14, 2020 at 09:47:44AM -0700, Darrick J. Wong wrote:
> > > -static int
> > > -iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > > -		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> > > +static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
> > > +		unsigned flags, struct page **pagep, struct iomap *iomap,
> > 
> > loff_t len?  You've been using size_t (ssize_t?) for length elsewhere,
> > can't return more than ssize_t, and afaik MAX_RW_COUNT will never go
> > larger than 2GB so I'm confused about types here...?
> 
> Yes, you're right.  This one should be size_t.
> 
> > >  	if (page_ops && page_ops->page_prepare) {
> > > +		if (len > UINT_MAX)
> > > +			len = UINT_MAX;
> > 
> > I'm not especially familiar with page_prepare (since it's a gfs2 thing);
> > why do you clamp len to UINT_MAX here?
> 
> The len parameter of ->page_prepare is an unsigned int.  I don't want
> a 1<<32+1 byte I/O to be seen as a 1 byte I/O.  We could upgrade the
> parameter to size_t from unsigned int?

OK, here's what I have -- two patches (copy-and-wasted):

    iomap: Prepare page ops for larger I/Os
    
    Just adjust the types for the page_prepare() and page_done() callbacks
    to size_t from unsigned int.  While I don't see individual pages that
    large in our near future, it's convenient to be able to pass in larger
    lengths and has no cost.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0f69fbd4af66..9a94d7e58199 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1028,7 +1028,7 @@ static void gfs2_write_unlock(struct inode *inode)
 }
 
 static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
-                                  unsigned len, struct iomap *iomap)
+                                  size_t len, struct iomap *iomap)
 {
        unsigned int blockmask = i_blocksize(inode) - 1;
        struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -1039,7 +1039,7 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
 }
 
 static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
-                                unsigned copied, struct page *page,
+                                size_t copied, struct page *page,
                                 struct iomap *iomap)
 {
        struct gfs2_trans *tr = current->journal_info;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4d1d3c3469e9..d3b06bffd996 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -106,9 +106,9 @@ iomap_sector(struct iomap *iomap, loff_t pos)
  * associated page could not be obtained.
  */
 struct iomap_page_ops {
-       int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
+       int (*page_prepare)(struct inode *inode, loff_t pos, size_t len,
                        struct iomap *iomap);
-       void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
+       void (*page_done)(struct inode *inode, loff_t pos, size_t copied,
                        struct page *page, struct iomap *iomap);
 };
 


The other is an amendment of this:

-static int
-iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags
,
-               struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
+static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, size_t len,
+               unsigned flags, struct page **pagep, struct iomap *iomap,
+               struct iomap *srcmap)

and I drop the clamp to UINT_MAX.  I also added:

-               unsigned long offset;   /* Offset into pagecache page */
-               unsigned long bytes;    /* Bytes to write to page */
+               size_t offset;          /* Offset into pagecache page */
+               size_t bytes;           /* Bytes to write to page */

which won't change anything, but does fit the general pattern of
using size_t for a byte count of in-memory things.  It matches
iomap_unshare_actor(), for example.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/14] iomap: Support THPs in invalidatepage
  2020-10-14 17:26     ` Matthew Wilcox
@ 2020-10-14 20:00       ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2020-10-14 20:00 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Darrick J. Wong, linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 06:26:34PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 14, 2020 at 09:33:47AM -0700, Darrick J. Wong wrote:
> > > @@ -1415,7 +1420,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > >  			 */
> > >  			if (wpc->ops->discard_page)
> > >  				wpc->ops->discard_page(page);
> > > -			ClearPageUptodate(page);
> > 
> > Er, I don't get it -- why do we now leave the page up to date after
> > writeback fails?
> 
> The page is still uptodate -- every byte in this page is at least as new
> as the corresponding bytes on disk.
> 

That seems rather odd if the preceding ->discard_page() turned an
underlying delalloc block into a hole. Technically the original written
data is still in the page, but it's no longer allocated/mapped or dirty
so really no longer in sync with on-disk state. Hm?

Brian


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/14] iomap: Support THPs in BIO completion path
  2020-10-14  3:03 ` [PATCH 03/14] iomap: Support THPs in BIO completion path Matthew Wilcox (Oracle)
@ 2020-10-15  9:50   ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2020-10-15  9:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:46AM +0100, Matthew Wilcox (Oracle) wrote:
> bio_for_each_segment_all() iterates once per regular sized page.
> Use bio_for_each_bvec_all() to iterate once per bvec and handle
> merged THPs ourselves, instead of teaching the block layer about THPs.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

This seems to conflict with your synchronous readpage series..

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/14] iomap: Support THPs in iomap_adjust_read_range
  2020-10-14  3:03 ` [PATCH 04/14] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
@ 2020-10-15  9:50   ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2020-10-15  9:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:47AM +0100, Matthew Wilcox (Oracle) wrote:
> Pass the struct page instead of the iomap_page so we can determine the
> size of the page.  Use offset_in_thp() instead of offset_in_page() and
> use thp_size() instead of PAGE_SIZE.  Convert the arguments to be size_t
> instead of unsigned int, in case pages ever get larger than 2^31 bytes.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 935468d79d9d..95ac66731297 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -82,16 +82,16 @@ iomap_page_release(struct page *page)
>  /*
>   * Calculate the range inside the page that we actually need to read.
>   */
> -static void
> -iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
> -		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
> +static void iomap_adjust_read_range(struct inode *inode, struct page *page,
> +		loff_t *pos, loff_t length, size_t *offp, size_t *lenp)

Can you please stop the pointless reformatting?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/14] iomap: Support THPs in readahead
  2020-10-14  3:03 ` [PATCH 08/14] iomap: Support THPs in readahead Matthew Wilcox (Oracle)
@ 2020-10-15  9:52   ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2020-10-15  9:52 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-fsdevel, linux-xfs, linux-mm

On Wed, Oct 14, 2020 at 04:03:51AM +0100, Matthew Wilcox (Oracle) wrote:
> +/*
> + * Estimate the number of vectors we need based on the current page size;
> + * if we're wrong we'll end up doing an overly large allocation or needing
> + * to do a second allocation, neither of which is a big deal.
> + */
> +static unsigned int iomap_nr_vecs(struct page *page, loff_t length)
> +{
> +	return (length + thp_size(page) - 1) >> page_shift(page);
> +}

This doesn't seem iomap specific, and would seems useful also for e.g.
direct I/O.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, back to index

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14  3:03 [PATCH 00/14] Transparent Huge Page support for XFS Matthew Wilcox (Oracle)
2020-10-14  3:03 ` [PATCH 01/14] fs: Support THPs in vfs_dedupe_file_range Matthew Wilcox (Oracle)
2020-10-14 16:12   ` Darrick J. Wong
2020-10-14 17:16     ` Matthew Wilcox
2020-10-14  3:03 ` [PATCH 02/14] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
2020-10-14 16:17   ` Darrick J. Wong
2020-10-14 17:23     ` Matthew Wilcox
2020-10-14  3:03 ` [PATCH 03/14] iomap: Support THPs in BIO completion path Matthew Wilcox (Oracle)
2020-10-15  9:50   ` Christoph Hellwig
2020-10-14  3:03 ` [PATCH 04/14] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
2020-10-15  9:50   ` Christoph Hellwig
2020-10-14  3:03 ` [PATCH 05/14] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
2020-10-14 16:33   ` Darrick J. Wong
2020-10-14 17:26     ` Matthew Wilcox
2020-10-14 20:00       ` Brian Foster
2020-10-14  3:03 ` [PATCH 06/14] iomap: Support THPs in iomap_is_partially_uptodate Matthew Wilcox (Oracle)
2020-10-14  3:03 ` [PATCH 07/14] iomap: Support THPs in readpage Matthew Wilcox (Oracle)
2020-10-14 16:39   ` Darrick J. Wong
2020-10-14 17:35     ` Matthew Wilcox
2020-10-14  3:03 ` [PATCH 08/14] iomap: Support THPs in readahead Matthew Wilcox (Oracle)
2020-10-15  9:52   ` Christoph Hellwig
2020-10-14  3:03 ` [PATCH 09/14] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
2020-10-14 16:47   ` Darrick J. Wong
2020-10-14 17:41     ` Matthew Wilcox
2020-10-14 18:08       ` Matthew Wilcox
2020-10-14  3:03 ` [PATCH 10/14] iomap: Handle THPs when writing to pages Matthew Wilcox (Oracle)
2020-10-14  3:03 ` [PATCH 11/14] iomap: Support THP writeback Matthew Wilcox (Oracle)
2020-10-14  3:03 ` [PATCH 12/14] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
2020-10-14  3:03 ` [PATCH 13/14] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
2020-10-14  3:03 ` [PATCH 14/14] xfs: Support THPs Matthew Wilcox (Oracle)
2020-10-14 16:51   ` Darrick J. Wong
2020-10-14 17:30     ` Matthew Wilcox

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git