linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] iomap/fs/block patches for 5.11
@ 2020-08-24 15:16 Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 01/11] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

As promised earlier [1], here are the patches which I would like to
merge into 5.11 to support THPs.  They depend on that earlier series.
If there's anything in here that you'd like to see pulled out and added
to that earlier series, let me know.

There are a couple of pieces in here which aren't exactly part of
iomap, but I think make sense to take through the iomap tree.

[1] https://lore.kernel.org/linux-fsdevel/20200824145511.10500-1-willy@infradead.org/

Matthew Wilcox (Oracle) (11):
  fs: Make page_mkwrite_check_truncate thp-aware
  mm: Support THPs in zero_user_segments
  mm: Zero the head page, not the tail page
  block: Add bio_for_each_thp_segment_all
  iomap: Support THPs in iomap_adjust_read_range
  iomap: Support THPs in invalidatepage
  iomap: Support THPs in read paths
  iomap: Change iomap_write_begin calling convention
  iomap: Support THPs in write paths
  iomap: Inline data shouldn't see THPs
  iomap: Handle tail pages in iomap_page_mkwrite

 fs/iomap/buffered-io.c  | 178 ++++++++++++++++++++++++----------------
 include/linux/bio.h     |  13 +++
 include/linux/bvec.h    |  27 ++++++
 include/linux/highmem.h |  15 +++-
 include/linux/pagemap.h |  10 +--
 mm/highmem.c            |  62 +++++++++++++-
 mm/shmem.c              |   7 ++
 mm/truncate.c           |   7 ++
 8 files changed, 236 insertions(+), 83 deletions(-)

-- 
2.28.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/11] fs: Make page_mkwrite_check_truncate thp-aware
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 02/11] mm: Support THPs in zero_user_segments Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

If the page is compound, check the last index in the page and return
the appropriate size.  Change the return type to ssize_t in case we ever
support pages larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 853733286138..50b176b65911 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -876,22 +876,22 @@ static inline unsigned long dir_pages(struct inode *inode)
  * @page: the page to check
  * @inode: the inode to check the page against
  *
- * Returns the number of bytes in the page up to EOF,
+ * Return: The number of bytes in the page up to EOF,
  * or -EFAULT if the page was truncated.
  */
-static inline int page_mkwrite_check_truncate(struct page *page,
+static inline ssize_t page_mkwrite_check_truncate(struct page *page,
 					      struct inode *inode)
 {
 	loff_t size = i_size_read(inode);
 	pgoff_t index = size >> PAGE_SHIFT;
-	int offset = offset_in_page(size);
+	unsigned long offset = offset_in_thp(page, size);
 
 	if (page->mapping != inode->i_mapping)
 		return -EFAULT;
 
 	/* page is wholly inside EOF */
-	if (page->index < index)
-		return PAGE_SIZE;
+	if (page->index + thp_nr_pages(page) - 1 < index)
+		return thp_size(page);
 	/* page is wholly past EOF */
 	if (page->index > index || !offset)
 		return -EFAULT;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/11] mm: Support THPs in zero_user_segments
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 01/11] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 03/11] mm: Zero the head page, not the tail page Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

We can only kmap() one subpage of a THP at a time, so loop over all
relevant subpages, skipping ones which don't need to be zeroed.  This is
too large to inline when THPs are enabled and we actually need highmem,
so put it in highmem.c.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/highmem.h | 15 +++++++---
 mm/highmem.c            | 62 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 14e6202ce47f..5390bfd4bdd3 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -284,13 +284,18 @@ static inline void clear_highpage(struct page *page)
 	kunmap_atomic(kaddr);
 }
 
+#if defined(CONFIG_HIGHMEM) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
+		unsigned start2, unsigned end2);
+#else /* !HIGHMEM || !TRANSPARENT_HUGEPAGE */
 static inline void zero_user_segments(struct page *page,
-	unsigned start1, unsigned end1,
-	unsigned start2, unsigned end2)
+		unsigned start1, unsigned end1,
+		unsigned start2, unsigned end2)
 {
 	void *kaddr = kmap_atomic(page);
+	unsigned int i;
 
-	BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);
+	BUG_ON(end1 > thp_size(page) || end2 > thp_size(page));
 
 	if (end1 > start1)
 		memset(kaddr + start1, 0, end1 - start1);
@@ -299,8 +304,10 @@ static inline void zero_user_segments(struct page *page,
 		memset(kaddr + start2, 0, end2 - start2);
 
 	kunmap_atomic(kaddr);
-	flush_dcache_page(page);
+	for (i = 0; i < thp_nr_pages(page); i++)
+		flush_dcache_page(page + i);
 }
+#endif /* !HIGHMEM || !TRANSPARENT_HUGEPAGE */
 
 static inline void zero_user_segment(struct page *page,
 	unsigned start, unsigned end)
diff --git a/mm/highmem.c b/mm/highmem.c
index 64d8dea47dd1..c0f1e389a153 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -367,9 +367,67 @@ void kunmap_high(struct page *page)
 	if (need_wakeup)
 		wake_up(pkmap_map_wait);
 }
-
 EXPORT_SYMBOL(kunmap_high);
-#endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
+		unsigned start2, unsigned end2)
+{
+	unsigned int i;
+
+	BUG_ON(end1 > thp_size(page) || end2 > thp_size(page));
+
+	for (i = 0; i < thp_nr_pages(page); i++) {
+		void *kaddr;
+		unsigned this_end;
+
+		if (end1 == 0 && start2 >= PAGE_SIZE) {
+			start2 -= PAGE_SIZE;
+			end2 -= PAGE_SIZE;
+			continue;
+		}
+
+		if (start1 >= PAGE_SIZE) {
+			start1 -= PAGE_SIZE;
+			end1 -= PAGE_SIZE;
+			if (start2) {
+				start2 -= PAGE_SIZE;
+				end2 -= PAGE_SIZE;
+			}
+			continue;
+		}
+
+		kaddr = kmap_atomic(page + i);
+
+		this_end = min_t(unsigned, end1, PAGE_SIZE);
+		if (end1 > start1)
+			memset(kaddr + start1, 0, this_end - start1);
+		end1 -= this_end;
+		start1 = 0;
+
+		if (start2 >= PAGE_SIZE) {
+			start2 -= PAGE_SIZE;
+			end2 -= PAGE_SIZE;
+		} else {
+			this_end = min_t(unsigned, end2, PAGE_SIZE);
+			if (end2 > start2)
+				memset(kaddr + start2, 0, this_end - start2);
+			end2 -= this_end;
+			start2 = 0;
+		}
+
+		kunmap_atomic(kaddr);
+		flush_dcache_page(page + i);
+
+		if (!end1 && !end2)
+			break;
+	}
+
+	BUG_ON((start1 | start2 | end1 | end2) != 0);
+}
+EXPORT_SYMBOL(zero_user_segments);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#endif /* CONFIG_HIGHMEM */
 
 #if defined(HASHED_PAGE_VIRTUAL)
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/11] mm: Zero the head page, not the tail page
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 01/11] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 02/11] mm: Support THPs in zero_user_segments Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 04/11] block: Add bio_for_each_thp_segment_all Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

Pass the head page to zero_user_segment(), not the tail page, and adjust
the byte offsets appropriately.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/shmem.c    | 7 +++++++
 mm/truncate.c | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..77982149b437 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -958,11 +958,18 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		struct page *page = NULL;
 		shmem_getpage(inode, start - 1, &page, SGP_READ);
 		if (page) {
+			struct page *head = thp_head(page);
 			unsigned int top = PAGE_SIZE;
 			if (start > end) {
 				top = partial_end;
 				partial_end = 0;
 			}
+			if (head != page) {
+				unsigned int diff = start - 1 - head->index;
+				partial_start += diff << PAGE_SHIFT;
+				top += diff << PAGE_SHIFT;
+				page = head;
+			}
 			zero_user_segment(page, partial_start, top);
 			set_page_dirty(page);
 			unlock_page(page);
diff --git a/mm/truncate.c b/mm/truncate.c
index dd9ebc1da356..152974888124 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -374,12 +374,19 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
+			struct page *head = thp_head(page);
 			unsigned int top = PAGE_SIZE;
 			if (start > end) {
 				/* Truncation within a single page */
 				top = partial_end;
 				partial_end = 0;
 			}
+			if (head != page) {
+				unsigned int diff = start - 1 - head->index;
+				partial_start += diff << PAGE_SHIFT;
+				top += diff << PAGE_SHIFT;
+				page = head;
+			}
 			wait_on_page_writeback(page);
 			zero_user_segment(page, partial_start, top);
 			cleancache_invalidate_page(mapping, page);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/11] block: Add bio_for_each_thp_segment_all
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 03/11] mm: Zero the head page, not the tail page Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-27  8:44   ` Christoph Hellwig
  2020-08-24 15:16 ` [PATCH 05/11] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

Iterate once for each THP instead of once for each base page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/bio.h  | 13 +++++++++++++
 include/linux/bvec.h | 27 +++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..a0e104910097 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -129,12 +129,25 @@ static inline bool bio_next_segment(const struct bio *bio,
 	return true;
 }
 
+static inline bool bio_next_thp_segment(const struct bio *bio,
+				    struct bvec_iter_all *iter)
+{
+	if (iter->idx >= bio->bi_vcnt)
+		return false;
+
+	bvec_thp_advance(&bio->bi_io_vec[iter->idx], iter);
+	return true;
+}
+
 /*
  * drivers should _never_ use the all version - the bio may have been split
  * before it got to the driver and the driver won't own all of it
  */
 #define bio_for_each_segment_all(bvl, bio, iter) \
 	for (bvl = bvec_init_iter_all(&iter); bio_next_segment((bio), &iter); )
+#define bio_for_each_thp_segment_all(bvl, bio, iter) \
+	for (bvl = bvec_init_iter_all(&iter); \
+	     bio_next_thp_segment((bio), &iter); )
 
 static inline void bio_advance_iter(const struct bio *bio,
 				    struct bvec_iter *iter, unsigned int bytes)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index ac0c7299d5b8..ea8a37a7515b 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -162,4 +162,31 @@ static inline void bvec_advance(const struct bio_vec *bvec,
 	}
 }
 
+static inline void bvec_thp_advance(const struct bio_vec *bvec,
+				struct bvec_iter_all *iter_all)
+{
+	struct bio_vec *bv = &iter_all->bv;
+	unsigned int page_size;
+
+	if (iter_all->done) {
+		bv->bv_page += thp_nr_pages(bv->bv_page);
+		page_size = thp_size(bv->bv_page);
+		bv->bv_offset = 0;
+	} else {
+		bv->bv_page = thp_head(bvec->bv_page +
+				(bvec->bv_offset >> PAGE_SHIFT));
+		page_size = thp_size(bv->bv_page);
+		bv->bv_offset = bvec->bv_offset -
+				(bv->bv_page - bvec->bv_page) * PAGE_SIZE;
+		BUG_ON(bv->bv_offset >= page_size);
+	}
+	bv->bv_len = min(page_size - bv->bv_offset,
+			 bvec->bv_len - iter_all->done);
+	iter_all->done += bv->bv_len;
+
+	if (iter_all->done == bvec->bv_len) {
+		iter_all->idx++;
+		iter_all->done = 0;
+	}
+}
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/11] iomap: Support THPs in iomap_adjust_read_range
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 04/11] block: Add bio_for_each_thp_segment_all Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 06/11] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

Pass the struct page instead of the iomap_page so we can determine the
size of the page.  Use offset_in_thp() instead of offset_in_page() and
use thp_size() instead of PAGE_SIZE.  Convert the arguments to be size_t
instead of unsigned int, in case pages ever get larger than 2^31 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2dba054095e8..5cc0343b6a8e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -76,16 +76,16 @@ iomap_page_release(struct page *page)
 /*
  * Calculate the range inside the page that we actually need to read.
  */
-static void
-iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
-		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
+static void iomap_adjust_read_range(struct inode *inode, struct page *page,
+		loff_t *pos, loff_t length, size_t *offp, size_t *lenp)
 {
+	struct iomap_page *iop = to_iomap_page(page);
 	loff_t orig_pos = *pos;
 	loff_t isize = i_size_read(inode);
 	unsigned block_bits = inode->i_blkbits;
 	unsigned block_size = (1 << block_bits);
-	unsigned poff = offset_in_page(*pos);
-	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+	size_t poff = offset_in_thp(page, *pos);
+	size_t plen = min_t(loff_t, thp_size(page) - poff, length);
 	unsigned first = poff >> block_bits;
 	unsigned last = (poff + plen - 1) >> block_bits;
 
@@ -123,7 +123,7 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 	 * page cache for blocks that are entirely outside of i_size.
 	 */
 	if (orig_pos <= isize && orig_pos + length > isize) {
-		unsigned end = offset_in_page(isize - 1) >> block_bits;
+		unsigned end = offset_in_thp(page, isize - 1) >> block_bits;
 
 		if (first <= end && last > end)
 			plen -= (last - end) * block_size;
@@ -234,7 +234,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	bool same_page = false, is_contig = false;
 	loff_t orig_pos = pos;
-	unsigned poff, plen;
+	size_t poff, plen;
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
@@ -244,7 +244,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	}
 
 	/* zero post-eof blocks as the page may be mapped */
-	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
+	iomap_adjust_read_range(inode, page, &pos, length, &poff, &plen);
 	if (plen == 0)
 		goto done;
 
@@ -550,18 +550,19 @@ static int
 __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		struct page *page, struct iomap *srcmap)
 {
-	struct iomap_page *iop = iomap_page_create(inode, page);
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
+	unsigned from = offset_in_page(pos), to = from + len;
+	size_t poff, plen;
 	int status;
 
 	if (PageUptodate(page))
 		return 0;
+	iomap_page_create(inode, page);
 
 	do {
-		iomap_adjust_read_range(inode, iop, &block_start,
+		iomap_adjust_read_range(inode, page, &block_start,
 				block_end - block_start, &poff, &plen);
 		if (plen == 0)
 			break;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/11] iomap: Support THPs in invalidatepage
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 05/11] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 07/11] iomap: Support THPs in read paths Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

If we're punching a hole in a THP, we need to remove the per-page
iomap data as the THP is about to be split and each page will need
its own.  This means that writepage can now come across a page with
no iop allocated, so remove the assertions that there is already one,
and just create one (with the Uptodate bits set) if there isn't one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 5cc0343b6a8e..9ea162617398 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -54,6 +54,8 @@ iomap_page_create(struct inode *inode, struct page *page)
 	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
 			GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
+	if (PageUptodate(page))
+		bitmap_fill(iop->uptodate, nr_blocks);
 	attach_page_private(page, iop);
 	return iop;
 }
@@ -483,10 +485,17 @@ iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
 	 * If we are invalidating the entire page, clear the dirty state from it
 	 * and release it to avoid unnecessary buildup of the LRU.
 	 */
-	if (offset == 0 && len == PAGE_SIZE) {
+	if (offset == 0 && len == thp_size(page)) {
 		WARN_ON_ONCE(PageWriteback(page));
 		cancel_dirty_page(page);
 		iomap_page_release(page);
+		return;
+	}
+
+	/* Punching a hole in a THP requires releasing the iop */
+	if (PageTransHuge(page)) {
+		VM_BUG_ON_PAGE(!PageUptodate(page), page);
+		iomap_page_release(page);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_invalidatepage);
@@ -1043,14 +1052,13 @@ static void
 iomap_finish_page_writeback(struct inode *inode, struct page *page,
 		int error, unsigned int len)
 {
-	struct iomap_page *iop = to_iomap_page(page);
+	struct iomap_page *iop = iomap_page_create(inode, page);
 
 	if (error) {
 		SetPageError(page);
 		mapping_set_error(inode->i_mapping, -EIO);
 	}
 
-	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
 
 	if (!iop || atomic_sub_and_test(len, &iop->write_count))
@@ -1340,14 +1348,13 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct inode *inode,
 		struct page *page, u64 end_offset)
 {
-	struct iomap_page *iop = to_iomap_page(page);
+	struct iomap_page *iop = iomap_page_create(inode, page);
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
 	u64 file_offset; /* file offset of page */
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
 
 	/*
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/11] iomap: Support THPs in read paths
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 06/11] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 08/11] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

Use thp_size() instead of PAGE_SIZE, offset_in_thp() instead
of offset_in_page() and bio_for_each_thp_segment_all().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9ea162617398..d14de8886d5c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -187,7 +187,7 @@ iomap_read_end_io(struct bio *bio)
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
 
-	bio_for_each_segment_all(bvec, bio, iter_all)
+	bio_for_each_thp_segment_all(bvec, bio, iter_all)
 		iomap_read_page_end_io(bvec, error);
 	bio_put(bio);
 }
@@ -227,6 +227,16 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 		pos >= i_size_read(inode);
 }
 
+/*
+ * Estimate the number of vectors we need based on the current page size;
+ * if we're wrong we'll end up doing an overly large allocation or needing
+ * to do a second allocation, neither of which is a big deal.
+ */
+static unsigned int iomap_nr_vecs(struct page *page, loff_t length)
+{
+	return (length + thp_size(page) - 1) >> page_shift(page);
+}
+
 static loff_t
 iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap, struct iomap *srcmap)
@@ -280,7 +290,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
 		gfp_t orig_gfp = gfp;
-		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		int nr_vecs = iomap_nr_vecs(page, length);
 
 		if (ctx->bio)
 			submit_bio(ctx->bio);
@@ -324,9 +334,9 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 
 	trace_iomap_readpage(page->mapping->host, 1);
 
-	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
+	for (poff = 0; poff < thp_size(page); poff += ret) {
 		ret = iomap_apply(inode, page_offset(page) + poff,
-				PAGE_SIZE - poff, 0, ops, &ctx,
+				thp_size(page) - poff, 0, ops, &ctx,
 				iomap_readpage_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
@@ -360,7 +370,8 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 	loff_t done, ret;
 
 	for (done = 0; done < length; done += ret) {
-		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
+		if (ctx->cur_page &&
+		    offset_in_thp(ctx->cur_page, pos + done) == 0) {
 			if (!ctx->cur_page_in_bio)
 				unlock_page(ctx->cur_page);
 			put_page(ctx->cur_page);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/11] iomap: Change iomap_write_begin calling convention
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 07/11] iomap: Support THPs in read paths Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 09/11] iomap: Support THPs in write paths Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

Pass (up to) the remaining length of the extent to iomap_write_begin()
and have it return the number of bytes that will fit in the page.
That lets us copy more bytes per call to iomap_write_begin() if the page
cache has already allocated a THP (and will in future allow us to pass
a hint to the page cache that it should try to allocate a larger page
if there are none in the cache).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 61 +++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d14de8886d5c..f43a15aaa381 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -566,14 +566,14 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 	return submit_bio_wait(&bio);
 }
 
-static int
-__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
-		struct page *page, struct iomap *srcmap)
+static ssize_t __iomap_write_begin(struct inode *inode, loff_t pos,
+		size_t len, int flags, struct page *page, struct iomap *srcmap)
 {
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-	unsigned from = offset_in_page(pos), to = from + len;
+	size_t from = offset_in_thp(page, pos);
+	size_t to = from + len;
 	size_t poff, plen;
 	int status;
 
@@ -609,12 +609,13 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	return 0;
 }
 
-static int
-iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
-		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
+static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
+		unsigned flags, struct page **pagep, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	struct page *page;
+	size_t offset;
 	int status = 0;
 
 	BUG_ON(pos + len > iomap->offset + iomap->length);
@@ -625,6 +626,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		return -EINTR;
 
 	if (page_ops && page_ops->page_prepare) {
+		if (len > UINT_MAX)
+			len = UINT_MAX;
 		status = page_ops->page_prepare(inode, pos, len, iomap);
 		if (status)
 			return status;
@@ -636,6 +639,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = -ENOMEM;
 		goto out_no_page;
 	}
+	page = thp_head(page);
+	offset = offset_in_thp(page, pos);
+	if (len > thp_size(page) - offset)
+		len = thp_size(page) - offset;
 
 	if (srcmap->type == IOMAP_INLINE)
 		iomap_read_inline_data(inode, page, srcmap);
@@ -645,11 +652,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = __iomap_write_begin(inode, pos, len, flags, page,
 				srcmap);
 
-	if (unlikely(status))
+	if (status < 0)
 		goto out_unlock;
 
 	*pagep = page;
-	return 0;
+	return len;
 
 out_unlock:
 	unlock_page(page);
@@ -805,8 +812,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
 				srcmap);
-		if (unlikely(status))
+		if (status < 0)
 			break;
+		/* We may be partway through a THP */
+		offset = offset_in_thp(page, pos);
 
 		if (mapping_writably_mapped(inode->i_mapping))
 			flush_dcache_page(page);
@@ -866,7 +875,6 @@ static loff_t
 iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap, struct iomap *srcmap)
 {
-	long status = 0;
 	loff_t written = 0;
 
 	/* don't bother with blocks that are not shared to start with */
@@ -877,25 +885,24 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return length;
 
 	do {
-		unsigned long offset = offset_in_page(pos);
-		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 		struct page *page;
+		ssize_t bytes;
 
-		status = iomap_write_begin(inode, pos, bytes,
+		bytes = iomap_write_begin(inode, pos, length,
 				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
-		if (unlikely(status))
-			return status;
+		if (bytes < 0)
+			return bytes;
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+		bytes = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
 				srcmap);
-		if (WARN_ON_ONCE(status == 0))
+		if (WARN_ON_ONCE(bytes == 0))
 			return -EIO;
 
 		cond_resched();
 
-		pos += status;
-		written += status;
-		length -= status;
+		pos += bytes;
+		written += bytes;
+		length -= bytes;
 
 		balance_dirty_pages_ratelimited(inode->i_mapping);
 	} while (length);
@@ -926,15 +933,13 @@ static loff_t iomap_zero(struct inode *inode, loff_t pos, u64 length,
 		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page;
-	int status;
-	unsigned offset = offset_in_page(pos);
-	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
+	ssize_t bytes;
 
-	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
-	if (status)
-		return status;
+	bytes = iomap_write_begin(inode, pos, length, 0, &page, iomap, srcmap);
+	if (bytes < 0)
+		return bytes;
 
-	zero_user(page, offset, bytes);
+	zero_user(page, offset_in_thp(page, pos), bytes);
 	mark_page_accessed(page);
 
 	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/11] iomap: Support THPs in write paths
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 08/11] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:16 ` [PATCH 10/11] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

Use thp_size() instead of PAGE_SIZE and offset_in_thp() instead of
offset_in_page().  Also simplify the logic in iomap_do_writepage() for
determining end of file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 54 ++++++++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f43a15aaa381..52d371c59758 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -452,7 +452,7 @@ iomap_is_partially_uptodate(struct page *page, unsigned long from,
 	unsigned i;
 
 	/* Limit range to one page */
-	len = min_t(unsigned, PAGE_SIZE - from, count);
+	len = min_t(unsigned, thp_size(page) - from, count);
 
 	/* First and last blocks in range within page */
 	first = from >> inode->i_blkbits;
@@ -649,8 +649,8 @@ static ssize_t iomap_write_begin(struct inode *inode, loff_t pos, loff_t len,
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
-		status = __iomap_write_begin(inode, pos, len, flags, page,
-				srcmap);
+		status = __iomap_write_begin(inode, pos, len, flags,
+				thp_head(page), srcmap);
 
 	if (status < 0)
 		goto out_unlock;
@@ -675,6 +675,7 @@ iomap_set_page_dirty(struct page *page)
 	struct address_space *mapping = page_mapping(page);
 	int newly_dirty;
 
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
 	if (unlikely(!mapping))
 		return !TestSetPageDirty(page);
 
@@ -697,7 +698,9 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		size_t copied, struct page *page)
 {
-	flush_dcache_page(page);
+	size_t offset = offset_in_thp(page, pos);
+
+	flush_dcache_page(page + offset / PAGE_SIZE);
 
 	/*
 	 * The blocks that were entirely written will now be uptodate, so we
@@ -712,7 +715,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 */
 	if (unlikely(copied < len && !PageUptodate(page)))
 		return 0;
-	iomap_set_range_uptodate(page, offset_in_page(pos), len);
+	iomap_set_range_uptodate(page, offset, len);
 	iomap_set_page_dirty(page);
 	return copied;
 }
@@ -749,7 +752,8 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
-		ret = __iomap_write_end(inode, pos, len, copied, page);
+		ret = __iomap_write_end(inode, pos, len, copied,
+				thp_head(page));
 	}
 
 	/*
@@ -788,6 +792,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		unsigned long bytes;	/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
 
+		/*
+		 * XXX: We don't know what size page we'll find in the
+		 * page cache, so only copy up to a regular page boundary.
+		 */
 		offset = offset_in_page(pos);
 		bytes = min_t(unsigned long, PAGE_SIZE - offset,
 						iov_iter_count(i));
@@ -818,7 +826,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		offset = offset_in_thp(page, pos);
 
 		if (mapping_writably_mapped(inode->i_mapping))
-			flush_dcache_page(page);
+			flush_dcache_page(page + offset / PAGE_SIZE);
 
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
@@ -1110,7 +1118,7 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bv, bio, iter_all)
+		bio_for_each_thp_segment_all(bv, bio, iter_all)
 			iomap_finish_page_writeback(inode, bv->bv_page, error,
 					bv->bv_len);
 		bio_put(bio);
@@ -1317,7 +1325,7 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 {
 	sector_t sector = iomap_sector(&wpc->iomap, offset);
 	unsigned len = i_blocksize(inode);
-	unsigned poff = offset & (PAGE_SIZE - 1);
+	unsigned poff = offset_in_thp(page, offset);
 	bool merged, same_page = false;
 
 	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, offset, sector)) {
@@ -1367,8 +1375,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
-	u64 file_offset; /* file offset of page */
+	loff_t pos;
 	int error = 0, count = 0, i;
+	int nr_blocks = i_blocks_per_page(inode, page);
 	LIST_HEAD(submit_list);
 
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
@@ -1378,20 +1387,20 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * end of the current map or find the current map invalid, grab a new
 	 * one.
 	 */
-	for (i = 0, file_offset = page_offset(page);
-	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
-	     i++, file_offset += len) {
+	for (i = 0, pos = page_offset(page);
+	     i < nr_blocks && pos < end_offset;
+	     i++, pos += len) {
 		if (iop && !test_bit(i, iop->uptodate))
 			continue;
 
-		error = wpc->ops->map_blocks(wpc, inode, file_offset);
+		error = wpc->ops->map_blocks(wpc, inode, pos);
 		if (error)
 			break;
 		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
 			continue;
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
-		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
+		iomap_add_to_ioend(inode, pos, page, iop, wpc, wbc,
 				 &submit_list);
 		count++;
 	}
@@ -1473,11 +1482,11 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 {
 	struct iomap_writepage_ctx *wpc = data;
 	struct inode *inode = page->mapping->host;
-	pgoff_t end_index;
 	u64 end_offset;
 	loff_t offset;
 
-	trace_iomap_writepage(inode, page_offset(page), PAGE_SIZE);
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	trace_iomap_writepage(inode, page_offset(page), thp_size(page));
 
 	/*
 	 * Refuse to write the page out if we are called from reclaim context.
@@ -1514,10 +1523,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 	 * ---------------------------------^------------------|
 	 */
 	offset = i_size_read(inode);
-	end_index = offset >> PAGE_SHIFT;
-	if (page->index < end_index)
-		end_offset = (loff_t)(page->index + 1) << PAGE_SHIFT;
-	else {
+	end_offset = page_offset(page) + thp_size(page);
+	if (end_offset > offset) {
 		/*
 		 * Check whether the page to write out is beyond or straddles
 		 * i_size or not.
@@ -1529,7 +1536,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * |				    |      Straddles     |
 		 * ---------------------------------^-----------|--------|
 		 */
-		unsigned offset_into_page = offset & (PAGE_SIZE - 1);
+		unsigned offset_into_page = offset_in_thp(page, offset);
+		pgoff_t end_index = offset >> PAGE_SHIFT;
 
 		/*
 		 * Skip the page if it is fully outside i_size, e.g. due to a
@@ -1560,7 +1568,7 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * memory is zeroed when mapped, and writes to that region are
 		 * not written out to the file."
 		 */
-		zero_user_segment(page, offset_into_page, PAGE_SIZE);
+		zero_user_segment(page, offset_into_page, thp_size(page));
 
 		/* Adjust the end_offset to the end of file */
 		end_offset = offset;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/11] iomap: Inline data shouldn't see THPs
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 09/11] iomap: Support THPs in write paths Matthew Wilcox (Oracle)
@ 2020-08-24 15:16 ` Matthew Wilcox (Oracle)
  2020-08-24 15:17 ` [PATCH 11/11] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
  2020-08-25 10:29 ` [PATCH 00/11] iomap/fs/block patches for 5.11 William Kucharski
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:16 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel,
	Christoph Hellwig

Assert that we're not seeing THPs in functions that read/write
inline data, rather than zeroing out the tail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 52d371c59758..ca2aa1995519 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -210,6 +210,7 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
 		return;
 
 	BUG_ON(page->index);
+	BUG_ON(PageCompound(page));
 	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
@@ -727,6 +728,7 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
 
 	flush_dcache_page(page);
 	WARN_ON_ONCE(!PageUptodate(page));
+	BUG_ON(PageCompound(page));
 	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/11] iomap: Handle tail pages in iomap_page_mkwrite
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2020-08-24 15:16 ` [PATCH 10/11] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
@ 2020-08-24 15:17 ` Matthew Wilcox (Oracle)
  2020-08-25 10:29 ` [PATCH 00/11] iomap/fs/block patches for 5.11 William Kucharski
  11 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 15:17 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-block, linux-mm, linux-kernel

iomap_page_mkwrite() can be called with a tail page.  If we are,
operate on the head page, since we're treating the entire thing as a
single unit and the whole page is dirtied at the same time.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ca2aa1995519..eb8202424665 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1043,7 +1043,7 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 {
-	struct page *page = vmf->page;
+	struct page *page = thp_head(vmf->page);
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	unsigned long length;
 	loff_t offset;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/11] iomap/fs/block patches for 5.11
  2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2020-08-24 15:17 ` [PATCH 11/11] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
@ 2020-08-25 10:29 ` William Kucharski
  11 siblings, 0 replies; 18+ messages in thread
From: William Kucharski @ 2020-08-25 10:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-block,
	linux-mm, linux-kernel

Really nice improvements here.

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

> On Aug 24, 2020, at 9:16 AM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> As promised earlier [1], here are the patches which I would like to
> merge into 5.11 to support THPs.  They depend on that earlier series.
> If there's anything in here that you'd like to see pulled out and added
> to that earlier series, let me know.
> 
> There are a couple of pieces in here which aren't exactly part of
> iomap, but I think make sense to take through the iomap tree.
> 
> [1] https://lore.kernel.org/linux-fsdevel/20200824145511.10500-1-willy@infradead.org/
> 
> Matthew Wilcox (Oracle) (11):
>  fs: Make page_mkwrite_check_truncate thp-aware
>  mm: Support THPs in zero_user_segments
>  mm: Zero the head page, not the tail page
>  block: Add bio_for_each_thp_segment_all
>  iomap: Support THPs in iomap_adjust_read_range
>  iomap: Support THPs in invalidatepage
>  iomap: Support THPs in read paths
>  iomap: Change iomap_write_begin calling convention
>  iomap: Support THPs in write paths
>  iomap: Inline data shouldn't see THPs
>  iomap: Handle tail pages in iomap_page_mkwrite
> 
> fs/iomap/buffered-io.c  | 178 ++++++++++++++++++++++++----------------
> include/linux/bio.h     |  13 +++
> include/linux/bvec.h    |  27 ++++++
> include/linux/highmem.h |  15 +++-
> include/linux/pagemap.h |  10 +--
> mm/highmem.c            |  62 +++++++++++++-
> mm/shmem.c              |   7 ++
> mm/truncate.c           |   7 ++
> 8 files changed, 236 insertions(+), 83 deletions(-)
> 
> -- 
> 2.28.0
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 04/11] block: Add bio_for_each_thp_segment_all
  2020-08-24 15:16 ` [PATCH 04/11] block: Add bio_for_each_thp_segment_all Matthew Wilcox (Oracle)
@ 2020-08-27  8:44   ` Christoph Hellwig
  2020-08-31 19:48     ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-block,
	linux-mm, linux-kernel

On Mon, Aug 24, 2020 at 04:16:53PM +0100, Matthew Wilcox (Oracle) wrote:
> Iterate once for each THP instead of once for each base page.

FYI, I've always been wondering if bio_for_each_segment_all is the
right interface for the I/O completions, because we generally don't
need the fake bvecs for each page.  Only the first page can have an
offset, and only the last page can be end earlier than the end of
the page size.

It would seem way more efficient to just have a helper that extracts
the offset and end, and just use that in a loop that does the way
cheaper iteration over the physical addresses only.  This might (or
might) not be a good time to switch to that model for iomap.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 04/11] block: Add bio_for_each_thp_segment_all
  2020-08-27  8:44   ` Christoph Hellwig
@ 2020-08-31 19:48     ` Matthew Wilcox
  2020-09-01  5:34       ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-08-31 19:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-block,
	linux-mm, linux-kernel

On Thu, Aug 27, 2020 at 09:44:31AM +0100, Christoph Hellwig wrote:
> On Mon, Aug 24, 2020 at 04:16:53PM +0100, Matthew Wilcox (Oracle) wrote:
> > Iterate once for each THP instead of once for each base page.
> 
> FYI, I've always been wondering if bio_for_each_segment_all is the
> right interface for the I/O completions, because we generally don't
> need the fake bvecs for each page.  Only the first page can have an
> offset, and only the last page can be end earlier than the end of
> the page size.
> 
> It would seem way more efficient to just have a helper that extracts
> the offset and end, and just use that in a loop that does the way
> cheaper iteration over the physical addresses only.  This might (or
> might) not be a good time to switch to that model for iomap.

Something like this?

static void iomap_read_end_io(struct bio *bio)
{
        int i, error = blk_status_to_errno(bio->bi_status);

        for (i = 0; i < bio->bi_vcnt; i++) {
                struct bio_vec *bvec = &bio->bi_io_vec[i];
                size_t offset = bvec->bv_offset;
                size_t length = bvec->bv_len;
                struct page *page = bvec->bv_page;

                while (length > 0) { 
                        size_t count = thp_size(page) - offset;
                        
                        if (count > length)
                                count = length;
                        iomap_read_page_end_io(page, offset, count, error);
                        page += (offset + count) / PAGE_SIZE;
                        length -= count;
                        offset = 0;
                }
        }

        bio_put(bio);
}

Maybe I'm missing something important here, but it's significantly
simpler code -- iomap_read_end_io() goes down from 816 bytes to 560 bytes
(256 bytes less!) iomap_read_page_end_io is inlined into it both before
and after.

There is some weirdness going on with regards to bv_offset that I don't
quite understand.  In the original bvec_advance:

                bv->bv_page = bvec->bv_page + (bvec->bv_offset >> PAGE_SHIFT);
                bv->bv_offset = bvec->bv_offset & ~PAGE_MASK;

which I cargo-culted into bvec_thp_advance as:

                bv->bv_page = thp_head(bvec->bv_page +
                                (bvec->bv_offset >> PAGE_SHIFT));
                page_size = thp_size(bv->bv_page);
                bv->bv_offset = bvec->bv_offset -
                                (bv->bv_page - bvec->bv_page) * PAGE_SIZE;

Is it possible to have a bvec with an offset that is larger than the
size of bv_page?  That doesn't seem like a useful thing to do, but
if that needs to be supported, then the code up top doesn't do that.
We maybe gain a little bit by counting length down to 0 instead of
counting it up to bv_len.  I dunno; reading the code over now, it
doesn't seem like that much of a difference.

Maybe you meant something different?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 04/11] block: Add bio_for_each_thp_segment_all
  2020-08-31 19:48     ` Matthew Wilcox
@ 2020-09-01  5:34       ` Christoph Hellwig
  2020-09-01 13:05         ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2020-09-01  5:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, Darrick J . Wong,
	linux-block, linux-mm, linux-kernel

On Mon, Aug 31, 2020 at 08:48:37PM +0100, Matthew Wilcox wrote:
> static void iomap_read_end_io(struct bio *bio)
> {
>         int i, error = blk_status_to_errno(bio->bi_status);
> 
>         for (i = 0; i < bio->bi_vcnt; i++) {
>                 struct bio_vec *bvec = &bio->bi_io_vec[i];

This should probably use bio_for_each_bvec_all instead of directly
poking into the bio.  I'd also be tempted to move the loop body into
a separate helper, but that's just a slight stylistic preference.

>                 size_t offset = bvec->bv_offset;
>                 size_t length = bvec->bv_len;
>                 struct page *page = bvec->bv_page;
> 
>                 while (length > 0) { 
>                         size_t count = thp_size(page) - offset;
>                         
>                         if (count > length)
>                                 count = length;
>                         iomap_read_page_end_io(page, offset, count, error);
>                         page += (offset + count) / PAGE_SIZE;

Shouldn't the page_size here be thp_size?

> Maybe I'm missing something important here, but it's significantly
> simpler code -- iomap_read_end_io() goes down from 816 bytes to 560 bytes
> (256 bytes less!) iomap_read_page_end_io is inlined into it both before
> and after.

Yes, that's exactly why I think avoiding bio_for_each_segment_all is
a good idea in general.

> There is some weirdness going on with regards to bv_offset that I don't
> quite understand.  In the original bvec_advance:
> 
>                 bv->bv_page = bvec->bv_page + (bvec->bv_offset >> PAGE_SHIFT);
>                 bv->bv_offset = bvec->bv_offset & ~PAGE_MASK;
> 
> which I cargo-culted into bvec_thp_advance as:
> 
>                 bv->bv_page = thp_head(bvec->bv_page +
>                                 (bvec->bv_offset >> PAGE_SHIFT));
>                 page_size = thp_size(bv->bv_page);
>                 bv->bv_offset = bvec->bv_offset -
>                                 (bv->bv_page - bvec->bv_page) * PAGE_SIZE;
> 
> Is it possible to have a bvec with an offset that is larger than the
> size of bv_page?  That doesn't seem like a useful thing to do, but
> if that needs to be supported, then the code up top doesn't do that.
> We maybe gain a little bit by counting length down to 0 instead of
> counting it up to bv_len.  I dunno; reading the code over now, it
> doesn't seem like that much of a difference.

Drivers can absolutely see a bv_offset that is larger due to bio
splitting.  However the submitting file system should never see one
unless it creates one, which would be stupid.

And yes, eventually bv_page and bv_offset should be replaced with a

	phys_addr_t		bv_phys;

and life would become simpler in many places (and the bvec would
shrink for most common setups as well).

For now I'd end up with something like:

static void iomap_read_end_bvec(struct page *page, size_t offset,
		size_t length, int error)
{
	while (length > 0) {
		size_t page_size = thp_size(page);
		size_t count = min(page_size - offset, length);

		iomap_read_page_end_io(page, offset, count, error);

		page += (offset + count) / page_size;
		length -= count;
		offset = 0;
	}
}

static void iomap_read_end_io(struct bio *bio)
{
	int i, error = blk_status_to_errno(bio->bi_status);
	struct bio_vec *bvec;

	bio_for_each_bvec_all(bvec, bio, i)
		iomap_read_end_bvec(bvec->bv_page, bvec->bv_offset,
				    bvec->bv_len, error;
        bio_put(bio);
}

and maybe even merge iomap_read_page_end_io into iomap_read_end_bvec.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 04/11] block: Add bio_for_each_thp_segment_all
  2020-09-01  5:34       ` Christoph Hellwig
@ 2020-09-01 13:05         ` Matthew Wilcox
  2020-09-01 14:50           ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-09-01 13:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-block,
	linux-mm, linux-kernel

On Tue, Sep 01, 2020 at 06:34:26AM +0100, Christoph Hellwig wrote:
> On Mon, Aug 31, 2020 at 08:48:37PM +0100, Matthew Wilcox wrote:
> > static void iomap_read_end_io(struct bio *bio)
> > {
> >         int i, error = blk_status_to_errno(bio->bi_status);
> > 
> >         for (i = 0; i < bio->bi_vcnt; i++) {
> >                 struct bio_vec *bvec = &bio->bi_io_vec[i];
> 
> This should probably use bio_for_each_bvec_all instead of directly
> poking into the bio.  I'd also be tempted to move the loop body into
> a separate helper, but that's just a slight stylistic preference.

Ah, got it.

> >                 size_t offset = bvec->bv_offset;
> >                 size_t length = bvec->bv_len;
> >                 struct page *page = bvec->bv_page;
> > 
> >                 while (length > 0) { 
> >                         size_t count = thp_size(page) - offset;
> >                         
> >                         if (count > length)
> >                                 count = length;
> >                         iomap_read_page_end_io(page, offset, count, error);
> >                         page += (offset + count) / PAGE_SIZE;
> 
> Shouldn't the page_size here be thp_size?

No.  Let's suppose we have a 20kB I/O which starts on a page boundary and
the first page is order-2.  To get from the first head page to the second
page, we need to add 4, which is 16kB / 4kB, not 16kB / 16kB.

> > Maybe I'm missing something important here, but it's significantly
> > simpler code -- iomap_read_end_io() goes down from 816 bytes to 560 bytes
> > (256 bytes less!) iomap_read_page_end_io is inlined into it both before
> > and after.
> 
> Yes, that's exactly why I think avoiding bio_for_each_segment_all is
> a good idea in general.

I took out all the attempts to cope with insane bv_offset to compare like
with like and I got the bio_for_each_thp_segment_all() version down to
656 bytes:

@@ -166,21 +166,15 @@ static inline void bvec_thp_advance(const struct bio_vec *
bvec,
                                struct bvec_iter_all *iter_all)
 {
        struct bio_vec *bv = &iter_all->bv;
-       unsigned int page_size;
 
        if (iter_all->done) {
                bv->bv_page += thp_nr_pages(bv->bv_page);
-               page_size = thp_size(bv->bv_page);
                bv->bv_offset = 0;
        } else {
-               bv->bv_page = thp_head(bvec->bv_page +
-                               (bvec->bv_offset >> PAGE_SHIFT));
-               page_size = thp_size(bv->bv_page);
-               bv->bv_offset = bvec->bv_offset -
-                               (bv->bv_page - bvec->bv_page) * PAGE_SIZE;
-               BUG_ON(bv->bv_offset >= page_size);
+               bv->bv_page = bvec->bv_page;
+               bv->bv_offset = bvec->bv_offset;
        }
-       bv->bv_len = min(page_size - bv->bv_offset,
+       bv->bv_len = min_t(unsigned int, thp_size(bv->bv_page) - bv->bv_offset,
                         bvec->bv_len - iter_all->done);
        iter_all->done += bv->bv_len;

> And yes, eventually bv_page and bv_offset should be replaced with a
> 
> 	phys_addr_t		bv_phys;
> 
> and life would become simpler in many places (and the bvec would
> shrink for most common setups as well).

I'd very much like to see that.  It causes quite considerable pain for
our virtualisation people that we need a struct page.  They'd like the
hypervisor to not have struct pages for the guest's memory, but if they
don't have them, they can't do I/O to them.  Perhaps I'll try getting
one of them to work on this.

I'm not entirely sure the bvec would shrink.  On 64-bit systems, it's
currently 8 bytes for the struct page, 4 bytes for the len and 4 bytes
for the offset.  Sure, we can get rid of the offset, but the compiler
will just pad the struct from 12 bytes back to 16.  On 32-bit systems
with 32-bit phys_addr_t, we go from 12 bytes down to 8, but most 32-bit
systems have a 64-bit phys_addr_t these days, don't they?

> For now I'd end up with something like:
> 
> static void iomap_read_end_bvec(struct page *page, size_t offset,
> 		size_t length, int error)
> {
> 	while (length > 0) {
> 		size_t page_size = thp_size(page);
> 		size_t count = min(page_size - offset, length);
> 
> 		iomap_read_page_end_io(page, offset, count, error);
> 
> 		page += (offset + count) / page_size;
> 		length -= count;
> 		offset = 0;
> 	}
> }
> 
> static void iomap_read_end_io(struct bio *bio)
> {
> 	int i, error = blk_status_to_errno(bio->bi_status);
> 	struct bio_vec *bvec;
> 
> 	bio_for_each_bvec_all(bvec, bio, i)
> 		iomap_read_end_bvec(bvec->bv_page, bvec->bv_offset,
> 				    bvec->bv_len, error;
>         bio_put(bio);
> }
> 
> and maybe even merge iomap_read_page_end_io into iomap_read_end_bvec.

The lines start to get a bit long.  Here's what I currently have on
the write side:

@@ -1104,6 +1117,20 @@ iomap_finish_page_writeback(struct inode *inode, struct p
age *page,
                end_page_writeback(page);
 }
 
+static void iomap_finish_bvec_write(struct inode *inode, struct page *page,
+               size_t offset, size_t length, int error)
+{
+       while (length > 0) {
+               size_t count = min(thp_size(page) - offset, length);
+
+               iomap_finish_page_writeback(inode, page, error, count);
+
+               page += (offset + count) / PAGE_SIZE;
+               offset = 0;
+               length -= count;
+       }
+}
+
 /*
  * We're now finished for good with this ioend structure.  Update the page
  * state, release holds on bios, and finally free up memory.  Do not use the
@@ -1121,7 +1148,7 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 
        for (bio = &ioend->io_inline_bio; bio; bio = next) {
                struct bio_vec *bv;
-               struct bvec_iter_all iter_all;
+               int i;
 
                /*
                 * For the last bio, bi_private points to the ioend, so we
@@ -1133,9 +1160,9 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
                        next = bio->bi_private;
 
                /* walk each page on bio, ending page IO on them */
-               bio_for_each_thp_segment_all(bv, bio, iter_all)
-                       iomap_finish_page_writeback(inode, bv->bv_page, error,
-                                       bv->bv_len);
+               bio_for_each_bvec_all(bv, bio, i)
+                       iomap_finish_bvec_writeback(inode, bv->bv_page,
+                                       bv->bv_offset, bv->bv_len, error);
                bio_put(bio);
        }
        /* The ioend has been freed by bio_put() */

That's a bit more boilerplate than I'd like, but if bio_vec is going to
lose its bv_page then I don't see a better way.  Unless we come up with
a different page/offset/length struct that bio_vecs are decomposed into.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 04/11] block: Add bio_for_each_thp_segment_all
  2020-09-01 13:05         ` Matthew Wilcox
@ 2020-09-01 14:50           ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-09-01 14:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, Darrick J . Wong,
	linux-block, linux-mm, linux-kernel

On Tue, Sep 01, 2020 at 02:05:25PM +0100, Matthew Wilcox wrote:
> > >                 struct page *page = bvec->bv_page;
> > > 
> > >                 while (length > 0) { 
> > >                         size_t count = thp_size(page) - offset;
> > >                         
> > >                         if (count > length)
> > >                                 count = length;
> > >                         iomap_read_page_end_io(page, offset, count, error);
> > >                         page += (offset + count) / PAGE_SIZE;
> > 
> > Shouldn't the page_size here be thp_size?
> 
> No.  Let's suppose we have a 20kB I/O which starts on a page boundary and
> the first page is order-2.  To get from the first head page to the second
> page, we need to add 4, which is 16kB / 4kB, not 16kB / 16kB.

True.

> I'm not entirely sure the bvec would shrink.  On 64-bit systems, it's
> currently 8 bytes for the struct page, 4 bytes for the len and 4 bytes
> for the offset.  Sure, we can get rid of the offset, but the compiler
> will just pad the struct from 12 bytes back to 16.  On 32-bit systems
> with 32-bit phys_addr_t, we go from 12 bytes down to 8, but most 32-bit
> systems have a 64-bit phys_addr_t these days, don't they?

Actually on those system that still are 32-bit because they are so
tiny I'd very much still expect a 32-bit phys_addr_t.  E.g. arm
without LPAE or 32-bit RISC-V.

But yeah, point taken on the alignment for the 64-bit ones.

> That's a bit more boilerplate than I'd like, but if bio_vec is going to
> lose its bv_page then I don't see a better way.  Unless we come up with
> a different page/offset/length struct that bio_vecs are decomposed into.

I'm not sure it is going to lose bv_page any time soon.  I'd sure like
to, but least time something like that came up Linus wasn't entirely
in favor.  Things might have changed now, though and I think it is about
time to give it another try.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-01 14:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 01/11] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 02/11] mm: Support THPs in zero_user_segments Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 03/11] mm: Zero the head page, not the tail page Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 04/11] block: Add bio_for_each_thp_segment_all Matthew Wilcox (Oracle)
2020-08-27  8:44   ` Christoph Hellwig
2020-08-31 19:48     ` Matthew Wilcox
2020-09-01  5:34       ` Christoph Hellwig
2020-09-01 13:05         ` Matthew Wilcox
2020-09-01 14:50           ` Christoph Hellwig
2020-08-24 15:16 ` [PATCH 05/11] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 06/11] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 07/11] iomap: Support THPs in read paths Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 08/11] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 09/11] iomap: Support THPs in write paths Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 10/11] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
2020-08-24 15:17 ` [PATCH 11/11] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
2020-08-25 10:29 ` [PATCH 00/11] iomap/fs/block patches for 5.11 William Kucharski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).