linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] THP iomap patches for 5.10
@ 2020-08-24 14:55 Matthew Wilcox (Oracle)
  2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
                   ` (8 more replies)
  0 siblings, 9 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

These patches are carefully plucked from the THP series.  I would like
them to hit 5.10 to make the THP patchset merge easier.  Some of these
are just generic improvements that make sense on their own terms, but
the overall intent is to support THPs in iomap.

I'll send another patch series later today which are the changes to
iomap which don't pay their own way until we actually have THPs in the
page cache.  I would like those to be reviewed with an eye to merging
them into 5.11.

Matthew Wilcox (Oracle) (9):
  iomap: Fix misplaced page flushing
  fs: Introduce i_blocks_per_page
  iomap: Use kzalloc to allocate iomap_page
  iomap: Use bitmap ops to set uptodate bits
  iomap: Support arbitrarily many blocks per page
  iomap: Convert read_count to byte count
  iomap: Convert write_count to byte count
  iomap: Convert iomap_write_end types
  iomap: Change calling convention for zeroing

 fs/dax.c                |  13 ++--
 fs/iomap/buffered-io.c  | 145 ++++++++++++++++------------------------
 fs/jfs/jfs_metapage.c   |   2 +-
 fs/xfs/xfs_aops.c       |   2 +-
 include/linux/dax.h     |   3 +-
 include/linux/pagemap.h |  16 +++++
 6 files changed, 83 insertions(+), 98 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] iomap: Fix misplaced page flushing
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-24 23:51   ` Dave Chinner
                     ` (2 more replies)
  2020-08-24 14:55 ` [PATCH 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

If iomap_unshare_actor() unshares to an inline iomap, the page was
not being flushed.  block_write_end() and __iomap_write_end() already
contain flushes, so adding it to iomap_write_end_inline() seems like
the best place.  That means we can remove it from iomap_write_actor().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcfc288dba3f..cffd575e57b6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -715,6 +715,7 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 {
 	void *addr;
 
+	flush_dcache_page(page);
 	WARN_ON_ONCE(!PageUptodate(page));
 	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
@@ -811,8 +812,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-		flush_dcache_page(page);
-
 		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
 				srcmap);
 		if (unlikely(status < 0))
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/9] fs: Introduce i_blocks_per_page
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
  2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-24 23:55   ` Dave Chinner
  2020-08-25 20:49   ` Darrick J. Wong
  2020-08-24 14:55 ` [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-nvdimm, linux-kernel, Christoph Hellwig

This helper is useful for both THPs and for supporting block size larger
than page size.  Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c  |  8 ++++----
 fs/jfs/jfs_metapage.c   |  2 +-
 fs/xfs/xfs_aops.c       |  2 +-
 include/linux/pagemap.h | 16 ++++++++++++++++
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index cffd575e57b6..13d5cdab8dcd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 
-	if (iop || i_blocksize(inode) == PAGE_SIZE)
+	if (iop || i_blocks_per_page(inode, page) <= 1)
 		return iop;
 
 	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
@@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
+	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
 		if (i >= first && i <= last)
 			set_bit(i, iop->uptodate);
 		else if (!test_bit(i, iop->uptodate))
@@ -1078,7 +1078,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
 		mapping_set_error(inode->i_mapping, -EIO);
 	}
 
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
 
 	if (!iop || atomic_dec_and_test(&iop->write_count))
@@ -1374,7 +1374,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
 
 	/*
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index a2f5338a5ea1..176580f54af9 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
 	struct inode *inode = page->mapping->host;
 	struct bio *bio = NULL;
 	int block_offset;
-	int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
+	int blocks_per_page = i_blocks_per_page(inode, page);
 	sector_t page_start;	/* address of page in fs blocks */
 	sector_t pblock;
 	int xlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index b35611882ff9..55d126d4e096 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -544,7 +544,7 @@ xfs_discard_page(
 			page, ip->i_ino, offset);
 
 	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-			PAGE_SIZE / i_blocksize(inode));
+			i_blocks_per_page(inode, page));
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7de11dcd534d..853733286138 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -899,4 +899,20 @@ static inline int page_mkwrite_check_truncate(struct page *page,
 	return offset;
 }
 
+/**
+ * i_blocks_per_page - How many blocks fit in this page.
+ * @inode: The inode which contains the blocks.
+ * @page: The page (head page if the page is a THP).
+ *
+ * If the block size is larger than the size of this page, return zero.
+ *
+ * Context: The caller should hold a refcount on the page to prevent it
+ * from being split.
+ * Return: The number of filesystem blocks covered by this page.
+ */
+static inline
+unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
+{
+	return thp_size(page) >> inode->i_blkbits;
+}
 #endif /* _LINUX_PAGEMAP_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
  2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
  2020-08-24 14:55 ` [PATCH 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-24 23:56   ` Dave Chinner
  2020-08-25 20:49   ` Darrick J. Wong
  2020-08-24 14:55 ` [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-nvdimm, linux-kernel, Christoph Hellwig

We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked.  The comment is no longer useful as
attach_page_private() handles the refcount now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 13d5cdab8dcd..639d54a4177e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -49,16 +49,8 @@ iomap_page_create(struct inode *inode, struct page *page)
 	if (iop || i_blocks_per_page(inode, page) <= 1)
 		return iop;
 
-	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
-	atomic_set(&iop->read_count, 0);
-	atomic_set(&iop->write_count, 0);
+	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
-	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
-	/*
-	 * migrate_page_move_mapping() assumes that pages with private data have
-	 * their count elevated by 1.
-	 */
 	attach_page_private(page, iop);
 	return iop;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-08-24 14:55 ` [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-24 23:56   ` Dave Chinner
  2020-08-25 20:50   ` Darrick J. Wong
  2020-08-24 14:55 ` [PATCH 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, linux-nvdimm, linux-kernel, Christoph Hellwig

Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 639d54a4177e..dbf9572dabe9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -134,19 +134,11 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	struct inode *inode = page->mapping->host;
 	unsigned first = off >> inode->i_blkbits;
 	unsigned last = (off + len - 1) >> inode->i_blkbits;
-	bool uptodate = true;
 	unsigned long flags;
-	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
-		if (i >= first && i <= last)
-			set_bit(i, iop->uptodate);
-		else if (!test_bit(i, iop->uptodate))
-			uptodate = false;
-	}
-
-	if (uptodate)
+	bitmap_set(iop->uptodate, first, last - first + 1);
+	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
 		SetPageUptodate(page);
 	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-08-24 14:55 ` [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-24 23:59   ` Dave Chinner
                     ` (2 more replies)
  2020-08-24 14:55 ` [PATCH 6/9] iomap: Convert read_count to byte count Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

Size the uptodate array dynamically to support larger pages in the
page cache.  With a 64kB page, we're only saving 8 bytes per page today,
but with a 2MB maximum page size, we'd have to allocate more than 4kB
per page.  Add a few debugging assertions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index dbf9572dabe9..844e95cacea8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -22,18 +22,19 @@
 #include "../internal.h"
 
 /*
- * Structure allocated for each page when block size < PAGE_SIZE to track
+ * Structure allocated for each page when block size < page size to track
  * sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
 	atomic_t		read_count;
 	atomic_t		write_count;
 	spinlock_t		uptodate_lock;
-	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+	unsigned long		uptodate[];
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
 {
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
 	if (page_has_private(page))
 		return (struct iomap_page *)page_private(page);
 	return NULL;
@@ -45,11 +46,13 @@ static struct iomap_page *
 iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
+	unsigned int nr_blocks = i_blocks_per_page(inode, page);
 
-	if (iop || i_blocks_per_page(inode, page) <= 1)
+	if (iop || nr_blocks <= 1)
 		return iop;
 
-	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
+	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+			GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
 	attach_page_private(page, iop);
 	return iop;
@@ -59,11 +62,14 @@ static void
 iomap_page_release(struct page *page)
 {
 	struct iomap_page *iop = detach_page_private(page);
+	unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
 
 	if (!iop)
 		return;
 	WARN_ON_ONCE(atomic_read(&iop->read_count));
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
+	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
+			PageUptodate(page));
 	kfree(iop);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/9] iomap: Convert read_count to byte count
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-08-24 14:55 ` [PATCH 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-25  0:09   ` Dave Chinner
  2020-08-27  8:35   ` Christoph Hellwig
  2020-08-24 14:55 ` [PATCH 7/9] iomap: Convert write_count " Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 844e95cacea8..c9b44f86d166 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -161,13 +161,6 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 		SetPageUptodate(page);
 }
 
-static void
-iomap_read_finish(struct iomap_page *iop, struct page *page)
-{
-	if (!iop || atomic_dec_and_test(&iop->read_count))
-		unlock_page(page);
-}
-
 static void
 iomap_read_page_end_io(struct bio_vec *bvec, int error)
 {
@@ -181,7 +174,8 @@ iomap_read_page_end_io(struct bio_vec *bvec, int error)
 		iomap_set_range_uptodate(page, bvec->bv_offset, bvec->bv_len);
 	}
 
-	iomap_read_finish(iop, page);
+	if (!iop || atomic_sub_and_test(bvec->bv_len, &iop->read_count))
+		unlock_page(page);
 }
 
 static void
@@ -269,20 +263,17 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
 		is_contig = true;
 
-	if (is_contig &&
-	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page)) {
-		if (!same_page && iop)
-			atomic_inc(&iop->read_count);
-		goto done;
-	}
-
 	/*
-	 * If we start a new segment we need to increase the read count, and we
-	 * need to do so before submitting any previous full bio to make sure
-	 * that we don't prematurely unlock the page.
+	 * We need to increase the read count before submitting any
+	 * previous bio to make sure that we don't prematurely unlock
+	 * the page.
 	 */
 	if (iop)
-		atomic_inc(&iop->read_count);
+		atomic_add(plen, &iop->read_count);
+
+	if (is_contig &&
+	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page))
+		goto done;
 
 	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/9] iomap: Convert write_count to byte count
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-08-24 14:55 ` [PATCH 6/9] iomap: Convert read_count to byte count Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-27  8:36   ` Christoph Hellwig
  2020-08-24 14:55 ` [PATCH 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
  2020-08-24 14:55 ` [PATCH 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
  8 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c9b44f86d166..8c6187a6f34e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1050,7 +1050,7 @@ EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
 static void
 iomap_finish_page_writeback(struct inode *inode, struct page *page,
-		int error)
+		int error, unsigned int len)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 
@@ -1062,7 +1062,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
 	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
 
-	if (!iop || atomic_dec_and_test(&iop->write_count))
+	if (!iop || atomic_sub_and_test(len, &iop->write_count))
 		end_page_writeback(page);
 }
 
@@ -1096,7 +1096,8 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 
 		/* walk each page on bio, ending page IO on them */
 		bio_for_each_segment_all(bv, bio, iter_all)
-			iomap_finish_page_writeback(inode, bv->bv_page, error);
+			iomap_finish_page_writeback(inode, bv->bv_page, error,
+					bv->bv_len);
 		bio_put(bio);
 	}
 	/* The ioend has been freed by bio_put() */
@@ -1312,8 +1313,8 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 
 	merged = __bio_try_merge_page(wpc->ioend->io_bio, page, len, poff,
 			&same_page);
-	if (iop && !same_page)
-		atomic_inc(&iop->write_count);
+	if (iop)
+		atomic_add(len, &iop->write_count);
 
 	if (!merged) {
 		if (bio_full(wpc->ioend->io_bio, len)) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 8/9] iomap: Convert iomap_write_end types
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-08-24 14:55 ` [PATCH 7/9] iomap: Convert write_count " Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-25  0:12   ` Dave Chinner
  2020-08-27  8:41   ` Christoph Hellwig
  2020-08-24 14:55 ` [PATCH 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

iomap_write_end cannot return an error, so switch it to return
size_t instead of int and remove the error checking from the callers.
Also convert the arguments to size_t from unsigned int, in case anyone
ever wants to support a page size larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8c6187a6f34e..7f618ab4b11e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -666,9 +666,8 @@ iomap_set_page_dirty(struct page *page)
 }
 EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 
-static int
-__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page)
+static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+		size_t copied, struct page *page)
 {
 	flush_dcache_page(page);
 
@@ -690,9 +689,8 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	return copied;
 }
 
-static int
-iomap_write_end_inline(struct inode *inode, struct page *page,
-		struct iomap *iomap, loff_t pos, unsigned copied)
+static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
+		struct iomap *iomap, loff_t pos, size_t copied)
 {
 	void *addr;
 
@@ -708,13 +706,14 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 	return copied;
 }
 
-static int
-iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
-		struct page *page, struct iomap *iomap, struct iomap *srcmap)
+/* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
+static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+		size_t copied, struct page *page, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	loff_t old_size = inode->i_size;
-	int ret;
+	size_t ret;
 
 	if (srcmap->type == IOMAP_INLINE) {
 		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
@@ -793,11 +792,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
+		copied = iomap_write_end(inode, pos, bytes, copied, page, iomap,
 				srcmap);
-		if (unlikely(status < 0))
-			break;
-		copied = status;
 
 		cond_resched();
 
@@ -871,11 +867,8 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
 				srcmap);
-		if (unlikely(status <= 0)) {
-			if (WARN_ON_ONCE(status == 0))
-				return -EIO;
-			return status;
-		}
+		if (WARN_ON_ONCE(status == 0))
+			return -EIO;
 
 		cond_resched();
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-08-24 14:55 ` [PATCH 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
@ 2020-08-24 14:55 ` Matthew Wilcox (Oracle)
  2020-08-25  0:27   ` Dave Chinner
  2020-08-25 22:23   ` Darrick J. Wong
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-24 14:55 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle), Darrick J . Wong, linux-nvdimm, linux-kernel

Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled.  This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/dax.c               | 13 ++++++-------
 fs/iomap/buffered-io.c | 33 +++++++++++++++------------------
 include/linux/dax.h    |  3 +--
 3 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 95341af1a966..f2b912cb034e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1037,18 +1037,18 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
 	return ret;
 }
 
-int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
-		   struct iomap *iomap)
+loff_t dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
 	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
 	pgoff_t pgoff;
 	long rc, id;
 	void *kaddr;
 	bool page_aligned = false;
-
+	unsigned offset = offset_in_page(pos);
+	unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
 	if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
-	    IS_ALIGNED(size, PAGE_SIZE))
+	    (size == PAGE_SIZE))
 		page_aligned = true;
 
 	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
@@ -1058,8 +1058,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
 	id = dax_read_lock();
 
 	if (page_aligned)
-		rc = dax_zero_page_range(iomap->dax_dev, pgoff,
-					 size >> PAGE_SHIFT);
+		rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
 	else
 		rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
 	if (rc < 0) {
@@ -1072,7 +1071,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
 		dax_flush(iomap->dax_dev, kaddr + offset, size);
 	}
 	dax_read_unlock(id);
-	return 0;
+	return size;
 }
 
 static loff_t
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7f618ab4b11e..2dba054095e8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -901,11 +901,13 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_zero(struct inode *inode, loff_t pos, u64 length,
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page;
 	int status;
+	unsigned offset = offset_in_page(pos);
+	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
 
 	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
 	if (status)
@@ -917,38 +919,33 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
-static loff_t
-iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
+		loff_t length, void *data, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	bool *did_zero = data;
 	loff_t written = 0;
-	int status;
 
 	/* already zeroed?  we're done. */
 	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
-		return count;
+		return length;
 
 	do {
-		unsigned offset, bytes;
-
-		offset = offset_in_page(pos);
-		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
+		loff_t bytes;
 
 		if (IS_DAX(inode))
-			status = dax_iomap_zero(pos, offset, bytes, iomap);
+			bytes = dax_iomap_zero(pos, length, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap,
-					srcmap);
-		if (status < 0)
-			return status;
+			bytes = iomap_zero(inode, pos, length, iomap, srcmap);
+		if (bytes < 0)
+			return bytes;
 
 		pos += bytes;
-		count -= bytes;
+		length -= bytes;
 		written += bytes;
 		if (did_zero)
 			*did_zero = true;
-	} while (count > 0);
+	} while (length > 0);
 
 	return written;
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 6904d4e0b2e0..80f17946f940 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -214,8 +214,7 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
-			struct iomap *iomap);
+loff_t dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap);
 static inline bool dax_mapping(struct address_space *mapping)
 {
 	return mapping->host && IS_DAX(mapping->host);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] iomap: Fix misplaced page flushing
  2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
@ 2020-08-24 23:51   ` Dave Chinner
  2020-08-25 20:47   ` Darrick J. Wong
  2020-08-27  8:24   ` Christoph Hellwig
  2 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-24 23:51 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:02PM +0100, Matthew Wilcox (Oracle) wrote:
> If iomap_unshare_actor() unshares to an inline iomap, the page was
> not being flushed.  block_write_end() and __iomap_write_end() already
> contain flushes, so adding it to iomap_write_end_inline() seems like
> the best place.  That means we can remove it from iomap_write_actor().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] fs: Introduce i_blocks_per_page
  2020-08-24 14:55 ` [PATCH 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
@ 2020-08-24 23:55   ` Dave Chinner
  2020-08-25 20:49   ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-24 23:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm,
	linux-kernel, Christoph Hellwig

On Mon, Aug 24, 2020 at 03:55:03PM +0100, Matthew Wilcox (Oracle) wrote:
> This helper is useful for both THPs and for supporting block size larger
> than page size.  Convert all users that I could find (we have a few
> different ways of writing this idiom, and I may have missed some).

There probably is - ISTR having a lot more of these changes for the
block_size > page_size patches as it needed the same {page, inode}
abstraction for calculating the range to iterate. But they can be
dealt with on a case by case basis, I think.

> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c  |  8 ++++----
>  fs/jfs/jfs_metapage.c   |  2 +-
>  fs/xfs/xfs_aops.c       |  2 +-
>  include/linux/pagemap.h | 16 ++++++++++++++++
>  4 files changed, 22 insertions(+), 6 deletions(-)

Otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page
  2020-08-24 14:55 ` [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
@ 2020-08-24 23:56   ` Dave Chinner
  2020-08-25 20:49   ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-24 23:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm,
	linux-kernel, Christoph Hellwig

On Mon, Aug 24, 2020 at 03:55:04PM +0100, Matthew Wilcox (Oracle) wrote:
> We can skip most of the initialisation, although spinlocks still
> need explicit initialisation as architectures may use a non-zero
> value to indicate unlocked.  The comment is no longer useful as
> attach_page_private() handles the refcount now.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)

The sooner this goes in the better :)

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits
  2020-08-24 14:55 ` [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
@ 2020-08-24 23:56   ` Dave Chinner
  2020-08-25 20:50   ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-24 23:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm,
	linux-kernel, Christoph Hellwig

On Mon, Aug 24, 2020 at 03:55:05PM +0100, Matthew Wilcox (Oracle) wrote:
> Now that the bitmap is protected by a spinlock, we can use the
> more efficient bitmap ops instead of individual test/set bit ops.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-24 14:55 ` [PATCH 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
@ 2020-08-24 23:59   ` Dave Chinner
  2020-08-25  0:22     ` Matthew Wilcox
  2020-08-25 21:02   ` Darrick J. Wong
  2020-08-27  8:26   ` Christoph Hellwig
  2 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2020-08-24 23:59 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:06PM +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically to support larger pages in the
> page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> but with a 2MB maximum page size, we'd have to allocate more than 4kB
> per page.  Add a few debugging assertions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index dbf9572dabe9..844e95cacea8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -22,18 +22,19 @@
>  #include "../internal.h"
>  
>  /*
> - * Structure allocated for each page when block size < PAGE_SIZE to track
> + * Structure allocated for each page when block size < page size to track
>   * sub-page uptodate status and I/O completions.
>   */
>  struct iomap_page {
>  	atomic_t		read_count;
>  	atomic_t		write_count;
>  	spinlock_t		uptodate_lock;
> -	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
> +	unsigned long		uptodate[];
>  };
>  
>  static inline struct iomap_page *to_iomap_page(struct page *page)
>  {
> +	VM_BUG_ON_PGFLAGS(PageTail(page), page);
>  	if (page_has_private(page))
>  		return (struct iomap_page *)page_private(page);
>  	return NULL;

Just to confirm: this vm bug check is to needed becuse we only
attach the iomap_page to the head page of a compound page?

Assuming that I've understood the above correctly:

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/9] iomap: Convert read_count to byte count
  2020-08-24 14:55 ` [PATCH 6/9] iomap: Convert read_count to byte count Matthew Wilcox (Oracle)
@ 2020-08-25  0:09   ` Dave Chinner
  2020-08-25 22:14     ` Darrick J. Wong
  2020-08-27  8:35   ` Christoph Hellwig
  1 sibling, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2020-08-25  0:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:07PM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of counting bio segments, count the number of bytes submitted.
> This insulates us from the block layer's definition of what a 'same page'
> is, which is not necessarily clear once THPs are involved.

I'd like to see a comment on the definition of struct iomap_page to
indicate that read_count (and write count) reflect the byte count of
IO currently in flight on the page, not an IO count, because THP
makes tracking this via bio state hard. Otherwise it is not at all
obvious why it is done and why it is intentional...

Otherwise the code looks OK.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] iomap: Convert iomap_write_end types
  2020-08-24 14:55 ` [PATCH 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
@ 2020-08-25  0:12   ` Dave Chinner
  2020-08-25  1:06     ` Matthew Wilcox
  2020-08-27  8:41   ` Christoph Hellwig
  1 sibling, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2020-08-25  0:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:09PM +0100, Matthew Wilcox (Oracle) wrote:
> iomap_write_end cannot return an error, so switch it to return
> size_t instead of int and remove the error checking from the callers.
> Also convert the arguments to size_t from unsigned int, in case anyone
> ever wants to support a page size larger than 2GB.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 31 ++++++++++++-------------------
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8c6187a6f34e..7f618ab4b11e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -666,9 +666,8 @@ iomap_set_page_dirty(struct page *page)
>  }
>  EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
>  
> -static int
> -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page)
> +static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> +		size_t copied, struct page *page)
>  {

Please leave the function declarations formatted the same way as
they currently are. They are done that way intentionally so it is
easy to grep for function definitions. Not to mention is't much
easier to read than when the function name is commingled into the
multiline paramener list like...

> -static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> -		struct page *page, struct iomap *iomap, struct iomap *srcmap)
> +/* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
> +static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> +		size_t copied, struct page *page, struct iomap *iomap,
> +		struct iomap *srcmap)

... this.

Otherwise the code looks fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-24 23:59   ` Dave Chinner
@ 2020-08-25  0:22     ` Matthew Wilcox
  0 siblings, 0 replies; 42+ messages in thread
From: Matthew Wilcox @ 2020-08-25  0:22 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 09:59:18AM +1000, Dave Chinner wrote:
> On Mon, Aug 24, 2020 at 03:55:06PM +0100, Matthew Wilcox (Oracle) wrote:
> >  static inline struct iomap_page *to_iomap_page(struct page *page)
> >  {
> > +	VM_BUG_ON_PGFLAGS(PageTail(page), page);
> >  	if (page_has_private(page))
> >  		return (struct iomap_page *)page_private(page);
> >  	return NULL;
> 
> Just to confirm: this vm bug check is to needed becuse we only
> attach the iomap_page to the head page of a compound page?
> 
> Assuming that I've understood the above correctly:

That's correct.  If we get a tail page in this path, something's gone
wrong somewhere upstream of us, and the stack trace should tell us where.
It's PGFLAGS so it's usually compiled out by distro kernels (you need to
enable CONFIG_DEBUG_VM_PGFLAGS for it to be active).

It was definitely useful in development; probably not so useful for
a distro kernel to assert.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-24 14:55 ` [PATCH 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
@ 2020-08-25  0:27   ` Dave Chinner
  2020-08-25  3:26     ` Matthew Wilcox
  2020-08-25 22:23   ` Darrick J. Wong
  1 sibling, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2020-08-25  0:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:10PM +0100, Matthew Wilcox (Oracle) wrote:
> Pass the full length to iomap_zero() and dax_iomap_zero(), and have
> them return how many bytes they actually handled.  This is preparatory
> work for handling THP, although it looks like DAX could actually take
> advantage of it if there's a larger contiguous area.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/dax.c               | 13 ++++++-------
>  fs/iomap/buffered-io.c | 33 +++++++++++++++------------------
>  include/linux/dax.h    |  3 +--
>  3 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 95341af1a966..f2b912cb034e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1037,18 +1037,18 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
>  	return ret;
>  }
>  
> -int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> -		   struct iomap *iomap)
> +loff_t dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
>  {
>  	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
>  	pgoff_t pgoff;
>  	long rc, id;
>  	void *kaddr;
>  	bool page_aligned = false;
> -
> +	unsigned offset = offset_in_page(pos);
> +	unsigned size = min_t(u64, PAGE_SIZE - offset, length);
>  
>  	if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
> -	    IS_ALIGNED(size, PAGE_SIZE))
> +	    (size == PAGE_SIZE))
>  		page_aligned = true;
>  
>  	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
> @@ -1058,8 +1058,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
>  	id = dax_read_lock();
>  
>  	if (page_aligned)
> -		rc = dax_zero_page_range(iomap->dax_dev, pgoff,
> -					 size >> PAGE_SHIFT);
> +		rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
>  	else
>  		rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
>  	if (rc < 0) {
> @@ -1072,7 +1071,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
>  		dax_flush(iomap->dax_dev, kaddr + offset, size);
>  	}
>  	dax_read_unlock(id);
> -	return 0;
> +	return size;
>  }
>  
>  static loff_t
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7f618ab4b11e..2dba054095e8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -901,11 +901,13 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
> -static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_zero(struct inode *inode, loff_t pos, u64 length,
> +		struct iomap *iomap, struct iomap *srcmap)

While you are touching this, please convert it back to:

static loff_t
iomap_zero(struct inode *inode, loff_t pos, u64 length,
		struct iomap *iomap, struct iomap *srcmap)


>  {
>  	struct page *page;
>  	int status;
> +	unsigned offset = offset_in_page(pos);
> +	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>  
>  	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
>  	if (status)
> @@ -917,38 +919,33 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
> -static loff_t
> -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> -		void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
> +		loff_t length, void *data, struct iomap *iomap,
> +		struct iomap *srcmap)

And leave this as it was formatted.

/me missed this when iomap_readahead() was introduced, too.


>  {
>  	bool *did_zero = data;
>  	loff_t written = 0;
> -	int status;
>  
>  	/* already zeroed?  we're done. */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> -		return count;
> +		return length;
>  
>  	do {
> -		unsigned offset, bytes;
> -
> -		offset = offset_in_page(pos);
> -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> +		loff_t bytes;
>  
>  		if (IS_DAX(inode))
> -			status = dax_iomap_zero(pos, offset, bytes, iomap);
> +			bytes = dax_iomap_zero(pos, length, iomap);

Hmmm. everything is loff_t here, but the callers are defining length
as u64, not loff_t. Is there a potential sign conversion problem
here? (sure 64 bit is way beyond anything we'll pass here, but...)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] iomap: Convert iomap_write_end types
  2020-08-25  0:12   ` Dave Chinner
@ 2020-08-25  1:06     ` Matthew Wilcox
  2020-08-25  1:33       ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2020-08-25  1:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 10:12:23AM +1000, Dave Chinner wrote:
> > -static int
> > -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > -		unsigned copied, struct page *page)
> > +static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> > +		size_t copied, struct page *page)
> >  {
> 
> Please leave the function declarations formatted the same way as
> they currently are. They are done that way intentionally so it is
> easy to grep for function definitions. Not to mention is't much
> easier to read than when the function name is commingled into the
> multiline paramener list like...

I understand that's true for XFS, but it's not true throughout the
rest of the kernel.  This file isn't even consistent:

buffered-io.c:static inline struct iomap_page *to_iomap_page(struct page *page)
buffered-io.c:static inline bool iomap_block_needs_zeroing(struct inode
buffered-io.c:static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
buffered-io.c:static void iomap_writepage_end_bio(struct bio *bio)
buffered-io.c:static int __init iomap_init(void)

(i just grepped for ^static so there're other functions not covered by this)

The other fs/iomap/ files are equally inconsistent.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] iomap: Convert iomap_write_end types
  2020-08-25  1:06     ` Matthew Wilcox
@ 2020-08-25  1:33       ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-25  1:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 02:06:05AM +0100, Matthew Wilcox wrote:
> On Tue, Aug 25, 2020 at 10:12:23AM +1000, Dave Chinner wrote:
> > > -static int
> > > -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > > -		unsigned copied, struct page *page)
> > > +static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> > > +		size_t copied, struct page *page)
> > >  {
> > 
> > Please leave the function declarations formatted the same way as
> > they currently are. They are done that way intentionally so it is
> > easy to grep for function definitions. Not to mention is't much
> > easier to read than when the function name is commingled into the
> > multiline paramener list like...
> 
> I understand that's true for XFS, but it's not true throughout the
> rest of the kernel. 

What other code does is irrelevant. I'm trying to maintain and
improve the consistency of the format used for the fs/iomap code.

> This file isn't even consistent:
> 
> buffered-io.c:static inline struct iomap_page *to_iomap_page(struct page *page)
> buffered-io.c:static inline bool iomap_block_needs_zeroing(struct inode
> buffered-io.c:static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> buffered-io.c:static void iomap_writepage_end_bio(struct bio *bio)
> buffered-io.c:static int __init iomap_init(void)
> 
> (i just grepped for ^static so there're other functions not covered by this)

5 functions that have that format, compared to 45 that do have the
formatting I asked you to retain. It think it's pretty clear which
way consistency lies here...

> The other fs/iomap/ files are equally inconsistent.

Inconsistency always occurs when multiple people modify the same
code. Often that's simply because reviewers haven't noticed the
inconsistency - it's certainly not intentional.

Saying "No, I'm going to make the code less consistent because it's
already slightly inconsistent" is, IMO, not a valid response to a
review request to conform to the existing code layout in that file,
especially if it improves the consistency of the code being
modified. That's really not negotiable....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-25  0:27   ` Dave Chinner
@ 2020-08-25  3:26     ` Matthew Wilcox
  2020-08-25  3:35       ` Andreas Dilger
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2020-08-25  3:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote:
> >  	do {
> > -		unsigned offset, bytes;
> > -
> > -		offset = offset_in_page(pos);
> > -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> > +		loff_t bytes;
> >  
> >  		if (IS_DAX(inode))
> > -			status = dax_iomap_zero(pos, offset, bytes, iomap);
> > +			bytes = dax_iomap_zero(pos, length, iomap);
> 
> Hmmm. everything is loff_t here, but the callers are defining length
> as u64, not loff_t. Is there a potential sign conversion problem
> here? (sure 64 bit is way beyond anything we'll pass here, but...)

I've gone back and forth on the correct type for 'length' a few times.
size_t is too small (not for zeroing, but for seek()).  An unsigned type
seems right -- a length can't be negative, and we don't want to give
the impression that it can.  But the return value from these functions
definitely needs to be signed so we can represent an error.  So a u64
length with an loff_t return type feels like the best solution.  And
the upper layers have to promise not to pass in a length that's more
than 2^63-1.

I have a patch set which starts the conversion process for all the actors
over to using u64 for the length.  Obviously, that's not mixed in with
the THP patchset.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-25  3:26     ` Matthew Wilcox
@ 2020-08-25  3:35       ` Andreas Dilger
  2020-08-25  4:27         ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Andreas Dilger @ 2020-08-25  3:35 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dave Chinner, linux-xfs, linux-fsdevel, Darrick J . Wong,
	linux-nvdimm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1487 bytes --]

On Aug 24, 2020, at 9:26 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote:
>>> 	do {
>>> -		unsigned offset, bytes;
>>> -
>>> -		offset = offset_in_page(pos);
>>> -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
>>> +		loff_t bytes;
>>> 
>>> 		if (IS_DAX(inode))
>>> -			status = dax_iomap_zero(pos, offset, bytes, iomap);
>>> +			bytes = dax_iomap_zero(pos, length, iomap);
>> 
>> Hmmm. everything is loff_t here, but the callers are defining length
>> as u64, not loff_t. Is there a potential sign conversion problem
>> here? (sure 64 bit is way beyond anything we'll pass here, but...)
> 
> I've gone back and forth on the correct type for 'length' a few times.
> size_t is too small (not for zeroing, but for seek()).  An unsigned type
> seems right -- a length can't be negative, and we don't want to give
> the impression that it can.  But the return value from these functions
> definitely needs to be signed so we can represent an error.  So a u64
> length with an loff_t return type feels like the best solution.  And
> the upper layers have to promise not to pass in a length that's more
> than 2^63-1.

The problem with allowing a u64 as the length is that it leads to the
possibility of an argument value that cannot be returned.  Checking
length < 0 is not worse than checking length > 0x7ffffffffffffff,
and has the benefit of consistency with the other argument types and
signs...

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-25  3:35       ` Andreas Dilger
@ 2020-08-25  4:27         ` Dave Chinner
  2020-08-25 12:40           ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2020-08-25  4:27 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, Darrick J . Wong,
	linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 09:35:59PM -0600, Andreas Dilger wrote:
> On Aug 24, 2020, at 9:26 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote:
> >>> 	do {
> >>> -		unsigned offset, bytes;
> >>> -
> >>> -		offset = offset_in_page(pos);
> >>> -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> >>> +		loff_t bytes;
> >>> 
> >>> 		if (IS_DAX(inode))
> >>> -			status = dax_iomap_zero(pos, offset, bytes, iomap);
> >>> +			bytes = dax_iomap_zero(pos, length, iomap);
> >> 
> >> Hmmm. everything is loff_t here, but the callers are defining length
> >> as u64, not loff_t. Is there a potential sign conversion problem
> >> here? (sure 64 bit is way beyond anything we'll pass here, but...)
> > 
> > I've gone back and forth on the correct type for 'length' a few times.
> > size_t is too small (not for zeroing, but for seek()).  An unsigned type
> > seems right -- a length can't be negative, and we don't want to give
> > the impression that it can.  But the return value from these functions
> > definitely needs to be signed so we can represent an error.  So a u64
> > length with an loff_t return type feels like the best solution.  And
> > the upper layers have to promise not to pass in a length that's more
> > than 2^63-1.
> 
> The problem with allowing a u64 as the length is that it leads to the
> possibility of an argument value that cannot be returned.  Checking
> length < 0 is not worse than checking length > 0x7ffffffffffffff,
> and has the benefit of consistency with the other argument types and
> signs...

I think the problem here is that we have no guaranteed 64 bit size
type. when that was the case with off_t, we created loff_t to always
represent a 64 bit offset value. However, we never created one for
the count/size that is passed alongside loff_t in many places - it
was said that "syscalls are limited to 32 bit sizes" and
"size_t is 64 bit on 64 bit platforms" and so on and so we still
don't have a clean way to pass 64 bit sizes through the IO path.

We've been living with this shitty situation for a long time now, so
perhaps it's time for us to define lsize_t for 64 bit lengths and
start using that everywhere that needs a 64 bit clean path
through the code, regardless of whether the arch is 32 or 64 bit...

Thoughts?

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-25  4:27         ` Dave Chinner
@ 2020-08-25 12:40           ` Matthew Wilcox
  2020-08-25 22:05             ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2020-08-25 12:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, linux-xfs, linux-fsdevel, Darrick J . Wong,
	linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 02:27:11PM +1000, Dave Chinner wrote:
> On Mon, Aug 24, 2020 at 09:35:59PM -0600, Andreas Dilger wrote:
> > On Aug 24, 2020, at 9:26 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > > 
> > > On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote:
> > >>> 	do {
> > >>> -		unsigned offset, bytes;
> > >>> -
> > >>> -		offset = offset_in_page(pos);
> > >>> -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> > >>> +		loff_t bytes;
> > >>> 
> > >>> 		if (IS_DAX(inode))
> > >>> -			status = dax_iomap_zero(pos, offset, bytes, iomap);
> > >>> +			bytes = dax_iomap_zero(pos, length, iomap);
> > >> 
> > >> Hmmm. everything is loff_t here, but the callers are defining length
> > >> as u64, not loff_t. Is there a potential sign conversion problem
> > >> here? (sure 64 bit is way beyond anything we'll pass here, but...)
> > > 
> > > I've gone back and forth on the correct type for 'length' a few times.
> > > size_t is too small (not for zeroing, but for seek()).  An unsigned type
> > > seems right -- a length can't be negative, and we don't want to give
> > > the impression that it can.  But the return value from these functions
> > > definitely needs to be signed so we can represent an error.  So a u64
> > > length with an loff_t return type feels like the best solution.  And
> > > the upper layers have to promise not to pass in a length that's more
> > > than 2^63-1.
> > 
> > The problem with allowing a u64 as the length is that it leads to the
> > possibility of an argument value that cannot be returned.  Checking
> > length < 0 is not worse than checking length > 0x7ffffffffffffff,
> > and has the benefit of consistency with the other argument types and
> > signs...

The callee should just trust that the caller isn't going to do that.
File sizes can't be more than 2^63-1 bytes, so an extent of a file also
can't be more than 2^63-1 bytes.

> I think the problem here is that we have no guaranteed 64 bit size
> type. when that was the case with off_t, we created loff_t to always
> represent a 64 bit offset value. However, we never created one for
> the count/size that is passed alongside loff_t in many places - it
> was said that "syscalls are limited to 32 bit sizes" and
> "size_t is 64 bit on 64 bit platforms" and so on and so we still
> don't have a clean way to pass 64 bit sizes through the IO path.
> 
> We've been living with this shitty situation for a long time now, so
> perhaps it's time for us to define lsize_t for 64 bit lengths and
> start using that everywhere that needs a 64 bit clean path
> through the code, regardless of whether the arch is 32 or 64 bit...
> 
> Thoughts?

I don't think the THP patches should be blocked on this expedition.

We have a guaranteed 64 bit type -- it's u64.  I don't think defining
lsize_t is going to fix anything.  The next big problem to fix will be
supporting storage >16EiB, but I think that's a project that can start
in ten years and still be here before anyone but the TLAs have that much
storage in a single device.

Any objection to leaving this patch as-is with a u64 length?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] iomap: Fix misplaced page flushing
  2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
  2020-08-24 23:51   ` Dave Chinner
@ 2020-08-25 20:47   ` Darrick J. Wong
  2020-08-27  8:24   ` Christoph Hellwig
  2 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 20:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:02PM +0100, Matthew Wilcox (Oracle) wrote:
> If iomap_unshare_actor() unshares to an inline iomap, the page was
> not being flushed.  block_write_end() and __iomap_write_end() already
> contain flushes, so adding it to iomap_write_end_inline() seems like
> the best place.  That means we can remove it from iomap_write_actor().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Seems reasonable to me...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index bcfc288dba3f..cffd575e57b6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -715,6 +715,7 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  {
>  	void *addr;
>  
> +	flush_dcache_page(page);
>  	WARN_ON_ONCE(!PageUptodate(page));
>  	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
>  
> @@ -811,8 +812,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
>  
> -		flush_dcache_page(page);
> -
>  		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
>  				srcmap);
>  		if (unlikely(status < 0))
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] fs: Introduce i_blocks_per_page
  2020-08-24 14:55 ` [PATCH 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
  2020-08-24 23:55   ` Dave Chinner
@ 2020-08-25 20:49   ` Darrick J. Wong
  2020-08-25 22:21     ` Dave Chinner
  1 sibling, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 20:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel, Christoph Hellwig

On Mon, Aug 24, 2020 at 03:55:03PM +0100, Matthew Wilcox (Oracle) wrote:
> This helper is useful for both THPs and for supporting block size larger
> than page size.  Convert all users that I could find (we have a few
> different ways of writing this idiom, and I may have missed some).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

/me wonders what will happen when someone tries to make blocksz >
pagesize work, but as the most likely someone already rvb'd this I guess
it's fine:

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c  |  8 ++++----
>  fs/jfs/jfs_metapage.c   |  2 +-
>  fs/xfs/xfs_aops.c       |  2 +-
>  include/linux/pagemap.h | 16 ++++++++++++++++
>  4 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index cffd575e57b6..13d5cdab8dcd 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
>  
> -	if (iop || i_blocksize(inode) == PAGE_SIZE)
> +	if (iop || i_blocks_per_page(inode, page) <= 1)
>  		return iop;
>  
>  	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> @@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  	unsigned int i;
>  
>  	spin_lock_irqsave(&iop->uptodate_lock, flags);
> -	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> +	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
>  		if (i >= first && i <= last)
>  			set_bit(i, iop->uptodate);
>  		else if (!test_bit(i, iop->uptodate))
> @@ -1078,7 +1078,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
>  		mapping_set_error(inode->i_mapping, -EIO);
>  	}
>  
> -	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
>  	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
>  
>  	if (!iop || atomic_dec_and_test(&iop->write_count))
> @@ -1374,7 +1374,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	int error = 0, count = 0, i;
>  	LIST_HEAD(submit_list);
>  
> -	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
>  	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
>  
>  	/*
> diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
> index a2f5338a5ea1..176580f54af9 100644
> --- a/fs/jfs/jfs_metapage.c
> +++ b/fs/jfs/jfs_metapage.c
> @@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
>  	struct inode *inode = page->mapping->host;
>  	struct bio *bio = NULL;
>  	int block_offset;
> -	int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
> +	int blocks_per_page = i_blocks_per_page(inode, page);
>  	sector_t page_start;	/* address of page in fs blocks */
>  	sector_t pblock;
>  	int xlen;
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index b35611882ff9..55d126d4e096 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -544,7 +544,7 @@ xfs_discard_page(
>  			page, ip->i_ino, offset);
>  
>  	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> -			PAGE_SIZE / i_blocksize(inode));
> +			i_blocks_per_page(inode, page));
>  	if (error && !XFS_FORCED_SHUTDOWN(mp))
>  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  out_invalidate:
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 7de11dcd534d..853733286138 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -899,4 +899,20 @@ static inline int page_mkwrite_check_truncate(struct page *page,
>  	return offset;
>  }
>  
> +/**
> + * i_blocks_per_page - How many blocks fit in this page.
> + * @inode: The inode which contains the blocks.
> + * @page: The page (head page if the page is a THP).
> + *
> + * If the block size is larger than the size of this page, return zero.
> + *
> + * Context: The caller should hold a refcount on the page to prevent it
> + * from being split.
> + * Return: The number of filesystem blocks covered by this page.
> + */
> +static inline
> +unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
> +{
> +	return thp_size(page) >> inode->i_blkbits;
> +}
>  #endif /* _LINUX_PAGEMAP_H */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page
  2020-08-24 14:55 ` [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
  2020-08-24 23:56   ` Dave Chinner
@ 2020-08-25 20:49   ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 20:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel, Christoph Hellwig

On Mon, Aug 24, 2020 at 03:55:04PM +0100, Matthew Wilcox (Oracle) wrote:
> We can skip most of the initialisation, although spinlocks still
> need explicit initialisation as architectures may use a non-zero
> value to indicate unlocked.  The comment is no longer useful as
> attach_page_private() handles the refcount now.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 13d5cdab8dcd..639d54a4177e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -49,16 +49,8 @@ iomap_page_create(struct inode *inode, struct page *page)
>  	if (iop || i_blocks_per_page(inode, page) <= 1)
>  		return iop;
>  
> -	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> -	atomic_set(&iop->read_count, 0);
> -	atomic_set(&iop->write_count, 0);
> +	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
>  	spin_lock_init(&iop->uptodate_lock);
> -	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> -
> -	/*
> -	 * migrate_page_move_mapping() assumes that pages with private data have
> -	 * their count elevated by 1.
> -	 */
>  	attach_page_private(page, iop);
>  	return iop;
>  }
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits
  2020-08-24 14:55 ` [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
  2020-08-24 23:56   ` Dave Chinner
@ 2020-08-25 20:50   ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 20:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel, Christoph Hellwig

On Mon, Aug 24, 2020 at 03:55:05PM +0100, Matthew Wilcox (Oracle) wrote:
> Now that the bitmap is protected by a spinlock, we can use the
> more efficient bitmap ops instead of individual test/set bit ops.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Yay!
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 639d54a4177e..dbf9572dabe9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -134,19 +134,11 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  	struct inode *inode = page->mapping->host;
>  	unsigned first = off >> inode->i_blkbits;
>  	unsigned last = (off + len - 1) >> inode->i_blkbits;
> -	bool uptodate = true;
>  	unsigned long flags;
> -	unsigned int i;
>  
>  	spin_lock_irqsave(&iop->uptodate_lock, flags);
> -	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
> -		if (i >= first && i <= last)
> -			set_bit(i, iop->uptodate);
> -		else if (!test_bit(i, iop->uptodate))
> -			uptodate = false;
> -	}
> -
> -	if (uptodate)
> +	bitmap_set(iop->uptodate, first, last - first + 1);
> +	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
>  		SetPageUptodate(page);
>  	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
>  }
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-24 14:55 ` [PATCH 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
  2020-08-24 23:59   ` Dave Chinner
@ 2020-08-25 21:02   ` Darrick J. Wong
  2020-08-26  2:26     ` Matthew Wilcox
  2020-08-27  8:26   ` Christoph Hellwig
  2 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 21:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:06PM +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically to support larger pages in the
> page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> but with a 2MB maximum page size, we'd have to allocate more than 4kB
> per page.  Add a few debugging assertions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index dbf9572dabe9..844e95cacea8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -22,18 +22,19 @@
>  #include "../internal.h"
>  
>  /*
> - * Structure allocated for each page when block size < PAGE_SIZE to track
> + * Structure allocated for each page when block size < page size to track
>   * sub-page uptodate status and I/O completions.

"for each regular page or head page of a huge page"?  Or whatever we're
calling them nowadays?

--D

>   */
>  struct iomap_page {
>  	atomic_t		read_count;
>  	atomic_t		write_count;
>  	spinlock_t		uptodate_lock;
> -	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
> +	unsigned long		uptodate[];
>  };
>  
>  static inline struct iomap_page *to_iomap_page(struct page *page)
>  {
> +	VM_BUG_ON_PGFLAGS(PageTail(page), page);
>  	if (page_has_private(page))
>  		return (struct iomap_page *)page_private(page);
>  	return NULL;
> @@ -45,11 +46,13 @@ static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
> +	unsigned int nr_blocks = i_blocks_per_page(inode, page);
>  
> -	if (iop || i_blocks_per_page(inode, page) <= 1)
> +	if (iop || nr_blocks <= 1)
>  		return iop;
>  
> -	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> +	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
> +			GFP_NOFS | __GFP_NOFAIL);
>  	spin_lock_init(&iop->uptodate_lock);
>  	attach_page_private(page, iop);
>  	return iop;
> @@ -59,11 +62,14 @@ static void
>  iomap_page_release(struct page *page)
>  {
>  	struct iomap_page *iop = detach_page_private(page);
> +	unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
>  
>  	if (!iop)
>  		return;
>  	WARN_ON_ONCE(atomic_read(&iop->read_count));
>  	WARN_ON_ONCE(atomic_read(&iop->write_count));
> +	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> +			PageUptodate(page));
>  	kfree(iop);
>  }
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-25 12:40           ` Matthew Wilcox
@ 2020-08-25 22:05             ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-25 22:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andreas Dilger, linux-xfs, linux-fsdevel, Darrick J . Wong,
	linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 01:40:24PM +0100, Matthew Wilcox wrote:
> Any objection to leaving this patch as-is with a u64 length?

No objection here - I just wanted to make sure that signed/unsigned
overflow was not going to be an issue...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/9] iomap: Convert read_count to byte count
  2020-08-25  0:09   ` Dave Chinner
@ 2020-08-25 22:14     ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 22:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox (Oracle),
	linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 10:09:02AM +1000, Dave Chinner wrote:
> On Mon, Aug 24, 2020 at 03:55:07PM +0100, Matthew Wilcox (Oracle) wrote:
> > Instead of counting bio segments, count the number of bytes submitted.
> > This insulates us from the block layer's definition of what a 'same page'
> > is, which is not necessarily clear once THPs are involved.
> 
> I'd like to see a comment on the definition of struct iomap_page to
> indicate that read_count (and write count) reflect the byte count of
> IO currently in flight on the page, not an IO count, because THP
> makes tracking this via bio state hard. Otherwise it is not at all
> obvious why it is done and why it is intentional...

Agreed. :)

--D

> Otherwise the code looks OK.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] fs: Introduce i_blocks_per_page
  2020-08-25 20:49   ` Darrick J. Wong
@ 2020-08-25 22:21     ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2020-08-25 22:21 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox (Oracle),
	linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel,
	Christoph Hellwig

On Tue, Aug 25, 2020 at 01:49:22PM -0700, Darrick J. Wong wrote:
> On Mon, Aug 24, 2020 at 03:55:03PM +0100, Matthew Wilcox (Oracle) wrote:
> > This helper is useful for both THPs and for supporting block size larger
> > than page size.  Convert all users that I could find (we have a few
> > different ways of writing this idiom, and I may have missed some).
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> /me wonders what will happen when someone tries to make blocksz >
> pagesize work,

I abstract the page/block size stuff into "chunks". i.e. we work on
the smallest contiguous chunk of data the current combination of
page and inode define. In the context of this patch, it is simply
just:

s/i_blocks_per_page/iomap_chunks_per_page/g

i.e. The helper functions end up looking like this:

static inline unsigned
iomap_chunk_size(struct inode *inode, struct page *page)
{
       return min_t(unsigned, page_size(page), i_blocksize(inode));
}

static inline unsigned
iomap_chunk_bits(struct inode *inode, struct page *page)
{
       return min_t(unsigned, page_shift(page), inode->i_blkbits);
}

static inline unsigned
iomap_chunks_per_page(struct inode *inode, struct page *page)
{
       return page_size(page) >> inode->i_blkbits;
}

and the latter is actually the same as what i_block_per_page() is
currently implemented as....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-24 14:55 ` [PATCH 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
  2020-08-25  0:27   ` Dave Chinner
@ 2020-08-25 22:23   ` Darrick J. Wong
  2020-08-27  8:39     ` Christoph Hellwig
  1 sibling, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-25 22:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:10PM +0100, Matthew Wilcox (Oracle) wrote:
> Pass the full length to iomap_zero() and dax_iomap_zero(), and have
> them return how many bytes they actually handled.  This is preparatory
> work for handling THP, although it looks like DAX could actually take
> advantage of it if there's a larger contiguous area.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/dax.c               | 13 ++++++-------
>  fs/iomap/buffered-io.c | 33 +++++++++++++++------------------
>  include/linux/dax.h    |  3 +--
>  3 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 95341af1a966..f2b912cb034e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1037,18 +1037,18 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
>  	return ret;
>  }
>  
> -int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> -		   struct iomap *iomap)
> +loff_t dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)

Sorry for my ultra-slow response to this.  The u64 length seems ok to me
(or uint64_t, I don't care all /that/ much), but using loff_t as a
return type bothers me because I see that and think that this function
is returning a new file offset, e.g. (pos + number of bytes zeroed).

So please, let's use s64 or something that isn't so misleading.

FWIW, Linus also[0] doesn't[1] like using loff_t for the number of bytes
copied.

--D

[0] https://lore.kernel.org/linux-fsdevel/CAHk-=wgcPAfOSigMf0xwaGfVjw413XN3UPATwYWHrss+QuivhQ@mail.gmail.com/
[1] https://lore.kernel.org/linux-fsdevel/CAHk-=wgvROUnrEVADVR_zTHY8NmYo-_jVjV37O1MdDm2de+Lmw@mail.gmail.com/

>  {
>  	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
>  	pgoff_t pgoff;
>  	long rc, id;
>  	void *kaddr;
>  	bool page_aligned = false;
> -
> +	unsigned offset = offset_in_page(pos);
> +	unsigned size = min_t(u64, PAGE_SIZE - offset, length);
>  
>  	if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
> -	    IS_ALIGNED(size, PAGE_SIZE))
> +	    (size == PAGE_SIZE))
>  		page_aligned = true;
>  
>  	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
> @@ -1058,8 +1058,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
>  	id = dax_read_lock();
>  
>  	if (page_aligned)
> -		rc = dax_zero_page_range(iomap->dax_dev, pgoff,
> -					 size >> PAGE_SHIFT);
> +		rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
>  	else
>  		rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
>  	if (rc < 0) {
> @@ -1072,7 +1071,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
>  		dax_flush(iomap->dax_dev, kaddr + offset, size);
>  	}
>  	dax_read_unlock(id);
> -	return 0;
> +	return size;
>  }
>  
>  static loff_t
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7f618ab4b11e..2dba054095e8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -901,11 +901,13 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
> -static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_zero(struct inode *inode, loff_t pos, u64 length,
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page;
>  	int status;
> +	unsigned offset = offset_in_page(pos);
> +	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>  
>  	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
>  	if (status)
> @@ -917,38 +919,33 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
> -static loff_t
> -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> -		void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
> +		loff_t length, void *data, struct iomap *iomap,
> +		struct iomap *srcmap)
>  {
>  	bool *did_zero = data;
>  	loff_t written = 0;
> -	int status;
>  
>  	/* already zeroed?  we're done. */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> -		return count;
> +		return length;
>  
>  	do {
> -		unsigned offset, bytes;
> -
> -		offset = offset_in_page(pos);
> -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> +		loff_t bytes;
>  
>  		if (IS_DAX(inode))
> -			status = dax_iomap_zero(pos, offset, bytes, iomap);
> +			bytes = dax_iomap_zero(pos, length, iomap);
>  		else
> -			status = iomap_zero(inode, pos, offset, bytes, iomap,
> -					srcmap);
> -		if (status < 0)
> -			return status;
> +			bytes = iomap_zero(inode, pos, length, iomap, srcmap);
> +		if (bytes < 0)
> +			return bytes;
>  
>  		pos += bytes;
> -		count -= bytes;
> +		length -= bytes;
>  		written += bytes;
>  		if (did_zero)
>  			*did_zero = true;
> -	} while (count > 0);
> +	} while (length > 0);
>  
>  	return written;
>  }
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 6904d4e0b2e0..80f17946f940 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -214,8 +214,7 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
>  int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
>  int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
>  				      pgoff_t index);
> -int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> -			struct iomap *iomap);
> +loff_t dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap);
>  static inline bool dax_mapping(struct address_space *mapping)
>  {
>  	return mapping->host && IS_DAX(mapping->host);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-25 21:02   ` Darrick J. Wong
@ 2020-08-26  2:26     ` Matthew Wilcox
  2020-08-26  3:32       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2020-08-26  2:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 02:02:03PM -0700, Darrick J. Wong wrote:
> >  /*
> > - * Structure allocated for each page when block size < PAGE_SIZE to track
> > + * Structure allocated for each page when block size < page size to track
> >   * sub-page uptodate status and I/O completions.
> 
> "for each regular page or head page of a huge page"?  Or whatever we're
> calling them nowadays?

Well, that's what I'm calling a "page" ;-)

How about "for each page or THP"?  The fact that it's stored in the
head page is incidental -- it's allocated for the THP.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-26  2:26     ` Matthew Wilcox
@ 2020-08-26  3:32       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-08-26  3:32 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Wed, Aug 26, 2020 at 03:26:23AM +0100, Matthew Wilcox wrote:
> On Tue, Aug 25, 2020 at 02:02:03PM -0700, Darrick J. Wong wrote:
> > >  /*
> > > - * Structure allocated for each page when block size < PAGE_SIZE to track
> > > + * Structure allocated for each page when block size < page size to track
> > >   * sub-page uptodate status and I/O completions.
> > 
> > "for each regular page or head page of a huge page"?  Or whatever we're
> > calling them nowadays?
> 
> Well, that's what I'm calling a "page" ;-)
> 
> How about "for each page or THP"?  The fact that it's stored in the
> head page is incidental -- it's allocated for the THP.

Ok.

--D

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] iomap: Fix misplaced page flushing
  2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
  2020-08-24 23:51   ` Dave Chinner
  2020-08-25 20:47   ` Darrick J. Wong
@ 2020-08-27  8:24   ` Christoph Hellwig
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:02PM +0100, Matthew Wilcox (Oracle) wrote:
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index bcfc288dba3f..cffd575e57b6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -715,6 +715,7 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  {
>  	void *addr;
>  
> +	flush_dcache_page(page);
>  	WARN_ON_ONCE(!PageUptodate(page));
>  	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));

Please move the call down below the asserts.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page
  2020-08-24 14:55 ` [PATCH 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
  2020-08-24 23:59   ` Dave Chinner
  2020-08-25 21:02   ` Darrick J. Wong
@ 2020-08-27  8:26   ` Christoph Hellwig
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

>  static inline struct iomap_page *to_iomap_page(struct page *page)
>  {
> +	VM_BUG_ON_PGFLAGS(PageTail(page), page);
>  	if (page_has_private(page))
>  		return (struct iomap_page *)page_private(page);
>  	return NULL;

Nit: can you add an empty line after the VM_BUG_ON_PGFLAGS assert to
keep the function readable?  Maybe also add a comment on the assert,
as it isn't totally obvious.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/9] iomap: Convert read_count to byte count
  2020-08-24 14:55 ` [PATCH 6/9] iomap: Convert read_count to byte count Matthew Wilcox (Oracle)
  2020-08-25  0:09   ` Dave Chinner
@ 2020-08-27  8:35   ` Christoph Hellwig
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

> @@ -269,20 +263,17 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
>  		is_contig = true;
>  
> -
>  	/*
> -	 * If we start a new segment we need to increase the read count, and we
> -	 * need to do so before submitting any previous full bio to make sure
> -	 * that we don't prematurely unlock the page.
> +	 * We need to increase the read count before submitting any
> +	 * previous bio to make sure that we don't prematurely unlock
> +	 * the page.
>  	 */
>  	if (iop)
> -		atomic_inc(&iop->read_count);
> +		atomic_add(plen, &iop->read_count);
> +
> +	if (is_contig &&
> +	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page))
> +		goto done;
>  
>  	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {

I think we can simplify this a bit by reordering the checks:

 	if (iop)
		atomic_add(plen, &iop->read_count);

  	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
		if (__bio_try_merge_page(ctx->bio, page, plen, poff,
				&same_page))
			goto done;
		is_contig = true;
	}

	if (!is_contig || bio_full(ctx->bio, plen)) {
		// the existing case to allocate a new bio
	}

Also I think read_count should be renamed to be more descriptive,
something like read_bytes_pending?  Same for the write side.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] iomap: Convert write_count to byte count
  2020-08-24 14:55 ` [PATCH 7/9] iomap: Convert write_count " Matthew Wilcox (Oracle)
@ 2020-08-27  8:36   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:08PM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of counting bio segments, count the number of bytes submitted.
> This insulates us from the block layer's definition of what a 'same page'
> is, which is not necessarily clear once THPs are involved.

Looks good (module the field naming as comment on the previous patch):

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] iomap: Change calling convention for zeroing
  2020-08-25 22:23   ` Darrick J. Wong
@ 2020-08-27  8:39     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox (Oracle),
	linux-xfs, linux-fsdevel, linux-nvdimm, linux-kernel

On Tue, Aug 25, 2020 at 03:23:55PM -0700, Darrick J. Wong wrote:
> Sorry for my ultra-slow response to this.  The u64 length seems ok to me
> (or uint64_t, I don't care all /that/ much), but using loff_t as a
> return type bothers me because I see that and think that this function
> is returning a new file offset, e.g. (pos + number of bytes zeroed).
> 
> So please, let's use s64 or something that isn't so misleading.
> 
> FWIW, Linus also[0] doesn't[1] like using loff_t for the number of bytes
> copied.

Let's just switch to u64 and s64 then.  Unless we want to come up with
our own typedefs.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] iomap: Convert iomap_write_end types
  2020-08-24 14:55 ` [PATCH 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
  2020-08-25  0:12   ` Dave Chinner
@ 2020-08-27  8:41   ` Christoph Hellwig
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2020-08-27  8:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, linux-nvdimm, linux-kernel

On Mon, Aug 24, 2020 at 03:55:09PM +0100, Matthew Wilcox (Oracle) wrote:
> iomap_write_end cannot return an error, so switch it to return
> size_t instead of int and remove the error checking from the callers.
> Also convert the arguments to size_t from unsigned int, in case anyone
> ever wants to support a page size larger than 2GB.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-08-27  8:41 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 14:55 [PATCH 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
2020-08-24 14:55 ` [PATCH 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
2020-08-24 23:51   ` Dave Chinner
2020-08-25 20:47   ` Darrick J. Wong
2020-08-27  8:24   ` Christoph Hellwig
2020-08-24 14:55 ` [PATCH 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
2020-08-24 23:55   ` Dave Chinner
2020-08-25 20:49   ` Darrick J. Wong
2020-08-25 22:21     ` Dave Chinner
2020-08-24 14:55 ` [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
2020-08-24 23:56   ` Dave Chinner
2020-08-25 20:49   ` Darrick J. Wong
2020-08-24 14:55 ` [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
2020-08-24 23:56   ` Dave Chinner
2020-08-25 20:50   ` Darrick J. Wong
2020-08-24 14:55 ` [PATCH 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
2020-08-24 23:59   ` Dave Chinner
2020-08-25  0:22     ` Matthew Wilcox
2020-08-25 21:02   ` Darrick J. Wong
2020-08-26  2:26     ` Matthew Wilcox
2020-08-26  3:32       ` Darrick J. Wong
2020-08-27  8:26   ` Christoph Hellwig
2020-08-24 14:55 ` [PATCH 6/9] iomap: Convert read_count to byte count Matthew Wilcox (Oracle)
2020-08-25  0:09   ` Dave Chinner
2020-08-25 22:14     ` Darrick J. Wong
2020-08-27  8:35   ` Christoph Hellwig
2020-08-24 14:55 ` [PATCH 7/9] iomap: Convert write_count " Matthew Wilcox (Oracle)
2020-08-27  8:36   ` Christoph Hellwig
2020-08-24 14:55 ` [PATCH 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
2020-08-25  0:12   ` Dave Chinner
2020-08-25  1:06     ` Matthew Wilcox
2020-08-25  1:33       ` Dave Chinner
2020-08-27  8:41   ` Christoph Hellwig
2020-08-24 14:55 ` [PATCH 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
2020-08-25  0:27   ` Dave Chinner
2020-08-25  3:26     ` Matthew Wilcox
2020-08-25  3:35       ` Andreas Dilger
2020-08-25  4:27         ` Dave Chinner
2020-08-25 12:40           ` Matthew Wilcox
2020-08-25 22:05             ` Dave Chinner
2020-08-25 22:23   ` Darrick J. Wong
2020-08-27  8:39     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).