linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] THP iomap patches for 5.10
@ 2020-09-10 23:46 Matthew Wilcox (Oracle)
  2020-09-10 23:46 ` [PATCH v2 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:46 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion

These patches are carefully plucked from the THP series.  I would like
them to hit 5.10 to make the THP patchset merge easier.  Some of these
are just generic improvements that make sense on their own terms, but
the overall intent is to support THPs in iomap.

v2:
 - Move the call to flush_dcache_page (Christoph)
 - Clarify comments (Darrick)
 - Rename read_count to read_bytes_pending (Christoph)
 - Rename write_count to write_bytes_pending (Christoph)
 - Restructure iomap_readpage_actor() (Christoph)
 - Change return type of the zeroing functions from loff_t to s64

Matthew Wilcox (Oracle) (9):
  iomap: Fix misplaced page flushing
  fs: Introduce i_blocks_per_page
  iomap: Use kzalloc to allocate iomap_page
  iomap: Use bitmap ops to set uptodate bits
  iomap: Support arbitrarily many blocks per page
  iomap: Convert read_count to read_bytes_pending
  iomap: Convert write_count to write_bytes_pending
  iomap: Convert iomap_write_end types
  iomap: Change calling convention for zeroing

 fs/dax.c                |  13 ++-
 fs/iomap/buffered-io.c  | 173 +++++++++++++++++-----------------------
 fs/jfs/jfs_metapage.c   |   2 +-
 fs/xfs/xfs_aops.c       |   2 +-
 include/linux/dax.h     |   3 +-
 include/linux/pagemap.h |  16 ++++
 6 files changed, 96 insertions(+), 113 deletions(-)

-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/9] iomap: Fix misplaced page flushing
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
@ 2020-09-10 23:46 ` Matthew Wilcox (Oracle)
  2020-09-10 23:47 ` [PATCH v2 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:46 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Dave Chinner, Christoph Hellwig

If iomap_unshare_actor() unshares to an inline iomap, the page was
not being flushed.  block_write_end() and __iomap_write_end() already
contain flushes, so adding it to iomap_write_end_inline() seems like
the best place.  That means we can remove it from iomap_write_actor().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 897ab9a26a74..d81a9a86c5aa 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -717,6 +717,7 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 	WARN_ON_ONCE(!PageUptodate(page));
 	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
+	flush_dcache_page(page);
 	addr = kmap_atomic(page);
 	memcpy(iomap->inline_data + pos, addr + pos, copied);
 	kunmap_atomic(addr);
@@ -810,8 +811,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-		flush_dcache_page(page);
-
 		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
 				srcmap);
 		if (unlikely(status < 0))
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/9] fs: Introduce i_blocks_per_page
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
  2020-09-10 23:46 ` [PATCH v2 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-15 14:58   ` Dave Kleikamp
  2020-09-15 15:40   ` David Laight
  2020-09-10 23:47 ` [PATCH v2 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig, Dave Chinner

This helper is useful for both THPs and for supporting block size larger
than page size.  Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c  |  8 ++++----
 fs/jfs/jfs_metapage.c   |  2 +-
 fs/xfs/xfs_aops.c       |  2 +-
 include/linux/pagemap.h | 16 ++++++++++++++++
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d81a9a86c5aa..330f86b825d7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 
-	if (iop || i_blocksize(inode) == PAGE_SIZE)
+	if (iop || i_blocks_per_page(inode, page) <= 1)
 		return iop;
 
 	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
@@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
+	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
 		if (i >= first && i <= last)
 			set_bit(i, iop->uptodate);
 		else if (!test_bit(i, iop->uptodate))
@@ -1077,7 +1077,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
 		mapping_set_error(inode->i_mapping, -EIO);
 	}
 
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
 
 	if (!iop || atomic_dec_and_test(&iop->write_count))
@@ -1373,7 +1373,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
 
 	/*
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index a2f5338a5ea1..176580f54af9 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
 	struct inode *inode = page->mapping->host;
 	struct bio *bio = NULL;
 	int block_offset;
-	int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
+	int blocks_per_page = i_blocks_per_page(inode, page);
 	sector_t page_start;	/* address of page in fs blocks */
 	sector_t pblock;
 	int xlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index b35611882ff9..55d126d4e096 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -544,7 +544,7 @@ xfs_discard_page(
 			page, ip->i_ino, offset);
 
 	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-			PAGE_SIZE / i_blocksize(inode));
+			i_blocks_per_page(inode, page));
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 50d2c39b47ab..f7f602040913 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -975,4 +975,20 @@ static inline int page_mkwrite_check_truncate(struct page *page,
 	return offset;
 }
 
+/**
+ * i_blocks_per_page - How many blocks fit in this page.
+ * @inode: The inode which contains the blocks.
+ * @page: The page (head page if the page is a THP).
+ *
+ * If the block size is larger than the size of this page, return zero.
+ *
+ * Context: The caller should hold a refcount on the page to prevent it
+ * from being split.
+ * Return: The number of filesystem blocks covered by this page.
+ */
+static inline
+unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
+{
+	return thp_size(page) >> inode->i_blkbits;
+}
 #endif /* _LINUX_PAGEMAP_H */
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/9] iomap: Use kzalloc to allocate iomap_page
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
  2020-09-10 23:46 ` [PATCH v2 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
  2020-09-10 23:47 ` [PATCH v2 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-10 23:47 ` [PATCH v2 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig, Dave Chinner

We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked.  The comment is no longer useful as
attach_page_private() handles the refcount now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 330f86b825d7..58a1fd83f2a4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -49,16 +49,8 @@ iomap_page_create(struct inode *inode, struct page *page)
 	if (iop || i_blocks_per_page(inode, page) <= 1)
 		return iop;
 
-	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
-	atomic_set(&iop->read_count, 0);
-	atomic_set(&iop->write_count, 0);
+	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
-	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
-	/*
-	 * migrate_page_move_mapping() assumes that pages with private data have
-	 * their count elevated by 1.
-	 */
 	attach_page_private(page, iop);
 	return iop;
 }
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/9] iomap: Use bitmap ops to set uptodate bits
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-09-10 23:47 ` [PATCH v2 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig, Dave Chinner

Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 58a1fd83f2a4..7fc0e02d27b0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -134,19 +134,11 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	struct inode *inode = page->mapping->host;
 	unsigned first = off >> inode->i_blkbits;
 	unsigned last = (off + len - 1) >> inode->i_blkbits;
-	bool uptodate = true;
 	unsigned long flags;
-	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
-		if (i >= first && i <= last)
-			set_bit(i, iop->uptodate);
-		else if (!test_bit(i, iop->uptodate))
-			uptodate = false;
-	}
-
-	if (uptodate)
+	bitmap_set(iop->uptodate, first, last - first + 1);
+	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
 		SetPageUptodate(page);
 	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
 }
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-09-10 23:47 ` [PATCH v2 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-11  5:36   ` Christoph Hellwig
                     ` (2 more replies)
  2020-09-10 23:47 ` [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Dave Chinner

Size the uptodate array dynamically to support larger pages in the
page cache.  With a 64kB page, we're only saving 8 bytes per page today,
but with a 2MB maximum page size, we'd have to allocate more than 4kB
per page.  Add a few debugging assertions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/iomap/buffered-io.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7fc0e02d27b0..9670c096b83e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -22,18 +22,25 @@
 #include "../internal.h"
 
 /*
- * Structure allocated for each page when block size < PAGE_SIZE to track
- * sub-page uptodate status and I/O completions.
+ * Structure allocated for each page or THP when block size < page size
+ * to track sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
 	atomic_t		read_count;
 	atomic_t		write_count;
 	spinlock_t		uptodate_lock;
-	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+	unsigned long		uptodate[];
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
 {
+	/*
+	 * per-block data is stored in the head page.  Callers should
+	 * not be dealing with tail pages (and if they are, they can
+	 * call thp_head() first.
+	 */
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+
 	if (page_has_private(page))
 		return (struct iomap_page *)page_private(page);
 	return NULL;
@@ -45,11 +52,13 @@ static struct iomap_page *
 iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
+	unsigned int nr_blocks = i_blocks_per_page(inode, page);
 
-	if (iop || i_blocks_per_page(inode, page) <= 1)
+	if (iop || nr_blocks <= 1)
 		return iop;
 
-	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
+	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+			GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
 	attach_page_private(page, iop);
 	return iop;
@@ -59,11 +68,14 @@ static void
 iomap_page_release(struct page *page)
 {
 	struct iomap_page *iop = detach_page_private(page);
+	unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
 
 	if (!iop)
 		return;
 	WARN_ON_ONCE(atomic_read(&iop->read_count));
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
+	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
+			PageUptodate(page));
 	kfree(iop);
 }
 
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-11  5:36   ` Christoph Hellwig
  2020-09-17 22:02   ` Darrick J. Wong
  2020-09-10 23:47 ` [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion

Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9670c096b83e..1cf976a8e55c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -26,7 +26,7 @@
  * to track sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
-	atomic_t		read_count;
+	atomic_t		read_bytes_pending;
 	atomic_t		write_count;
 	spinlock_t		uptodate_lock;
 	unsigned long		uptodate[];
@@ -72,7 +72,7 @@ iomap_page_release(struct page *page)
 
 	if (!iop)
 		return;
-	WARN_ON_ONCE(atomic_read(&iop->read_count));
+	WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
 	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
 			PageUptodate(page));
@@ -167,13 +167,6 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 		SetPageUptodate(page);
 }
 
-static void
-iomap_read_finish(struct iomap_page *iop, struct page *page)
-{
-	if (!iop || atomic_dec_and_test(&iop->read_count))
-		unlock_page(page);
-}
-
 static void
 iomap_read_page_end_io(struct bio_vec *bvec, int error)
 {
@@ -187,7 +180,8 @@ iomap_read_page_end_io(struct bio_vec *bvec, int error)
 		iomap_set_range_uptodate(page, bvec->bv_offset, bvec->bv_len);
 	}
 
-	iomap_read_finish(iop, page);
+	if (!iop || atomic_sub_and_test(bvec->bv_len, &iop->read_bytes_pending))
+		unlock_page(page);
 }
 
 static void
@@ -267,30 +261,19 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	}
 
 	ctx->cur_page_in_bio = true;
+	if (iop)
+		atomic_add(plen, &iop->read_bytes_pending);
 
-	/*
-	 * Try to merge into a previous segment if we can.
-	 */
+	/* Try to merge into a previous segment if we can */
 	sector = iomap_sector(iomap, pos);
-	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
+	if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
+		if (__bio_try_merge_page(ctx->bio, page, plen, poff,
+				&same_page))
+			goto done;
 		is_contig = true;
-
-	if (is_contig &&
-	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page)) {
-		if (!same_page && iop)
-			atomic_inc(&iop->read_count);
-		goto done;
 	}
 
-	/*
-	 * If we start a new segment we need to increase the read count, and we
-	 * need to do so before submitting any previous full bio to make sure
-	 * that we don't prematurely unlock the page.
-	 */
-	if (iop)
-		atomic_inc(&iop->read_count);
-
-	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
+	if (!is_contig || bio_full(ctx->bio, plen)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
 		gfp_t orig_gfp = gfp;
 		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-09-10 23:47 ` [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-17 22:02   ` Darrick J. Wong
  2020-09-10 23:47 ` [PATCH v2 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
  2020-09-10 23:47 ` [PATCH v2 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
  8 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig

Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1cf976a8e55c..64a5cb383f30 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -27,7 +27,7 @@
  */
 struct iomap_page {
 	atomic_t		read_bytes_pending;
-	atomic_t		write_count;
+	atomic_t		write_bytes_pending;
 	spinlock_t		uptodate_lock;
 	unsigned long		uptodate[];
 };
@@ -73,7 +73,7 @@ iomap_page_release(struct page *page)
 	if (!iop)
 		return;
 	WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
-	WARN_ON_ONCE(atomic_read(&iop->write_count));
+	WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
 	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
 			PageUptodate(page));
 	kfree(iop);
@@ -1047,7 +1047,7 @@ EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
 static void
 iomap_finish_page_writeback(struct inode *inode, struct page *page,
-		int error)
+		int error, unsigned int len)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 
@@ -1057,9 +1057,9 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
 	}
 
 	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
-	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
+	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) <= 0);
 
-	if (!iop || atomic_dec_and_test(&iop->write_count))
+	if (!iop || atomic_sub_and_test(len, &iop->write_bytes_pending))
 		end_page_writeback(page);
 }
 
@@ -1093,7 +1093,8 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 
 		/* walk each page on bio, ending page IO on them */
 		bio_for_each_segment_all(bv, bio, iter_all)
-			iomap_finish_page_writeback(inode, bv->bv_page, error);
+			iomap_finish_page_writeback(inode, bv->bv_page, error,
+					bv->bv_len);
 		bio_put(bio);
 	}
 	/* The ioend has been freed by bio_put() */
@@ -1309,8 +1310,8 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 
 	merged = __bio_try_merge_page(wpc->ioend->io_bio, page, len, poff,
 			&same_page);
-	if (iop && !same_page)
-		atomic_inc(&iop->write_count);
+	if (iop)
+		atomic_add(len, &iop->write_bytes_pending);
 
 	if (!merged) {
 		if (bio_full(wpc->ioend->io_bio, len)) {
@@ -1353,7 +1354,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	LIST_HEAD(submit_list);
 
 	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
-	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
+	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
 
 	/*
 	 * Walk through the page to find areas to write back. If we run off the
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 8/9] iomap: Convert iomap_write_end types
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-09-10 23:47 ` [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-17 22:03   ` Darrick J. Wong
  2020-09-10 23:47 ` [PATCH v2 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
  8 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig

iomap_write_end cannot return an error, so switch it to return
size_t instead of int and remove the error checking from the callers.
Also convert the arguments to size_t from unsigned int, in case anyone
ever wants to support a page size larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 64a5cb383f30..cb25a7b70401 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -663,9 +663,8 @@ iomap_set_page_dirty(struct page *page)
 }
 EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 
-static int
-__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page)
+static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+		size_t copied, struct page *page)
 {
 	flush_dcache_page(page);
 
@@ -687,9 +686,8 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	return copied;
 }
 
-static int
-iomap_write_end_inline(struct inode *inode, struct page *page,
-		struct iomap *iomap, loff_t pos, unsigned copied)
+static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
+		struct iomap *iomap, loff_t pos, size_t copied)
 {
 	void *addr;
 
@@ -705,13 +703,14 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 	return copied;
 }
 
-static int
-iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
-		struct page *page, struct iomap *iomap, struct iomap *srcmap)
+/* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
+static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+		size_t copied, struct page *page, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	loff_t old_size = inode->i_size;
-	int ret;
+	size_t ret;
 
 	if (srcmap->type == IOMAP_INLINE) {
 		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
@@ -790,11 +789,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
+		copied = iomap_write_end(inode, pos, bytes, copied, page, iomap,
 				srcmap);
-		if (unlikely(status < 0))
-			break;
-		copied = status;
 
 		cond_resched();
 
@@ -868,11 +864,8 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
 				srcmap);
-		if (unlikely(status <= 0)) {
-			if (WARN_ON_ONCE(status == 0))
-				return -EIO;
-			return status;
-		}
+		if (WARN_ON_ONCE(status == 0))
+			return -EIO;
 
 		cond_resched();
 
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 9/9] iomap: Change calling convention for zeroing
  2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-09-10 23:47 ` [PATCH v2 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
@ 2020-09-10 23:47 ` Matthew Wilcox (Oracle)
  2020-09-11  6:42   ` Christoph Hellwig
  2020-09-17 22:05   ` Darrick J. Wong
  8 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-10 23:47 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: Matthew Wilcox (Oracle),
	Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion

Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled.  This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/dax.c               | 13 ++++++-------
 fs/iomap/buffered-io.c | 33 +++++++++++++++------------------
 include/linux/dax.h    |  3 +--
 3 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 994ab66a9907..6ad346352a8c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1037,18 +1037,18 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
 	return ret;
 }
 
-int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
-		   struct iomap *iomap)
+s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
 	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
 	pgoff_t pgoff;
 	long rc, id;
 	void *kaddr;
 	bool page_aligned = false;
-
+	unsigned offset = offset_in_page(pos);
+	unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
 	if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
-	    IS_ALIGNED(size, PAGE_SIZE))
+	    (size == PAGE_SIZE))
 		page_aligned = true;
 
 	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
@@ -1058,8 +1058,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
 	id = dax_read_lock();
 
 	if (page_aligned)
-		rc = dax_zero_page_range(iomap->dax_dev, pgoff,
-					 size >> PAGE_SHIFT);
+		rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
 	else
 		rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
 	if (rc < 0) {
@@ -1072,7 +1071,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
 		dax_flush(iomap->dax_dev, kaddr + offset, size);
 	}
 	dax_read_unlock(id);
-	return 0;
+	return size;
 }
 
 static loff_t
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index cb25a7b70401..3e1eb40a73fd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -898,11 +898,13 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
+static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page;
 	int status;
+	unsigned offset = offset_in_page(pos);
+	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
 
 	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
 	if (status)
@@ -914,38 +916,33 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
-static loff_t
-iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
+		loff_t length, void *data, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	bool *did_zero = data;
 	loff_t written = 0;
-	int status;
 
 	/* already zeroed?  we're done. */
 	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
-		return count;
+		return length;
 
 	do {
-		unsigned offset, bytes;
-
-		offset = offset_in_page(pos);
-		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
+		s64 bytes;
 
 		if (IS_DAX(inode))
-			status = dax_iomap_zero(pos, offset, bytes, iomap);
+			bytes = dax_iomap_zero(pos, length, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap,
-					srcmap);
-		if (status < 0)
-			return status;
+			bytes = iomap_zero(inode, pos, length, iomap, srcmap);
+		if (bytes < 0)
+			return bytes;
 
 		pos += bytes;
-		count -= bytes;
+		length -= bytes;
 		written += bytes;
 		if (did_zero)
 			*did_zero = true;
-	} while (count > 0);
+	} while (length > 0);
 
 	return written;
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 6904d4e0b2e0..951a851a0481 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -214,8 +214,7 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
-			struct iomap *iomap);
+s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap);
 static inline bool dax_mapping(struct address_space *mapping)
 {
 	return mapping->host && IS_DAX(mapping->host);
-- 
2.28.0
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
@ 2020-09-11  5:36   ` Christoph Hellwig
  2020-09-17 22:00   ` Darrick J. Wong
  2020-09-22 16:23   ` Qian Cai
  2 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2020-09-11  5:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner

On Fri, Sep 11, 2020 at 12:47:03AM +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically to support larger pages in the
> page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> but with a 2MB maximum page size, we'd have to allocate more than 4kB
> per page.  Add a few debugging assertions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending
  2020-09-10 23:47 ` [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending Matthew Wilcox (Oracle)
@ 2020-09-11  5:36   ` Christoph Hellwig
  2020-09-17 22:02   ` Darrick J. Wong
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2020-09-11  5:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion

On Fri, Sep 11, 2020 at 12:47:04AM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of counting bio segments, count the number of bytes submitted.
> This insulates us from the block layer's definition of what a 'same page'
> is, which is not necessarily clear once THPs are involved.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 9/9] iomap: Change calling convention for zeroing
  2020-09-10 23:47 ` [PATCH v2 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
@ 2020-09-11  6:42   ` Christoph Hellwig
  2020-09-17 22:05   ` Darrick J. Wong
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2020-09-11  6:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion

On Fri, Sep 11, 2020 at 12:47:07AM +0100, Matthew Wilcox (Oracle) wrote:
> Pass the full length to iomap_zero() and dax_iomap_zero(), and have
> them return how many bytes they actually handled.  This is preparatory
> work for handling THP, although it looks like DAX could actually take
> advantage of it if there's a larger contiguous area.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/9] fs: Introduce i_blocks_per_page
  2020-09-10 23:47 ` [PATCH v2 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
@ 2020-09-15 14:58   ` Dave Kleikamp
  2020-09-15 15:40   ` David Laight
  1 sibling, 0 replies; 32+ messages in thread
From: Dave Kleikamp @ 2020-09-15 14:58 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-xfs, linux-fsdevel
  Cc: Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig, Dave Chinner

On 9/10/20 6:47 PM, Matthew Wilcox (Oracle) wrote:
> This helper is useful for both THPs and for supporting block size larger
> than page size.  Convert all users that I could find (we have a few
> different ways of writing this idiom, and I may have missed some).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

For jfs:
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>

> ---
>  fs/iomap/buffered-io.c  |  8 ++++----
>  fs/jfs/jfs_metapage.c   |  2 +-
>  fs/xfs/xfs_aops.c       |  2 +-
>  include/linux/pagemap.h | 16 ++++++++++++++++
>  4 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d81a9a86c5aa..330f86b825d7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
>  
> -	if (iop || i_blocksize(inode) == PAGE_SIZE)
> +	if (iop || i_blocks_per_page(inode, page) <= 1)
>  		return iop;
>  
>  	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> @@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  	unsigned int i;
>  
>  	spin_lock_irqsave(&iop->uptodate_lock, flags);
> -	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> +	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
>  		if (i >= first && i <= last)
>  			set_bit(i, iop->uptodate);
>  		else if (!test_bit(i, iop->uptodate))
> @@ -1077,7 +1077,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
>  		mapping_set_error(inode->i_mapping, -EIO);
>  	}
>  
> -	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
>  	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
>  
>  	if (!iop || atomic_dec_and_test(&iop->write_count))
> @@ -1373,7 +1373,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	int error = 0, count = 0, i;
>  	LIST_HEAD(submit_list);
>  
> -	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
> +	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
>  	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
>  
>  	/*
> diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
> index a2f5338a5ea1..176580f54af9 100644
> --- a/fs/jfs/jfs_metapage.c
> +++ b/fs/jfs/jfs_metapage.c
> @@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
>  	struct inode *inode = page->mapping->host;
>  	struct bio *bio = NULL;
>  	int block_offset;
> -	int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
> +	int blocks_per_page = i_blocks_per_page(inode, page);
>  	sector_t page_start;	/* address of page in fs blocks */
>  	sector_t pblock;
>  	int xlen;
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index b35611882ff9..55d126d4e096 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -544,7 +544,7 @@ xfs_discard_page(
>  			page, ip->i_ino, offset);
>  
>  	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> -			PAGE_SIZE / i_blocksize(inode));
> +			i_blocks_per_page(inode, page));
>  	if (error && !XFS_FORCED_SHUTDOWN(mp))
>  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  out_invalidate:
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 50d2c39b47ab..f7f602040913 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -975,4 +975,20 @@ static inline int page_mkwrite_check_truncate(struct page *page,
>  	return offset;
>  }
>  
> +/**
> + * i_blocks_per_page - How many blocks fit in this page.
> + * @inode: The inode which contains the blocks.
> + * @page: The page (head page if the page is a THP).
> + *
> + * If the block size is larger than the size of this page, return zero.
> + *
> + * Context: The caller should hold a refcount on the page to prevent it
> + * from being split.
> + * Return: The number of filesystem blocks covered by this page.
> + */
> +static inline
> +unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
> +{
> +	return thp_size(page) >> inode->i_blkbits;
> +}
>  #endif /* _LINUX_PAGEMAP_H */
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 2/9] fs: Introduce i_blocks_per_page
  2020-09-10 23:47 ` [PATCH v2 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
  2020-09-15 14:58   ` Dave Kleikamp
@ 2020-09-15 15:40   ` David Laight
  2020-09-15 15:49     ` Matthew Wilcox
  1 sibling, 1 reply; 32+ messages in thread
From: David Laight @ 2020-09-15 15:40 UTC (permalink / raw)
  To: 'Matthew Wilcox (Oracle)', linux-xfs, linux-fsdevel
  Cc: Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Christoph Hellwig, Dave

From: Matthew Wilcox (Oracle)
> Sent: 11 September 2020 00:47
> This helper is useful for both THPs and for supporting block size larger
> than page size.  Convert all users that I could find (we have a few
> different ways of writing this idiom, and I may have missed some).
> 
...
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d81a9a86c5aa..330f86b825d7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
> 
> -	if (iop || i_blocksize(inode) == PAGE_SIZE)
> +	if (iop || i_blocks_per_page(inode, page) <= 1)
>  		return iop;
> 
>  	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> @@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  	unsigned int i;
> 
>  	spin_lock_irqsave(&iop->uptodate_lock, flags);
> -	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> +	for (i = 0; i < i_blocks_per_page(inode, page); i++) {

You probably don't want to call the helper every time
around the loop.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/9] fs: Introduce i_blocks_per_page
  2020-09-15 15:40   ` David Laight
@ 2020-09-15 15:49     ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-09-15 15:49 UTC (permalink / raw)
  To: David Laight
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Christoph Hellwig, Dave Chinner

On Tue, Sep 15, 2020 at 03:40:52PM +0000, David Laight wrote:
> > @@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
> >  	unsigned int i;
> > 
> >  	spin_lock_irqsave(&iop->uptodate_lock, flags);
> > -	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> > +	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
> 
> You probably don't want to call the helper every time
> around the loop.

This is a classic example of focusing on the details and missing the
larger picture.  We don't want the loop at all, and if you'd kept reading
the patch series, you'd see it disappear later.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
  2020-09-11  5:36   ` Christoph Hellwig
@ 2020-09-17 22:00   ` Darrick J. Wong
  2020-09-22 16:23   ` Qian Cai
  2 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-17 22:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion, Dave Chinner

On Fri, Sep 11, 2020 at 12:47:03AM +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically to support larger pages in the
> page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> but with a 2MB maximum page size, we'd have to allocate more than 4kB
> per page.  Add a few debugging assertions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7fc0e02d27b0..9670c096b83e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -22,18 +22,25 @@
>  #include "../internal.h"
>  
>  /*
> - * Structure allocated for each page when block size < PAGE_SIZE to track
> - * sub-page uptodate status and I/O completions.
> + * Structure allocated for each page or THP when block size < page size
> + * to track sub-page uptodate status and I/O completions.
>   */
>  struct iomap_page {
>  	atomic_t		read_count;
>  	atomic_t		write_count;
>  	spinlock_t		uptodate_lock;
> -	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
> +	unsigned long		uptodate[];
>  };
>  
>  static inline struct iomap_page *to_iomap_page(struct page *page)
>  {
> +	/*
> +	 * per-block data is stored in the head page.  Callers should
> +	 * not be dealing with tail pages (and if they are, they can
> +	 * call thp_head() first.
> +	 */
> +	VM_BUG_ON_PGFLAGS(PageTail(page), page);
> +
>  	if (page_has_private(page))
>  		return (struct iomap_page *)page_private(page);
>  	return NULL;
> @@ -45,11 +52,13 @@ static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
> +	unsigned int nr_blocks = i_blocks_per_page(inode, page);
>  
> -	if (iop || i_blocks_per_page(inode, page) <= 1)
> +	if (iop || nr_blocks <= 1)
>  		return iop;
>  
> -	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> +	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
> +			GFP_NOFS | __GFP_NOFAIL);
>  	spin_lock_init(&iop->uptodate_lock);
>  	attach_page_private(page, iop);
>  	return iop;
> @@ -59,11 +68,14 @@ static void
>  iomap_page_release(struct page *page)
>  {
>  	struct iomap_page *iop = detach_page_private(page);
> +	unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
>  
>  	if (!iop)
>  		return;
>  	WARN_ON_ONCE(atomic_read(&iop->read_count));
>  	WARN_ON_ONCE(atomic_read(&iop->write_count));
> +	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> +			PageUptodate(page));
>  	kfree(iop);
>  }
>  
> -- 
> 2.28.0
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending
  2020-09-10 23:47 ` [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending Matthew Wilcox (Oracle)
  2020-09-11  5:36   ` Christoph Hellwig
@ 2020-09-17 22:02   ` Darrick J. Wong
  1 sibling, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-17 22:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion

On Fri, Sep 11, 2020 at 12:47:04AM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of counting bio segments, count the number of bytes submitted.
> This insulates us from the block layer's definition of what a 'same page'
> is, which is not necessarily clear once THPs are involved.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 41 ++++++++++++-----------------------------
>  1 file changed, 12 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9670c096b83e..1cf976a8e55c 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -26,7 +26,7 @@
>   * to track sub-page uptodate status and I/O completions.
>   */
>  struct iomap_page {
> -	atomic_t		read_count;
> +	atomic_t		read_bytes_pending;
>  	atomic_t		write_count;
>  	spinlock_t		uptodate_lock;
>  	unsigned long		uptodate[];
> @@ -72,7 +72,7 @@ iomap_page_release(struct page *page)
>  
>  	if (!iop)
>  		return;
> -	WARN_ON_ONCE(atomic_read(&iop->read_count));
> +	WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
>  	WARN_ON_ONCE(atomic_read(&iop->write_count));
>  	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
>  			PageUptodate(page));
> @@ -167,13 +167,6 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  		SetPageUptodate(page);
>  }
>  
> -static void
> -iomap_read_finish(struct iomap_page *iop, struct page *page)
> -{
> -	if (!iop || atomic_dec_and_test(&iop->read_count))
> -		unlock_page(page);
> -}
> -
>  static void
>  iomap_read_page_end_io(struct bio_vec *bvec, int error)
>  {
> @@ -187,7 +180,8 @@ iomap_read_page_end_io(struct bio_vec *bvec, int error)
>  		iomap_set_range_uptodate(page, bvec->bv_offset, bvec->bv_len);
>  	}
>  
> -	iomap_read_finish(iop, page);
> +	if (!iop || atomic_sub_and_test(bvec->bv_len, &iop->read_bytes_pending))
> +		unlock_page(page);
>  }
>  
>  static void
> @@ -267,30 +261,19 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	}
>  
>  	ctx->cur_page_in_bio = true;
> +	if (iop)
> +		atomic_add(plen, &iop->read_bytes_pending);
>  
> -	/*
> -	 * Try to merge into a previous segment if we can.
> -	 */
> +	/* Try to merge into a previous segment if we can */
>  	sector = iomap_sector(iomap, pos);
> -	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
> +	if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
> +		if (__bio_try_merge_page(ctx->bio, page, plen, poff,
> +				&same_page))
> +			goto done;
>  		is_contig = true;
> -
> -	if (is_contig &&
> -	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page)) {
> -		if (!same_page && iop)
> -			atomic_inc(&iop->read_count);
> -		goto done;
>  	}
>  
> -	/*
> -	 * If we start a new segment we need to increase the read count, and we
> -	 * need to do so before submitting any previous full bio to make sure
> -	 * that we don't prematurely unlock the page.
> -	 */
> -	if (iop)
> -		atomic_inc(&iop->read_count);
> -
> -	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
> +	if (!is_contig || bio_full(ctx->bio, plen)) {
>  		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
>  		gfp_t orig_gfp = gfp;
>  		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -- 
> 2.28.0
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending
  2020-09-10 23:47 ` [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending Matthew Wilcox (Oracle)
@ 2020-09-17 22:02   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-17 22:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion, Christoph Hellwig

On Fri, Sep 11, 2020 at 12:47:05AM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of counting bio segments, count the number of bytes submitted.
> This insulates us from the block layer's definition of what a 'same page'
> is, which is not necessarily clear once THPs are involved.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 1cf976a8e55c..64a5cb383f30 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -27,7 +27,7 @@
>   */
>  struct iomap_page {
>  	atomic_t		read_bytes_pending;
> -	atomic_t		write_count;
> +	atomic_t		write_bytes_pending;
>  	spinlock_t		uptodate_lock;
>  	unsigned long		uptodate[];
>  };
> @@ -73,7 +73,7 @@ iomap_page_release(struct page *page)
>  	if (!iop)
>  		return;
>  	WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
> -	WARN_ON_ONCE(atomic_read(&iop->write_count));
> +	WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
>  	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
>  			PageUptodate(page));
>  	kfree(iop);
> @@ -1047,7 +1047,7 @@ EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
>  
>  static void
>  iomap_finish_page_writeback(struct inode *inode, struct page *page,
> -		int error)
> +		int error, unsigned int len)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
>  
> @@ -1057,9 +1057,9 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
>  	}
>  
>  	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
> -	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
> +	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) <= 0);
>  
> -	if (!iop || atomic_dec_and_test(&iop->write_count))
> +	if (!iop || atomic_sub_and_test(len, &iop->write_bytes_pending))
>  		end_page_writeback(page);
>  }
>  
> @@ -1093,7 +1093,8 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
>  
>  		/* walk each page on bio, ending page IO on them */
>  		bio_for_each_segment_all(bv, bio, iter_all)
> -			iomap_finish_page_writeback(inode, bv->bv_page, error);
> +			iomap_finish_page_writeback(inode, bv->bv_page, error,
> +					bv->bv_len);
>  		bio_put(bio);
>  	}
>  	/* The ioend has been freed by bio_put() */
> @@ -1309,8 +1310,8 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
>  
>  	merged = __bio_try_merge_page(wpc->ioend->io_bio, page, len, poff,
>  			&same_page);
> -	if (iop && !same_page)
> -		atomic_inc(&iop->write_count);
> +	if (iop)
> +		atomic_add(len, &iop->write_bytes_pending);
>  
>  	if (!merged) {
>  		if (bio_full(wpc->ioend->io_bio, len)) {
> @@ -1353,7 +1354,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	LIST_HEAD(submit_list);
>  
>  	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
> -	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
> +	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
>  
>  	/*
>  	 * Walk through the page to find areas to write back. If we run off the
> -- 
> 2.28.0
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 8/9] iomap: Convert iomap_write_end types
  2020-09-10 23:47 ` [PATCH v2 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
@ 2020-09-17 22:03   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-17 22:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion, Christoph Hellwig

On Fri, Sep 11, 2020 at 12:47:06AM +0100, Matthew Wilcox (Oracle) wrote:
> iomap_write_end cannot return an error, so switch it to return
> size_t instead of int and remove the error checking from the callers.
> Also convert the arguments to size_t from unsigned int, in case anyone
> ever wants to support a page size larger than 2GB.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

LGTM
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 31 ++++++++++++-------------------
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 64a5cb383f30..cb25a7b70401 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -663,9 +663,8 @@ iomap_set_page_dirty(struct page *page)
>  }
>  EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
>  
> -static int
> -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page)
> +static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> +		size_t copied, struct page *page)
>  {
>  	flush_dcache_page(page);
>  
> @@ -687,9 +686,8 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	return copied;
>  }
>  
> -static int
> -iomap_write_end_inline(struct inode *inode, struct page *page,
> -		struct iomap *iomap, loff_t pos, unsigned copied)
> +static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
> +		struct iomap *iomap, loff_t pos, size_t copied)
>  {
>  	void *addr;
>  
> @@ -705,13 +703,14 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  	return copied;
>  }
>  
> -static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> -		struct page *page, struct iomap *iomap, struct iomap *srcmap)
> +/* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
> +static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> +		size_t copied, struct page *page, struct iomap *iomap,
> +		struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	loff_t old_size = inode->i_size;
> -	int ret;
> +	size_t ret;
>  
>  	if (srcmap->type == IOMAP_INLINE) {
>  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> @@ -790,11 +789,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
>  
> -		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> +		copied = iomap_write_end(inode, pos, bytes, copied, page, iomap,
>  				srcmap);
> -		if (unlikely(status < 0))
> -			break;
> -		copied = status;
>  
>  		cond_resched();
>  
> @@ -868,11 +864,8 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
>  				srcmap);
> -		if (unlikely(status <= 0)) {
> -			if (WARN_ON_ONCE(status == 0))
> -				return -EIO;
> -			return status;
> -		}
> +		if (WARN_ON_ONCE(status == 0))
> +			return -EIO;
>  
>  		cond_resched();
>  
> -- 
> 2.28.0
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 9/9] iomap: Change calling convention for zeroing
  2020-09-10 23:47 ` [PATCH v2 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
  2020-09-11  6:42   ` Christoph Hellwig
@ 2020-09-17 22:05   ` Darrick J. Wong
  2020-09-17 22:11     ` Matthew Wilcox
  1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-17 22:05 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion

On Fri, Sep 11, 2020 at 12:47:07AM +0100, Matthew Wilcox (Oracle) wrote:
> Pass the full length to iomap_zero() and dax_iomap_zero(), and have
> them return how many bytes they actually handled.  This is preparatory
> work for handling THP, although it looks like DAX could actually take
> advantage of it if there's a larger contiguous area.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/dax.c               | 13 ++++++-------
>  fs/iomap/buffered-io.c | 33 +++++++++++++++------------------
>  include/linux/dax.h    |  3 +--
>  3 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 994ab66a9907..6ad346352a8c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1037,18 +1037,18 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
>  	return ret;
>  }
>  
> -int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> -		   struct iomap *iomap)
> +s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
>  {
>  	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
>  	pgoff_t pgoff;
>  	long rc, id;
>  	void *kaddr;
>  	bool page_aligned = false;
> -
> +	unsigned offset = offset_in_page(pos);
> +	unsigned size = min_t(u64, PAGE_SIZE - offset, length);
>  
>  	if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
> -	    IS_ALIGNED(size, PAGE_SIZE))
> +	    (size == PAGE_SIZE))
>  		page_aligned = true;
>  
>  	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
> @@ -1058,8 +1058,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
>  	id = dax_read_lock();
>  
>  	if (page_aligned)
> -		rc = dax_zero_page_range(iomap->dax_dev, pgoff,
> -					 size >> PAGE_SHIFT);
> +		rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
>  	else
>  		rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
>  	if (rc < 0) {
> @@ -1072,7 +1071,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
>  		dax_flush(iomap->dax_dev, kaddr + offset, size);
>  	}
>  	dax_read_unlock(id);
> -	return 0;
> +	return size;
>  }
>  
>  static loff_t
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index cb25a7b70401..3e1eb40a73fd 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -898,11 +898,13 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
> -static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> +static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page;
>  	int status;
> +	unsigned offset = offset_in_page(pos);
> +	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>  
>  	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
>  	if (status)
> @@ -914,38 +916,33 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
> -static loff_t
> -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> -		void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
> +		loff_t length, void *data, struct iomap *iomap,

Any reason not to change @length and the return value to s64?

--D

> +		struct iomap *srcmap)
>  {
>  	bool *did_zero = data;
>  	loff_t written = 0;
> -	int status;
>  
>  	/* already zeroed?  we're done. */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> -		return count;
> +		return length;
>  
>  	do {
> -		unsigned offset, bytes;
> -
> -		offset = offset_in_page(pos);
> -		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> +		s64 bytes;
>  
>  		if (IS_DAX(inode))
> -			status = dax_iomap_zero(pos, offset, bytes, iomap);
> +			bytes = dax_iomap_zero(pos, length, iomap);
>  		else
> -			status = iomap_zero(inode, pos, offset, bytes, iomap,
> -					srcmap);
> -		if (status < 0)
> -			return status;
> +			bytes = iomap_zero(inode, pos, length, iomap, srcmap);
> +		if (bytes < 0)
> +			return bytes;
>  
>  		pos += bytes;
> -		count -= bytes;
> +		length -= bytes;
>  		written += bytes;
>  		if (did_zero)
>  			*did_zero = true;
> -	} while (count > 0);
> +	} while (length > 0);
>  
>  	return written;
>  }
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 6904d4e0b2e0..951a851a0481 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -214,8 +214,7 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
>  int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
>  int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
>  				      pgoff_t index);
> -int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> -			struct iomap *iomap);
> +s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap);
>  static inline bool dax_mapping(struct address_space *mapping)
>  {
>  	return mapping->host && IS_DAX(mapping->host);
> -- 
> 2.28.0
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 9/9] iomap: Change calling convention for zeroing
  2020-09-17 22:05   ` Darrick J. Wong
@ 2020-09-17 22:11     ` Matthew Wilcox
  2020-09-17 22:18       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-09-17 22:11 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion

On Thu, Sep 17, 2020 at 03:05:00PM -0700, Darrick J. Wong wrote:
> > -static loff_t
> > -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> > -		void *data, struct iomap *iomap, struct iomap *srcmap)
> > +static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
> > +		loff_t length, void *data, struct iomap *iomap,
> 
> Any reason not to change @length and the return value to s64?

Because it's an actor, passed to iomap_apply, so its types have to match.
I can change that, but it'll be a separate patch series.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 9/9] iomap: Change calling convention for zeroing
  2020-09-17 22:11     ` Matthew Wilcox
@ 2020-09-17 22:18       ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-17 22:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, linux-nvdimm,
	linux-kernel, Dave Kleikamp, jfs-discussion

On Thu, Sep 17, 2020 at 11:11:15PM +0100, Matthew Wilcox wrote:
> On Thu, Sep 17, 2020 at 03:05:00PM -0700, Darrick J. Wong wrote:
> > > -static loff_t
> > > -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> > > -		void *data, struct iomap *iomap, struct iomap *srcmap)
> > > +static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
> > > +		loff_t length, void *data, struct iomap *iomap,
> > 
> > Any reason not to change @length and the return value to s64?
> 
> Because it's an actor, passed to iomap_apply, so its types have to match.
> I can change that, but it'll be a separate patch series.

Ah, right.  I seemingly forgot that. :(

Carry on.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
  2020-09-11  5:36   ` Christoph Hellwig
  2020-09-17 22:00   ` Darrick J. Wong
@ 2020-09-22 16:23   ` Qian Cai
  2020-09-22 17:05     ` Matthew Wilcox
  2 siblings, 1 reply; 32+ messages in thread
From: Qian Cai @ 2020-09-22 16:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-xfs, linux-fsdevel
  Cc: Darrick J . Wong, Christoph Hellwig, linux-nvdimm, linux-kernel,
	Dave Kleikamp, jfs-discussion, Dave Chinner, Stephen Rothwell,
	linux-next

On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically to support larger pages in the
> page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> but with a 2MB maximum page size, we'd have to allocate more than 4kB
> per page.  Add a few debugging assertions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Some syscall fuzzing will trigger this on powerpc:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

[ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-io.c:78 iomap_page_release+0x250/0x270
[ 8805.895376][T445431] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci mdio libphy firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
[ 8805.895521][T445431] CPU: 61 PID: 445431 Comm: trinity-c61 Not tainted 5.9.0-rc6-next-20200922 #3
[ 8805.895551][T445431] NIP:  c0000000004734a0 LR: c00000000047335c CTR: 0000000000000000
[ 8805.895571][T445431] REGS: c000001fe5427620 TRAP: 0700   Not tainted  (5.9.0-rc6-next-20200922)
[ 8805.895609][T445431] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002244  XER: 20040000
[ 8805.895671][T445431] CFAR: c000000000473394 IRQMASK: 0 
[ 8805.895671][T445431] GPR00: c00000000029b698 c000001fe54278b0 c000000005687e00 0000000000000000 
[ 8805.895671][T445431] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[ 8805.895671][T445431] GPR08: c00c000002c4b5c7 007fff8000000015 0000000000000001 c000000005f5e028 
[ 8805.895671][T445431] GPR12: 0000000000002000 c000001ffffc0580 0000000010037be0 00000000109ec8e0 
[ 8805.895671][T445431] GPR16: 00000000109eccb0 c000001fe54279c0 c000001fe5427a40 0000000000000000 
[ 8805.895671][T445431] GPR20: c000000000a4aa80 0000000000000008 0000000000000000 000000000000000f 
[ 8805.895671][T445431] GPR24: ffffffffffffffff c000001fe54279b8 c000001fe5427940 0000000000000001 
[ 8805.895671][T445431] GPR28: 0000000000000001 c0000014dd4f9128 0000000000000000 c00c000006273a80 
[ 8805.895989][T445431] NIP [c0000000004734a0] iomap_page_release+0x250/0x270
[ 8805.896016][T445431] LR [c00000000047335c] iomap_page_release+0x10c/0x270
[ 8805.896041][T445431] Call Trace:
[ 8805.896065][T445431] [c000001fe54278b0] [c000001fe5427940] 0xc000001fe5427940 (unreliable)
[ 8805.896101][T445431] [c000001fe54278f0] [c00000000029b698] truncate_cleanup_page+0x188/0x2e0
[ 8805.896144][T445431] [c000001fe5427920] [c00000000029c61c] truncate_inode_pages_range+0x23c/0x9f0
[ 8805.896187][T445431] [c000001fe5427b40] [c00000000029ceec] truncate_pagecache+0x5c/0x90
[ 8805.896229][T445431] [c000001fe5427b80] [c000000000569858] xfs_setattr_size+0xb8/0x5d0
[ 8805.896270][T445431] [c000001fe5427c10] [c00000000056a26c] xfs_vn_setattr+0x8c/0x130
[ 8805.896321][T445431] [c000001fe5427c60] [c0000000003dbe60] notify_change+0x390/0x5b0
[ 8805.896372][T445431] [c000001fe5427cd0] [c0000000003a5294] do_truncate+0x94/0x130
[ 8805.896412][T445431] [c000001fe5427d60] [c0000000003a57a8] do_sys_ftruncate+0xe8/0x160
[ 8805.896455][T445431] [c000001fe5427dc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 8805.896496][T445431] [c000001fe5427e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 8805.896544][T445431] Instruction dump:
[ 8805.896569][T445431] 3c82fb3a 388490c8 4be65be1 60000000 0fe00000 60000000 60000000 60000000 
[ 8805.896623][T445431] 0fe00000 4bfffea8 60000000 60000000 <0fe00000> 4bfffef4 60000000 60000000 
[ 8805.896678][T445431] CPU: 61 PID: 445431 Comm: trinity-c61 Not tainted 5.9.0-rc6-next-20200922 #3
[ 8805.896716][T445431] Call Trace:
[ 8805.896750][T445431] [c000001fe5427410] [c0000000006440e8] dump_stack+0xec/0x144 (unreliable)
[ 8805.896796][T445431] [c000001fe5427450] [c0000000000b0a24] __warn+0xc4/0x144
[ 8805.896837][T445431] [c000001fe54274e0] [c000000000642cf8] report_bug+0x108/0x1f0
[ 8805.896878][T445431] [c000001fe5427580] [c000000000021714] program_check_exception+0x104/0x2e0
[ 8805.896922][T445431] [c000001fe54275b0] [c000000000009664] program_check_common_virt+0x2c4/0x310
[ 8805.896965][T445431] --- interrupt: 700 at iomap_page_release+0x250/0x270
[ 8805.896965][T445431]     LR = iomap_page_release+0x10c/0x270
[ 8805.897018][T445431] [c000001fe54278b0] [c000001fe5427940] 0xc000001fe5427940 (unreliable)
[ 8805.897053][T445431] [c000001fe54278f0] [c00000000029b698] truncate_cleanup_page+0x188/0x2e0
[ 8805.897104][T445431] [c000001fe5427920] [c00000000029c61c] truncate_inode_pages_range+0x23c/0x9f0
[ 8805.897146][T445431] [c000001fe5427b40] [c00000000029ceec] truncate_pagecache+0x5c/0x90
[ 8805.897186][T445431] [c000001fe5427b80] [c000000000569858] xfs_setattr_size+0xb8/0x5d0
[ 8805.897226][T445431] [c000001fe5427c10] [c00000000056a26c] xfs_vn_setattr+0x8c/0x130
[ 8805.897266][T445431] [c000001fe5427c60] [c0000000003dbe60] notify_change+0x390/0x5b0
[ 8805.897307][T445431] [c000001fe5427cd0] [c0000000003a5294] do_truncate+0x94/0x130
[ 8805.897329][T445431] [c000001fe5427d60] [c0000000003a57a8] do_sys_ftruncate+0xe8/0x160
[ 8805.897369][T445431] [c000001fe5427dc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 8805.897410][T445431] [c000001fe5427e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 8805.897448][T445431] irq event stamp: 360936
[ 8805.897475][T445431] hardirqs last  enabled at (360935): [<c0000000002bd5b8>] dec_zone_page_state+0x78/0xb0
[ 8805.897515][T445431] hardirqs last disabled at (360936): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
[ 8805.897556][T445431] softirqs last  enabled at (342438): [<c0000000008d3d38>] __do_softirq+0x378/0x6e4
[ 8805.897597][T445431] softirqs last disabled at (342433): [<c0000000000bc838>] irq_exit+0x118/0x140

> ---
>  fs/iomap/buffered-io.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7fc0e02d27b0..9670c096b83e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -22,18 +22,25 @@
>  #include "../internal.h"
>  
>  /*
> - * Structure allocated for each page when block size < PAGE_SIZE to track
> - * sub-page uptodate status and I/O completions.
> + * Structure allocated for each page or THP when block size < page size
> + * to track sub-page uptodate status and I/O completions.
>   */
>  struct iomap_page {
>  	atomic_t		read_count;
>  	atomic_t		write_count;
>  	spinlock_t		uptodate_lock;
> -	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
> +	unsigned long		uptodate[];
>  };
>  
>  static inline struct iomap_page *to_iomap_page(struct page *page)
>  {
> +	/*
> +	 * per-block data is stored in the head page.  Callers should
> +	 * not be dealing with tail pages (and if they are, they can
> +	 * call thp_head() first.
> +	 */
> +	VM_BUG_ON_PGFLAGS(PageTail(page), page);
> +
>  	if (page_has_private(page))
>  		return (struct iomap_page *)page_private(page);
>  	return NULL;
> @@ -45,11 +52,13 @@ static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
> +	unsigned int nr_blocks = i_blocks_per_page(inode, page);
>  
> -	if (iop || i_blocks_per_page(inode, page) <= 1)
> +	if (iop || nr_blocks <= 1)
>  		return iop;
>  
> -	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
> +	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
> +			GFP_NOFS | __GFP_NOFAIL);
>  	spin_lock_init(&iop->uptodate_lock);
>  	attach_page_private(page, iop);
>  	return iop;
> @@ -59,11 +68,14 @@ static void
>  iomap_page_release(struct page *page)
>  {
>  	struct iomap_page *iop = detach_page_private(page);
> +	unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
>  
>  	if (!iop)
>  		return;
>  	WARN_ON_ONCE(atomic_read(&iop->read_count));
>  	WARN_ON_ONCE(atomic_read(&iop->write_count));
> +	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> +			PageUptodate(page));
>  	kfree(iop);
>  }
>  
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-22 16:23   ` Qian Cai
@ 2020-09-22 17:05     ` Matthew Wilcox
  2020-09-22 17:25       ` Qian Cai
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-09-22 17:05 UTC (permalink / raw)
  To: Qian Cai
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > Size the uptodate array dynamically to support larger pages in the
> > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > per page.  Add a few debugging assertions.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> Some syscall fuzzing will trigger this on powerpc:
> 
> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> 
> [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-io.c:78 iomap_page_release+0x250/0x270

Well, I'm glad it triggered.  That warning is:
        WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
                        PageUptodate(page));
so there was definitely a problem of some kind.

truncate_cleanup_page() calls
do_invalidatepage() calls
iomap_invalidatepage() calls
iomap_page_release()

Is this the first warning?  I'm wondering if maybe there was an I/O error
earlier which caused PageUptodate to get cleared again.  If it's easy to
reproduce, perhaps you could try something like this?

+void dump_iomap_page(struct page *page, const char *reason)
+{
+       struct iomap_page *iop = to_iomap_page(page);
+       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
+
+       dump_page(page, reason);
+       if (iop)
+               printk("iop:reads %d writes %d uptodate %*pb\n",
+                               atomic_read(&iop->read_bytes_pending),
+                               atomic_read(&iop->write_bytes_pending),
+                               nr_blocks, iop->uptodate);
+       else
+               printk("iop:none\n");
+}

and then do something like:

	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
		dump_iomap_page(page, NULL);
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-22 17:05     ` Matthew Wilcox
@ 2020-09-22 17:25       ` Qian Cai
  2020-09-23  1:06       ` Qian Cai
  2020-09-24  1:07       ` Matthew Wilcox
  2 siblings, 0 replies; 32+ messages in thread
From: Qian Cai @ 2020-09-22 17:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Darrick J .,
	Wong,  <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@infradead.org>, ,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > Size the uptodate array dynamically to support larger pages in the
> > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > per page.  Add a few debugging assertions.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Some syscall fuzzing will trigger this on powerpc:
> > 
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > 
> > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > io.c:78 iomap_page_release+0x250/0x270
> 
> Well, I'm glad it triggered.  That warning is:
>         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
>                         PageUptodate(page));
> so there was definitely a problem of some kind.
> 
> truncate_cleanup_page() calls
> do_invalidatepage() calls
> iomap_invalidatepage() calls
> iomap_page_release()
> 
> Is this the first warning?  I'm wondering if maybe there was an I/O error
> earlier which caused PageUptodate to get cleared again.  If it's easy to
> reproduce, perhaps you could try something like this?

Yes, this is the first warning. BTW, I did run the reproducer of a805c111650c
("iomap: fix WARN_ON_ONCE() from unprivileged users") earlier, so I am wondering
if this is just another victim WARN_ON_ONCE() from it.

> 
> +void dump_iomap_page(struct page *page, const char *reason)
> +{
> +       struct iomap_page *iop = to_iomap_page(page);
> +       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> +
> +       dump_page(page, reason);
> +       if (iop)
> +               printk("iop:reads %d writes %d uptodate %*pb\n",
> +                               atomic_read(&iop->read_bytes_pending),
> +                               atomic_read(&iop->write_bytes_pending),
> +                               nr_blocks, iop->uptodate);
> +       else
> +               printk("iop:none\n");
> +}
> 
> and then do something like:
> 
> 	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> 		dump_iomap_page(page, NULL);
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-22 17:05     ` Matthew Wilcox
  2020-09-22 17:25       ` Qian Cai
@ 2020-09-23  1:06       ` Qian Cai
  2020-09-23  2:48         ` Matthew Wilcox
  2020-09-24  1:07       ` Matthew Wilcox
  2 siblings, 1 reply; 32+ messages in thread
From: Qian Cai @ 2020-09-23  1:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Darrick J .,
	Wong,  <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@infradead.org>, ,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > Size the uptodate array dynamically to support larger pages in the
> > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > per page.  Add a few debugging assertions.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Some syscall fuzzing will trigger this on powerpc:
> > 
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > 
> > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > io.c:78 iomap_page_release+0x250/0x270
> 
> Well, I'm glad it triggered.  That warning is:
>         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
>                         PageUptodate(page));
> so there was definitely a problem of some kind.
> 
> truncate_cleanup_page() calls
> do_invalidatepage() calls
> iomap_invalidatepage() calls
> iomap_page_release()
> 
> Is this the first warning?  I'm wondering if maybe there was an I/O error
> earlier which caused PageUptodate to get cleared again.  If it's easy to
> reproduce, perhaps you could try something like this?
> 
> +void dump_iomap_page(struct page *page, const char *reason)
> +{
> +       struct iomap_page *iop = to_iomap_page(page);
> +       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> +
> +       dump_page(page, reason);
> +       if (iop)
> +               printk("iop:reads %d writes %d uptodate %*pb\n",
> +                               atomic_read(&iop->read_bytes_pending),
> +                               atomic_read(&iop->write_bytes_pending),
> +                               nr_blocks, iop->uptodate);
> +       else
> +               printk("iop:none\n");
> +}
> 
> and then do something like:
> 
> 	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> 		dump_iomap_page(page, NULL);

This:

[ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
[ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
[ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
[ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
[ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
[ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
[ 1683.158506][T164965] iop:none

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-23  1:06       ` Qian Cai
@ 2020-09-23  2:48         ` Matthew Wilcox
  2020-09-23  5:00           ` Darrick J. Wong
  2020-09-23 16:55           ` Qian Cai
  0 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-09-23  2:48 UTC (permalink / raw)
  To: Qian Cai
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > Size the uptodate array dynamically to support larger pages in the
> > > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > per page.  Add a few debugging assertions.
> > > > 
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Some syscall fuzzing will trigger this on powerpc:
> > > 
> > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > 
> > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > io.c:78 iomap_page_release+0x250/0x270
> > 
> > Well, I'm glad it triggered.  That warning is:
> >         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> >                         PageUptodate(page));
> > so there was definitely a problem of some kind.
> > 
> > truncate_cleanup_page() calls
> > do_invalidatepage() calls
> > iomap_invalidatepage() calls
> > iomap_page_release()
> > 
> > Is this the first warning?  I'm wondering if maybe there was an I/O error
> > earlier which caused PageUptodate to get cleared again.  If it's easy to
> > reproduce, perhaps you could try something like this?
> > 
> > +void dump_iomap_page(struct page *page, const char *reason)
> > +{
> > +       struct iomap_page *iop = to_iomap_page(page);
> > +       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> > +
> > +       dump_page(page, reason);
> > +       if (iop)
> > +               printk("iop:reads %d writes %d uptodate %*pb\n",
> > +                               atomic_read(&iop->read_bytes_pending),
> > +                               atomic_read(&iop->write_bytes_pending),
> > +                               nr_blocks, iop->uptodate);
> > +       else
> > +               printk("iop:none\n");
> > +}
> > 
> > and then do something like:
> > 
> > 	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > 		dump_iomap_page(page, NULL);
> 
> This:
> 
> [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
> [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
> [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
> [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
> [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
> [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> [ 1683.158506][T164965] iop:none

Oh, I'm a fool.  This is after the call to detach_page_private() so
page->private is NULL and we don't get the iop dumped.

Nevertheless, this is interesting.  Somehow, the page is marked Uptodate,
but the bitmap is deemed not full.  There are three places where we set
an iomap page Uptodate:

1.      if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
                SetPageUptodate(page);

2.      if (page_has_private(page))
                iomap_iop_set_range_uptodate(page, off, len);
        else
                SetPageUptodate(page);

3.      BUG_ON(page->index);
...
        SetPageUptodate(page);

It can't be #2 because the page has an iop.  It can't be #3 because the
page->index is not 0.  So at some point in the past, the bitmap was full.

I don't think it's possible for inode->i_blksize to change, and you
aren't running with THPs, so it's definitely not possible for thp_size()
to change.  So i_blocks_per_page() isn't going to change.

We seem to have allocated enough memory for ->iop because that's also
based on i_blocks_per_page().

I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
I've been trying to reproduce this on x86 with a 1kB block size
filesystem, and haven't been able to yet.  Maybe I'll try to setup a
powerpc cross-compilation environment tomorrow.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-23  2:48         ` Matthew Wilcox
@ 2020-09-23  5:00           ` Darrick J. Wong
  2020-09-23 18:59             ` Matthew Wilcox
  2020-09-23 16:55           ` Qian Cai
  1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2020-09-23  5:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Qian Cai, linux-xfs, linux-fsdevel, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Wed, Sep 23, 2020 at 03:48:59AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> > On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > > Size the uptodate array dynamically to support larger pages in the
> > > > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > > per page.  Add a few debugging assertions.
> > > > > 
> > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Some syscall fuzzing will trigger this on powerpc:
> > > > 
> > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > > 
> > > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > > io.c:78 iomap_page_release+0x250/0x270
> > > 
> > > Well, I'm glad it triggered.  That warning is:
> > >         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > >                         PageUptodate(page));
> > > so there was definitely a problem of some kind.
> > > 
> > > truncate_cleanup_page() calls
> > > do_invalidatepage() calls
> > > iomap_invalidatepage() calls
> > > iomap_page_release()
> > > 
> > > Is this the first warning?  I'm wondering if maybe there was an I/O error
> > > earlier which caused PageUptodate to get cleared again.  If it's easy to
> > > reproduce, perhaps you could try something like this?
> > > 
> > > +void dump_iomap_page(struct page *page, const char *reason)
> > > +{
> > > +       struct iomap_page *iop = to_iomap_page(page);
> > > +       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> > > +
> > > +       dump_page(page, reason);
> > > +       if (iop)
> > > +               printk("iop:reads %d writes %d uptodate %*pb\n",
> > > +                               atomic_read(&iop->read_bytes_pending),
> > > +                               atomic_read(&iop->write_bytes_pending),
> > > +                               nr_blocks, iop->uptodate);
> > > +       else
> > > +               printk("iop:none\n");
> > > +}
> > > 
> > > and then do something like:
> > > 
> > > 	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > > 		dump_iomap_page(page, NULL);
> > 
> > This:
> > 
> > [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
> > [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
> > [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
> > [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
> > [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
> > [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> > [ 1683.158506][T164965] iop:none
> 
> Oh, I'm a fool.  This is after the call to detach_page_private() so
> page->private is NULL and we don't get the iop dumped.
> 
> Nevertheless, this is interesting.  Somehow, the page is marked Uptodate,
> but the bitmap is deemed not full.  There are three places where we set
> an iomap page Uptodate:
> 
> 1.      if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
>                 SetPageUptodate(page);
> 
> 2.      if (page_has_private(page))
>                 iomap_iop_set_range_uptodate(page, off, len);
>         else
>                 SetPageUptodate(page);
> 
> 3.      BUG_ON(page->index);
> ...
>         SetPageUptodate(page);
> 
> It can't be #2 because the page has an iop.  It can't be #3 because the
> page->index is not 0.  So at some point in the past, the bitmap was full.
> 
> I don't think it's possible for inode->i_blksize to change, and you
> aren't running with THPs, so it's definitely not possible for thp_size()
> to change.  So i_blocks_per_page() isn't going to change.
> 
> We seem to have allocated enough memory for ->iop because that's also
> based on i_blocks_per_page().
> 
> I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
> I've been trying to reproduce this on x86 with a 1kB block size
> filesystem, and haven't been able to yet.  Maybe I'll try to setup a
> powerpc cross-compilation environment tomorrow.

FWIW I managed to reproduce it with the following fstests configuration
on a 1k block size fs on a x86 machinE:

SECTION      -- -no-sections-
FSTYP        -- xfs
MKFS_OPTIONS --  -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024
MOUNT_OPTIONS --  -o usrquota,grpquota,prjquota
HOST_OPTIONS -- local.config
CHECK_OPTIONS -- -g auto
XFS_MKFS_OPTIONS -- -bsize=4096
TIME_FACTOR  -- 1
LOAD_FACTOR  -- 1
TEST_DIR     -- /mnt
TEST_DEV     -- /dev/sde
SCRATCH_DEV  -- /dev/sdd
SCRATCH_MNT  -- /opt
OVL_UPPER    -- ovl-upper
OVL_LOWER    -- ovl-lower
OVL_WORK     -- ovl-work
KERNEL       -- 5.9.0-rc4-djw

The kernel is more or less iomap-for-next.

--D
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-23  2:48         ` Matthew Wilcox
  2020-09-23  5:00           ` Darrick J. Wong
@ 2020-09-23 16:55           ` Qian Cai
  1 sibling, 0 replies; 32+ messages in thread
From: Qian Cai @ 2020-09-23 16:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Darrick J .,
	Wong,  <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@infradead.org>, ,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Wed, 2020-09-23 at 03:48 +0100, Matthew Wilcox wrote:
> I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
> I've been trying to reproduce this on x86 with a 1kB block size
> filesystem, and haven't been able to yet.  Maybe I'll try to setup a
> powerpc cross-compilation environment tomorrow.

Another data point is that this can be reproduced on both arm64 and powerpc
(both have 64k-size pages) by running this LTP test case:

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/preadv2/preadv203.c

[ 2397.723289] preadv203 (80525): drop_caches: 3
[ 2400.796402] XFS (loop0): Unmounting Filesystem
[ 2401.431293] ------------[ cut here ]------------
[ 2401.431411] WARNING: CPU: 0 PID: 80501 at fs/iomap/buffered-io.c:78 iomap_page_release+0x250/0x270
[ 2401.431435] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror dm_reg
ion_hash dm_log dm_mod
[ 2401.431534] CPU: 0 PID: 80501 Comm: preadv203 Not tainted 5.9.0-rc6-next-20200923+ #4
[ 2401.431567] NIP:  c000000000473b30 LR: c0000000004739ec CTR: c000000000473e80
[ 2401.431590] REGS: c0000011a7bbf7a0 TRAP: 0700   Not tainted  (5.9.0-rc6-next-20200923+)
[ 2401.431613] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44000244  XER: 20040000
[ 2401.431655] CFAR: c000000000473a24 IRQMASK: 0 
               GPR00: c00000000029b6a8 c0000011a7bbfa30 c000000007e87e00 0000000000000005 
               GPR04: 0000000000000000 0000000000000005 0000000000000005 0000000000000000 
               GPR08: c00c000806f1f587 087fff8000000037 0000000000000001 c00000000875e028 
               GPR12: c000000000473e80 c00000000c180000 0000000000000000 0000000000000000 
               GPR16: 0000000000000000 c0000011a7bbfbb0 c0000011a7bbfbc0 0000000000000000 
               GPR20: c000000000a4aad8 0000000000000000 0000000000000000 000000000000000f 
               GPR24: ffffffffffffffff c0000011a7bbfb38 c0000011a7bbfac0 0000000000000000 
               GPR28: 0000000000000001 c000000ea4fe0a28 0000000000000000 c00c0008071b46c0 
[ 2401.431856] NIP [c000000000473b30] iomap_page_release+0x250/0x270
[ 2401.431878] LR [c0000000004739ec] iomap_page_release+0x10c/0x270
[ 2401.431899] Call Trace:
[ 2401.431926] [c0000011a7bbfa30] [c0000011a7bbfac0] 0xc0000011a7bbfac0 (unreliable)
[ 2401.431962] [c0000011a7bbfa70] [c00000000029b6a8] truncate_cleanup_page+0x188/0x2e0
[ 2401.431987] [c0000011a7bbfaa0] [c00000000029c62c] truncate_inode_pages_range+0x23c/0x9f0
[ 2401.432023] [c0000011a7bbfcc0] [c0000000003d9588] evict+0x1e8/0x200
[ 2401.432055] [c0000011a7bbfd00] [c0000000003c64b0] do_unlinkat+0x2d0/0x3a0
[ 2401.432081] [c0000011a7bbfdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 2401.432097] [c0000011a7bbfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 2401.432120] Instruction dump:
[ 2401.432140] 3c82f8ba 388490b8 4be655b1 60000000 0fe00000 60000000 60000000 60000000 
[ 2401.432176] 0fe00000 4bfffea8 60000000 60000000 <0fe00000> 4bfffef4 60000000 60000000 
[ 2401.432213] CPU: 0 PID: 80501 Comm: preadv203 Not tainted 5.9.0-rc6-next-20200923+ #4
[ 2401.432245] Call Trace:
[ 2401.432255] [c0000011a7bbf590] [c000000000644778] dump_stack+0xec/0x144 (unreliable)
[ 2401.432284] [c0000011a7bbf5d0] [c0000000000b0a24] __warn+0xc4/0x144
[ 2401.432298] [c0000011a7bbf660] [c000000000643388] report_bug+0x108/0x1f0
[ 2401.432331] [c0000011a7bbf700] [c000000000021714] program_check_exception+0x104/0x2e0
[ 2401.432359] [c0000011a7bbf730] [c000000000009664] program_check_common_virt+0x2c4/0x310
[ 2401.432385] --- interrupt: 700 at iomap_page_release+0x250/0x270
                   LR = iomap_page_release+0x10c/0x270
[ 2401.432420] [c0000011a7bbfa30] [c0000011a7bbfac0] 0xc0000011a7bbfac0 (unreliable)
[ 2401.432446] [c0000011a7bbfa70] [c00000000029b6a8] truncate_cleanup_page+0x188/0x2e0
[ 2401.432470] [c0000011a7bbfaa0] [c00000000029c62c] truncate_inode_pages_range+0x23c/0x9f0
[ 2401.432505] [c0000011a7bbfcc0] [c0000000003d9588] evict+0x1e8/0x200
[ 2401.432528] [c0000011a7bbfd00] [c0000000003c64b0] do_unlinkat+0x2d0/0x3a0
[ 2401.432552] [c0000011a7bbfdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 2401.432576] [c0000011a7bbfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 2401.432599] irq event stamp: 210662
[ 2401.432619] hardirqs last  enabled at (210661): [<c0000000008d32bc>] _raw_spin_unlock_irq+0x3c/0x70
[ 2401.432652] hardirqs last disabled at (210662): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
[ 2401.432688] softirqs last  enabled at (210280): [<c000000000547de4>] xfs_buf_find.isra.31+0xb54/0x1060
[ 2401.432722] softirqs last disabled at (210278): [<c0000000005476fc>] xfs_buf_find.isra.31+0x46c/0x1060

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-23  5:00           ` Darrick J. Wong
@ 2020-09-23 18:59             ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-09-23 18:59 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Qian Cai, linux-xfs, linux-fsdevel, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Tue, Sep 22, 2020 at 10:00:01PM -0700, Darrick J. Wong wrote:
> On Wed, Sep 23, 2020 at 03:48:59AM +0100, Matthew Wilcox wrote:
> > On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> > > On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > > > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > > > Size the uptodate array dynamically to support larger pages in the
> > > > > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > > > per page.  Add a few debugging assertions.
> > > > > > 
> > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > Some syscall fuzzing will trigger this on powerpc:
> > > > > 
> > > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > > > 
> > > > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > > > io.c:78 iomap_page_release+0x250/0x270
> > > > 
> > > > Well, I'm glad it triggered.  That warning is:
> > > >         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > > >                         PageUptodate(page));
> > > > so there was definitely a problem of some kind.
> > > > 
> > > > truncate_cleanup_page() calls
> > > > do_invalidatepage() calls
> > > > iomap_invalidatepage() calls
> > > > iomap_page_release()
> > > > 
> > > > Is this the first warning?  I'm wondering if maybe there was an I/O error
> > > > earlier which caused PageUptodate to get cleared again.  If it's easy to
> > > > reproduce, perhaps you could try something like this?
> > > > 
> > > > +void dump_iomap_page(struct page *page, const char *reason)
> > > > +{
> > > > +       struct iomap_page *iop = to_iomap_page(page);
> > > > +       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> > > > +
> > > > +       dump_page(page, reason);
> > > > +       if (iop)
> > > > +               printk("iop:reads %d writes %d uptodate %*pb\n",
> > > > +                               atomic_read(&iop->read_bytes_pending),
> > > > +                               atomic_read(&iop->write_bytes_pending),
> > > > +                               nr_blocks, iop->uptodate);
> > > > +       else
> > > > +               printk("iop:none\n");
> > > > +}
> > > > 
> > > > and then do something like:
> > > > 
> > > > 	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > > > 		dump_iomap_page(page, NULL);
> > > 
> > > This:
> > > 
> > > [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
> > > [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
> > > [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
> > > [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
> > > [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
> > > [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> > > [ 1683.158506][T164965] iop:none
> > 
> > Oh, I'm a fool.  This is after the call to detach_page_private() so
> > page->private is NULL and we don't get the iop dumped.
> > 
> > Nevertheless, this is interesting.  Somehow, the page is marked Uptodate,
> > but the bitmap is deemed not full.  There are three places where we set
> > an iomap page Uptodate:
> > 
> > 1.      if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
> >                 SetPageUptodate(page);
> > 
> > 2.      if (page_has_private(page))
> >                 iomap_iop_set_range_uptodate(page, off, len);
> >         else
> >                 SetPageUptodate(page);
> > 
> > 3.      BUG_ON(page->index);
> > ...
> >         SetPageUptodate(page);
> > 
> > It can't be #2 because the page has an iop.  It can't be #3 because the
> > page->index is not 0.  So at some point in the past, the bitmap was full.
> > 
> > I don't think it's possible for inode->i_blksize to change, and you
> > aren't running with THPs, so it's definitely not possible for thp_size()
> > to change.  So i_blocks_per_page() isn't going to change.
> > 
> > We seem to have allocated enough memory for ->iop because that's also
> > based on i_blocks_per_page().
> > 
> > I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
> > I've been trying to reproduce this on x86 with a 1kB block size
> > filesystem, and haven't been able to yet.  Maybe I'll try to setup a
> > powerpc cross-compilation environment tomorrow.
> 
> FWIW I managed to reproduce it with the following fstests configuration
> on a 1k block size fs on a x86 machinE:
> 
> SECTION      -- -no-sections-
> FSTYP        -- xfs
> MKFS_OPTIONS --  -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024
> MOUNT_OPTIONS --  -o usrquota,grpquota,prjquota
> HOST_OPTIONS -- local.config
> CHECK_OPTIONS -- -g auto
> XFS_MKFS_OPTIONS -- -bsize=4096
> TIME_FACTOR  -- 1
> LOAD_FACTOR  -- 1
> TEST_DIR     -- /mnt
> TEST_DEV     -- /dev/sde
> SCRATCH_DEV  -- /dev/sdd
> SCRATCH_MNT  -- /opt
> OVL_UPPER    -- ovl-upper
> OVL_LOWER    -- ovl-lower
> OVL_WORK     -- ovl-work
> KERNEL       -- 5.9.0-rc4-djw

It just survived another 3-hour run for me:

FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 bobo-kvm 5.9.0-rc4 #40 SMP Tue Sep 22 14:18:21 EDT 2020
MKFS_OPTIONS  -- -f -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024 /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch

The only warning I hit was in generic/019:

0172 WARNING: CPU: 1 PID: 6933 at fs/iomap/buffered-io.c:997 iomap_page_mkwrite_actor+0x72/0x80

which is the:
                WARN_ON_ONCE(!PageUptodate(page));
that happens as a result of the ClearPageUptodate() in iomap_writepage_map()
which has been happening approximately forever.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
  2020-09-22 17:05     ` Matthew Wilcox
  2020-09-22 17:25       ` Qian Cai
  2020-09-23  1:06       ` Qian Cai
@ 2020-09-24  1:07       ` Matthew Wilcox
  2 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-09-24  1:07 UTC (permalink / raw)
  To: Qian Cai
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Christoph Hellwig,
	linux-nvdimm, linux-kernel, Dave Kleikamp, jfs-discussion,
	Dave Chinner, Stephen Rothwell, linux-next

On Tue, Sep 22, 2020 at 06:05:26PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > Size the uptodate array dynamically to support larger pages in the
> > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > per page.  Add a few debugging assertions.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Some syscall fuzzing will trigger this on powerpc:
> > 
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > 
> > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-io.c:78 iomap_page_release+0x250/0x270
> 
> Well, I'm glad it triggered.  That warning is:
>         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
>                         PageUptodate(page));
> so there was definitely a problem of some kind.

OK, I'm pretty sure the problem predated this commit, and it's simply
that I added a warning (looking for something else) that caught this.

I have a tree completly gunked up with debugging code now to try to
understand the problem better, but my guess is that if you put this
warning into a previous version, you'd see the same problem occurring
(and it is a real problem, because we skip writeback of parts of the
page which are !uptodate).
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-09-24  1:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
2020-09-10 23:46 ` [PATCH v2 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
2020-09-10 23:47 ` [PATCH v2 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
2020-09-15 14:58   ` Dave Kleikamp
2020-09-15 15:40   ` David Laight
2020-09-15 15:49     ` Matthew Wilcox
2020-09-10 23:47 ` [PATCH v2 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
2020-09-10 23:47 ` [PATCH v2 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
2020-09-11  5:36   ` Christoph Hellwig
2020-09-17 22:00   ` Darrick J. Wong
2020-09-22 16:23   ` Qian Cai
2020-09-22 17:05     ` Matthew Wilcox
2020-09-22 17:25       ` Qian Cai
2020-09-23  1:06       ` Qian Cai
2020-09-23  2:48         ` Matthew Wilcox
2020-09-23  5:00           ` Darrick J. Wong
2020-09-23 18:59             ` Matthew Wilcox
2020-09-23 16:55           ` Qian Cai
2020-09-24  1:07       ` Matthew Wilcox
2020-09-10 23:47 ` [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending Matthew Wilcox (Oracle)
2020-09-11  5:36   ` Christoph Hellwig
2020-09-17 22:02   ` Darrick J. Wong
2020-09-10 23:47 ` [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending Matthew Wilcox (Oracle)
2020-09-17 22:02   ` Darrick J. Wong
2020-09-10 23:47 ` [PATCH v2 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
2020-09-17 22:03   ` Darrick J. Wong
2020-09-10 23:47 ` [PATCH v2 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
2020-09-11  6:42   ` Christoph Hellwig
2020-09-17 22:05   ` Darrick J. Wong
2020-09-17 22:11     ` Matthew Wilcox
2020-09-17 22:18       ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).