linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iomap based buffered reads & iomap cleanups v4
@ 2018-05-30  9:58 Christoph Hellwig
  2018-05-30  9:58 ` [PATCH 01/13] block: add a lower-level bio_add_page interface Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Hi all,

this series adds support for buffered reads without buffer heads to
the iomap and XFS code.  It has been split from the larger series
for easier review.


A git tree is available at:

    git://git.infradead.org/users/hch/xfs.git xfs-iomap-read.4

Gitweb:

    http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-iomap-read.4

Changes since v3:
 - remove iomap_read_bio_alloc
 - set REQ_RAHEAD flag for readpages
 - better commen on the add_to_page_cache_lru semantics for readpages
 - move all write related patches to a separate series

Changes since v2:
 - minor page_seek_hole_data tweaks
 - don't read data entirely covered by the write operation in write_begin
 - fix zeroing on write_begin I/O failure
 - remove iomap_block_needs_zeroing to make the code more clear
 - update comments on __do_page_cache_readahead

Changes since v1:
 - fix the iomap_readpages error handling
 - use unsigned file offsets in a few places to avoid arithmetic overflows
 - allocate a iomap_page in iomap_page_mkwrite to fix generic/095
 - improve a few comments
 - add more asserts
 - warn about truncated block numbers from ->bmap
 - new patch to change the __do_page_cache_readahead return value to
   unsigned int
 - remove an incorrectly added empty line
 - make inline data an explicit iomap type instead of a flag
 - add a IOMAP_F_BUFFER_HEAD flag to force use of buffers heads for gfs2,
   and keep the basic buffer head infrastructure around for now.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/13] block: add a lower-level bio_add_page interface
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 10:12   ` Ming Lei
  2018-05-30  9:58 ` [PATCH 02/13] mm: give the 'ret' variable a better name __do_page_cache_readahead Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

For the upcoming removal of buffer heads in XFS we need to keep track of
the number of outstanding writeback requests per page.  For this we need
to know if bio_add_page merged a region with the previous bvec or not.
Instead of adding additional arguments this refactors bio_add_page to
be implemented using three lower level helpers which users like XFS can
use directly if they care about the merge decisions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 block/bio.c         | 96 +++++++++++++++++++++++++++++----------------
 include/linux/bio.h |  9 +++++
 2 files changed, 72 insertions(+), 33 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 53e0f0a1ed94..fdf635d42bbd 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -773,7 +773,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 			return 0;
 	}
 
-	if (bio->bi_vcnt >= bio->bi_max_vecs)
+	if (bio_full(bio))
 		return 0;
 
 	/*
@@ -821,52 +821,82 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 EXPORT_SYMBOL(bio_add_pc_page);
 
 /**
- *	bio_add_page	-	attempt to add page to bio
- *	@bio: destination bio
- *	@page: page to add
- *	@len: vec entry length
- *	@offset: vec entry offset
+ * __bio_try_merge_page - try appending data to an existing bvec.
+ * @bio: destination bio
+ * @page: page to add
+ * @len: length of the data to add
+ * @off: offset of the data in @page
  *
- *	Attempt to add a page to the bio_vec maplist. This will only fail
- *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
+ * Try to add the data at @page + @off to the last bvec of @bio.  This is a
+ * a useful optimisation for file systems with a block size smaller than the
+ * page size.
+ *
+ * Return %true on success or %false on failure.
  */
-int bio_add_page(struct bio *bio, struct page *page,
-		 unsigned int len, unsigned int offset)
+bool __bio_try_merge_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off)
 {
-	struct bio_vec *bv;
-
-	/*
-	 * cloned bio must not modify vec list
-	 */
 	if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
-		return 0;
+		return false;
 
-	/*
-	 * For filesystems with a blocksize smaller than the pagesize
-	 * we will often be called with the same page as last time and
-	 * a consecutive offset.  Optimize this special case.
-	 */
 	if (bio->bi_vcnt > 0) {
-		bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
-		if (page == bv->bv_page &&
-		    offset == bv->bv_offset + bv->bv_len) {
+		if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
 			bv->bv_len += len;
-			goto done;
+			bio->bi_iter.bi_size += len;
+			return true;
 		}
 	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(__bio_try_merge_page);
 
-	if (bio->bi_vcnt >= bio->bi_max_vecs)
-		return 0;
+/**
+ * __bio_add_page - add page to a bio in a new segment
+ * @bio: destination bio
+ * @page: page to add
+ * @len: length of the data to add
+ * @off: offset of the data in @page
+ *
+ * Add the data at @page + @off to @bio as a new bvec.  The caller must ensure
+ * that @bio has space for another bvec.
+ */
+void __bio_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off)
+{
+	struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
 
-	bv		= &bio->bi_io_vec[bio->bi_vcnt];
-	bv->bv_page	= page;
-	bv->bv_len	= len;
-	bv->bv_offset	= offset;
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+	WARN_ON_ONCE(bio_full(bio));
+
+	bv->bv_page = page;
+	bv->bv_offset = off;
+	bv->bv_len = len;
 
-	bio->bi_vcnt++;
-done:
 	bio->bi_iter.bi_size += len;
+	bio->bi_vcnt++;
+}
+EXPORT_SYMBOL_GPL(__bio_add_page);
+
+/**
+ *	bio_add_page	-	attempt to add page to bio
+ *	@bio: destination bio
+ *	@page: page to add
+ *	@len: vec entry length
+ *	@offset: vec entry offset
+ *
+ *	Attempt to add a page to the bio_vec maplist. This will only fail
+ *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
+ */
+int bio_add_page(struct bio *bio, struct page *page,
+		 unsigned int len, unsigned int offset)
+{
+	if (!__bio_try_merge_page(bio, page, len, offset)) {
+		if (bio_full(bio))
+			return 0;
+		__bio_add_page(bio, page, len, offset);
+	}
 	return len;
 }
 EXPORT_SYMBOL(bio_add_page);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ce547a25e8ae..3e73c8bc25ea 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -123,6 +123,11 @@ static inline void *bio_data(struct bio *bio)
 	return NULL;
 }
 
+static inline bool bio_full(struct bio *bio)
+{
+	return bio->bi_vcnt >= bio->bi_max_vecs;
+}
+
 /*
  * will die
  */
@@ -470,6 +475,10 @@ void bio_chain(struct bio *, struct bio *);
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
+bool __bio_try_merge_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off);
+void __bio_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/13] mm: give the 'ret' variable a better name __do_page_cache_readahead
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
  2018-05-30  9:58 ` [PATCH 01/13] block: add a lower-level bio_add_page interface Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:01   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 03/13] mm: return an unsigned int from __do_page_cache_readahead Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

It counts the number of pages acted on, so name it nr_pages to make that
obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/readahead.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 539bbb6c1fad..16d0cb1e2616 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -156,7 +156,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	unsigned long end_index;	/* The last page we want to read */
 	LIST_HEAD(page_pool);
 	int page_idx;
-	int ret = 0;
+	int nr_pages = 0;
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
 
@@ -187,7 +187,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		list_add(&page->lru, &page_pool);
 		if (page_idx == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
-		ret++;
+		nr_pages++;
 	}
 
 	/*
@@ -195,11 +195,11 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	if (ret)
-		read_pages(mapping, filp, &page_pool, ret, gfp_mask);
+	if (nr_pages)
+		read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
 	BUG_ON(!list_empty(&page_pool));
 out:
-	return ret;
+	return nr_pages;
 }
 
 /*
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/13] mm: return an unsigned int from __do_page_cache_readahead
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
  2018-05-30  9:58 ` [PATCH 01/13] block: add a lower-level bio_add_page interface Christoph Hellwig
  2018-05-30  9:58 ` [PATCH 02/13] mm: give the 'ret' variable a better name __do_page_cache_readahead Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:02   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 04/13] mm: split ->readpages calls to avoid non-contiguous pages lists Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

We never return an error, so switch to returning an unsigned int.  Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/internal.h  |  2 +-
 mm/readahead.c | 15 +++++----------
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 62d8c34e63d5..954003ac766a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -53,7 +53,7 @@ void unmap_page_range(struct mmu_gather *tlb,
 			     unsigned long addr, unsigned long end,
 			     struct zap_details *details);
 
-extern int __do_page_cache_readahead(struct address_space *mapping,
+extern unsigned int __do_page_cache_readahead(struct address_space *mapping,
 		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
 		unsigned long lookahead_size);
 
diff --git a/mm/readahead.c b/mm/readahead.c
index 16d0cb1e2616..fa4d4b767130 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -147,16 +147,16 @@ static int read_pages(struct address_space *mapping, struct file *filp,
  *
  * Returns the number of pages requested, or the maximum amount of I/O allowed.
  */
-int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			pgoff_t offset, unsigned long nr_to_read,
-			unsigned long lookahead_size)
+unsigned int __do_page_cache_readahead(struct address_space *mapping,
+		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
+		unsigned long lookahead_size)
 {
 	struct inode *inode = mapping->host;
 	struct page *page;
 	unsigned long end_index;	/* The last page we want to read */
 	LIST_HEAD(page_pool);
 	int page_idx;
-	int nr_pages = 0;
+	unsigned int nr_pages = 0;
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
 
@@ -223,16 +223,11 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
 	nr_to_read = min(nr_to_read, max_pages);
 	while (nr_to_read) {
-		int err;
-
 		unsigned long this_chunk = (2 * 1024 * 1024) / PAGE_SIZE;
 
 		if (this_chunk > nr_to_read)
 			this_chunk = nr_to_read;
-		err = __do_page_cache_readahead(mapping, filp,
-						offset, this_chunk, 0);
-		if (err < 0)
-			return err;
+		__do_page_cache_readahead(mapping, filp, offset, this_chunk, 0);
 
 		offset += this_chunk;
 		nr_to_read -= this_chunk;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/13] mm: split ->readpages calls to avoid non-contiguous pages lists
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 03/13] mm: return an unsigned int from __do_page_cache_readahead Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:02   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 05/13] iomap: inline data should be an iomap type, not a flag Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

That way file systems don't have to go spotting for non-contiguous pages
and work around them.  It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/readahead.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index fa4d4b767130..e273f0de3376 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -140,8 +140,8 @@ static int read_pages(struct address_space *mapping, struct file *filp,
 }
 
 /*
- * __do_page_cache_readahead() actually reads a chunk of disk.  It allocates all
- * the pages first, then submits them all for I/O. This avoids the very bad
+ * __do_page_cache_readahead() actually reads a chunk of disk.  It allocates
+ * the pages first, then submits them for I/O. This avoids the very bad
  * behaviour which would occur if page allocations are causing VM writeback.
  * We really don't want to intermingle reads and writes like that.
  *
@@ -177,8 +177,18 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->i_pages, page_offset);
 		rcu_read_unlock();
-		if (page && !radix_tree_exceptional_entry(page))
+		if (page && !radix_tree_exceptional_entry(page)) {
+			/*
+			 * Page already present?  Kick off the current batch of
+			 * contiguous pages before continuing with the next
+			 * batch.
+			 */
+			if (nr_pages)
+				read_pages(mapping, filp, &page_pool, nr_pages,
+						gfp_mask);
+			nr_pages = 0;
 			continue;
+		}
 
 		page = __page_cache_alloc(gfp_mask);
 		if (!page)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/13] iomap: inline data should be an iomap type, not a flag
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 04/13] mm: split ->readpages calls to avoid non-contiguous pages lists Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:05   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 06/13] iomap: fix the comment describing IOMAP_NOWAIT Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/ext4/inline.c      |  4 ++--
 fs/gfs2/bmap.c        |  3 +--
 fs/iomap.c            | 21 ++++++++++++---------
 include/linux/iomap.h |  2 +-
 4 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
 	iomap->offset = 0;
 	iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
 			      i_size_read(inode));
-	iomap->type = 0;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 
 out:
 	up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 278ed0869c3c..cbeedd3cfb36 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -680,8 +680,7 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
 	iomap->length = i_size_read(inode);
-	iomap->type = IOMAP_MAPPED;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
 }
 
 /**
diff --git a/fs/iomap.c b/fs/iomap.c
index f2456d0d8ddd..df2652b0d85d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -502,10 +502,13 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 	case IOMAP_DELALLOC:
 		flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
 		break;
+	case IOMAP_MAPPED:
+		break;
 	case IOMAP_UNWRITTEN:
 		flags |= FIEMAP_EXTENT_UNWRITTEN;
 		break;
-	case IOMAP_MAPPED:
+	case IOMAP_INLINE:
+		flags |= FIEMAP_EXTENT_DATA_INLINE;
 		break;
 	}
 
@@ -513,8 +516,6 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 		flags |= FIEMAP_EXTENT_MERGED;
 	if (iomap->flags & IOMAP_F_SHARED)
 		flags |= FIEMAP_EXTENT_SHARED;
-	if (iomap->flags & IOMAP_F_DATA_INLINE)
-		flags |= FIEMAP_EXTENT_DATA_INLINE;
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
 			iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
@@ -1214,14 +1215,16 @@ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
 	struct iomap_swapfile_info *isi = data;
 	int error;
 
-	/* No inline data. */
-	if (iomap->flags & IOMAP_F_DATA_INLINE) {
+	switch (iomap->type) {
+	case IOMAP_MAPPED:
+	case IOMAP_UNWRITTEN:
+		/* Only real or unwritten extents. */
+		break;
+	case IOMAP_INLINE:
+		/* No inline data. */
 		pr_err("swapon: file is inline\n");
 		return -EINVAL;
-	}
-
-	/* Only real or unwritten extents. */
-	if (iomap->type != IOMAP_MAPPED && iomap->type != IOMAP_UNWRITTEN) {
+	default:
 		pr_err("swapon: file has unallocated extents\n");
 		return -EINVAL;
 	}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4bd87294219a..8f7095fc514e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
 #define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE	0x05	/* data inline in the inode */
 
 /*
  * Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
  */
 #define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED		0x20	/* block shared with another file */
-#define IOMAP_F_DATA_INLINE	0x40	/* data inline in the inode */
 
 /*
  * Magic value for addr:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/13] iomap: fix the comment describing IOMAP_NOWAIT
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 05/13] iomap: inline data should be an iomap type, not a flag Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:05   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 07/13] iomap: move IOMAP_F_BOUNDARY to gfs2 Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/linux/iomap.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8f7095fc514e..13d19b4c29a9 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -59,7 +59,7 @@ struct iomap {
 #define IOMAP_REPORT		(1 << 2) /* report extent status, e.g. FIEMAP */
 #define IOMAP_FAULT		(1 << 3) /* mapping for page fault */
 #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
-#define IOMAP_NOWAIT		(1 << 5) /* Don't wait for writeback */
+#define IOMAP_NOWAIT		(1 << 5) /* do not block */
 
 struct iomap_ops {
 	/*
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/13] iomap: move IOMAP_F_BOUNDARY to gfs2
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 06/13] iomap: fix the comment describing IOMAP_NOWAIT Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:08   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 08/13] iomap: use __bio_add_page in iomap_dio_zero Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Just define a range of fs specific flags and use that in gfs2 instead of
exposing this internal flag flobally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/gfs2/bmap.c        | 8 +++++---
 include/linux/iomap.h | 9 +++++++--
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index cbeedd3cfb36..8efa6297e19c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -683,6 +683,8 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 	iomap->type = IOMAP_INLINE;
 }
 
+#define IOMAP_F_GFS2_BOUNDARY IOMAP_F_PRIVATE
+
 /**
  * gfs2_iomap_begin - Map blocks from an inode to disk blocks
  * @inode: The inode
@@ -774,7 +776,7 @@ int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 	bh = mp.mp_bh[ip->i_height - 1];
 	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, lend - lblock, &eob);
 	if (eob)
-		iomap->flags |= IOMAP_F_BOUNDARY;
+		iomap->flags |= IOMAP_F_GFS2_BOUNDARY;
 	iomap->length = (u64)len << inode->i_blkbits;
 
 out_release:
@@ -846,12 +848,12 @@ int gfs2_block_map(struct inode *inode, sector_t lblock,
 
 	if (iomap.length > bh_map->b_size) {
 		iomap.length = bh_map->b_size;
-		iomap.flags &= ~IOMAP_F_BOUNDARY;
+		iomap.flags &= ~IOMAP_F_GFS2_BOUNDARY;
 	}
 	if (iomap.addr != IOMAP_NULL_ADDR)
 		map_bh(bh_map, inode->i_sb, iomap.addr >> inode->i_blkbits);
 	bh_map->b_size = iomap.length;
-	if (iomap.flags & IOMAP_F_BOUNDARY)
+	if (iomap.flags & IOMAP_F_GFS2_BOUNDARY)
 		set_buffer_boundary(bh_map);
 	if (iomap.flags & IOMAP_F_NEW)
 		set_buffer_new(bh_map);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 13d19b4c29a9..819e0cd2a950 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -27,8 +27,7 @@ struct vm_fault;
  * written data and requires fdatasync to commit them to persistent storage.
  */
 #define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
-#define IOMAP_F_BOUNDARY	0x02	/* mapping ends at metadata boundary */
-#define IOMAP_F_DIRTY		0x04	/* uncommitted metadata */
+#define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
@@ -36,6 +35,12 @@ struct vm_fault;
 #define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED		0x20	/* block shared with another file */
 
+/*
+ * Flags from 0x1000 up are for file system specific usage:
+ */
+#define IOMAP_F_PRIVATE		0x1000
+
+
 /*
  * Magic value for addr:
  */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/13] iomap: use __bio_add_page in iomap_dio_zero
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 07/13] iomap: move IOMAP_F_BOUNDARY to gfs2 Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:09   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 09/13] iomap: add a iomap_sector helper Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

We don't need any merging logic, and this also replaces a BUG_ON with a
WARN_ON_ONCE inside __bio_add_page for the impossible overflow condition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index df2652b0d85d..85901b449146 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -845,8 +845,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
 	get_page(page);
-	if (bio_add_page(bio, page, len, 0) != len)
-		BUG();
+	__bio_add_page(bio, page, len, 0);
 	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE);
 
 	atomic_inc(&dio->ref);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/13] iomap: add a iomap_sector helper
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 08/13] iomap: use __bio_add_page in iomap_dio_zero Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:10   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 10/13] iomap: add an iomap-based bmap implementation Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Factor the repeated calculation of the on-disk sector for a given logical
block into a littler helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 85901b449146..74cdf8b5bbb0 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -96,6 +96,12 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	return written ? written : ret;
 }
 
+static sector_t
+iomap_sector(struct iomap *iomap, loff_t pos)
+{
+	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
+}
+
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
@@ -353,11 +359,8 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 		struct iomap *iomap)
 {
-	sector_t sector = (iomap->addr +
-			   (pos & PAGE_MASK) - iomap->offset) >> 9;
-
-	return __dax_zero_page_range(iomap->bdev, iomap->dax_dev, sector,
-			offset, bytes);
+	return __dax_zero_page_range(iomap->bdev, iomap->dax_dev,
+			iomap_sector(iomap, pos & PAGE_MASK), offset, bytes);
 }
 
 static loff_t
@@ -839,8 +842,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
 
 	bio = bio_alloc(GFP_KERNEL, 1);
 	bio_set_dev(bio, iomap->bdev);
-	bio->bi_iter.bi_sector =
-		(iomap->addr + pos - iomap->offset) >> 9;
+	bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
@@ -934,8 +936,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
 
 		bio = bio_alloc(GFP_KERNEL, nr_pages);
 		bio_set_dev(bio, iomap->bdev);
-		bio->bi_iter.bi_sector =
-			(iomap->addr + pos - iomap->offset) >> 9;
+		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
 		bio->bi_write_hint = dio->iocb->ki_hint;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/13] iomap: add an iomap-based bmap implementation
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 09/13] iomap: add a iomap_sector helper Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:11   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

This adds a simple iomap-based implementation of the legacy ->bmap
interface.  Note that we can't easily add checks for rt or reflink
files, so these will have to remain in the callers.  This interface
just needs to die..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c            | 34 ++++++++++++++++++++++++++++++++++
 include/linux/iomap.h |  3 +++
 2 files changed, 37 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index 74cdf8b5bbb0..b0bc928672af 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1307,3 +1307,37 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
 }
 EXPORT_SYMBOL_GPL(iomap_swapfile_activate);
 #endif /* CONFIG_SWAP */
+
+static loff_t
+iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
+		void *data, struct iomap *iomap)
+{
+	sector_t *bno = data, addr;
+
+	if (iomap->type == IOMAP_MAPPED) {
+		addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
+		if (addr > INT_MAX)
+			WARN(1, "would truncate bmap result\n");
+		else
+			*bno = addr;
+	}
+	return 0;
+}
+
+/* legacy ->bmap interface.  0 is the error return (!) */
+sector_t
+iomap_bmap(struct address_space *mapping, sector_t bno,
+		const struct iomap_ops *ops)
+{
+	struct inode *inode = mapping->host;
+	loff_t pos = bno >> inode->i_blkbits;
+	unsigned blocksize = i_blocksize(inode);
+
+	if (filemap_write_and_wait(mapping))
+		return 0;
+
+	bno = 0;
+	iomap_apply(inode, pos, blocksize, 0, ops, &bno, iomap_bmap_actor);
+	return bno;
+}
+EXPORT_SYMBOL_GPL(iomap_bmap);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 819e0cd2a950..a044a824da85 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 
+struct address_space;
 struct fiemap_extent_info;
 struct inode;
 struct iov_iter;
@@ -100,6 +101,8 @@ loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
 		const struct iomap_ops *ops);
 loff_t iomap_seek_data(struct inode *inode, loff_t offset,
 		const struct iomap_ops *ops);
+sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
+		const struct iomap_ops *ops);
 
 /*
  * Flags for direct I/O ->end_io:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 10/13] iomap: add an iomap-based bmap implementation Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 16:22   ` Darrick J. Wong
  2018-05-30 23:45   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 12/13] xfs: use iomap_bmap Christoph Hellwig
  2018-05-30  9:58 ` [PATCH 13/13] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
  12 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Simply use iomap_apply to iterate over the file and a submit a bio for
each non-uptodate but mapped region and zero everything else.  Note that
as-is this can not be used for file systems with a blocksize smaller than
the page size, but that support will be added later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c            | 203 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/iomap.h |   4 +
 2 files changed, 206 insertions(+), 1 deletion(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index b0bc928672af..5e5a266e3325 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) 2010 Red Hat, Inc.
- * Copyright (c) 2016 Christoph Hellwig.
+ * Copyright (c) 2016-2018 Christoph Hellwig.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 #include <linux/gfp.h>
 #include <linux/mm.h>
+#include <linux/mm_inline.h>
 #include <linux/swap.h>
 #include <linux/pagemap.h>
 #include <linux/file.h>
@@ -102,6 +103,206 @@ iomap_sector(struct iomap *iomap, loff_t pos)
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
 }
 
+static void
+iomap_read_end_io(struct bio *bio)
+{
+	int error = blk_status_to_errno(bio->bi_status);
+	struct bio_vec *bvec;
+	int i;
+
+	bio_for_each_segment_all(bvec, bio, i)
+		page_endio(bvec->bv_page, false, error);
+	bio_put(bio);
+}
+
+struct iomap_readpage_ctx {
+	struct page		*cur_page;
+	bool			cur_page_in_bio;
+	bool			is_readahead;
+	struct bio		*bio;
+	struct list_head	*pages;
+};
+
+static loff_t
+iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
+		struct iomap *iomap)
+{
+	struct iomap_readpage_ctx *ctx = data;
+	struct page *page = ctx->cur_page;
+	unsigned poff = pos & (PAGE_SIZE - 1);
+	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+	bool is_contig = false;
+	sector_t sector;
+
+	/* we don't support blocksize < PAGE_SIZE quite yet: */
+	WARN_ON_ONCE(pos != page_offset(page));
+	WARN_ON_ONCE(plen != PAGE_SIZE);
+
+	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
+		zero_user(page, poff, plen);
+		SetPageUptodate(page);
+		goto done;
+	}
+
+	ctx->cur_page_in_bio = true;
+
+	/*
+	 * Try to merge into a previous segment if we can.
+	 */
+	sector = iomap_sector(iomap, pos);
+	if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
+		if (__bio_try_merge_page(ctx->bio, page, plen, poff))
+			goto done;
+		is_contig = true;
+	}
+
+	if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
+		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
+		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+		if (ctx->bio)
+			submit_bio(ctx->bio);
+
+		if (ctx->is_readahead) /* same as readahead_gfp_mask */
+			gfp |= __GFP_NORETRY | __GFP_NOWARN;
+		ctx->bio = bio_alloc(gfp, min(BIO_MAX_PAGES, nr_vecs));
+		ctx->bio->bi_opf = REQ_OP_READ;
+		if (ctx->is_readahead)
+			ctx->bio->bi_opf |= REQ_RAHEAD;
+		ctx->bio->bi_iter.bi_sector = sector;
+		bio_set_dev(ctx->bio, iomap->bdev);
+		ctx->bio->bi_end_io = iomap_read_end_io;
+	}
+
+	__bio_add_page(ctx->bio, page, plen, poff);
+done:
+	return plen;
+}
+
+int
+iomap_readpage(struct page *page, const struct iomap_ops *ops)
+{
+	struct iomap_readpage_ctx ctx = { .cur_page = page };
+	struct inode *inode = page->mapping->host;
+	unsigned poff;
+	loff_t ret;
+
+	WARN_ON_ONCE(page_has_buffers(page));
+
+	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
+		ret = iomap_apply(inode, page_offset(page) + poff,
+				PAGE_SIZE - poff, 0, ops, &ctx,
+				iomap_readpage_actor);
+		if (ret <= 0) {
+			WARN_ON_ONCE(ret == 0);
+			SetPageError(page);
+			break;
+		}
+	}
+
+	if (ctx.bio) {
+		submit_bio(ctx.bio);
+		WARN_ON_ONCE(!ctx.cur_page_in_bio);
+	} else {
+		WARN_ON_ONCE(ctx.cur_page_in_bio);
+		unlock_page(page);
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iomap_readpage);
+
+static struct page *
+iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
+		loff_t length, loff_t *done)
+{
+	while (!list_empty(pages)) {
+		struct page *page = lru_to_page(pages);
+
+		if (page_offset(page) >= (u64)pos + length)
+			break;
+
+		list_del(&page->lru);
+		if (!add_to_page_cache_lru(page, inode->i_mapping, page->index,
+				GFP_NOFS))
+			return page;
+
+		/*
+		 * If we already have a page in the page cache at index we are
+		 * done.  Upper layers don't care if it is uptodate after the
+		 * readpages call itself as every page gets checked again once
+		 * actually needed.
+		 */
+		*done += PAGE_SIZE;
+		put_page(page);
+	}
+
+	return NULL;
+}
+
+static loff_t
+iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
+		void *data, struct iomap *iomap)
+{
+	struct iomap_readpage_ctx *ctx = data;
+	loff_t done, ret;
+
+	for (done = 0; done < length; done += ret) {
+		if (ctx->cur_page && ((pos + done) & (PAGE_SIZE - 1)) == 0) {
+			if (!ctx->cur_page_in_bio)
+				unlock_page(ctx->cur_page);
+			put_page(ctx->cur_page);
+			ctx->cur_page = NULL;
+		}
+		if (!ctx->cur_page) {
+			ctx->cur_page = iomap_next_page(inode, ctx->pages,
+					pos, length, &done);
+			if (!ctx->cur_page)
+				break;
+			ctx->cur_page_in_bio = false;
+		}
+		ret = iomap_readpage_actor(inode, pos + done, length - done,
+				ctx, iomap);
+	}
+
+	return done;
+}
+
+int
+iomap_readpages(struct address_space *mapping, struct list_head *pages,
+		unsigned nr_pages, const struct iomap_ops *ops)
+{
+	struct iomap_readpage_ctx ctx = {
+		.pages		= pages,
+		.is_readahead	= true,
+	};
+	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
+	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
+	loff_t length = last - pos + PAGE_SIZE, ret = 0;
+
+	while (length > 0) {
+		ret = iomap_apply(mapping->host, pos, length, 0, ops,
+				&ctx, iomap_readpages_actor);
+		if (ret <= 0) {
+			WARN_ON_ONCE(ret == 0);
+			goto done;
+		}
+		pos += ret;
+		length -= ret;
+	}
+	ret = 0;
+done:
+	if (ctx.bio)
+		submit_bio(ctx.bio);
+	if (ctx.cur_page) {
+		if (!ctx.cur_page_in_bio)
+			unlock_page(ctx.cur_page);
+		put_page(ctx.cur_page);
+	}
+	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iomap_readpages);
+
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a044a824da85..7300d30ca495 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -9,6 +9,7 @@ struct fiemap_extent_info;
 struct inode;
 struct iov_iter;
 struct kiocb;
+struct page;
 struct vm_area_struct;
 struct vm_fault;
 
@@ -88,6 +89,9 @@ struct iomap_ops {
 
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
+int iomap_readpage(struct page *page, const struct iomap_ops *ops);
+int iomap_readpages(struct address_space *mapping, struct list_head *pages,
+		unsigned nr_pages, const struct iomap_ops *ops);
 int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/13] xfs: use iomap_bmap
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:46   ` Dave Chinner
  2018-05-30  9:58 ` [PATCH 13/13] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Switch to the iomap based bmap implementation to get rid of one of the
last users of xfs_get_blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 80de476cecf8..56e405572909 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1378,10 +1378,9 @@ xfs_vm_bmap(
 	struct address_space	*mapping,
 	sector_t		block)
 {
-	struct inode		*inode = (struct inode *)mapping->host;
-	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_inode	*ip = XFS_I(mapping->host);
 
-	trace_xfs_vm_bmap(XFS_I(inode));
+	trace_xfs_vm_bmap(ip);
 
 	/*
 	 * The swap code (ab-)uses ->bmap to get a block mapping and then
@@ -1394,9 +1393,7 @@ xfs_vm_bmap(
 	 */
 	if (xfs_is_reflink_inode(ip) || XFS_IS_REALTIME_INODE(ip))
 		return 0;
-
-	filemap_write_and_wait(mapping);
-	return generic_block_bmap(mapping, block, xfs_get_blocks);
+	return iomap_bmap(mapping, block, &xfs_iomap_ops);
 }
 
 STATIC int
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/13] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages
  2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2018-05-30  9:58 ` [PATCH 12/13] xfs: use iomap_bmap Christoph Hellwig
@ 2018-05-30  9:58 ` Christoph Hellwig
  2018-05-30 23:47   ` Dave Chinner
  12 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

For file systems with a block size that equals the page size we never do
partial reads, so we can use the buffer_head-less iomap versions of
readpage and readpages without conflicting with the buffer_head structures
create later in write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 56e405572909..c631c457b444 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1402,6 +1402,8 @@ xfs_vm_readpage(
 	struct page		*page)
 {
 	trace_xfs_vm_readpage(page->mapping->host, 1);
+	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
+		return iomap_readpage(page, &xfs_iomap_ops);
 	return mpage_readpage(page, xfs_get_blocks);
 }
 
@@ -1413,6 +1415,8 @@ xfs_vm_readpages(
 	unsigned		nr_pages)
 {
 	trace_xfs_vm_readpages(mapping->host, nr_pages);
+	if (i_blocksize(mapping->host) == PAGE_SIZE)
+		return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
 	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
 }
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/13] block: add a lower-level bio_add_page interface
  2018-05-30  9:58 ` [PATCH 01/13] block: add a lower-level bio_add_page interface Christoph Hellwig
@ 2018-05-30 10:12   ` Ming Lei
  0 siblings, 0 replies; 32+ messages in thread
From: Ming Lei @ 2018-05-30 10:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: open list:XFS FILESYSTEM, Linux FS Devel, linux-mm

On Wed, May 30, 2018 at 5:58 PM, Christoph Hellwig <hch@lst.de> wrote:
> For the upcoming removal of buffer heads in XFS we need to keep track of
> the number of outstanding writeback requests per page.  For this we need
> to know if bio_add_page merged a region with the previous bvec or not.
> Instead of adding additional arguments this refactors bio_add_page to
> be implemented using three lower level helpers which users like XFS can
> use directly if they care about the merge decisions.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jens Axboe <axboe@kernel.dk>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  block/bio.c         | 96 +++++++++++++++++++++++++++++----------------
>  include/linux/bio.h |  9 +++++
>  2 files changed, 72 insertions(+), 33 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 53e0f0a1ed94..fdf635d42bbd 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -773,7 +773,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
>                         return 0;
>         }
>
> -       if (bio->bi_vcnt >= bio->bi_max_vecs)
> +       if (bio_full(bio))
>                 return 0;
>
>         /*
> @@ -821,52 +821,82 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
>  EXPORT_SYMBOL(bio_add_pc_page);
>
>  /**
> - *     bio_add_page    -       attempt to add page to bio
> - *     @bio: destination bio
> - *     @page: page to add
> - *     @len: vec entry length
> - *     @offset: vec entry offset
> + * __bio_try_merge_page - try appending data to an existing bvec.
> + * @bio: destination bio
> + * @page: page to add
> + * @len: length of the data to add
> + * @off: offset of the data in @page
>   *
> - *     Attempt to add a page to the bio_vec maplist. This will only fail
> - *     if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
> + * Try to add the data at @page + @off to the last bvec of @bio.  This is a
> + * a useful optimisation for file systems with a block size smaller than the
> + * page size.
> + *
> + * Return %true on success or %false on failure.
>   */
> -int bio_add_page(struct bio *bio, struct page *page,
> -                unsigned int len, unsigned int offset)
> +bool __bio_try_merge_page(struct bio *bio, struct page *page,
> +               unsigned int len, unsigned int off)
>  {
> -       struct bio_vec *bv;
> -
> -       /*
> -        * cloned bio must not modify vec list
> -        */
>         if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
> -               return 0;
> +               return false;
>
> -       /*
> -        * For filesystems with a blocksize smaller than the pagesize
> -        * we will often be called with the same page as last time and
> -        * a consecutive offset.  Optimize this special case.
> -        */
>         if (bio->bi_vcnt > 0) {
> -               bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
> +               struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
>
> -               if (page == bv->bv_page &&
> -                   offset == bv->bv_offset + bv->bv_len) {
> +               if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
>                         bv->bv_len += len;
> -                       goto done;
> +                       bio->bi_iter.bi_size += len;
> +                       return true;
>                 }
>         }
> +       return false;
> +}
> +EXPORT_SYMBOL_GPL(__bio_try_merge_page);
>
> -       if (bio->bi_vcnt >= bio->bi_max_vecs)
> -               return 0;
> +/**
> + * __bio_add_page - add page to a bio in a new segment
> + * @bio: destination bio
> + * @page: page to add
> + * @len: length of the data to add
> + * @off: offset of the data in @page
> + *
> + * Add the data at @page + @off to @bio as a new bvec.  The caller must ensure
> + * that @bio has space for another bvec.
> + */
> +void __bio_add_page(struct bio *bio, struct page *page,
> +               unsigned int len, unsigned int off)
> +{
> +       struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
>
> -       bv              = &bio->bi_io_vec[bio->bi_vcnt];
> -       bv->bv_page     = page;
> -       bv->bv_len      = len;
> -       bv->bv_offset   = offset;
> +       WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
> +       WARN_ON_ONCE(bio_full(bio));
> +
> +       bv->bv_page = page;
> +       bv->bv_offset = off;
> +       bv->bv_len = len;
>
> -       bio->bi_vcnt++;
> -done:
>         bio->bi_iter.bi_size += len;
> +       bio->bi_vcnt++;
> +}
> +EXPORT_SYMBOL_GPL(__bio_add_page);
> +
> +/**
> + *     bio_add_page    -       attempt to add page to bio
> + *     @bio: destination bio
> + *     @page: page to add
> + *     @len: vec entry length
> + *     @offset: vec entry offset
> + *
> + *     Attempt to add a page to the bio_vec maplist. This will only fail
> + *     if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
> + */
> +int bio_add_page(struct bio *bio, struct page *page,
> +                unsigned int len, unsigned int offset)
> +{
> +       if (!__bio_try_merge_page(bio, page, len, offset)) {
> +               if (bio_full(bio))
> +                       return 0;
> +               __bio_add_page(bio, page, len, offset);
> +       }
>         return len;
>  }
>  EXPORT_SYMBOL(bio_add_page);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index ce547a25e8ae..3e73c8bc25ea 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -123,6 +123,11 @@ static inline void *bio_data(struct bio *bio)
>         return NULL;
>  }
>
> +static inline bool bio_full(struct bio *bio)
> +{
> +       return bio->bi_vcnt >= bio->bi_max_vecs;
> +}
> +
>  /*
>   * will die
>   */
> @@ -470,6 +475,10 @@ void bio_chain(struct bio *, struct bio *);
>  extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
>  extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
>                            unsigned int, unsigned int);
> +bool __bio_try_merge_page(struct bio *bio, struct page *page,
> +               unsigned int len, unsigned int off);
> +void __bio_add_page(struct bio *bio, struct page *page,
> +               unsigned int len, unsigned int off);
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
>  struct rq_map_data;
>  extern struct bio *bio_map_user_iov(struct request_queue *,
> --
> 2.17.0
>

Looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation
  2018-05-30  9:58 ` [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
@ 2018-05-30 16:22   ` Darrick J. Wong
  2018-05-30 23:45   ` Dave Chinner
  1 sibling, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2018-05-30 16:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:11AM +0200, Christoph Hellwig wrote:
> Simply use iomap_apply to iterate over the file and a submit a bio for
> each non-uptodate but mapped region and zero everything else.  Note that
> as-is this can not be used for file systems with a blocksize smaller than
> the page size, but that support will be added later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap.c            | 203 +++++++++++++++++++++++++++++++++++++++++-
>  include/linux/iomap.h |   4 +
>  2 files changed, 206 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index b0bc928672af..5e5a266e3325 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (C) 2010 Red Hat, Inc.
> - * Copyright (c) 2016 Christoph Hellwig.
> + * Copyright (c) 2016-2018 Christoph Hellwig.
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms and conditions of the GNU General Public License,
> @@ -18,6 +18,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/gfp.h>
>  #include <linux/mm.h>
> +#include <linux/mm_inline.h>
>  #include <linux/swap.h>
>  #include <linux/pagemap.h>
>  #include <linux/file.h>
> @@ -102,6 +103,206 @@ iomap_sector(struct iomap *iomap, loff_t pos)
>  	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>  }
>  
> +static void
> +iomap_read_end_io(struct bio *bio)
> +{
> +	int error = blk_status_to_errno(bio->bi_status);
> +	struct bio_vec *bvec;
> +	int i;
> +
> +	bio_for_each_segment_all(bvec, bio, i)
> +		page_endio(bvec->bv_page, false, error);
> +	bio_put(bio);
> +}
> +
> +struct iomap_readpage_ctx {
> +	struct page		*cur_page;
> +	bool			cur_page_in_bio;
> +	bool			is_readahead;
> +	struct bio		*bio;
> +	struct list_head	*pages;
> +};
> +
> +static loff_t
> +iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> +		struct iomap *iomap)
> +{
> +	struct iomap_readpage_ctx *ctx = data;
> +	struct page *page = ctx->cur_page;
> +	unsigned poff = pos & (PAGE_SIZE - 1);
> +	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
> +	bool is_contig = false;
> +	sector_t sector;
> +
> +	/* we don't support blocksize < PAGE_SIZE quite yet: */
> +	WARN_ON_ONCE(pos != page_offset(page));
> +	WARN_ON_ONCE(plen != PAGE_SIZE);
> +
> +	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
> +		zero_user(page, poff, plen);
> +		SetPageUptodate(page);
> +		goto done;
> +	}
> +
> +	ctx->cur_page_in_bio = true;
> +
> +	/*
> +	 * Try to merge into a previous segment if we can.
> +	 */
> +	sector = iomap_sector(iomap, pos);
> +	if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
> +		if (__bio_try_merge_page(ctx->bio, page, plen, poff))
> +			goto done;
> +		is_contig = true;
> +	}
> +
> +	if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
> +		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
> +		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +
> +		if (ctx->bio)
> +			submit_bio(ctx->bio);
> +
> +		if (ctx->is_readahead) /* same as readahead_gfp_mask */
> +			gfp |= __GFP_NORETRY | __GFP_NOWARN;
> +		ctx->bio = bio_alloc(gfp, min(BIO_MAX_PAGES, nr_vecs));
> +		ctx->bio->bi_opf = REQ_OP_READ;
> +		if (ctx->is_readahead)
> +			ctx->bio->bi_opf |= REQ_RAHEAD;
> +		ctx->bio->bi_iter.bi_sector = sector;
> +		bio_set_dev(ctx->bio, iomap->bdev);
> +		ctx->bio->bi_end_io = iomap_read_end_io;
> +	}
> +
> +	__bio_add_page(ctx->bio, page, plen, poff);
> +done:
> +	return plen;
> +}
> +
> +int
> +iomap_readpage(struct page *page, const struct iomap_ops *ops)
> +{
> +	struct iomap_readpage_ctx ctx = { .cur_page = page };
> +	struct inode *inode = page->mapping->host;
> +	unsigned poff;
> +	loff_t ret;
> +
> +	WARN_ON_ONCE(page_has_buffers(page));
> +
> +	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
> +		ret = iomap_apply(inode, page_offset(page) + poff,
> +				PAGE_SIZE - poff, 0, ops, &ctx,
> +				iomap_readpage_actor);
> +		if (ret <= 0) {
> +			WARN_ON_ONCE(ret == 0);
> +			SetPageError(page);
> +			break;
> +		}
> +	}
> +
> +	if (ctx.bio) {
> +		submit_bio(ctx.bio);
> +		WARN_ON_ONCE(!ctx.cur_page_in_bio);
> +	} else {
> +		WARN_ON_ONCE(ctx.cur_page_in_bio);
> +		unlock_page(page);
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iomap_readpage);
> +
> +static struct page *
> +iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
> +		loff_t length, loff_t *done)
> +{
> +	while (!list_empty(pages)) {
> +		struct page *page = lru_to_page(pages);
> +
> +		if (page_offset(page) >= (u64)pos + length)
> +			break;
> +
> +		list_del(&page->lru);
> +		if (!add_to_page_cache_lru(page, inode->i_mapping, page->index,
> +				GFP_NOFS))
> +			return page;
> +
> +		/*
> +		 * If we already have a page in the page cache at index we are
> +		 * done.  Upper layers don't care if it is uptodate after the
> +		 * readpages call itself as every page gets checked again once
> +		 * actually needed.
> +		 */
> +		*done += PAGE_SIZE;
> +		put_page(page);
> +	}
> +
> +	return NULL;
> +}
> +
> +static loff_t
> +iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> +		void *data, struct iomap *iomap)
> +{
> +	struct iomap_readpage_ctx *ctx = data;
> +	loff_t done, ret;
> +
> +	for (done = 0; done < length; done += ret) {
> +		if (ctx->cur_page && ((pos + done) & (PAGE_SIZE - 1)) == 0) {
> +			if (!ctx->cur_page_in_bio)
> +				unlock_page(ctx->cur_page);
> +			put_page(ctx->cur_page);
> +			ctx->cur_page = NULL;
> +		}
> +		if (!ctx->cur_page) {
> +			ctx->cur_page = iomap_next_page(inode, ctx->pages,
> +					pos, length, &done);
> +			if (!ctx->cur_page)
> +				break;
> +			ctx->cur_page_in_bio = false;
> +		}
> +		ret = iomap_readpage_actor(inode, pos + done, length - done,
> +				ctx, iomap);
> +	}
> +
> +	return done;
> +}
> +
> +int
> +iomap_readpages(struct address_space *mapping, struct list_head *pages,
> +		unsigned nr_pages, const struct iomap_ops *ops)
> +{
> +	struct iomap_readpage_ctx ctx = {
> +		.pages		= pages,
> +		.is_readahead	= true,
> +	};
> +	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
> +	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
> +	loff_t length = last - pos + PAGE_SIZE, ret = 0;
> +
> +	while (length > 0) {
> +		ret = iomap_apply(mapping->host, pos, length, 0, ops,
> +				&ctx, iomap_readpages_actor);
> +		if (ret <= 0) {
> +			WARN_ON_ONCE(ret == 0);
> +			goto done;
> +		}
> +		pos += ret;
> +		length -= ret;
> +	}
> +	ret = 0;
> +done:
> +	if (ctx.bio)
> +		submit_bio(ctx.bio);
> +	if (ctx.cur_page) {
> +		if (!ctx.cur_page_in_bio)
> +			unlock_page(ctx.cur_page);
> +		put_page(ctx.cur_page);
> +	}
> +	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iomap_readpages);
> +
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index a044a824da85..7300d30ca495 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -9,6 +9,7 @@ struct fiemap_extent_info;
>  struct inode;
>  struct iov_iter;
>  struct kiocb;
> +struct page;
>  struct vm_area_struct;
>  struct vm_fault;
>  
> @@ -88,6 +89,9 @@ struct iomap_ops {
>  
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
> +int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> +int iomap_readpages(struct address_space *mapping, struct list_head *pages,
> +		unsigned nr_pages, const struct iomap_ops *ops);
>  int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>  		const struct iomap_ops *ops);
>  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/13] mm: give the 'ret' variable a better name __do_page_cache_readahead
  2018-05-30  9:58 ` [PATCH 02/13] mm: give the 'ret' variable a better name __do_page_cache_readahead Christoph Hellwig
@ 2018-05-30 23:01   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:02AM +0200, Christoph Hellwig wrote:
> It counts the number of pages acted on, so name it nr_pages to make that
> obvious.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

*nod*

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/13] mm: return an unsigned int from __do_page_cache_readahead
  2018-05-30  9:58 ` [PATCH 03/13] mm: return an unsigned int from __do_page_cache_readahead Christoph Hellwig
@ 2018-05-30 23:02   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:03AM +0200, Christoph Hellwig wrote:
> We never return an error, so switch to returning an unsigned int.  Most
> callers already did implicit casts to an unsigned type, and the one that
> didn't can be simplified now.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Makes sense.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/13] mm: split ->readpages calls to avoid non-contiguous pages lists
  2018-05-30  9:58 ` [PATCH 04/13] mm: split ->readpages calls to avoid non-contiguous pages lists Christoph Hellwig
@ 2018-05-30 23:02   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:04AM +0200, Christoph Hellwig wrote:
> That way file systems don't have to go spotting for non-contiguous pages
> and work around them.  It also kicks off I/O earlier, allowing it to
> finish earlier and reduce latency.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/13] iomap: inline data should be an iomap type, not a flag
  2018-05-30  9:58 ` [PATCH 05/13] iomap: inline data should be an iomap type, not a flag Christoph Hellwig
@ 2018-05-30 23:05   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:05AM +0200, Christoph Hellwig wrote:
> Inline data is fundamentally different from our normal mapped case in that
> it doesn't even have a block address.  So instead of having a flag for it
> it should be an entirely separate iomap range type.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/13] iomap: fix the comment describing IOMAP_NOWAIT
  2018-05-30  9:58 ` [PATCH 06/13] iomap: fix the comment describing IOMAP_NOWAIT Christoph Hellwig
@ 2018-05-30 23:05   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:06AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

simple one :)

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/13] iomap: move IOMAP_F_BOUNDARY to gfs2
  2018-05-30  9:58 ` [PATCH 07/13] iomap: move IOMAP_F_BOUNDARY to gfs2 Christoph Hellwig
@ 2018-05-30 23:08   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:07AM +0200, Christoph Hellwig wrote:
> Just define a range of fs specific flags and use that in gfs2 instead of
> exposing this internal flag flobally.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Makes sense to have a private range for the flags - cleans this up
nicely.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/13] iomap: use __bio_add_page in iomap_dio_zero
  2018-05-30  9:58 ` [PATCH 08/13] iomap: use __bio_add_page in iomap_dio_zero Christoph Hellwig
@ 2018-05-30 23:09   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:08AM +0200, Christoph Hellwig wrote:
> We don't need any merging logic, and this also replaces a BUG_ON with a
> WARN_ON_ONCE inside __bio_add_page for the impossible overflow condition.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] iomap: add a iomap_sector helper
  2018-05-30  9:58 ` [PATCH 09/13] iomap: add a iomap_sector helper Christoph Hellwig
@ 2018-05-30 23:10   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:09AM +0200, Christoph Hellwig wrote:
> Factor the repeated calculation of the on-disk sector for a given logical
> block into a littler helper.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/13] iomap: add an iomap-based bmap implementation
  2018-05-30  9:58 ` [PATCH 10/13] iomap: add an iomap-based bmap implementation Christoph Hellwig
@ 2018-05-30 23:11   ` Dave Chinner
  2018-05-31  6:07     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:10AM +0200, Christoph Hellwig wrote:
> This adds a simple iomap-based implementation of the legacy ->bmap
> interface.  Note that we can't easily add checks for rt or reflink
> files, so these will have to remain in the callers.  This interface
> just needs to die..
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/iomap.c            | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/iomap.h |  3 +++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 74cdf8b5bbb0..b0bc928672af 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1307,3 +1307,37 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
>  }
>  EXPORT_SYMBOL_GPL(iomap_swapfile_activate);
>  #endif /* CONFIG_SWAP */
> +
> +static loff_t
> +iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> +		void *data, struct iomap *iomap)
> +{
> +	sector_t *bno = data, addr;

Can you split these? maybe scope addr insie the if() branch it is
used in?

Otherwise looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation
  2018-05-30  9:58 ` [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
  2018-05-30 16:22   ` Darrick J. Wong
@ 2018-05-30 23:45   ` Dave Chinner
  2018-05-31  6:13     ` Christoph Hellwig
  1 sibling, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:11AM +0200, Christoph Hellwig wrote:
> Simply use iomap_apply to iterate over the file and a submit a bio for
> each non-uptodate but mapped region and zero everything else.  Note that
> as-is this can not be used for file systems with a blocksize smaller than
> the page size, but that support will be added later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap.c            | 203 +++++++++++++++++++++++++++++++++++++++++-
>  include/linux/iomap.h |   4 +
>  2 files changed, 206 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index b0bc928672af..5e5a266e3325 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (C) 2010 Red Hat, Inc.
> - * Copyright (c) 2016 Christoph Hellwig.
> + * Copyright (c) 2016-2018 Christoph Hellwig.
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms and conditions of the GNU General Public License,
> @@ -18,6 +18,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/gfp.h>
>  #include <linux/mm.h>
> +#include <linux/mm_inline.h>
>  #include <linux/swap.h>
>  #include <linux/pagemap.h>
>  #include <linux/file.h>
> @@ -102,6 +103,206 @@ iomap_sector(struct iomap *iomap, loff_t pos)
>  	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>  }
>  
> +static void
> +iomap_read_end_io(struct bio *bio)
> +{
> +	int error = blk_status_to_errno(bio->bi_status);
> +	struct bio_vec *bvec;
> +	int i;
> +
> +	bio_for_each_segment_all(bvec, bio, i)
> +		page_endio(bvec->bv_page, false, error);
> +	bio_put(bio);
> +}
> +
> +struct iomap_readpage_ctx {
> +	struct page		*cur_page;
> +	bool			cur_page_in_bio;
> +	bool			is_readahead;
> +	struct bio		*bio;
> +	struct list_head	*pages;
> +};
> +
> +static loff_t
> +iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> +		struct iomap *iomap)
> +{
> +	struct iomap_readpage_ctx *ctx = data;
> +	struct page *page = ctx->cur_page;
> +	unsigned poff = pos & (PAGE_SIZE - 1);
> +	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
> +	bool is_contig = false;
> +	sector_t sector;
> +
> +	/* we don't support blocksize < PAGE_SIZE quite yet: */

sentence ends with a ".". :)

> +	WARN_ON_ONCE(pos != page_offset(page));
> +	WARN_ON_ONCE(plen != PAGE_SIZE);
> +
> +	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {

In what situation do we get a read request completely beyond EOF?
(comment, please!)

> +		zero_user(page, poff, plen);
> +		SetPageUptodate(page);
> +		goto done;
> +	}

[...]

> +int
> +iomap_readpage(struct page *page, const struct iomap_ops *ops)
> +{
> +	struct iomap_readpage_ctx ctx = { .cur_page = page };
> +	struct inode *inode = page->mapping->host;
> +	unsigned poff;
> +	loff_t ret;
> +
> +	WARN_ON_ONCE(page_has_buffers(page));
> +
> +	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
> +		ret = iomap_apply(inode, page_offset(page) + poff,
> +				PAGE_SIZE - poff, 0, ops, &ctx,
> +				iomap_readpage_actor);
> +		if (ret <= 0) {
> +			WARN_ON_ONCE(ret == 0);
> +			SetPageError(page);
> +			break;
> +		}
> +	}
> +
> +	if (ctx.bio) {
> +		submit_bio(ctx.bio);
> +		WARN_ON_ONCE(!ctx.cur_page_in_bio);
> +	} else {
> +		WARN_ON_ONCE(ctx.cur_page_in_bio);
> +		unlock_page(page);
> +	}
> +	return 0;

Hmmm. If we had an error from iomap_apply, shouldn't we be returning
it here instead just throwing it away? some ->readpage callers
appear to ignore the PageError() state on return but do expect
errors to be returned.

[...]

> +iomap_readpages(struct address_space *mapping, struct list_head *pages,
> +		unsigned nr_pages, const struct iomap_ops *ops)
> +{
> +	struct iomap_readpage_ctx ctx = {
> +		.pages		= pages,
> +		.is_readahead	= true,
> +	};
> +	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
> +	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
> +	loff_t length = last - pos + PAGE_SIZE, ret = 0;

Two lines, please.

> +	while (length > 0) {
> +		ret = iomap_apply(mapping->host, pos, length, 0, ops,
> +				&ctx, iomap_readpages_actor);
> +		if (ret <= 0) {
> +			WARN_ON_ONCE(ret == 0);
> +			goto done;
> +		}
> +		pos += ret;
> +		length -= ret;
> +	}
> +	ret = 0;
> +done:
> +	if (ctx.bio)
> +		submit_bio(ctx.bio);
> +	if (ctx.cur_page) {
> +		if (!ctx.cur_page_in_bio)
> +			unlock_page(ctx.cur_page);
> +		put_page(ctx.cur_page);
> +	}
> +	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));

What error condition is this warning about?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] xfs: use iomap_bmap
  2018-05-30  9:58 ` [PATCH 12/13] xfs: use iomap_bmap Christoph Hellwig
@ 2018-05-30 23:46   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:12AM +0200, Christoph Hellwig wrote:
> Switch to the iomap based bmap implementation to get rid of one of the
> last users of xfs_get_blocks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/13] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages
  2018-05-30  9:58 ` [PATCH 13/13] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
@ 2018-05-30 23:47   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-30 23:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:58:13AM +0200, Christoph Hellwig wrote:
> For file systems with a block size that equals the page size we never do
> partial reads, so we can use the buffer_head-less iomap versions of
> readpage and readpages without conflicting with the buffer_head structures
> create later in write_begin.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/13] iomap: add an iomap-based bmap implementation
  2018-05-30 23:11   ` Dave Chinner
@ 2018-05-31  6:07     ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-31  6:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Thu, May 31, 2018 at 09:11:56AM +1000, Dave Chinner wrote:
> On Wed, May 30, 2018 at 11:58:10AM +0200, Christoph Hellwig wrote:
> > This adds a simple iomap-based implementation of the legacy ->bmap
> > interface.  Note that we can't easily add checks for rt or reflink
> > files, so these will have to remain in the callers.  This interface
> > just needs to die..
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/iomap.c            | 34 ++++++++++++++++++++++++++++++++++
> >  include/linux/iomap.h |  3 +++
> >  2 files changed, 37 insertions(+)
> > 
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index 74cdf8b5bbb0..b0bc928672af 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -1307,3 +1307,37 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
> >  }
> >  EXPORT_SYMBOL_GPL(iomap_swapfile_activate);
> >  #endif /* CONFIG_SWAP */
> > +
> > +static loff_t
> > +iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> > +		void *data, struct iomap *iomap)
> > +{
> > +	sector_t *bno = data, addr;
> 
> Can you split these? maybe scope addr insie the if() branch it is
> used in?

This was intentional to avoid wasting another two lines on this
trivial, deprecated functionality..

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation
  2018-05-30 23:45   ` Dave Chinner
@ 2018-05-31  6:13     ` Christoph Hellwig
  2018-05-31 11:59       ` Dave Chinner
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-31  6:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Thu, May 31, 2018 at 09:45:57AM +1000, Dave Chinner wrote:
> sentence ends with a ".". :)

Ok.  This was intended to point to the WARN_ON calls below, but a "."
is fine with me, too.

> 
> > +	WARN_ON_ONCE(pos != page_offset(page));
> > +	WARN_ON_ONCE(plen != PAGE_SIZE);
> > +
> > +	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
> 
> In what situation do we get a read request completely beyond EOF?
> (comment, please!)

This is generally to cover a racing read beyond EOF.  That being said
I'd have to look up if it can really happen for blocksize == pagesize.

All this becomes moot once small block size support is added, so I think
I'd rather skip the comment and research here for now.

> > +	if (ctx.bio) {
> > +		submit_bio(ctx.bio);
> > +		WARN_ON_ONCE(!ctx.cur_page_in_bio);
> > +	} else {
> > +		WARN_ON_ONCE(ctx.cur_page_in_bio);
> > +		unlock_page(page);
> > +	}
> > +	return 0;
> 
> Hmmm. If we had an error from iomap_apply, shouldn't we be returning
> it here instead just throwing it away? some ->readpage callers
> appear to ignore the PageError() state on return but do expect
> errors to be returned.

Both mpage_readpage and block_read_full_page always return 0, so for
now I'd like to stay compatible to them.  Might be worth a full audit
later.

> > +	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
> > +	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
> > +	loff_t length = last - pos + PAGE_SIZE, ret = 0;
> 
> Two lines, please.

I really like it that way, though..

> > +done:
> > +	if (ctx.bio)
> > +		submit_bio(ctx.bio);
> > +	if (ctx.cur_page) {
> > +		if (!ctx.cur_page_in_bio)
> > +			unlock_page(ctx.cur_page);
> > +		put_page(ctx.cur_page);
> > +	}
> > +	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
> 
> What error condition is this warning about?

Not finishing all pages without an error.  Which wasn't too hard to get
wrong given the arance readpages calling convention.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation
  2018-05-31  6:13     ` Christoph Hellwig
@ 2018-05-31 11:59       ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-05-31 11:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Thu, May 31, 2018 at 08:13:15AM +0200, Christoph Hellwig wrote:
> On Thu, May 31, 2018 at 09:45:57AM +1000, Dave Chinner wrote:
> > sentence ends with a ".". :)
> 
> Ok.  This was intended to point to the WARN_ON calls below, but a "."
> is fine with me, too.
> 
> > 
> > > +	WARN_ON_ONCE(pos != page_offset(page));
> > > +	WARN_ON_ONCE(plen != PAGE_SIZE);
> > > +
> > > +	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
> > 
> > In what situation do we get a read request completely beyond EOF?
> > (comment, please!)
> 
> This is generally to cover a racing read beyond EOF.  That being said
> I'd have to look up if it can really happen for blocksize == pagesize.
> 
> All this becomes moot once small block size support is added, so I think
> I'd rather skip the comment and research here for now.

OK.

> > > +	if (ctx.bio) {
> > > +		submit_bio(ctx.bio);
> > > +		WARN_ON_ONCE(!ctx.cur_page_in_bio);
> > > +	} else {
> > > +		WARN_ON_ONCE(ctx.cur_page_in_bio);
> > > +		unlock_page(page);
> > > +	}
> > > +	return 0;
> > 
> > Hmmm. If we had an error from iomap_apply, shouldn't we be returning
> > it here instead just throwing it away? some ->readpage callers
> > appear to ignore the PageError() state on return but do expect
> > errors to be returned.
> 
> Both mpage_readpage and block_read_full_page always return 0, so for
> now I'd like to stay compatible to them.  Might be worth a full audit
> later.
> 
> > > +	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
> > > +	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
> > > +	loff_t length = last - pos + PAGE_SIZE, ret = 0;
> > 
> > Two lines, please.
> 
> I really like it that way, though..

Except for the fact most peoples eyes are trained for one line per
declaration and one variable assignment per line. I don't care about
an extra line of code or two, but it's so easy to lose a declaration
of a short variable in all those long declarations and initialisers.
I found myself asking several times through these patchsets "now
where was /that/ variable declared/initialised?".  That's why I'm
asking for it to be changed.

> > > +done:
> > > +	if (ctx.bio)
> > > +		submit_bio(ctx.bio);
> > > +	if (ctx.cur_page) {
> > > +		if (!ctx.cur_page_in_bio)
> > > +			unlock_page(ctx.cur_page);
> > > +		put_page(ctx.cur_page);
> > > +	}
> > > +	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
> > 
> > What error condition is this warning about?
> 
> Not finishing all pages without an error.  Which wasn't too hard to get
> wrong given the arance readpages calling convention.

It's crusty old code like this that make me realise why we have so
many problems with IO error reporting - instead of fixing error
propagation problems when we come across them,  we just layer more
crap on top with some undocumented warnings for good measure.

Not really happy about it. Please add comments explaining the crap
you're adding to work around the crappy error propagation issues.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 05/13] iomap: inline data should be an iomap type, not a flag
  2018-05-31 18:06 iomap based buffered reads & iomap cleanups v5 Christoph Hellwig
@ 2018-05-31 18:06 ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:06 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/ext4/inline.c      |  4 ++--
 fs/gfs2/bmap.c        |  3 +--
 fs/iomap.c            | 21 ++++++++++++---------
 include/linux/iomap.h |  2 +-
 4 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
 	iomap->offset = 0;
 	iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
 			      i_size_read(inode));
-	iomap->type = 0;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 
 out:
 	up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 278ed0869c3c..cbeedd3cfb36 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -680,8 +680,7 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
 	iomap->length = i_size_read(inode);
-	iomap->type = IOMAP_MAPPED;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
 }
 
 /**
diff --git a/fs/iomap.c b/fs/iomap.c
index f2456d0d8ddd..df2652b0d85d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -502,10 +502,13 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 	case IOMAP_DELALLOC:
 		flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
 		break;
+	case IOMAP_MAPPED:
+		break;
 	case IOMAP_UNWRITTEN:
 		flags |= FIEMAP_EXTENT_UNWRITTEN;
 		break;
-	case IOMAP_MAPPED:
+	case IOMAP_INLINE:
+		flags |= FIEMAP_EXTENT_DATA_INLINE;
 		break;
 	}
 
@@ -513,8 +516,6 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 		flags |= FIEMAP_EXTENT_MERGED;
 	if (iomap->flags & IOMAP_F_SHARED)
 		flags |= FIEMAP_EXTENT_SHARED;
-	if (iomap->flags & IOMAP_F_DATA_INLINE)
-		flags |= FIEMAP_EXTENT_DATA_INLINE;
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
 			iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
@@ -1214,14 +1215,16 @@ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
 	struct iomap_swapfile_info *isi = data;
 	int error;
 
-	/* No inline data. */
-	if (iomap->flags & IOMAP_F_DATA_INLINE) {
+	switch (iomap->type) {
+	case IOMAP_MAPPED:
+	case IOMAP_UNWRITTEN:
+		/* Only real or unwritten extents. */
+		break;
+	case IOMAP_INLINE:
+		/* No inline data. */
 		pr_err("swapon: file is inline\n");
 		return -EINVAL;
-	}
-
-	/* Only real or unwritten extents. */
-	if (iomap->type != IOMAP_MAPPED && iomap->type != IOMAP_UNWRITTEN) {
+	default:
 		pr_err("swapon: file has unallocated extents\n");
 		return -EINVAL;
 	}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4bd87294219a..8f7095fc514e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
 #define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE	0x05	/* data inline in the inode */
 
 /*
  * Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
  */
 #define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED		0x20	/* block shared with another file */
-#define IOMAP_F_DATA_INLINE	0x40	/* data inline in the inode */
 
 /*
  * Magic value for addr:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-05-31 18:06 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  9:58 iomap based buffered reads & iomap cleanups v4 Christoph Hellwig
2018-05-30  9:58 ` [PATCH 01/13] block: add a lower-level bio_add_page interface Christoph Hellwig
2018-05-30 10:12   ` Ming Lei
2018-05-30  9:58 ` [PATCH 02/13] mm: give the 'ret' variable a better name __do_page_cache_readahead Christoph Hellwig
2018-05-30 23:01   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 03/13] mm: return an unsigned int from __do_page_cache_readahead Christoph Hellwig
2018-05-30 23:02   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 04/13] mm: split ->readpages calls to avoid non-contiguous pages lists Christoph Hellwig
2018-05-30 23:02   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 05/13] iomap: inline data should be an iomap type, not a flag Christoph Hellwig
2018-05-30 23:05   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 06/13] iomap: fix the comment describing IOMAP_NOWAIT Christoph Hellwig
2018-05-30 23:05   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 07/13] iomap: move IOMAP_F_BOUNDARY to gfs2 Christoph Hellwig
2018-05-30 23:08   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 08/13] iomap: use __bio_add_page in iomap_dio_zero Christoph Hellwig
2018-05-30 23:09   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 09/13] iomap: add a iomap_sector helper Christoph Hellwig
2018-05-30 23:10   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 10/13] iomap: add an iomap-based bmap implementation Christoph Hellwig
2018-05-30 23:11   ` Dave Chinner
2018-05-31  6:07     ` Christoph Hellwig
2018-05-30  9:58 ` [PATCH 11/13] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
2018-05-30 16:22   ` Darrick J. Wong
2018-05-30 23:45   ` Dave Chinner
2018-05-31  6:13     ` Christoph Hellwig
2018-05-31 11:59       ` Dave Chinner
2018-05-30  9:58 ` [PATCH 12/13] xfs: use iomap_bmap Christoph Hellwig
2018-05-30 23:46   ` Dave Chinner
2018-05-30  9:58 ` [PATCH 13/13] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
2018-05-30 23:47   ` Dave Chinner
2018-05-31 18:06 iomap based buffered reads & iomap cleanups v5 Christoph Hellwig
2018-05-31 18:06 ` [PATCH 05/13] iomap: inline data should be an iomap type, not a flag Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).