linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v5 0/6] Support for RWF_UNCACHED
@ 2019-12-17 14:39 Jens Axboe
  2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block; +Cc: willy, clm, torvalds, david

Recently someone asked me how io_uring buffered IO compares to mmaped
IO in terms of performance. So I ran some tests with buffered IO, and
found the experience to be somewhat painful. The test case is pretty
basic, random reads over a dataset that's 10x the size of RAM.
Performance starts out fine, and then the page cache fills up and we
hit a throughput cliff. CPU usage of the IO threads go up, and we have
kswapd spending 100% of a core trying to keep up. Seeing that, I was
reminded of the many complaints I here about buffered IO, and the fact
that most of the folks complaining will ultimately bite the bullet and
move to O_DIRECT to just get the kernel out of the way.

But I don't think it needs to be like that. Switching to O_DIRECT isn't
always easily doable. The buffers have different life times, size and
alignment constraints, etc. On top of that, mixing buffered and O_DIRECT
can be painful.

Seems to me that we have an opportunity to provide something that sits
somewhere in between buffered and O_DIRECT, and this is where
RWF_UNCACHED enters the picture. If this flag is set on IO, we get the
following behavior:

- If the data is in cache, it remains in cache and the copy (in or out)
  is served to/from that. This is true for both reads and writes.

- For writes, if the data is NOT in cache, we add it while performing the
  IO. When the IO is done, we remove it again.

- For reads, if the data is NOT in the cache, we allocate a private page
  and use that for IO. When the IO is done, we free this page. The page
  never sees the page cache.

With this, I can do 100% smooth buffered reads or writes without pushing
the kernel to the state where kswapd is sweating bullets. In fact it
doesn't even register.

Comments appreciated! This should work on any standard file system,
using either the generic helpers or iomap. I have tested ext4 and xfs
for the right read/write behavior, but no further validation has been
done yet. This version contains the bigger prep patch of switching
iomap_apply() and actors to struct iomap_data, and I hope I didn't
mess that up too badly. I'll try and exercise it all, I've done XFS
mounts and reads+writes and it seems happy from that POV at least.

The core of the changes are actually really small, the majority of
the diff is just prep work to get there.

Patches are against current git, and can also be found here:

https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached

 fs/ceph/file.c          |   2 +-
 fs/dax.c                |  25 +++--
 fs/ext4/file.c          |   2 +-
 fs/iomap/apply.c        |  61 ++++++++---
 fs/iomap/buffered-io.c  | 230 +++++++++++++++++++++++++---------------
 fs/iomap/direct-io.c    |  57 +++++-----
 fs/iomap/fiemap.c       |  48 +++++----
 fs/iomap/seek.c         |  64 ++++++-----
 fs/iomap/swapfile.c     |  27 ++---
 fs/iomap/trace.h        |   4 +-
 fs/nfs/file.c           |   2 +-
 fs/xfs/xfs_iomap.c      |   7 +-
 include/linux/fs.h      |   7 +-
 include/linux/iomap.h   |  20 +++-
 include/uapi/linux/fs.h |   5 +-
 mm/filemap.c            |  87 +++++++++++++--
 16 files changed, 439 insertions(+), 209 deletions(-)

Changes since v4:
- Add patch for disabling delalloc on buffered writes on xfs
- Fixup an XFS flags mishap
- Add iomap flag trace definitions
- Fixup silly add_to_page_cache() vs add_to_page_cache_lru() mistake

Changes since v3:
- Add iomap_actor_data to cut down on arguments
- Fix bad flag drop in iomap_write_begin()
- Remove unused IOMAP_WRITE_F_UNCACHED flag
- Don't use the page cache at all for reads

Changes since v2:
- Rework the write side according to Chinners suggestions. Much cleaner
  this way. It does mean that we invalidate the full write region if just
  ONE page (or more) had to be created, where before it was more granular.
  I don't think that's a concern, and on the plus side, we now no longer
  have to chunk invalidations into 15/16 pages at the time.
- Cleanups

Changes since v1:
- Switch to pagevecs for write_drop_cached_pages()
- Use page_offset() instead of manual shift
- Ensure we hold a reference on the page between calling ->write_end()
  and checking the mapping on the locked page
- Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag after
  the first page

--
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] fs: add read support for RWF_UNCACHED
  2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
@ 2019-12-17 14:39 ` Jens Axboe
  2019-12-17 15:16   ` Guoqing Jiang
                     ` (2 more replies)
  2019-12-17 14:39 ` [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb Jens Axboe
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david, Jens Axboe

If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private
pages for the buffered reads. These pages will never be inserted into
the page cache, and they are simply droped when we have done the copy at
the end of IO.

If pages in the read range are already in the page cache, then use those
for just copying the data instead of starting IO on private pages.

A previous solution used the page cache even for non-cached ranges, but
the cost of doing so was too high. Removing nodes at the end is
expensive, even with LRU bypass. On top of that, repeatedly
instantiating new xarray nodes is very costly, as it needs to memset 576
bytes of data, and freeing said nodes involve an RCU call per node as
well. All that adds up, making uncached somewhat slower than O_DIRECT.

With the current solition, we're basically at O_DIRECT levels of
performance for RWF_UNCACHED IO.

Protect against truncate the same way O_DIRECT does, by calling
inode_dio_begin() to elevate the inode->i_dio_count.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/fs.h      |  3 +++
 include/uapi/linux/fs.h |  5 ++++-
 mm/filemap.c            | 38 ++++++++++++++++++++++++++++++++------
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98e0349adb52..092ea2a4319b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -314,6 +314,7 @@ enum rw_hint {
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
+#define IOCB_UNCACHED		(1 << 8)
 
 struct kiocb {
 	struct file		*ki_filp;
@@ -3418,6 +3419,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 		ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
 	if (flags & RWF_APPEND)
 		ki->ki_flags |= IOCB_APPEND;
+	if (flags & RWF_UNCACHED)
+		ki->ki_flags |= IOCB_UNCACHED;
 	return 0;
 }
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 379a612f8f1d..357ebb0e0c5d 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO O_APPEND */
 #define RWF_APPEND	((__force __kernel_rwf_t)0x00000010)
 
+/* drop cache after reading or writing data */
+#define RWF_UNCACHED	((__force __kernel_rwf_t)0x00000040)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-			 RWF_APPEND)
+			 RWF_APPEND | RWF_UNCACHED)
 
 #endif /* _UAPI_LINUX_FS_H */
diff --git a/mm/filemap.c b/mm/filemap.c
index bf6aa30be58d..7ddc4d8386cf 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1990,6 +1990,13 @@ static void shrink_readahead_size_eio(struct file *filp,
 	ra->ra_pages /= 4;
 }
 
+static void buffered_put_page(struct page *page, bool clear_mapping)
+{
+	if (clear_mapping)
+		page->mapping = NULL;
+	put_page(page);
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2013,6 +2020,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
 	struct file_ra_state *ra = &filp->f_ra;
+	bool did_dio_begin = false;
 	loff_t *ppos = &iocb->ki_pos;
 	pgoff_t index;
 	pgoff_t last_index;
@@ -2032,6 +2040,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	offset = *ppos & ~PAGE_MASK;
 
 	for (;;) {
+		bool clear_mapping = false;
 		struct page *page;
 		pgoff_t end_index;
 		loff_t isize;
@@ -2048,6 +2057,13 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		if (!page) {
 			if (iocb->ki_flags & IOCB_NOWAIT)
 				goto would_block;
+			/* UNCACHED implies no read-ahead */
+			if (iocb->ki_flags & IOCB_UNCACHED) {
+				did_dio_begin = true;
+				/* block truncate for UNCACHED reads */
+				inode_dio_begin(inode);
+				goto no_cached_page;
+			}
 			page_cache_sync_readahead(mapping,
 					ra, filp,
 					index, last_index - index);
@@ -2106,7 +2122,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		isize = i_size_read(inode);
 		end_index = (isize - 1) >> PAGE_SHIFT;
 		if (unlikely(!isize || index > end_index)) {
-			put_page(page);
+			buffered_put_page(page, clear_mapping);
 			goto out;
 		}
 
@@ -2115,7 +2131,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		if (index == end_index) {
 			nr = ((isize - 1) & ~PAGE_MASK) + 1;
 			if (nr <= offset) {
-				put_page(page);
+				buffered_put_page(page, clear_mapping);
 				goto out;
 			}
 		}
@@ -2147,7 +2163,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		offset &= ~PAGE_MASK;
 		prev_offset = offset;
 
-		put_page(page);
+		buffered_put_page(page, clear_mapping);
 		written += ret;
 		if (!iov_iter_count(iter))
 			goto out;
@@ -2189,7 +2205,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
 		if (unlikely(error)) {
 			if (error == AOP_TRUNCATED_PAGE) {
-				put_page(page);
+				buffered_put_page(page, clear_mapping);
 				error = 0;
 				goto find_page;
 			}
@@ -2206,7 +2222,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 					 * invalidate_mapping_pages got it
 					 */
 					unlock_page(page);
-					put_page(page);
+					buffered_put_page(page, clear_mapping);
 					goto find_page;
 				}
 				unlock_page(page);
@@ -2221,7 +2237,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
 readpage_error:
 		/* UHHUH! A synchronous read error occurred. Report it */
-		put_page(page);
+		buffered_put_page(page, clear_mapping);
 		goto out;
 
 no_cached_page:
@@ -2234,6 +2250,14 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 			error = -ENOMEM;
 			goto out;
 		}
+		if (iocb->ki_flags & IOCB_UNCACHED) {
+			__SetPageLocked(page);
+			page->mapping = mapping;
+			page->index = index;
+			clear_mapping = true;
+			goto readpage;
+		}
+
 		error = add_to_page_cache_lru(page, mapping, index,
 				mapping_gfp_constraint(mapping, GFP_KERNEL));
 		if (error) {
@@ -2250,6 +2274,8 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 would_block:
 	error = -EAGAIN;
 out:
+	if (did_dio_begin)
+		inode_dio_end(inode);
 	ra->prev_pos = prev_index;
 	ra->prev_pos <<= PAGE_SHIFT;
 	ra->prev_pos |= prev_offset;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb
  2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
  2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
@ 2019-12-17 14:39 ` Jens Axboe
  2019-12-17 14:39 ` [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david, Jens Axboe

Right now all callers pass in iocb->ki_pos, just pass in the iocb. This
is in preparation for using the iocb flags in generic_perform_write().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/ceph/file.c     | 2 +-
 fs/ext4/file.c     | 2 +-
 fs/nfs/file.c      | 2 +-
 include/linux/fs.h | 3 ++-
 mm/filemap.c       | 8 +++++---
 5 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 11929d2bb594..096c009f188f 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1538,7 +1538,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		 * are pending vmtruncate. So write and vmtruncate
 		 * can not run at the same time
 		 */
-		written = generic_perform_write(file, from, pos);
+		written = generic_perform_write(file, from, iocb);
 		if (likely(written >= 0))
 			iocb->ki_pos = pos + written;
 		ceph_end_io_write(inode);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 6a7293a5cda2..9ffb857765d5 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -249,7 +249,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 		goto out;
 
 	current->backing_dev_info = inode_to_bdi(inode);
-	ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos);
+	ret = generic_perform_write(iocb->ki_filp, from, iocb);
 	current->backing_dev_info = NULL;
 
 out:
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 8eb731d9be3e..d8f51a702a4e 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -624,7 +624,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 	result = generic_write_checks(iocb, from);
 	if (result > 0) {
 		current->backing_dev_info = inode_to_bdi(inode);
-		result = generic_perform_write(file, from, iocb->ki_pos);
+		result = generic_perform_write(file, from, iocb);
 		current->backing_dev_info = NULL;
 	}
 	nfs_end_io_write(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 092ea2a4319b..bf58db1bc032 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3103,7 +3103,8 @@ extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
-extern ssize_t generic_perform_write(struct file *, struct iov_iter *, loff_t);
+extern ssize_t generic_perform_write(struct file *, struct iov_iter *,
+				     struct kiocb *);
 
 ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
 		rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index 7ddc4d8386cf..522152ed86d8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3292,10 +3292,11 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping,
 EXPORT_SYMBOL(grab_cache_page_write_begin);
 
 ssize_t generic_perform_write(struct file *file,
-				struct iov_iter *i, loff_t pos)
+				struct iov_iter *i, struct kiocb *iocb)
 {
 	struct address_space *mapping = file->f_mapping;
 	const struct address_space_operations *a_ops = mapping->a_ops;
+	loff_t pos = iocb->ki_pos;
 	long status = 0;
 	ssize_t written = 0;
 	unsigned int flags = 0;
@@ -3429,7 +3430,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
 			goto out;
 
-		status = generic_perform_write(file, from, pos = iocb->ki_pos);
+		pos = iocb->ki_pos;
+		status = generic_perform_write(file, from, iocb);
 		/*
 		 * If generic_perform_write() returned a synchronous error
 		 * then we want to return the number of bytes which were
@@ -3461,7 +3463,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			 */
 		}
 	} else {
-		written = generic_perform_write(file, from, iocb->ki_pos);
+		written = generic_perform_write(file, from, iocb);
 		if (likely(written > 0))
 			iocb->ki_pos += written;
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED
  2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
  2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
  2019-12-17 14:39 ` [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb Jens Axboe
@ 2019-12-17 14:39 ` Jens Axboe
  2019-12-17 14:39 ` [PATCH 4/6] iomap: add struct iomap_ctx Jens Axboe
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david, Jens Axboe

If RWF_UNCACHED is set for io_uring (or pwritev2(2)), we'll drop the
cache instantiated for buffered writes. If new pages aren't
instantiated, we leave them alone. This provides similar semantics to
reads with RWF_UNCACHED set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/fs.h |  1 +
 mm/filemap.c       | 41 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index bf58db1bc032..5ea5fc167524 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -285,6 +285,7 @@ enum positive_aop_returns {
 #define AOP_FLAG_NOFS			0x0002 /* used by filesystem to direct
 						* helper code (eg buffer layer)
 						* to clear GFP_FS from alloc */
+#define AOP_FLAG_UNCACHED		0x0004
 
 /*
  * oh the beauties of C type declarations.
diff --git a/mm/filemap.c b/mm/filemap.c
index 522152ed86d8..0b5f29b30c34 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3277,10 +3277,12 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping,
 					pgoff_t index, unsigned flags)
 {
 	struct page *page;
-	int fgp_flags = FGP_LOCK|FGP_WRITE|FGP_CREAT;
+	int fgp_flags = FGP_LOCK|FGP_WRITE;
 
 	if (flags & AOP_FLAG_NOFS)
 		fgp_flags |= FGP_NOFS;
+	if (!(flags & AOP_FLAG_UNCACHED))
+		fgp_flags |= FGP_CREAT;
 
 	page = pagecache_get_page(mapping, index, fgp_flags,
 			mapping_gfp_mask(mapping));
@@ -3301,6 +3303,9 @@ ssize_t generic_perform_write(struct file *file,
 	ssize_t written = 0;
 	unsigned int flags = 0;
 
+	if (iocb->ki_flags & IOCB_UNCACHED)
+		flags |= AOP_FLAG_UNCACHED;
+
 	do {
 		struct page *page;
 		unsigned long offset;	/* Offset into pagecache page */
@@ -3333,10 +3338,16 @@ ssize_t generic_perform_write(struct file *file,
 			break;
 		}
 
+retry:
 		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
 						&page, &fsdata);
-		if (unlikely(status < 0))
+		if (unlikely(status < 0)) {
+			if (status == -ENOMEM && (flags & AOP_FLAG_UNCACHED)) {
+				flags &= ~AOP_FLAG_UNCACHED;
+				goto retry;
+			}
 			break;
+		}
 
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_page(page);
@@ -3372,6 +3383,32 @@ ssize_t generic_perform_write(struct file *file,
 		balance_dirty_pages_ratelimited(mapping);
 	} while (iov_iter_count(i));
 
+	if (written && (iocb->ki_flags & IOCB_UNCACHED)) {
+		loff_t end;
+
+		pos = iocb->ki_pos;
+		end = pos + written;
+
+		status = filemap_write_and_wait_range(mapping, pos, end);
+		if (status)
+			goto out;
+
+		/*
+		 * No pages were created for this range, we're done
+		 */
+		if (flags & AOP_FLAG_UNCACHED)
+			 goto out;
+
+		/*
+		 * Try to invalidate cache pages for the range we just wrote.
+		 * We don't care if invalidation fails as the write has still
+		 * worked and leaving clean uptodate pages in the page cache
+		 * isn't a corruption vector for uncached IO.
+		 */
+		 invalidate_inode_pages2_range(mapping,
+					pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+	}
+out:
 	return written ? written : status;
 }
 EXPORT_SYMBOL(generic_perform_write);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] iomap: add struct iomap_ctx
  2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
                   ` (2 preceding siblings ...)
  2019-12-17 14:39 ` [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
@ 2019-12-17 14:39 ` Jens Axboe
  2019-12-17 19:39   ` Linus Torvalds
  2019-12-17 14:39 ` [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
  2019-12-17 14:39 ` [PATCH 6/6] xfs: don't do delayed allocations for uncached " Jens Axboe
  5 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david, Jens Axboe

We pass a lot of arguments to iomap_apply(), and subsequently to the
actors that it calls. In preparation for adding one more argument,
switch them to using a struct iomap_ctx instead. The actor gets a const
version of that, they are not supposed to change anything in it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/dax.c               |  25 +++--
 fs/iomap/apply.c       |  26 +++---
 fs/iomap/buffered-io.c | 202 +++++++++++++++++++++++++----------------
 fs/iomap/direct-io.c   |  57 +++++++-----
 fs/iomap/fiemap.c      |  48 ++++++----
 fs/iomap/seek.c        |  64 ++++++++-----
 fs/iomap/swapfile.c    |  27 +++---
 include/linux/iomap.h  |  15 ++-
 8 files changed, 278 insertions(+), 186 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1f1f0201cad1..2637afec30b2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1090,13 +1090,16 @@ int __dax_zero_page_range(struct block_device *bdev,
 EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
 static loff_t
-dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+dax_iomap_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	struct block_device *bdev = iomap->bdev;
 	struct dax_device *dax_dev = iomap->dax_dev;
-	struct iov_iter *iter = data;
+	struct iov_iter *iter = data->priv;
+	loff_t pos = data->pos;
+	loff_t length = pos + data->len;
 	loff_t end = pos + length, done = 0;
+	struct inode *inode = data->inode;
 	ssize_t ret = 0;
 	size_t xfer;
 	int id;
@@ -1197,22 +1200,26 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = mapping->host;
-	loff_t pos = iocb->ki_pos, ret = 0, done = 0;
-	unsigned flags = 0;
+	loff_t ret = 0, done = 0;
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= iocb->ki_pos,
+		.priv	= iter,
+	};
 
 	if (iov_iter_rw(iter) == WRITE) {
 		lockdep_assert_held_write(&inode->i_rwsem);
-		flags |= IOMAP_WRITE;
+		data.flags |= IOMAP_WRITE;
 	} else {
 		lockdep_assert_held(&inode->i_rwsem);
 	}
 
 	while (iov_iter_count(iter)) {
-		ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
-				iter, dax_iomap_actor);
+		data.len = iov_iter_count(iter);
+		ret = iomap_apply(&data, ops, dax_iomap_actor);
 		if (ret <= 0)
 			break;
-		pos += ret;
+		data.pos += ret;
 		done += ret;
 	}
 
diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index 76925b40b5fd..792079403a22 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -21,15 +21,16 @@
  * iomap_end call.
  */
 loff_t
-iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
-		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
+iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops,
+	    iomap_actor_t actor)
 {
 	struct iomap iomap = { .type = IOMAP_HOLE };
 	struct iomap srcmap = { .type = IOMAP_HOLE };
 	loff_t written = 0, ret;
 	u64 end;
 
-	trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_);
+	trace_iomap_apply(data->inode, data->pos, data->len, data->flags, ops,
+				actor, _RET_IP_);
 
 	/*
 	 * Need to map a range from start position for length bytes. This can
@@ -43,17 +44,18 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * expose transient stale data. If the reserve fails, we can safely
 	 * back out at this point as there is nothing to undo.
 	 */
-	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
+	ret = ops->iomap_begin(data->inode, data->pos, data->len, data->flags,
+				&iomap, &srcmap);
 	if (ret)
 		return ret;
-	if (WARN_ON(iomap.offset > pos))
+	if (WARN_ON(iomap.offset > data->pos))
 		return -EIO;
 	if (WARN_ON(iomap.length == 0))
 		return -EIO;
 
-	trace_iomap_apply_dstmap(inode, &iomap);
+	trace_iomap_apply_dstmap(data->inode, &iomap);
 	if (srcmap.type != IOMAP_HOLE)
-		trace_iomap_apply_srcmap(inode, &srcmap);
+		trace_iomap_apply_srcmap(data->inode, &srcmap);
 
 	/*
 	 * Cut down the length to the one actually provided by the filesystem,
@@ -62,8 +64,8 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	end = iomap.offset + iomap.length;
 	if (srcmap.type != IOMAP_HOLE)
 		end = min(end, srcmap.offset + srcmap.length);
-	if (pos + length > end)
-		length = end - pos;
+	if (data->pos + data->len > end)
+		data->len = end - data->pos;
 
 	/*
 	 * Now that we have guaranteed that the space allocation will succeed,
@@ -77,7 +79,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * iomap into the actors so that they don't need to have special
 	 * handling for the two cases.
 	 */
-	written = actor(inode, pos, length, data, &iomap,
+	written = actor(data, &iomap,
 			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
 
 	/*
@@ -85,9 +87,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * should not fail unless the filesystem has had a fatal error.
 	 */
 	if (ops->iomap_end) {
-		ret = ops->iomap_end(inode, pos, length,
+		ret = ops->iomap_end(data->inode, data->pos, data->len,
 				     written > 0 ? written : 0,
-				     flags, &iomap);
+				     data->flags, &iomap);
 	}
 
 	return written ? written : ret;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 828444e14d09..7f8300bce767 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -248,14 +248,15 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 }
 
 static loff_t
-iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+iomap_readpage_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		     struct iomap *srcmap)
 {
-	struct iomap_readpage_ctx *ctx = data;
+	struct iomap_readpage_ctx *ctx = data->priv;
+	struct inode *inode = data->inode;
 	struct page *page = ctx->cur_page;
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	bool same_page = false, is_contig = false;
-	loff_t orig_pos = pos;
+	loff_t pos = data->pos, orig_pos = data->pos;
 	unsigned poff, plen;
 	sector_t sector;
 
@@ -266,7 +267,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	}
 
 	/* zero post-eof blocks as the page may be mapped */
-	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
+	iomap_adjust_read_range(inode, iop, &pos, data->len, &poff, &plen);
 	if (plen == 0)
 		goto done;
 
@@ -302,7 +303,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
-		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		int nr_vecs = (data->len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
 		if (ctx->bio)
 			submit_bio(ctx->bio);
@@ -333,16 +334,20 @@ int
 iomap_readpage(struct page *page, const struct iomap_ops *ops)
 {
 	struct iomap_readpage_ctx ctx = { .cur_page = page };
-	struct inode *inode = page->mapping->host;
+	struct iomap_ctx data = {
+		.inode	= page->mapping->host,
+		.priv	= &ctx,
+		.flags	= 0
+	};
 	unsigned poff;
 	loff_t ret;
 
 	trace_iomap_readpage(page->mapping->host, 1);
 
 	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
-		ret = iomap_apply(inode, page_offset(page) + poff,
-				PAGE_SIZE - poff, 0, ops, &ctx,
-				iomap_readpage_actor);
+		data.pos = page_offset(page) + poff;
+		data.len = PAGE_SIZE - poff;
+		ret = iomap_apply(&data, ops, iomap_readpage_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
 			SetPageError(page);
@@ -396,28 +401,34 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
 }
 
 static loff_t
-iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_readpages_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		      struct iomap *srcmap)
 {
-	struct iomap_readpage_ctx *ctx = data;
+	struct iomap_readpage_ctx *ctx = data->priv;
 	loff_t done, ret;
 
-	for (done = 0; done < length; done += ret) {
-		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
+	for (done = 0; done < data->len; done += ret) {
+		struct iomap_ctx rp_data = {
+			.inode	= data->inode,
+			.pos	= data->pos + done,
+			.len	= data->len - done,
+			.priv	= ctx,
+		};
+
+		if (ctx->cur_page && offset_in_page(rp_data.pos) == 0) {
 			if (!ctx->cur_page_in_bio)
 				unlock_page(ctx->cur_page);
 			put_page(ctx->cur_page);
 			ctx->cur_page = NULL;
 		}
 		if (!ctx->cur_page) {
-			ctx->cur_page = iomap_next_page(inode, ctx->pages,
-					pos, length, &done);
+			ctx->cur_page = iomap_next_page(data->inode, ctx->pages,
+					data->pos, data->len, &done);
 			if (!ctx->cur_page)
 				break;
 			ctx->cur_page_in_bio = false;
 		}
-		ret = iomap_readpage_actor(inode, pos + done, length - done,
-				ctx, iomap, srcmap);
+		ret = iomap_readpage_actor(&rp_data, iomap, srcmap);
 	}
 
 	return done;
@@ -431,21 +442,27 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
 		.pages		= pages,
 		.is_readahead	= true,
 	};
-	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
+	struct iomap_ctx data = {
+		.inode	= mapping->host,
+		.priv	= &ctx,
+		.flags	= 0
+	};
 	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
-	loff_t length = last - pos + PAGE_SIZE, ret = 0;
+	loff_t ret = 0;
+
+	data.pos = page_offset(list_entry(pages->prev, struct page, lru));
+	data.len = last - data.pos + PAGE_SIZE;
 
-	trace_iomap_readpages(mapping->host, nr_pages);
+	trace_iomap_readpages(data.inode, nr_pages);
 
-	while (length > 0) {
-		ret = iomap_apply(mapping->host, pos, length, 0, ops,
-				&ctx, iomap_readpages_actor);
+	while (data.len > 0) {
+		ret = iomap_apply(&data, ops, iomap_readpages_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
 			goto done;
 		}
-		pos += ret;
-		length -= ret;
+		data.pos += ret;
+		data.len -= ret;
 	}
 	ret = 0;
 done:
@@ -796,10 +813,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
 }
 
 static loff_t
-iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		  struct iomap *srcmap)
 {
-	struct iov_iter *i = data;
+	struct inode *inode = data->inode;
+	struct iov_iter *i = data->priv;
+	loff_t length = data->len;
+	loff_t pos = data->pos;
 	long status = 0;
 	ssize_t written = 0;
 
@@ -879,15 +899,20 @@ ssize_t
 iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops)
 {
-	struct inode *inode = iocb->ki_filp->f_mapping->host;
-	loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+	struct iomap_ctx data = {
+		.inode	= iocb->ki_filp->f_mapping->host,
+		.pos	= iocb->ki_pos,
+		.priv	= iter,
+		.flags	= IOMAP_WRITE
+	};
+	loff_t ret = 0, written = 0;
 
 	while (iov_iter_count(iter)) {
-		ret = iomap_apply(inode, pos, iov_iter_count(iter),
-				IOMAP_WRITE, ops, iter, iomap_write_actor);
+		data.len = iov_iter_count(iter);
+		ret = iomap_apply(&data, ops, iomap_write_actor);
 		if (ret <= 0)
 			break;
-		pos += ret;
+		data.pos += ret;
 		written += ret;
 	}
 
@@ -896,9 +921,11 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
 static loff_t
-iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+iomap_unshare_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		    struct iomap *srcmap)
 {
+	loff_t pos = data->pos;
+	loff_t length = data->len;
 	long status = 0;
 	ssize_t written = 0;
 
@@ -914,13 +941,13 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 		struct page *page;
 
-		status = iomap_write_begin(inode, pos, bytes,
+		status = iomap_write_begin(data->inode, pos, bytes,
 				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
 		if (unlikely(status))
 			return status;
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
-				srcmap);
+		status = iomap_write_end(data->inode, pos, bytes, bytes, page,
+						iomap, srcmap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -933,7 +960,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		written += status;
 		length -= status;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(data->inode->i_mapping);
 	} while (length);
 
 	return written;
@@ -943,15 +970,20 @@ int
 iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops)
 {
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= pos,
+		.len	= len,
+		.flags	= IOMAP_WRITE,
+	};
 	loff_t ret;
 
-	while (len) {
-		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
-				iomap_unshare_actor);
+	while (data.len) {
+		ret = iomap_apply(&data, ops, iomap_unshare_actor);
 		if (ret <= 0)
 			return ret;
-		pos += ret;
-		len -= ret;
+		data.pos += ret;
+		data.len -= ret;
 	}
 
 	return 0;
@@ -982,16 +1014,18 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 }
 
 static loff_t
-iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_zero_range_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		       struct iomap *srcmap)
 {
-	bool *did_zero = data;
+	bool *did_zero = data->priv;
+	loff_t count = data->len;
+	loff_t pos = data->pos;
 	loff_t written = 0;
 	int status;
 
 	/* already zeroed?  we're done. */
 	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
-		return count;
+		return data->len;
 
 	do {
 		unsigned offset, bytes;
@@ -999,11 +1033,11 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		offset = offset_in_page(pos);
 		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
 
-		if (IS_DAX(inode))
+		if (IS_DAX(data->inode))
 			status = iomap_dax_zero(pos, offset, bytes, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap,
-					srcmap);
+			status = iomap_zero(data->inode, pos, offset, bytes,
+						iomap, srcmap);
 		if (status < 0)
 			return status;
 
@@ -1021,16 +1055,22 @@ int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
 {
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= pos,
+		.len	= len,
+		.priv	= did_zero,
+		.flags	= IOMAP_ZERO
+	};
 	loff_t ret;
 
-	while (len > 0) {
-		ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
-				ops, did_zero, iomap_zero_range_actor);
+	while (data.len > 0) {
+		ret = iomap_apply(&data, ops, iomap_zero_range_actor);
 		if (ret <= 0)
 			return ret;
 
-		pos += ret;
-		len -= ret;
+		data.pos += ret;
+		data.len -= ret;
 	}
 
 	return 0;
@@ -1052,57 +1092,59 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 EXPORT_SYMBOL_GPL(iomap_truncate_page);
 
 static loff_t
-iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_page_mkwrite_actor(const struct iomap_ctx *data,
+			 struct iomap *iomap, struct iomap *srcmap)
 {
-	struct page *page = data;
+	struct page *page = data->priv;
 	int ret;
 
 	if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
-		ret = __block_write_begin_int(page, pos, length, NULL, iomap);
+		ret = __block_write_begin_int(page, data->pos, data->len, NULL,
+						iomap);
 		if (ret)
 			return ret;
-		block_commit_write(page, 0, length);
+		block_commit_write(page, 0, data->len);
 	} else {
 		WARN_ON_ONCE(!PageUptodate(page));
-		iomap_page_create(inode, page);
+		iomap_page_create(data->inode, page);
 		set_page_dirty(page);
 	}
 
-	return length;
+	return data->len;
 }
 
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 {
 	struct page *page = vmf->page;
-	struct inode *inode = file_inode(vmf->vma->vm_file);
-	unsigned long length;
-	loff_t offset, size;
+	struct iomap_ctx data = {
+		.inode	= file_inode(vmf->vma->vm_file),
+		.pos	= page_offset(page),
+		.flags	= IOMAP_WRITE | IOMAP_FAULT,
+		.priv	= page,
+	};
 	ssize_t ret;
+	loff_t size;
 
 	lock_page(page);
-	size = i_size_read(inode);
-	offset = page_offset(page);
-	if (page->mapping != inode->i_mapping || offset > size) {
+	size = i_size_read(data.inode);
+	if (page->mapping != data.inode->i_mapping || data.pos > size) {
 		/* We overload EFAULT to mean page got truncated */
 		ret = -EFAULT;
 		goto out_unlock;
 	}
 
 	/* page is wholly or partially inside EOF */
-	if (offset > size - PAGE_SIZE)
-		length = offset_in_page(size);
+	if (data.pos > size - PAGE_SIZE)
+		data.len = offset_in_page(size);
 	else
-		length = PAGE_SIZE;
+		data.len = PAGE_SIZE;
 
-	while (length > 0) {
-		ret = iomap_apply(inode, offset, length,
-				IOMAP_WRITE | IOMAP_FAULT, ops, page,
-				iomap_page_mkwrite_actor);
+	while (data.len > 0) {
+		ret = iomap_apply(&data, ops, iomap_page_mkwrite_actor);
 		if (unlikely(ret <= 0))
 			goto out_unlock;
-		offset += ret;
-		length -= ret;
+		data.pos += ret;
+		data.len -= ret;
 	}
 
 	wait_for_stable_page(page);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 23837926c0c5..7f1bffa262e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -364,24 +364,27 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 }
 
 static loff_t
-iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_dio_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		struct iomap *srcmap)
 {
-	struct iomap_dio *dio = data;
+	struct iomap_dio *dio = data->priv;
 
 	switch (iomap->type) {
 	case IOMAP_HOLE:
 		if (WARN_ON_ONCE(dio->flags & IOMAP_DIO_WRITE))
 			return -EIO;
-		return iomap_dio_hole_actor(length, dio);
+		return iomap_dio_hole_actor(data->len, dio);
 	case IOMAP_UNWRITTEN:
 		if (!(dio->flags & IOMAP_DIO_WRITE))
-			return iomap_dio_hole_actor(length, dio);
-		return iomap_dio_bio_actor(inode, pos, length, dio, iomap);
+			return iomap_dio_hole_actor(data->len, dio);
+		return iomap_dio_bio_actor(data->inode, data->pos, data->len,
+						dio, iomap);
 	case IOMAP_MAPPED:
-		return iomap_dio_bio_actor(inode, pos, length, dio, iomap);
+		return iomap_dio_bio_actor(data->inode, data->pos, data->len,
+						dio, iomap);
 	case IOMAP_INLINE:
-		return iomap_dio_inline_actor(inode, pos, length, dio, iomap);
+		return iomap_dio_inline_actor(data->inode, data->pos, data->len,
+						dio, iomap);
 	default:
 		WARN_ON_ONCE(1);
 		return -EIO;
@@ -404,16 +407,19 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = file_inode(iocb->ki_filp);
-	size_t count = iov_iter_count(iter);
-	loff_t pos = iocb->ki_pos;
-	loff_t end = iocb->ki_pos + count - 1, ret = 0;
-	unsigned int flags = IOMAP_DIRECT;
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= iocb->ki_pos,
+		.len	= iov_iter_count(iter),
+		.flags	= IOMAP_DIRECT
+	};
+	loff_t end = data.pos + data.len - 1, ret = 0;
 	struct blk_plug plug;
 	struct iomap_dio *dio;
 
 	lockdep_assert_held(&inode->i_rwsem);
 
-	if (!count)
+	if (!data.len)
 		return 0;
 
 	if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion))
@@ -436,14 +442,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->submit.cookie = BLK_QC_T_NONE;
 	dio->submit.last_queue = NULL;
 
+	data.priv = dio;
+
 	if (iov_iter_rw(iter) == READ) {
-		if (pos >= dio->i_size)
+		if (data.pos >= dio->i_size)
 			goto out_free_dio;
 
 		if (iter_is_iovec(iter))
 			dio->flags |= IOMAP_DIO_DIRTY;
 	} else {
-		flags |= IOMAP_WRITE;
+		data.flags |= IOMAP_WRITE;
 		dio->flags |= IOMAP_DIO_WRITE;
 
 		/* for data sync or sync, we need sync completion processing */
@@ -461,14 +469,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	}
 
 	if (iocb->ki_flags & IOCB_NOWAIT) {
-		if (filemap_range_has_page(mapping, pos, end)) {
+		if (filemap_range_has_page(mapping, data.pos, end)) {
 			ret = -EAGAIN;
 			goto out_free_dio;
 		}
-		flags |= IOMAP_NOWAIT;
+		data.flags |= IOMAP_NOWAIT;
 	}
 
-	ret = filemap_write_and_wait_range(mapping, pos, end);
+	ret = filemap_write_and_wait_range(mapping, data.pos, end);
 	if (ret)
 		goto out_free_dio;
 
@@ -479,7 +487,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	 * pretty crazy thing to do, so we don't support it 100%.
 	 */
 	ret = invalidate_inode_pages2_range(mapping,
-			pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+			data.pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
 	if (ret)
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
@@ -495,8 +503,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 
 	blk_start_plug(&plug);
 	do {
-		ret = iomap_apply(inode, pos, count, flags, ops, dio,
-				iomap_dio_actor);
+		ret = iomap_apply(&data, ops, iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
 			if (ret == -ENOTBLK) {
@@ -505,18 +512,18 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			}
 			break;
 		}
-		pos += ret;
+		data.pos += ret;
 
-		if (iov_iter_rw(iter) == READ && pos >= dio->i_size) {
+		if (iov_iter_rw(iter) == READ && data.pos >= dio->i_size) {
 			/*
 			 * We only report that we've read data up to i_size.
 			 * Revert iter to a state corresponding to that as
 			 * some callers (such as splice code) rely on it.
 			 */
-			iov_iter_revert(iter, pos - dio->i_size);
+			iov_iter_revert(iter, data.pos - dio->i_size);
 			break;
 		}
-	} while ((count = iov_iter_count(iter)) > 0);
+	} while ((data.len = iov_iter_count(iter)) > 0);
 	blk_finish_plug(&plug);
 
 	if (ret < 0)
diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index bccf305ea9ce..61cd264a5d36 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -43,20 +43,20 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 }
 
 static loff_t
-iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+iomap_fiemap_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		   struct iomap *srcmap)
 {
-	struct fiemap_ctx *ctx = data;
-	loff_t ret = length;
+	struct fiemap_ctx *ctx = data->priv;
+	loff_t ret = data->len;
 
 	if (iomap->type == IOMAP_HOLE)
-		return length;
+		return data->len;
 
 	ret = iomap_to_fiemap(ctx->fi, &ctx->prev, 0);
 	ctx->prev = *iomap;
 	switch (ret) {
 	case 0:		/* success */
-		return length;
+		return data->len;
 	case 1:		/* extent array full */
 		return 0;
 	default:
@@ -68,6 +68,13 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 		loff_t start, loff_t len, const struct iomap_ops *ops)
 {
 	struct fiemap_ctx ctx;
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= start,
+		.len	= len,
+		.flags	= IOMAP_REPORT,
+		.priv	= &ctx
+	};
 	loff_t ret;
 
 	memset(&ctx, 0, sizeof(ctx));
@@ -84,9 +91,8 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 			return ret;
 	}
 
-	while (len > 0) {
-		ret = iomap_apply(inode, start, len, IOMAP_REPORT, ops, &ctx,
-				iomap_fiemap_actor);
+	while (data.len > 0) {
+		ret = iomap_apply(&data, ops, iomap_fiemap_actor);
 		/* inode with no (attribute) mapping will give ENOENT */
 		if (ret == -ENOENT)
 			break;
@@ -95,8 +101,8 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 		if (ret == 0)
 			break;
 
-		start += ret;
-		len -= ret;
+		data.pos += ret;
+		data.len -= ret;
 	}
 
 	if (ctx.prev.type != IOMAP_HOLE) {
@@ -110,13 +116,14 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 EXPORT_SYMBOL_GPL(iomap_fiemap);
 
 static loff_t
-iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_bmap_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		 struct iomap *srcmap)
 {
-	sector_t *bno = data, addr;
+	sector_t *bno = data->priv, addr;
 
 	if (iomap->type == IOMAP_MAPPED) {
-		addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
+		addr = (data->pos - iomap->offset + iomap->addr) >>
+				data->inode->i_blkbits;
 		if (addr > INT_MAX)
 			WARN(1, "would truncate bmap result\n");
 		else
@@ -131,16 +138,19 @@ iomap_bmap(struct address_space *mapping, sector_t bno,
 		const struct iomap_ops *ops)
 {
 	struct inode *inode = mapping->host;
-	loff_t pos = bno << inode->i_blkbits;
-	unsigned blocksize = i_blocksize(inode);
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= bno << inode->i_blkbits,
+		.len	= i_blocksize(inode),
+		.priv	= &bno
+	};
 	int ret;
 
 	if (filemap_write_and_wait(mapping))
 		return 0;
 
 	bno = 0;
-	ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno,
-			  iomap_bmap_actor);
+	ret = iomap_apply(&data, ops, iomap_bmap_actor);
 	if (ret)
 		return 0;
 	return bno;
diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index 89f61d93c0bc..5501be02557a 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -118,21 +118,23 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
 
 
 static loff_t
-iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
-		      void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_seek_hole_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		      struct iomap *srcmap)
 {
+	loff_t offset = data->pos;
+
 	switch (iomap->type) {
 	case IOMAP_UNWRITTEN:
-		offset = page_cache_seek_hole_data(inode, offset, length,
-						   SEEK_HOLE);
+		offset = page_cache_seek_hole_data(data->inode, offset,
+						   data->len, SEEK_HOLE);
 		if (offset < 0)
-			return length;
+			return data->len;
 		/* fall through */
 	case IOMAP_HOLE:
-		*(loff_t *)data = offset;
+		*(loff_t *)data->priv = offset;
 		return 0;
 	default:
-		return length;
+		return data->len;
 	}
 }
 
@@ -140,23 +142,28 @@ loff_t
 iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
 {
 	loff_t size = i_size_read(inode);
-	loff_t length = size - offset;
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.len	= size - offset,
+		.priv	= &offset,
+		.flags	= IOMAP_REPORT
+	};
 	loff_t ret;
 
 	/* Nothing to be found before or beyond the end of the file. */
 	if (offset < 0 || offset >= size)
 		return -ENXIO;
 
-	while (length > 0) {
-		ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops,
-				  &offset, iomap_seek_hole_actor);
+	while (data.len > 0) {
+		data.pos = offset;
+		ret = iomap_apply(&data, ops, iomap_seek_hole_actor);
 		if (ret < 0)
 			return ret;
 		if (ret == 0)
 			break;
 
 		offset += ret;
-		length -= ret;
+		data.len -= ret;
 	}
 
 	return offset;
@@ -164,20 +171,22 @@ iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
 EXPORT_SYMBOL_GPL(iomap_seek_hole);
 
 static loff_t
-iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
-		      void *data, struct iomap *iomap, struct iomap *srcmap)
+iomap_seek_data_actor(const struct iomap_ctx *data, struct iomap *iomap,
+		      struct iomap *srcmap)
 {
+	loff_t offset = data->pos;
+
 	switch (iomap->type) {
 	case IOMAP_HOLE:
-		return length;
+		return data->len;
 	case IOMAP_UNWRITTEN:
-		offset = page_cache_seek_hole_data(inode, offset, length,
-						   SEEK_DATA);
+		offset = page_cache_seek_hole_data(data->inode, offset,
+						   data->len, SEEK_DATA);
 		if (offset < 0)
-			return length;
+			return data->len;
 		/*FALLTHRU*/
 	default:
-		*(loff_t *)data = offset;
+		*(loff_t *)data->priv = offset;
 		return 0;
 	}
 }
@@ -186,26 +195,31 @@ loff_t
 iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
 {
 	loff_t size = i_size_read(inode);
-	loff_t length = size - offset;
+	struct iomap_ctx data = {
+		.inode  = inode,
+		.len    = size - offset,
+		.priv   = &offset,
+		.flags  = IOMAP_REPORT
+	};
 	loff_t ret;
 
 	/* Nothing to be found before or beyond the end of the file. */
 	if (offset < 0 || offset >= size)
 		return -ENXIO;
 
-	while (length > 0) {
-		ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops,
-				  &offset, iomap_seek_data_actor);
+	while (data.len > 0) {
+		data.pos = offset;
+		ret = iomap_apply(&data, ops, iomap_seek_data_actor);
 		if (ret < 0)
 			return ret;
 		if (ret == 0)
 			break;
 
 		offset += ret;
-		length -= ret;
+		data.len -= ret;
 	}
 
-	if (length <= 0)
+	if (data.len <= 0)
 		return -ENXIO;
 	return offset;
 }
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index a648dbf6991e..aae04b40a3b7 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -75,11 +75,10 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
  * swap only cares about contiguous page-aligned physical extents and makes no
  * distinction between written and unwritten extents.
  */
-static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
-		loff_t count, void *data, struct iomap *iomap,
-		struct iomap *srcmap)
+static loff_t iomap_swapfile_activate_actor(const struct iomap_ctx *data,
+		struct iomap *iomap, struct iomap *srcmap)
 {
-	struct iomap_swapfile_info *isi = data;
+	struct iomap_swapfile_info *isi = data->priv;
 	int error;
 
 	switch (iomap->type) {
@@ -125,7 +124,7 @@ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
 			return error;
 		memcpy(&isi->iomap, iomap, sizeof(isi->iomap));
 	}
-	return count;
+	return data->len;
 }
 
 /*
@@ -142,8 +141,13 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
 	};
 	struct address_space *mapping = swap_file->f_mapping;
 	struct inode *inode = mapping->host;
-	loff_t pos = 0;
-	loff_t len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE);
+	struct iomap_ctx data = {
+		.inode	= inode,
+		.pos	= 0,
+		.len 	= ALIGN_DOWN(i_size_read(inode), PAGE_SIZE),
+		.priv	= &isi,
+		.flags	= IOMAP_REPORT
+	};
 	loff_t ret;
 
 	/*
@@ -154,14 +158,13 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
 	if (ret)
 		return ret;
 
-	while (len > 0) {
-		ret = iomap_apply(inode, pos, len, IOMAP_REPORT,
-				ops, &isi, iomap_swapfile_activate_actor);
+	while (data.len > 0) {
+		ret = iomap_apply(&data, ops, iomap_swapfile_activate_actor);
 		if (ret <= 0)
 			return ret;
 
-		pos += ret;
-		len -= ret;
+		data.pos += ret;
+		data.len -= ret;
 	}
 
 	if (isi.iomap.length) {
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b09463dae0d..00e439aac8ea 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -145,11 +145,18 @@ struct iomap_ops {
 /*
  * Main iomap iterator function.
  */
-typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
-		void *data, struct iomap *iomap, struct iomap *srcmap);
+struct iomap_ctx {
+	struct inode	*inode;
+	loff_t		pos;
+	loff_t		len;
+	void		*priv;
+	unsigned	flags;
+};
+
+typedef loff_t (*iomap_actor_t)(const struct iomap_ctx *data,
+		struct iomap *iomap, struct iomap *srcmap);
 
-loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
-		unsigned flags, const struct iomap_ops *ops, void *data,
+loff_t iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops,
 		iomap_actor_t actor);
 
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes
  2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
                   ` (3 preceding siblings ...)
  2019-12-17 14:39 ` [PATCH 4/6] iomap: add struct iomap_ctx Jens Axboe
@ 2019-12-17 14:39 ` Jens Axboe
  2019-12-18  1:52   ` Darrick J. Wong
  2019-12-17 14:39 ` [PATCH 6/6] xfs: don't do delayed allocations for uncached " Jens Axboe
  5 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david, Jens Axboe

This adds support for RWF_UNCACHED for file systems using iomap to
perform buffered writes. We use the generic infrastructure for this,
by tracking pages we created and calling write_drop_cached_pages()
to issue writeback and prune those pages.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/iomap/apply.c       | 35 +++++++++++++++++++++++++++++++++++
 fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
 fs/iomap/trace.h       |  4 +++-
 include/linux/iomap.h  |  5 +++++
 4 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index 792079403a22..687e86945b27 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -92,5 +92,40 @@ iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops,
 				     data->flags, &iomap);
 	}
 
+	if (written <= 0)
+		goto out;
+
+	/*
+	 * If this is an uncached write, then we need to write and sync this
+	 * range of data. This is only true for a buffered write, not for
+	 * O_DIRECT.
+	 */
+	if ((data->flags & (IOMAP_WRITE|IOMAP_DIRECT|IOMAP_UNCACHED)) ==
+			(IOMAP_WRITE|IOMAP_UNCACHED)) {
+		struct address_space *mapping = data->inode->i_mapping;
+
+		end = data->pos + written;
+		ret = filemap_write_and_wait_range(mapping, data->pos, end);
+		if (ret)
+			goto out;
+
+		/*
+		 * No pages were created for this range, we're done. We only
+		 * invalidate the range if no pages were created for the
+		 * entire range.
+		 */
+		if (!(iomap.flags & IOMAP_F_PAGE_CREATE))
+			goto out;
+
+		/*
+		 * Try to invalidate cache pages for the range we just wrote.
+		 * We don't care if invalidation fails as the write has still
+		 * worked and leaving clean uptodate pages in the page cache
+		 * isn't a corruption vector for uncached IO.
+		 */
+		invalidate_inode_pages2_range(mapping,
+				data->pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+	}
+out:
 	return written ? written : ret;
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7f8300bce767..328afeba950f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -582,6 +582,7 @@ EXPORT_SYMBOL_GPL(iomap_migrate_page);
 
 enum {
 	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
+	IOMAP_WRITE_F_UNCACHED		= (1 << 1),
 };
 
 static void
@@ -659,6 +660,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
+	unsigned aop_flags;
 	struct page *page;
 	int status = 0;
 
@@ -675,8 +677,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 			return status;
 	}
 
+	aop_flags = AOP_FLAG_NOFS;
+	if (flags & IOMAP_WRITE_F_UNCACHED)
+		aop_flags |= AOP_FLAG_UNCACHED;
 	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
-			AOP_FLAG_NOFS);
+						aop_flags);
 	if (!page) {
 		status = -ENOMEM;
 		goto out_no_page;
@@ -820,9 +825,13 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
 	struct iov_iter *i = data->priv;
 	loff_t length = data->len;
 	loff_t pos = data->pos;
+	unsigned flags = 0;
 	long status = 0;
 	ssize_t written = 0;
 
+	if (data->flags & IOMAP_UNCACHED)
+		flags |= IOMAP_WRITE_F_UNCACHED;
+
 	do {
 		struct page *page;
 		unsigned long offset;	/* Offset into pagecache page */
@@ -851,10 +860,18 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
-				srcmap);
-		if (unlikely(status))
+retry:
+		status = iomap_write_begin(inode, pos, bytes, flags,
+						&page, iomap, srcmap);
+		if (unlikely(status)) {
+			if (status == -ENOMEM &&
+			    (flags & IOMAP_WRITE_F_UNCACHED)) {
+				iomap->flags |= IOMAP_F_PAGE_CREATE;
+				flags &= ~IOMAP_WRITE_F_UNCACHED;
+				goto retry;
+			}
 			break;
+		}
 
 		if (mapping_writably_mapped(inode->i_mapping))
 			flush_dcache_page(page);
@@ -907,6 +924,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 	};
 	loff_t ret = 0, written = 0;
 
+	if (iocb->ki_flags & IOCB_UNCACHED)
+		data.flags |= IOMAP_UNCACHED;
+
 	while (iov_iter_count(iter)) {
 		data.len = iov_iter_count(iter);
 		ret = iomap_apply(&data, ops, iomap_write_actor);
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 6dc227b8c47e..63c771e3eef5 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -93,7 +93,8 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage);
 	{ IOMAP_REPORT,		"REPORT" }, \
 	{ IOMAP_FAULT,		"FAULT" }, \
 	{ IOMAP_DIRECT,		"DIRECT" }, \
-	{ IOMAP_NOWAIT,		"NOWAIT" }
+	{ IOMAP_NOWAIT,		"NOWAIT" }, \
+	{ IOMAP_UNCACHED,	"UNCACHED" }
 
 #define IOMAP_F_FLAGS_STRINGS \
 	{ IOMAP_F_NEW,		"NEW" }, \
@@ -101,6 +102,7 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage);
 	{ IOMAP_F_SHARED,	"SHARED" }, \
 	{ IOMAP_F_MERGED,	"MERGED" }, \
 	{ IOMAP_F_BUFFER_HEAD,	"BH" }, \
+	{ IOMAP_F_PAGE_CREATE,	"PAGE_CREATE" }, \
 	{ IOMAP_F_SIZE_CHANGED,	"SIZE_CHANGED" }
 
 DECLARE_EVENT_CLASS(iomap_class,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 00e439aac8ea..58311b6fdfdd 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -48,12 +48,16 @@ struct vm_fault;
  *
  * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
  * buffer heads for this mapping.
+ *
+ * IOMAP_F_PAGE_CREATE indicates that pages had to be allocated to satisfy
+ * this operation.
  */
 #define IOMAP_F_NEW		0x01
 #define IOMAP_F_DIRTY		0x02
 #define IOMAP_F_SHARED		0x04
 #define IOMAP_F_MERGED		0x08
 #define IOMAP_F_BUFFER_HEAD	0x10
+#define IOMAP_F_PAGE_CREATE	0x20
 
 /*
  * Flags set by the core iomap code during operations:
@@ -121,6 +125,7 @@ struct iomap_page_ops {
 #define IOMAP_FAULT		(1 << 3) /* mapping for page fault */
 #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
 #define IOMAP_NOWAIT		(1 << 5) /* do not block */
+#define IOMAP_UNCACHED		(1 << 6) /* uncached IO */
 
 struct iomap_ops {
 	/*
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] xfs: don't do delayed allocations for uncached buffered writes
  2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
                   ` (4 preceding siblings ...)
  2019-12-17 14:39 ` [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
@ 2019-12-17 14:39 ` Jens Axboe
  2019-12-18  1:57   ` Darrick J. Wong
  5 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 14:39 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david, Jens Axboe

This data is going to be written immediately, so don't bother trying
to do delayed allocation for it.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/xfs/xfs_iomap.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 28e2d1f37267..d0cd4a05d59f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -847,8 +847,11 @@ xfs_buffered_write_iomap_begin(
 	int			allocfork = XFS_DATA_FORK;
 	int			error = 0;
 
-	/* we can't use delayed allocations when using extent size hints */
-	if (xfs_get_extsz_hint(ip))
+	/*
+	 * Don't do delayed allocations when using extent size hints, or
+	 * if we were asked to do uncached buffered writes.
+	 */
+	if (xfs_get_extsz_hint(ip) || (flags & IOMAP_UNCACHED))
 		return xfs_direct_write_iomap_begin(inode, offset, count,
 				flags, iomap, srcmap);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fs: add read support for RWF_UNCACHED
  2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
@ 2019-12-17 15:16   ` Guoqing Jiang
  2019-12-17 16:42     ` Jens Axboe
  2019-12-17 15:57   ` Matthew Wilcox
  2019-12-18  3:17   ` Dave Chinner
  2 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2019-12-17 15:16 UTC (permalink / raw)
  To: Jens Axboe, linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david


On 12/17/19 3:39 PM, Jens Axboe wrote:
> If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private
> pages for the buffered reads. These pages will never be inserted into
> the page cache, and they are simply droped when we have done the copy at
> the end of IO.
>
> If pages in the read range are already in the page cache, then use those
> for just copying the data instead of starting IO on private pages.
>
> A previous solution used the page cache even for non-cached ranges, but
> the cost of doing so was too high. Removing nodes at the end is
> expensive, even with LRU bypass. On top of that, repeatedly
> instantiating new xarray nodes is very costly, as it needs to memset 576
> bytes of data, and freeing said nodes involve an RCU call per node as
> well. All that adds up, making uncached somewhat slower than O_DIRECT.
>
> With the current*solition*, we're basically at O_DIRECT levels of

Maybe it is 'solution' here.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fs: add read support for RWF_UNCACHED
  2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
  2019-12-17 15:16   ` Guoqing Jiang
@ 2019-12-17 15:57   ` Matthew Wilcox
  2019-12-17 16:41     ` Jens Axboe
  2019-12-18  3:17   ` Dave Chinner
  2 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2019-12-17 15:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-mm, linux-fsdevel, linux-block, clm, torvalds, david

On Tue, Dec 17, 2019 at 07:39:43AM -0700, Jens Axboe wrote:
> +static void buffered_put_page(struct page *page, bool clear_mapping)
> +{
> +	if (clear_mapping)
> +		page->mapping = NULL;
> +	put_page(page);
> +}

I'm not a huge fan of the variable name 'clear_mapping'.  It describes
what it does rather than why it does it.  So maybe 'drop_immediate'?
Or 'uncached'?

>  		if (!page) {
>  			if (iocb->ki_flags & IOCB_NOWAIT)
>  				goto would_block;
> +			/* UNCACHED implies no read-ahead */
> +			if (iocb->ki_flags & IOCB_UNCACHED) {
> +				did_dio_begin = true;
> +				/* block truncate for UNCACHED reads */
> +				inode_dio_begin(inode);

I think this needs to be:

				if (!did_dio_begin)
					inode_dio_begin(inode);
				did_dio_begin = true;

otherwise inode->i_dio_count is going to be increased once per uncached
page.  Do you have a test in your test-suite that does I/O to more than
one page at a time?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fs: add read support for RWF_UNCACHED
  2019-12-17 15:57   ` Matthew Wilcox
@ 2019-12-17 16:41     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 16:41 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, linux-block, clm, torvalds, david

On 12/17/19 8:57 AM, Matthew Wilcox wrote:
> On Tue, Dec 17, 2019 at 07:39:43AM -0700, Jens Axboe wrote:
>> +static void buffered_put_page(struct page *page, bool clear_mapping)
>> +{
>> +	if (clear_mapping)
>> +		page->mapping = NULL;
>> +	put_page(page);
>> +}
> 
> I'm not a huge fan of the variable name 'clear_mapping'.  It describes
> what it does rather than why it does it.  So maybe 'drop_immediate'?
> Or 'uncached'?

I do like 'uncached' a lot better, I've made that change.

> I think this needs to be:
> 
> 				if (!did_dio_begin)
> 					inode_dio_begin(inode);
> 				did_dio_begin = true;
> 
> otherwise inode->i_dio_count is going to be increased once per uncached
> page.  Do you have a test in your test-suite that does I/O to more than
> one page at a time?

Good catch! Yes it does, I have fixed that up. Thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fs: add read support for RWF_UNCACHED
  2019-12-17 15:16   ` Guoqing Jiang
@ 2019-12-17 16:42     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2019-12-17 16:42 UTC (permalink / raw)
  To: Guoqing Jiang, linux-mm, linux-fsdevel, linux-block
  Cc: willy, clm, torvalds, david

On 12/17/19 8:16 AM, Guoqing Jiang wrote:
> 
> On 12/17/19 3:39 PM, Jens Axboe wrote:
>> If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private
>> pages for the buffered reads. These pages will never be inserted into
>> the page cache, and they are simply droped when we have done the copy at
>> the end of IO.
>>
>> If pages in the read range are already in the page cache, then use those
>> for just copying the data instead of starting IO on private pages.
>>
>> A previous solution used the page cache even for non-cached ranges, but
>> the cost of doing so was too high. Removing nodes at the end is
>> expensive, even with LRU bypass. On top of that, repeatedly
>> instantiating new xarray nodes is very costly, as it needs to memset 576
>> bytes of data, and freeing said nodes involve an RCU call per node as
>> well. All that adds up, making uncached somewhat slower than O_DIRECT.
>>
>> With the current*solition*, we're basically at O_DIRECT levels of
> 
> Maybe it is 'solution' here.

Indeed, fixed up, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] iomap: add struct iomap_ctx
  2019-12-17 14:39 ` [PATCH 4/6] iomap: add struct iomap_ctx Jens Axboe
@ 2019-12-17 19:39   ` Linus Torvalds
  2019-12-17 20:26     ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2019-12-17 19:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linux-MM, linux-fsdevel, linux-block, Matthew Wilcox,
	Chris Mason, Dave Chinner

On Tue, Dec 17, 2019 at 6:40 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> We pass a lot of arguments to iomap_apply(), and subsequently to the
> actors that it calls. In preparation for adding one more argument,
> switch them to using a struct iomap_ctx instead. The actor gets a const
> version of that, they are not supposed to change anything in it.

Looks generally like what I expected, but when looking at the patch I
notice that the type of 'len' is crazy and wrong.

It was wrong before too, though:

> -dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,

'loff_t length' is not right.

> +       loff_t pos = data->pos;
> +       loff_t length = pos + data->len;

And WTH is that? "pos + data->len" is not "length", that's end. And this:

>         loff_t end = pos + length, done = 0;

What? Now 'end' is 'pos+length', which is 'pos+pos+data->len'.

WHAA?

> @@ -1197,22 +1200,26 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
>  {
> +       loff_t ret = 0, done = 0;

More insanity. "ret" shouldn't be loff_t.

dax_iomap_rw() returns a ssize_t.

> +       loff_t count = data->len;

More of this crazy things.

>  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,

This was wrong before.

> +struct iomap_ctx {
> +       struct inode    *inode;
> +       loff_t          pos;
> +       loff_t          len;
> +       void            *priv;
> +       unsigned        flags;
> +};

Please make 'len' be 'size_t' or something.

If you're on a 32-bit architecture, you shouldn't be writing more than
4GB in a go anyway.

Is there some reason for this horrible case of "let's allow 64-bit sizes?"

Because even if there is, it shouldn't be "loff_t". That's an
_offset_. Not a length.

            Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] iomap: add struct iomap_ctx
  2019-12-17 19:39   ` Linus Torvalds
@ 2019-12-17 20:26     ` Linus Torvalds
  2019-12-18  0:15       ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2019-12-17 20:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linux-MM, linux-fsdevel, linux-block, Matthew Wilcox,
	Chris Mason, Dave Chinner

On Tue, Dec 17, 2019 at 11:39 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> 'loff_t length' is not right.

Looking around, it does seem to get used that way. Too much, though.

> > +       loff_t pos = data->pos;
> > +       loff_t length = pos + data->len;
>
> And WTH is that? "pos + data->len" is not "length", that's end. And this:
>
> >         loff_t end = pos + length, done = 0;
>
> What? Now 'end' is 'pos+length', which is 'pos+pos+data->len'.

But this is unrelated to the crazy types. That just can't bve right.

> Is there some reason for this horrible case of "let's allow 64-bit sizes?"
>
> Because even if there is, it shouldn't be "loff_t". That's an
> _offset_. Not a length.

We do seem to have a lot of these across filesystems. And a lot of
confusion. Most of the IO reoutines clearly take or return a size_t
(returning ssize_t) as the IO size. And then you have the
zeroing/truncation stuff that tends to take loff_t. Which still smells
wrong, and s64 would look like a better case, but whatever.

The "iomap_zero_range() for truncate" case really does seem to need a
64-bit value, because people do the difference of two loff_t's for it.
In fact, it almost looks like that function should take a "start ,
end" pair, which would make loff_t be the _right_ thing.

Because "length" really is just (a positive) size_t normally.

                Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] iomap: add struct iomap_ctx
  2019-12-17 20:26     ` Linus Torvalds
@ 2019-12-18  0:15       ` Jens Axboe
  2019-12-18  1:25         ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2019-12-18  0:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux-MM, linux-fsdevel, linux-block, Matthew Wilcox,
	Chris Mason, Dave Chinner

On 12/17/19 1:26 PM, Linus Torvalds wrote:
> On Tue, Dec 17, 2019 at 11:39 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> 'loff_t length' is not right.
> 
> Looking around, it does seem to get used that way. Too much, though.
> 
>>> +       loff_t pos = data->pos;
>>> +       loff_t length = pos + data->len;
>>
>> And WTH is that? "pos + data->len" is not "length", that's end. And this:
>>
>>>         loff_t end = pos + length, done = 0;
>>
>> What? Now 'end' is 'pos+length', which is 'pos+pos+data->len'.
> 
> But this is unrelated to the crazy types. That just can't bve right.

Yeah, I fixed that one up, that was my error.

>> Is there some reason for this horrible case of "let's allow 64-bit sizes?"
>>
>> Because even if there is, it shouldn't be "loff_t". That's an
>> _offset_. Not a length.
> 
> We do seem to have a lot of these across filesystems. And a lot of
> confusion. Most of the IO reoutines clearly take or return a size_t
> (returning ssize_t) as the IO size. And then you have the
> zeroing/truncation stuff that tends to take loff_t. Which still smells
> wrong, and s64 would look like a better case, but whatever.
> 
> The "iomap_zero_range() for truncate" case really does seem to need a
> 64-bit value, because people do the difference of two loff_t's for it.
> In fact, it almost looks like that function should take a "start ,
> end" pair, which would make loff_t be the _right_ thing.
> 
> Because "length" really is just (a positive) size_t normally.

Honestly, I'd much rather leave the loff_t -> size_t/ssize_t to
Darrick/Dave, it's really outside the scope of this patch, and I'd
prefer not to have to muck with it. They probably feel the same way!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] iomap: add struct iomap_ctx
  2019-12-18  0:15       ` Jens Axboe
@ 2019-12-18  1:25         ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-12-18  1:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Linux-MM, linux-fsdevel, linux-block,
	Matthew Wilcox, Chris Mason, Dave Chinner, Christoph Hellwig

On Tue, Dec 17, 2019 at 05:15:46PM -0700, Jens Axboe wrote:
> On 12/17/19 1:26 PM, Linus Torvalds wrote:
> > On Tue, Dec 17, 2019 at 11:39 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> 'loff_t length' is not right.
> > 
> > Looking around, it does seem to get used that way. Too much, though.
> > 
> >>> +       loff_t pos = data->pos;
> >>> +       loff_t length = pos + data->len;
> >>
> >> And WTH is that? "pos + data->len" is not "length", that's end. And this:
> >>
> >>>         loff_t end = pos + length, done = 0;
> >>
> >> What? Now 'end' is 'pos+length', which is 'pos+pos+data->len'.
> > 
> > But this is unrelated to the crazy types. That just can't bve right.
> 
> Yeah, I fixed that one up, that was my error.
> 
> >> Is there some reason for this horrible case of "let's allow 64-bit sizes?"
> >>
> >> Because even if there is, it shouldn't be "loff_t". That's an
> >> _offset_. Not a length.
> > 
> > We do seem to have a lot of these across filesystems. And a lot of
> > confusion. Most of the IO reoutines clearly take or return a size_t
> > (returning ssize_t) as the IO size. And then you have the
> > zeroing/truncation stuff that tends to take loff_t. Which still smells
> > wrong, and s64 would look like a better case, but whatever.
> > 
> > The "iomap_zero_range() for truncate" case really does seem to need a
> > 64-bit value, because people do the difference of two loff_t's for it.
> > In fact, it almost looks like that function should take a "start ,
> > end" pair, which would make loff_t be the _right_ thing.

Yeah.  "loff_t length" always struck me as a little odd, but until now I
hadn't heard enough complaining about it to put any effort into fixing
the iomap_apply code that (afaict) mostly worked ok.  But it shouldn't
be a difficult change.

> > Because "length" really is just (a positive) size_t normally.

However, I don't think it's a good idea to reduce the @length argument
to size_t (and the iomap_apply return value to ssize_t) because they're
32-bit values and doing that will force iomap to clamp lengths and
return values to S32_MAX.  Instituting a ~2G max on read and write calls
is fine because those operate directly on file data (== slow), but the
vfs already clamps the length before the iov gets to iomap.

For the other iomap users that care more about the mappings and less
about the data in those mappings (seek hole, seek data, fiemap, swap) it
doesn't make much sense.  If the filesystem can send back a 100GB extent
map (e.g. holes in a sparse file, or we just have superstar allocation
strategies), the fs should send that straight to the iomap actor
function without having to cut that into 50x loop iterations.  Looking
ahead to things like file mapping leases, a (formerly wealthy) client
should be able to request a mmap lease on 100GB worth of pmem and get
the whole lease if the fs can allocate 100G at once.

I like the idea of making the length parameter and the return value
int64_t instead of loff_t.  Is int64_t the preferred typedef or s64?  I
forget.

> Honestly, I'd much rather leave the loff_t -> size_t/ssize_t to
> Darrick/Dave, it's really outside the scope of this patch, and I'd
> prefer not to have to muck with it. They probably feel the same way!

Don't forget Christoph.  Heh, we /did/ forget Christoph. :(
Maybe they have better historical context since they invented this iomap
mechanism for pnfs or something long before I came along.

--D

> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes
  2019-12-17 14:39 ` [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
@ 2019-12-18  1:52   ` Darrick J. Wong
  2019-12-18  3:18     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-12-18  1:52 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-mm, linux-fsdevel, linux-block, willy, clm, torvalds, david

On Tue, Dec 17, 2019 at 07:39:47AM -0700, Jens Axboe wrote:
> This adds support for RWF_UNCACHED for file systems using iomap to
> perform buffered writes. We use the generic infrastructure for this,
> by tracking pages we created and calling write_drop_cached_pages()
> to issue writeback and prune those pages.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/iomap/apply.c       | 35 +++++++++++++++++++++++++++++++++++
>  fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
>  fs/iomap/trace.h       |  4 +++-
>  include/linux/iomap.h  |  5 +++++
>  4 files changed, 67 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 792079403a22..687e86945b27 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -92,5 +92,40 @@ iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops,
>  				     data->flags, &iomap);
>  	}
>  
> +	if (written <= 0)
> +		goto out;
> +
> +	/*
> +	 * If this is an uncached write, then we need to write and sync this
> +	 * range of data. This is only true for a buffered write, not for
> +	 * O_DIRECT.
> +	 */

I tracked down the original conversation, where Dave had this to say:

"Hence, IMO, this is the wrong layer in iomap to be dealing with¬
writeback and cache residency for uncached IO. We should be caching¬
residency/invalidation at a per-IO level, not a per-page level."

He's right, but I still think it doesn't quite smell right to be putting
this in iomap_apply, since that's a generic function that implements
iteration and shouldn't be messing with cache invalidation.

So I have two possible suggestions for where to put this:

(1) Add the "flush and maybe invalidate" behavior to the bottom of
iomap_write_actor like I said in the v4 patchset.  That will issue
writeback and invalidate pagecache in smallish quantities.

(2) Find a way to pass the IOMAP_F_PAGE_CREATE state from
iomap_write_actor back to iomap_file_buffered_write and do the
flush-and-invalidate for the entire write request once at the end.

> +	if ((data->flags & (IOMAP_WRITE|IOMAP_DIRECT|IOMAP_UNCACHED)) ==
> +			(IOMAP_WRITE|IOMAP_UNCACHED)) {
> +		struct address_space *mapping = data->inode->i_mapping;
> +
> +		end = data->pos + written;
> +		ret = filemap_write_and_wait_range(mapping, data->pos, end);
> +		if (ret)
> +			goto out;
> +
> +		/*
> +		 * No pages were created for this range, we're done. We only
> +		 * invalidate the range if no pages were created for the
> +		 * entire range.
> +		 */
> +		if (!(iomap.flags & IOMAP_F_PAGE_CREATE))
> +			goto out;
> +
> +		/*
> +		 * Try to invalidate cache pages for the range we just wrote.
> +		 * We don't care if invalidation fails as the write has still
> +		 * worked and leaving clean uptodate pages in the page cache
> +		 * isn't a corruption vector for uncached IO.
> +		 */
> +		invalidate_inode_pages2_range(mapping,
> +				data->pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> +	}
> +out:
>  	return written ? written : ret;
>  }
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7f8300bce767..328afeba950f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -582,6 +582,7 @@ EXPORT_SYMBOL_GPL(iomap_migrate_page);
>  
>  enum {
>  	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
> +	IOMAP_WRITE_F_UNCACHED		= (1 << 1),
>  };
>  
>  static void
> @@ -659,6 +660,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
> +	unsigned aop_flags;
>  	struct page *page;
>  	int status = 0;
>  
> @@ -675,8 +677,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  			return status;
>  	}
>  
> +	aop_flags = AOP_FLAG_NOFS;
> +	if (flags & IOMAP_WRITE_F_UNCACHED)
> +		aop_flags |= AOP_FLAG_UNCACHED;
>  	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
> -			AOP_FLAG_NOFS);
> +						aop_flags);
>  	if (!page) {
>  		status = -ENOMEM;
>  		goto out_no_page;
> @@ -820,9 +825,13 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
>  	struct iov_iter *i = data->priv;
>  	loff_t length = data->len;
>  	loff_t pos = data->pos;
> +	unsigned flags = 0;
>  	long status = 0;
>  	ssize_t written = 0;
>  
> +	if (data->flags & IOMAP_UNCACHED)
> +		flags |= IOMAP_WRITE_F_UNCACHED;
> +
>  	do {
>  		struct page *page;
>  		unsigned long offset;	/* Offset into pagecache page */
> @@ -851,10 +860,18 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
>  			break;
>  		}
>  
> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> -				srcmap);
> -		if (unlikely(status))
> +retry:
> +		status = iomap_write_begin(inode, pos, bytes, flags,
> +						&page, iomap, srcmap);
> +		if (unlikely(status)) {
> +			if (status == -ENOMEM &&
> +			    (flags & IOMAP_WRITE_F_UNCACHED)) {
> +				iomap->flags |= IOMAP_F_PAGE_CREATE;
> +				flags &= ~IOMAP_WRITE_F_UNCACHED;

What's the strategy here?  We couldn't get a page for an uncached write,
so try again as a regular cached write?

Thanks for making the updates, it's looking better.

--D

> +				goto retry;
> +			}
>  			break;
> +		}
>  
>  		if (mapping_writably_mapped(inode->i_mapping))
>  			flush_dcache_page(page);
> @@ -907,6 +924,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
>  	};
>  	loff_t ret = 0, written = 0;
>  
> +	if (iocb->ki_flags & IOCB_UNCACHED)
> +		data.flags |= IOMAP_UNCACHED;
> +
>  	while (iov_iter_count(iter)) {
>  		data.len = iov_iter_count(iter);
>  		ret = iomap_apply(&data, ops, iomap_write_actor);
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 6dc227b8c47e..63c771e3eef5 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -93,7 +93,8 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage);
>  	{ IOMAP_REPORT,		"REPORT" }, \
>  	{ IOMAP_FAULT,		"FAULT" }, \
>  	{ IOMAP_DIRECT,		"DIRECT" }, \
> -	{ IOMAP_NOWAIT,		"NOWAIT" }
> +	{ IOMAP_NOWAIT,		"NOWAIT" }, \
> +	{ IOMAP_UNCACHED,	"UNCACHED" }
>  
>  #define IOMAP_F_FLAGS_STRINGS \
>  	{ IOMAP_F_NEW,		"NEW" }, \
> @@ -101,6 +102,7 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage);
>  	{ IOMAP_F_SHARED,	"SHARED" }, \
>  	{ IOMAP_F_MERGED,	"MERGED" }, \
>  	{ IOMAP_F_BUFFER_HEAD,	"BH" }, \
> +	{ IOMAP_F_PAGE_CREATE,	"PAGE_CREATE" }, \
>  	{ IOMAP_F_SIZE_CHANGED,	"SIZE_CHANGED" }
>  
>  DECLARE_EVENT_CLASS(iomap_class,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 00e439aac8ea..58311b6fdfdd 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -48,12 +48,16 @@ struct vm_fault;
>   *
>   * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
>   * buffer heads for this mapping.
> + *
> + * IOMAP_F_PAGE_CREATE indicates that pages had to be allocated to satisfy
> + * this operation.
>   */
>  #define IOMAP_F_NEW		0x01
>  #define IOMAP_F_DIRTY		0x02
>  #define IOMAP_F_SHARED		0x04
>  #define IOMAP_F_MERGED		0x08
>  #define IOMAP_F_BUFFER_HEAD	0x10
> +#define IOMAP_F_PAGE_CREATE	0x20
>  
>  /*
>   * Flags set by the core iomap code during operations:
> @@ -121,6 +125,7 @@ struct iomap_page_ops {
>  #define IOMAP_FAULT		(1 << 3) /* mapping for page fault */
>  #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
>  #define IOMAP_NOWAIT		(1 << 5) /* do not block */
> +#define IOMAP_UNCACHED		(1 << 6) /* uncached IO */
>  
>  struct iomap_ops {
>  	/*
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: don't do delayed allocations for uncached buffered writes
  2019-12-17 14:39 ` [PATCH 6/6] xfs: don't do delayed allocations for uncached " Jens Axboe
@ 2019-12-18  1:57   ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-12-18  1:57 UTC (permalink / raw)
  To: Jens Axboe, '
  Cc: linux-mm, linux-fsdevel, linux-block, willy, clm, torvalds, david

On Tue, Dec 17, 2019 at 07:39:48AM -0700, Jens Axboe wrote:
> This data is going to be written immediately, so don't bother trying
> to do delayed allocation for it.
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks reasonable once all the previous patches go in,

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 28e2d1f37267..d0cd4a05d59f 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -847,8 +847,11 @@ xfs_buffered_write_iomap_begin(
>  	int			allocfork = XFS_DATA_FORK;
>  	int			error = 0;
>  
> -	/* we can't use delayed allocations when using extent size hints */
> -	if (xfs_get_extsz_hint(ip))
> +	/*
> +	 * Don't do delayed allocations when using extent size hints, or
> +	 * if we were asked to do uncached buffered writes.
> +	 */
> +	if (xfs_get_extsz_hint(ip) || (flags & IOMAP_UNCACHED))
>  		return xfs_direct_write_iomap_begin(inode, offset, count,
>  				flags, iomap, srcmap);
>  
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fs: add read support for RWF_UNCACHED
  2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
  2019-12-17 15:16   ` Guoqing Jiang
  2019-12-17 15:57   ` Matthew Wilcox
@ 2019-12-18  3:17   ` Dave Chinner
  2 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2019-12-18  3:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-mm, linux-fsdevel, linux-block, willy, clm, torvalds

On Tue, Dec 17, 2019 at 07:39:43AM -0700, Jens Axboe wrote:
> If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private
> pages for the buffered reads. These pages will never be inserted into
> the page cache, and they are simply droped when we have done the copy at
> the end of IO.

Ok, so unlike the uncached write case, this /isn't/ coherent with
the page cache. IOWs, it can race with other reads and page faults
and mmap can change the data it has faulted into the page cache and
write it back before the RWF_UNCACHED read completes, resulting
in RWF_UNCACHED potential returning torn data.

And it's not coherent with uncached writes, either, if the
filesystem does not provide it's own serialisation between buffered
reads and writes. i.e. simple/legacy filesystems just use the page
lock to serialise buffered reads against buffered writes, while
buffered writes are serialised against each other via the
inode_lock() in generic_file_write_iter().

Further, the use of inode_dio_wait() for truncate serialisation is
optional for filesystems - it's not a generic mechanism. Filesystems
that only support buffered IO only use page locks to provide
truncate serialisation and don't call inode_dio_wait() in their
->setattr methods. Hence to serialise truncates against uncached
reads in such filesystems the uncached read needs to be page cache
coherent.

As I said previously: I don't think this is a viable approach
because it has page cache coherency issues that are just as bad, if
not worse, than direct IO.

> If pages in the read range are already in the page cache, then use those
> for just copying the data instead of starting IO on private pages.
> 
> A previous solution used the page cache even for non-cached ranges, but
> the cost of doing so was too high. Removing nodes at the end is
> expensive, even with LRU bypass.

If you want to bypass the page cache overhead all together, then
use direct IO. We should not make the same mistakes as O_DIRECT
for the same reasons (performance!). Instead, once we have the
page cache coherent path working we should then work to optimise
it with better page cache insert/remove primitives to lower the
overhead.

> On top of that, repeatedly
> instantiating new xarray nodes is very costly, as it needs to memset 576
> bytes of data, and freeing said nodes involve an RCU call per node as
> well. All that adds up, making uncached somewhat slower than O_DIRECT.

I think some of that has to do with implementation details and the
fact you appear to be focussing on PAGE_SIZE IOs. This means you are
instantiating and removing a single cached page at a time from the
mapping tree, and that means we need to allocate and free an xarray
node per IO.

I would say this is a future Xarray optimisation target, not a
reason for introducing a new incoherent IO API we'll have to support
long after we've fixed the xarray inefficiency....

> @@ -2048,6 +2057,13 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
>  		if (!page) {
>  			if (iocb->ki_flags & IOCB_NOWAIT)
>  				goto would_block;
> +			/* UNCACHED implies no read-ahead */
> +			if (iocb->ki_flags & IOCB_UNCACHED) {
> +				did_dio_begin = true;
> +				/* block truncate for UNCACHED reads */
> +				inode_dio_begin(inode);
> +				goto no_cached_page;
> +			}

Ok, so for every page we don't find in the cache, we go issue IO.
We also call inode_dio_begin() on every page we miss, but only call
inode_dio_end() a single time. So we leak i_dio_count on every IO
that needs to read more than a single page, which means we'll hang
in the next call to inode_dio_wait()...

>  no_cached_page:
> @@ -2234,6 +2250,14 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
>  			error = -ENOMEM;
>  			goto out;
>  		}
> +		if (iocb->ki_flags & IOCB_UNCACHED) {
> +			__SetPageLocked(page);
> +			page->mapping = mapping;
> +			page->index = index;
> +			clear_mapping = true;
> +			goto readpage;
> +		}

And we go through the ->readpage path, which means we a building and
issuing a single bio per page we miss. THis is highly inefficient,
and relies on bio merging to form large IOs in the block layer.
Sure, you won't notice the impact if all you do is PAGE_SIZE read()
calls, but if you are doing multi-megabyte IOs because you have
spinning rust storage then it will make a big difference.

IOWs, this does not take advantage of either the mpage_readpages or
the iomap_readpages many-pages-to-a-bio read optimisations that the
readahead path gives us, so there's more CPU and IO overhead from this
RWF_UNCACHED path than there is from the normal readahead based
buffered IO path.

Oh, and if we have random pages in the cache, this
single-page-at-atime approach that will break up large sequential
IOs in smaller, non-sequential IOs that will be dispatched
separately. It's much more CPU efficient to do a single large IO
that pulls new data into the random page in middle of the range than
it is to build, issue and complete two separate IOs. It's also much
more friendly to spinning rust to do a single large IO than two
small separated-by-a-few-sectors IOs...

IMO, this looks like an implementation hyper-focussed on brute
performance of PAGE_SIZE IOs and it compromises on coherency and
efficiency to attain that performance. Quite frankly, if performance
is so critical that you need to compromise the IO path in the way,
then just use direct IO.

Let's get a sane, clean, efficient page cache coherent read IO path
in place for RWF_UNCACHED, then optimise it for performance. If it's
done right, then the page cache/xarray insert/remove overhead should
largely disappear in the noise.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes
  2019-12-18  1:52   ` Darrick J. Wong
@ 2019-12-18  3:18     ` Jens Axboe
  2019-12-18  4:15       ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2019-12-18  3:18 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-mm, linux-fsdevel, linux-block, willy, clm, torvalds, david

On 12/17/19 6:52 PM, Darrick J. Wong wrote:
> On Tue, Dec 17, 2019 at 07:39:47AM -0700, Jens Axboe wrote:
>> This adds support for RWF_UNCACHED for file systems using iomap to
>> perform buffered writes. We use the generic infrastructure for this,
>> by tracking pages we created and calling write_drop_cached_pages()
>> to issue writeback and prune those pages.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  fs/iomap/apply.c       | 35 +++++++++++++++++++++++++++++++++++
>>  fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
>>  fs/iomap/trace.h       |  4 +++-
>>  include/linux/iomap.h  |  5 +++++
>>  4 files changed, 67 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
>> index 792079403a22..687e86945b27 100644
>> --- a/fs/iomap/apply.c
>> +++ b/fs/iomap/apply.c
>> @@ -92,5 +92,40 @@ iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops,
>>  				     data->flags, &iomap);
>>  	}
>>  
>> +	if (written <= 0)
>> +		goto out;
>> +
>> +	/*
>> +	 * If this is an uncached write, then we need to write and sync this
>> +	 * range of data. This is only true for a buffered write, not for
>> +	 * O_DIRECT.
>> +	 */
> 
> I tracked down the original conversation, where Dave had this to say:
> 
> "Hence, IMO, this is the wrong layer in iomap to be dealing with¬
> writeback and cache residency for uncached IO. We should be caching¬
> residency/invalidation at a per-IO level, not a per-page level."
> 
> He's right, but I still think it doesn't quite smell right to be putting
> this in iomap_apply, since that's a generic function that implements
> iteration and shouldn't be messing with cache invalidation.
> 
> So I have two possible suggestions for where to put this:
> 
> (1) Add the "flush and maybe invalidate" behavior to the bottom of
> iomap_write_actor like I said in the v4 patchset.  That will issue
> writeback and invalidate pagecache in smallish quantities.
> 
> (2) Find a way to pass the IOMAP_F_PAGE_CREATE state from
> iomap_write_actor back to iomap_file_buffered_write and do the
> flush-and-invalidate for the entire write request once at the end.

Thanks for your suggestion, I'll look into option 2. Option 1 isn't
going to work, as smaller quantities is going to cause a performance
issue for streamed IO.

>> @@ -851,10 +860,18 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
>>  			break;
>>  		}
>>  
>> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
>> -				srcmap);
>> -		if (unlikely(status))
>> +retry:
>> +		status = iomap_write_begin(inode, pos, bytes, flags,
>> +						&page, iomap, srcmap);
>> +		if (unlikely(status)) {
>> +			if (status == -ENOMEM &&
>> +			    (flags & IOMAP_WRITE_F_UNCACHED)) {
>> +				iomap->flags |= IOMAP_F_PAGE_CREATE;
>> +				flags &= ~IOMAP_WRITE_F_UNCACHED;
> 
> What's the strategy here?  We couldn't get a page for an uncached write,
> so try again as a regular cached write?

The idea is that we start with IOMAP_WRITE_F_UNCACHED set, in which case
we only do page lookup, not create. If that works, then we know that the
given page was already in the page cache. If it fails with -ENOMEM, we
store this information as IOMAP_F_PAGE_CREATE, and then clear
IOMAP_WRITE_F_UNCACHED and retry. The retry will create the page, and
now the caller knows that we had to create pages to satisfy this
write. The caller uses this information to invalidate the entire range.

Hope that explains better!

> Thanks for making the updates, it's looking better.

Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes
  2019-12-18  3:18     ` Jens Axboe
@ 2019-12-18  4:15       ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-12-18  4:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-mm, linux-fsdevel, linux-block, willy, clm, torvalds, david

On Tue, Dec 17, 2019 at 08:18:07PM -0700, Jens Axboe wrote:
> On 12/17/19 6:52 PM, Darrick J. Wong wrote:
> > On Tue, Dec 17, 2019 at 07:39:47AM -0700, Jens Axboe wrote:
> >> This adds support for RWF_UNCACHED for file systems using iomap to
> >> perform buffered writes. We use the generic infrastructure for this,
> >> by tracking pages we created and calling write_drop_cached_pages()
> >> to issue writeback and prune those pages.
> >>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >> ---
> >>  fs/iomap/apply.c       | 35 +++++++++++++++++++++++++++++++++++
> >>  fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
> >>  fs/iomap/trace.h       |  4 +++-
> >>  include/linux/iomap.h  |  5 +++++
> >>  4 files changed, 67 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> >> index 792079403a22..687e86945b27 100644
> >> --- a/fs/iomap/apply.c
> >> +++ b/fs/iomap/apply.c
> >> @@ -92,5 +92,40 @@ iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops,
> >>  				     data->flags, &iomap);
> >>  	}
> >>  
> >> +	if (written <= 0)
> >> +		goto out;
> >> +
> >> +	/*
> >> +	 * If this is an uncached write, then we need to write and sync this
> >> +	 * range of data. This is only true for a buffered write, not for
> >> +	 * O_DIRECT.
> >> +	 */
> > 
> > I tracked down the original conversation, where Dave had this to say:
> > 
> > "Hence, IMO, this is the wrong layer in iomap to be dealing with¬
> > writeback and cache residency for uncached IO. We should be caching¬
> > residency/invalidation at a per-IO level, not a per-page level."
> > 
> > He's right, but I still think it doesn't quite smell right to be putting
> > this in iomap_apply, since that's a generic function that implements
> > iteration and shouldn't be messing with cache invalidation.
> > 
> > So I have two possible suggestions for where to put this:
> > 
> > (1) Add the "flush and maybe invalidate" behavior to the bottom of
> > iomap_write_actor like I said in the v4 patchset.  That will issue
> > writeback and invalidate pagecache in smallish quantities.
> > 
> > (2) Find a way to pass the IOMAP_F_PAGE_CREATE state from
> > iomap_write_actor back to iomap_file_buffered_write and do the
> > flush-and-invalidate for the entire write request once at the end.
> 
> Thanks for your suggestion, I'll look into option 2. Option 1 isn't
> going to work, as smaller quantities is going to cause a performance
> issue for streamed IO.

<nod> I would also conjecture that pushing fewer flushes through xfs
might reduce the amount of seeking writing around that it has to do,
which may or may not help with storage these days.

> >> @@ -851,10 +860,18 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap,
> >>  			break;
> >>  		}
> >>  
> >> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> >> -				srcmap);
> >> -		if (unlikely(status))
> >> +retry:
> >> +		status = iomap_write_begin(inode, pos, bytes, flags,
> >> +						&page, iomap, srcmap);
> >> +		if (unlikely(status)) {
> >> +			if (status == -ENOMEM &&
> >> +			    (flags & IOMAP_WRITE_F_UNCACHED)) {
> >> +				iomap->flags |= IOMAP_F_PAGE_CREATE;
> >> +				flags &= ~IOMAP_WRITE_F_UNCACHED;
> > 
> > What's the strategy here?  We couldn't get a page for an uncached write,
> > so try again as a regular cached write?
> 
> The idea is that we start with IOMAP_WRITE_F_UNCACHED set, in which case
> we only do page lookup, not create. If that works, then we know that the
> given page was already in the page cache. If it fails with -ENOMEM, we
> store this information as IOMAP_F_PAGE_CREATE, and then clear
> IOMAP_WRITE_F_UNCACHED and retry. The retry will create the page, and
> now the caller knows that we had to create pages to satisfy this
> write. The caller uses this information to invalidate the entire range.

Ah, ok, it's intrinsic to the operation.  Would you mind putting that in
as a comment?

--D

> Hope that explains better!
> 
> > Thanks for making the updates, it's looking better.
> 
> Thanks!
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-12-18  4:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
2019-12-17 15:16   ` Guoqing Jiang
2019-12-17 16:42     ` Jens Axboe
2019-12-17 15:57   ` Matthew Wilcox
2019-12-17 16:41     ` Jens Axboe
2019-12-18  3:17   ` Dave Chinner
2019-12-17 14:39 ` [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-17 14:39 ` [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-17 14:39 ` [PATCH 4/6] iomap: add struct iomap_ctx Jens Axboe
2019-12-17 19:39   ` Linus Torvalds
2019-12-17 20:26     ` Linus Torvalds
2019-12-18  0:15       ` Jens Axboe
2019-12-18  1:25         ` Darrick J. Wong
2019-12-17 14:39 ` [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-18  1:52   ` Darrick J. Wong
2019-12-18  3:18     ` Jens Axboe
2019-12-18  4:15       ` Darrick J. Wong
2019-12-17 14:39 ` [PATCH 6/6] xfs: don't do delayed allocations for uncached " Jens Axboe
2019-12-18  1:57   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).