linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iomap preparations for GFS2
@ 2018-06-06 10:40 Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 1/6] fs: factor out a __generic_write_end helper Christoph Hellwig
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

Hi all,

this is a slight rework of the patches from Andreas to prepare for gfs2
using the iomap code.  I'd like to start with an immutable branch for
iomap patches in either the XFS tree or a tree of mine own with something
like this so that we can have a nice common base for both the major
iomap changes for XFS and the GFS2 work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] fs: factor out a __generic_write_end helper
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
@ 2018-06-06 10:40 ` Christoph Hellwig
  2018-06-06 10:59   ` Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 2/6] iomap: move bdev and dax_dev in a union Christoph Hellwig
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c   | 67 +++++++++++++++++++++++++++++++----------------------------
 fs/internal.h |  2 ++
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index cabc045f483d..aba2a948b235 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2076,6 +2076,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 }
 EXPORT_SYMBOL(block_write_begin);
 
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page)
+{
+	loff_t old_size = inode->i_size;
+	bool i_size_changed = false;
+
+	/*
+	 * No need to use i_size_read() here, the i_size cannot change under us
+	 * because we hold i_rwsem.
+	 *
+	 * But it's important to update i_size while still holding page lock:
+	 * page writeout could otherwise come in and zero beyond i_size.
+	 */
+	if (pos + copied > inode->i_size) {
+		i_size_write(inode, pos + copied);
+		i_size_changed = true;
+	}
+
+	unlock_page(page);
+	put_page(page);
+
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
+	/*
+	 * Don't mark the inode dirty under page lock. First, it unnecessarily
+	 * makes the holding time of page lock longer. Second, it forces lock
+	 * ordering of page lock and transaction start for journaling
+	 * filesystems.
+	 */
+	if (i_size_changed)
+		mark_inode_dirty(inode);
+	return copied;
+}
+
 int block_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
@@ -2116,39 +2150,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
 {
-	struct inode *inode = mapping->host;
-	loff_t old_size = inode->i_size;
-	int i_size_changed = 0;
-
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-
-	/*
-	 * No need to use i_size_read() here, the i_size
-	 * cannot change under us because we hold i_mutex.
-	 *
-	 * But it's important to update i_size while still holding page lock:
-	 * page writeout could otherwise come in and zero beyond i_size.
-	 */
-	if (pos+copied > inode->i_size) {
-		i_size_write(inode, pos+copied);
-		i_size_changed = 1;
-	}
-
-	unlock_page(page);
-	put_page(page);
-
-	if (old_size < pos)
-		pagecache_isize_extended(inode, old_size, pos);
-	/*
-	 * Don't mark the inode dirty under page lock. First, it unnecessarily
-	 * makes the holding time of page lock longer. Second, it forces lock
-	 * ordering of page lock and transaction start for journaling
-	 * filesystems.
-	 */
-	if (i_size_changed)
-		mark_inode_dirty(inode);
-
-	return copied;
+	return __generic_write_end(mapping->host, pos, copied, page);
 }
 EXPORT_SYMBOL(generic_write_end);
 
diff --git a/fs/internal.h b/fs/internal.h
index 980d005b21b4..4a18bdbd2214 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 extern void guard_bio_eod(int rw, struct bio *bio);
 extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 		get_block_t *get_block, struct iomap *iomap);
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page);
 
 /*
  * char_dev.c
-- 
2.14.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/6] iomap: move bdev and dax_dev in a union
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 1/6] fs: factor out a __generic_write_end helper Christoph Hellwig
@ 2018-06-06 10:40 ` Christoph Hellwig
  2018-06-06 11:37   ` Andreas Gruenbacher
  2018-06-06 10:40 ` [PATCH 3/6] iomap: mark newly allocated buffer heads as new Christoph Hellwig
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

We always either ask for a block device or DAX device mapping, so we can
use the same space for both.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext2/inode.c       | 6 ++++--
 fs/ext4/inode.c       | 6 ++++--
 fs/xfs/xfs_iomap.c    | 6 ++++--
 include/linux/iomap.h | 6 ++++--
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..8aead4e9dbc1 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -815,9 +815,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		return ret;
 
 	iomap->flags = 0;
-	iomap->bdev = inode->i_sb->s_bdev;
 	iomap->offset = (u64)first_block << blkbits;
-	iomap->dax_dev = sbi->s_daxdev;
+	if (IS_DAX(inode))
+		iomap->dax_dev = sbi->s_daxdev;
+	else
+		iomap->bdev = inode->i_sb->s_bdev;
 
 	if (ret == 0) {
 		iomap->type = IOMAP_HOLE;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2ea07efbe016..79027e99118f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3524,8 +3524,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	iomap->flags = 0;
 	if (ext4_inode_datasync_dirty(inode))
 		iomap->flags |= IOMAP_F_DIRTY;
-	iomap->bdev = inode->i_sb->s_bdev;
-	iomap->dax_dev = sbi->s_daxdev;
+	if (IS_DAX(inode))
+		iomap->dax_dev = sbi->s_daxdev;
+	else
+		iomap->bdev = inode->i_sb->s_bdev;
 	iomap->offset = (u64)first_block << blkbits;
 	iomap->length = (u64)map.m_len << blkbits;
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c6ce6f9335b6..587110db1ad2 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -70,8 +70,10 @@ xfs_bmbt_to_iomap(
 	}
 	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
 	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
-	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
-	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
+	if (IS_DAX(VFS_I(ip)))
+		iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+	else
+		iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
 }
 
 xfs_extlen_t
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a044a824da85..212f4d59bcbf 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -53,8 +53,10 @@ struct iomap {
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
-	struct block_device	*bdev;	/* block device for I/O */
-	struct dax_device	*dax_dev; /* dax_dev for dax operations */
+	union {
+		struct block_device *bdev;
+		struct dax_device   *dax_dev;
+	};
 };
 
 /*
-- 
2.14.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/6] iomap: mark newly allocated buffer heads as new
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 1/6] fs: factor out a __generic_write_end helper Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 2/6] iomap: move bdev and dax_dev in a union Christoph Hellwig
@ 2018-06-06 10:40 ` Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 4/6] iomap: complete partial direct I/O writes synchronously Christoph Hellwig
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

From: Andreas Gruenbacher <agruenba@redhat.com>

In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set.  This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index aba2a948b235..c8c2b7d8b8d6 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,16 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 		break;
 	case IOMAP_UNWRITTEN:
 		/*
-		 * For unwritten regions, we always need to ensure that
-		 * sub-block writes cause the regions in the block we are not
-		 * writing to are zeroed. Set the buffer as new to ensure this.
+		 * For unwritten regions, we always need to ensure that regions
+		 * in the block we are not writing to are zeroed. Mark the
+		 * buffer as new to ensure this.
 		 */
 		set_buffer_new(bh);
 		set_buffer_unwritten(bh);
 		/* FALLTHRU */
 	case IOMAP_MAPPED:
-		if (offset >= i_size_read(inode))
+		if ((iomap->flags & IOMAP_F_NEW) ||
+		    offset >= i_size_read(inode))
 			set_buffer_new(bh);
 		bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
 				inode->i_blkbits;
-- 
2.14.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/6] iomap: complete partial direct I/O writes synchronously
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-06-06 10:40 ` [PATCH 3/6] iomap: mark newly allocated buffer heads as new Christoph Hellwig
@ 2018-06-06 10:40 ` Christoph Hellwig
  2018-06-06 10:40 ` [PATCH 5/6] iomap: generic inline data handling Christoph Hellwig
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

From: Andreas Gruenbacher <agruenba@redhat.com>

According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write.  In that
case, wait for the short write to complete to allow gfs2 to recover.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 206539d369a8..f428baa32185 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -811,6 +811,7 @@ struct iomap_dio {
 	atomic_t		ref;
 	unsigned		flags;
 	int			error;
+	bool			wait_for_completion;
 
 	union {
 		/* used during submission and for synchronous completion: */
@@ -914,9 +915,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 		iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (is_sync_kiocb(dio->iocb)) {
+		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
-
 			WRITE_ONCE(dio->submit.waiter, NULL);
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -1130,13 +1130,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
+	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
-	if (is_sync_kiocb(iocb)) {
-		dio->submit.waiter = current;
-		dio->submit.cookie = BLK_QC_T_NONE;
-		dio->submit.last_queue = NULL;
-	}
+	dio->submit.waiter = current;
+	dio->submit.cookie = BLK_QC_T_NONE;
+	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
 		if (pos >= dio->i_size)
@@ -1186,7 +1185,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1201,8 +1200,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK)
+			if (ret == -ENOTBLK) {
+				dio->wait_for_completion = true;
 				ret = 0;
+			}
 			break;
 		}
 		pos += ret;
@@ -1223,7 +1224,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
 
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!is_sync_kiocb(iocb))
+		if (!dio->wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {
-- 
2.14.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 5/6] iomap: generic inline data handling
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-06-06 10:40 ` [PATCH 4/6] iomap: complete partial direct I/O writes synchronously Christoph Hellwig
@ 2018-06-06 10:40 ` Christoph Hellwig
  2018-06-07 13:50   ` Andreas Grünbacher
  2018-06-06 10:40 ` [PATCH 6/6] iomap: add a page_done callback Christoph Hellwig
  2018-06-06 11:38 ` iomap preparations for GFS2 Andreas Gruenbacher
  6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

From: Andreas Gruenbacher <agruenba@redhat.com>

Add generic inline data handling by adding a pointer to the inline data
region to struct iomap.  When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.

This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: small cleanups to better fit in with other iomap work]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/iomap.h |  1 +
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index f428baa32185..f2a491b98b7c 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -116,6 +116,26 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
+static void
+iomap_read_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap)
+{
+	size_t size = i_size_read(inode);
+	void *addr;
+
+	if (PageUptodate(page))
+		return;
+
+	BUG_ON(page->index);
+	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
+
+	addr = kmap_atomic(page);
+	memcpy(addr, iomap->inline_data, size);
+	memset(addr + size, 0, PAGE_SIZE - size);
+	kunmap_atomic(addr);
+	SetPageUptodate(page);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
@@ -133,7 +153,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	if (!page)
 		return -ENOMEM;
 
-	status = __block_write_begin_int(page, pos, len, NULL, iomap);
+	if (iomap->type == IOMAP_INLINE)
+		iomap_read_inline_data(inode, page, iomap);
+	else
+		status = __block_write_begin_int(page, pos, len, NULL, iomap);
+
 	if (unlikely(status)) {
 		unlock_page(page);
 		put_page(page);
@@ -146,14 +170,37 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	return status;
 }
 
+static int
+iomap_write_end_inline(struct inode *inode, struct page *page,
+		struct iomap *iomap, loff_t pos, unsigned copied)
+{
+	void *addr;
+
+	WARN_ON_ONCE(!PageUptodate(page));
+	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
+
+	addr = kmap_atomic(page);
+	memcpy(iomap->inline_data + pos, addr + pos, copied);
+	kunmap_atomic(addr);
+
+	mark_inode_dirty(inode);
+	__generic_write_end(inode, pos, copied, page);
+	return copied;
+}
+
 static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page)
+		unsigned copied, struct page *page, struct iomap *iomap)
 {
 	int ret;
 
-	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
-			copied, page, NULL);
+	if (iomap->type == IOMAP_INLINE) {
+		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
+	} else {
+		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
+				copied, page, NULL);
+	}
+
 	if (ret < len)
 		iomap_write_failed(inode, pos, len);
 	return ret;
@@ -208,7 +255,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page);
+		status = iomap_write_end(inode, pos, bytes, copied, page,
+				iomap);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -302,7 +350,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -354,7 +402,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 212f4d59bcbf..69ef622f650e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -56,6 +56,7 @@ struct iomap {
 	union {
 		struct block_device *bdev;
 		struct dax_device   *dax_dev;
+		void		    *inline_data;
 	};
 };
 
-- 
2.14.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 6/6] iomap: add a page_done callback
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-06-06 10:40 ` [PATCH 5/6] iomap: generic inline data handling Christoph Hellwig
@ 2018-06-06 10:40 ` Christoph Hellwig
  2018-06-06 11:37   ` Andreas Gruenbacher
  2018-06-06 11:38 ` iomap preparations for GFS2 Andreas Gruenbacher
  6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:40 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

This will be used by gfs2 to attach data to transactions for the journaled
data mode.  But the concept is generic enough that we might be able to
use it for other purposes like encryption/integrity post-processing in the
future.

Based on a patch from Andreas Gruenbacher.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c            | 3 +++
 include/linux/iomap.h | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index f2a491b98b7c..a7ecdd5ddd17 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -201,6 +201,9 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 				copied, page, NULL);
 	}
 
+	if (iomap->page_done)
+		iomap->page_done(inode, pos, copied, page, iomap);
+
 	if (ret < len)
 		iomap_write_failed(inode, pos, len);
 	return ret;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 69ef622f650e..474e2255ac6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -58,6 +58,14 @@ struct iomap {
 		struct dax_device   *dax_dev;
 		void		    *inline_data;
 	};
+
+	/*
+	 * Called when finished processing a page in the mapping returned in
+	 * thus iomap.  At least for now this is only supported in the buffered
+	 * write path.
+	 */
+	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
+			struct page *page, struct iomap *iomap);
 };
 
 /*
-- 
2.14.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] fs: factor out a __generic_write_end helper
  2018-06-06 10:40 ` [PATCH 1/6] fs: factor out a __generic_write_end helper Christoph Hellwig
@ 2018-06-06 10:59   ` Christoph Hellwig
  2018-06-06 11:09     ` Andreas Grünbacher
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:59 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

On Wed, Jun 06, 2018 at 12:40:28PM +0200, Christoph Hellwig wrote:
> Bits of the buffer.c based write_end implementations that don't know
> about buffer_heads and can be reused by other implementations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This actually already had two reviews:

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

That got lost through the detour via gfs2.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] fs: factor out a __generic_write_end helper
  2018-06-06 10:59   ` Christoph Hellwig
@ 2018-06-06 11:09     ` Andreas Grünbacher
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Grünbacher @ 2018-06-06 11:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, linux-xfs, Dan Williams,
	Linux FS-devel Mailing List, cluster-devel, linux-ext4

2018-06-06 12:59 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Wed, Jun 06, 2018 at 12:40:28PM +0200, Christoph Hellwig wrote:
>> Bits of the buffer.c based write_end implementations that don't know
>> about buffer_heads and can be reused by other implementations.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> This actually already had two reviews:
>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>
> That got lost through the detour via gfs2.

And one by me which has been dropped this time around, but I suppose
it doesn't matter.

Andreas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] iomap: move bdev and dax_dev in a union
  2018-06-06 10:40 ` [PATCH 2/6] iomap: move bdev and dax_dev in a union Christoph Hellwig
@ 2018-06-06 11:37   ` Andreas Gruenbacher
  2018-06-06 12:05     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Gruenbacher @ 2018-06-06 11:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

On 6 June 2018 at 12:40, Christoph Hellwig <hch@lst.de> wrote:
> We always either ask for a block device or DAX device mapping, so we can
> use the same space for both.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/ext2/inode.c       | 6 ++++--
>  fs/ext4/inode.c       | 6 ++++--
>  fs/xfs/xfs_iomap.c    | 6 ++++--
>  include/linux/iomap.h | 6 ++++--
>  4 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 71635909df3b..8aead4e9dbc1 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -815,9 +815,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>                 return ret;
>
>         iomap->flags = 0;
> -       iomap->bdev = inode->i_sb->s_bdev;
>         iomap->offset = (u64)first_block << blkbits;
> -       iomap->dax_dev = sbi->s_daxdev;
> +       if (IS_DAX(inode))
> +               iomap->dax_dev = sbi->s_daxdev;
> +       else
> +               iomap->bdev = inode->i_sb->s_bdev;
>
>         if (ret == 0) {
>                 iomap->type = IOMAP_HOLE;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2ea07efbe016..79027e99118f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3524,8 +3524,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>         iomap->flags = 0;
>         if (ext4_inode_datasync_dirty(inode))
>                 iomap->flags |= IOMAP_F_DIRTY;
> -       iomap->bdev = inode->i_sb->s_bdev;
> -       iomap->dax_dev = sbi->s_daxdev;
> +       if (IS_DAX(inode))
> +               iomap->dax_dev = sbi->s_daxdev;
> +       else
> +               iomap->bdev = inode->i_sb->s_bdev;
>         iomap->offset = (u64)first_block << blkbits;
>         iomap->length = (u64)map.m_len << blkbits;
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c6ce6f9335b6..587110db1ad2 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -70,8 +70,10 @@ xfs_bmbt_to_iomap(
>         }
>         iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
>         iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
> -       iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
> -       iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
> +       if (IS_DAX(VFS_I(ip)))
> +               iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
> +       else
> +               iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));

This seems backwards.

>  }
>
>  xfs_extlen_t
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index a044a824da85..212f4d59bcbf 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -53,8 +53,10 @@ struct iomap {
>         u64                     length; /* length of mapping, bytes */
>         u16                     type;   /* type of mapping */
>         u16                     flags;  /* flags for mapping */
> -       struct block_device     *bdev;  /* block device for I/O */
> -       struct dax_device       *dax_dev; /* dax_dev for dax operations */
> +       union {
> +               struct block_device *bdev;
> +               struct dax_device   *dax_dev;
> +       };
>  };
>
>  /*
> --
> 2.14.2
>

Andreas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] iomap: add a page_done callback
  2018-06-06 10:40 ` [PATCH 6/6] iomap: add a page_done callback Christoph Hellwig
@ 2018-06-06 11:37   ` Andreas Gruenbacher
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Gruenbacher @ 2018-06-06 11:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

On 6 June 2018 at 12:40, Christoph Hellwig <hch@lst.de> wrote:
> This will be used by gfs2 to attach data to transactions for the journaled
> data mode.  But the concept is generic enough that we might be able to
> use it for other purposes like encryption/integrity post-processing in the
> future.
>
> Based on a patch from Andreas Gruenbacher.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap.c            | 3 +++
>  include/linux/iomap.h | 8 ++++++++
>  2 files changed, 11 insertions(+)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index f2a491b98b7c..a7ecdd5ddd17 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -201,6 +201,9 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>                                 copied, page, NULL);
>         }
>
> +       if (iomap->page_done)
> +               iomap->page_done(inode, pos, copied, page, iomap);
> +
>         if (ret < len)
>                 iomap_write_failed(inode, pos, len);
>         return ret;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 69ef622f650e..474e2255ac6e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -58,6 +58,14 @@ struct iomap {
>                 struct dax_device   *dax_dev;
>                 void                *inline_data;
>         };
> +
> +       /*
> +        * Called when finished processing a page in the mapping returned in
> +        * thus iomap.  At least for now this is only supported in the buffered

"thus" -> "this"

> +        * write path.
> +        */
> +       void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
> +                       struct page *page, struct iomap *iomap);
>  };
>
>  /*
> --
> 2.14.2
>

Andreas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: iomap preparations for GFS2
  2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-06-06 10:40 ` [PATCH 6/6] iomap: add a page_done callback Christoph Hellwig
@ 2018-06-06 11:38 ` Andreas Gruenbacher
  2018-06-06 15:31   ` Andreas Grünbacher
  6 siblings, 1 reply; 17+ messages in thread
From: Andreas Gruenbacher @ 2018-06-06 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

On 6 June 2018 at 12:40, Christoph Hellwig <hch@lst.de> wrote:
> Hi all,
>
> this is a slight rework of the patches from Andreas to prepare for gfs2
> using the iomap code.  I'd like to start with an immutable branch for
> iomap patches in either the XFS tree or a tree of mine own with something
> like this so that we can have a nice common base for both the major
> iomap changes for XFS and the GFS2 work.

Okay.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] iomap: move bdev and dax_dev in a union
  2018-06-06 11:37   ` Andreas Gruenbacher
@ 2018-06-06 12:05     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 12:05 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, linux-xfs, Dan Williams, linux-fsdevel,
	cluster-devel, linux-ext4

On Wed, Jun 06, 2018 at 01:37:02PM +0200, Andreas Gruenbacher wrote:
> This seems backwards.

It is.  So much for last minute polarity fixups..

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: iomap preparations for GFS2
  2018-06-06 11:38 ` iomap preparations for GFS2 Andreas Gruenbacher
@ 2018-06-06 15:31   ` Andreas Grünbacher
  2018-06-06 15:39     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Grünbacher @ 2018-06-06 15:31 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, linux-xfs, Dan Williams, linux-fsdevel,
	cluster-devel, linux-ext4

2018-06-06 13:38 GMT+02:00 Andreas Gruenbacher <agruenba@redhat.com>:
> On 6 June 2018 at 12:40, Christoph Hellwig <hch@lst.de> wrote:
>> Hi all,
>>
>> this is a slight rework of the patches from Andreas to prepare for gfs2
>> using the iomap code.  I'd like to start with an immutable branch for
>> iomap patches in either the XFS tree or a tree of mine own with something
>> like this so that we can have a nice common base for both the major
>> iomap changes for XFS and the GFS2 work.

When do you want to push this upstream? We've got the gfs2 patches
that depend on it, and we cannot make progress with those before this
set is pushed.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: iomap preparations for GFS2
  2018-06-06 15:31   ` Andreas Grünbacher
@ 2018-06-06 15:39     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-06 15:39 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Andreas Gruenbacher, Christoph Hellwig, linux-xfs, Dan Williams,
	linux-fsdevel, cluster-devel, linux-ext4

On Wed, Jun 06, 2018 at 05:31:25PM +0200, Andreas Gr�nbacher wrote:
> 2018-06-06 13:38 GMT+02:00 Andreas Gruenbacher <agruenba@redhat.com>:
> > On 6 June 2018 at 12:40, Christoph Hellwig <hch@lst.de> wrote:
> >> Hi all,
> >>
> >> this is a slight rework of the patches from Andreas to prepare for gfs2
> >> using the iomap code.  I'd like to start with an immutable branch for
> >> iomap patches in either the XFS tree or a tree of mine own with something
> >> like this so that we can have a nice common base for both the major
> >> iomap changes for XFS and the GFS2 work.
> 
> When do you want to push this upstream? We've got the gfs2 patches
> that depend on it, and we cannot make progress with those before this
> set is pushed.

As soon as we have 4.18r-c1 can start work for the next merge
window.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] iomap: generic inline data handling
  2018-06-06 10:40 ` [PATCH 5/6] iomap: generic inline data handling Christoph Hellwig
@ 2018-06-07 13:50   ` Andreas Grünbacher
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Grünbacher @ 2018-06-07 13:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, linux-xfs, Dan Williams,
	Linux FS-devel Mailing List, cluster-devel, linux-ext4

2018-06-06 12:40 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> From: Andreas Gruenbacher <agruenba@redhat.com>
>
> Add generic inline data handling by adding a pointer to the inline data
> region to struct iomap.  When handling a buffered IOMAP_INLINE write,
> iomap_write_begin will copy the current inline data from the inline data
> region into the page cache, and iomap_write_end will copy the changes in
> the page cache back to the inline data region.
>
> This doesn't cover inline data reads and direct I/O yet because so far,
> we have no users.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> [hch: small cleanups to better fit in with other iomap work]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++------
>  include/linux/iomap.h |  1 +
>  2 files changed, 56 insertions(+), 7 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index f428baa32185..f2a491b98b7c 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -116,6 +116,26 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>                 truncate_pagecache_range(inode, max(pos, i_size), pos + len);
>  }
>
> +static void
> +iomap_read_inline_data(struct inode *inode, struct page *page,
> +               struct iomap *iomap)
> +{
> +       size_t size = i_size_read(inode);
> +       void *addr;
> +
> +       if (PageUptodate(page))
> +               return;
> +
> +       BUG_ON(page->index);
> +       BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +
> +       addr = kmap_atomic(page);
> +       memcpy(addr, iomap->inline_data, size);
> +       memset(addr + size, 0, PAGE_SIZE - size);
> +       kunmap_atomic(addr);
> +       SetPageUptodate(page);
> +}

Can you please move iomap_read_inline_data above iomap_write_failed in
fs/iomap.c so that it will end up above the readpage code? We'll need
iomap_read_inline_data in the readpage code.

>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>                 struct page **pagep, struct iomap *iomap)
> @@ -133,7 +153,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>         if (!page)
>                 return -ENOMEM;
>
> -       status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +       if (iomap->type == IOMAP_INLINE)
> +               iomap_read_inline_data(inode, page, iomap);
> +       else
> +               status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +
>         if (unlikely(status)) {
>                 unlock_page(page);
>                 put_page(page);
> @@ -146,14 +170,37 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>         return status;
>  }
>
> +static int
> +iomap_write_end_inline(struct inode *inode, struct page *page,
> +               struct iomap *iomap, loff_t pos, unsigned copied)
> +{
> +       void *addr;
> +
> +       WARN_ON_ONCE(!PageUptodate(page));
> +       BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +
> +       addr = kmap_atomic(page);
> +       memcpy(iomap->inline_data + pos, addr + pos, copied);
> +       kunmap_atomic(addr);
> +
> +       mark_inode_dirty(inode);
> +       __generic_write_end(inode, pos, copied, page);
> +       return copied;
> +}
> +
>  static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -               unsigned copied, struct page *page)
> +               unsigned copied, struct page *page, struct iomap *iomap)
>  {
>         int ret;
>
> -       ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> -                       copied, page, NULL);
> +       if (iomap->type == IOMAP_INLINE) {
> +               ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> +       } else {
> +               ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> +                               copied, page, NULL);
> +       }
> +
>         if (ret < len)
>                 iomap_write_failed(inode, pos, len);
>         return ret;
> @@ -208,7 +255,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
>                 flush_dcache_page(page);
>
> -               status = iomap_write_end(inode, pos, bytes, copied, page);
> +               status = iomap_write_end(inode, pos, bytes, copied, page,
> +                               iomap);
>                 if (unlikely(status < 0))
>                         break;
>                 copied = status;
> @@ -302,7 +350,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
>                 WARN_ON_ONCE(!PageUptodate(page));
>
> -               status = iomap_write_end(inode, pos, bytes, bytes, page);
> +               status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>                 if (unlikely(status <= 0)) {
>                         if (WARN_ON_ONCE(status == 0))
>                                 return -EIO;
> @@ -354,7 +402,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>         zero_user(page, offset, bytes);
>         mark_page_accessed(page);
>
> -       return iomap_write_end(inode, pos, bytes, bytes, page);
> +       return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  }
>
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 212f4d59bcbf..69ef622f650e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -56,6 +56,7 @@ struct iomap {
>         union {
>                 struct block_device *bdev;
>                 struct dax_device   *dax_dev;
> +               void                *inline_data;
>         };
>  };
>
> --
> 2.14.2
>

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/6] iomap: complete partial direct I/O writes synchronously
  2018-06-14 12:04 iomap preparations for GFS2 v2 Christoph Hellwig
@ 2018-06-14 12:04 ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-06-14 12:04 UTC (permalink / raw)
  To: Andreas Gruenbacher, linux-xfs
  Cc: Dan Williams, linux-fsdevel, cluster-devel, linux-ext4

From: Andreas Gruenbacher <agruenba@redhat.com>

According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write.  In that
case, wait for the short write to complete to allow gfs2 to recover.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 7d1e9f45f098..bf1e81b08962 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -811,6 +811,7 @@ struct iomap_dio {
 	atomic_t		ref;
 	unsigned		flags;
 	int			error;
+	bool			wait_for_completion;
 
 	union {
 		/* used during submission and for synchronous completion: */
@@ -914,9 +915,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 		iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (is_sync_kiocb(dio->iocb)) {
+		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
-
 			WRITE_ONCE(dio->submit.waiter, NULL);
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -1131,13 +1131,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
+	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
-	if (is_sync_kiocb(iocb)) {
-		dio->submit.waiter = current;
-		dio->submit.cookie = BLK_QC_T_NONE;
-		dio->submit.last_queue = NULL;
-	}
+	dio->submit.waiter = current;
+	dio->submit.cookie = BLK_QC_T_NONE;
+	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
 		if (pos >= dio->i_size)
@@ -1187,7 +1186,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1202,8 +1201,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK)
+			if (ret == -ENOTBLK) {
+				dio->wait_for_completion = true;
 				ret = 0;
+			}
 			break;
 		}
 		pos += ret;
@@ -1224,7 +1225,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
 
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!is_sync_kiocb(iocb))
+		if (!dio->wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {
-- 
2.17.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-06-14 12:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-06 10:40 iomap preparations for GFS2 Christoph Hellwig
2018-06-06 10:40 ` [PATCH 1/6] fs: factor out a __generic_write_end helper Christoph Hellwig
2018-06-06 10:59   ` Christoph Hellwig
2018-06-06 11:09     ` Andreas Grünbacher
2018-06-06 10:40 ` [PATCH 2/6] iomap: move bdev and dax_dev in a union Christoph Hellwig
2018-06-06 11:37   ` Andreas Gruenbacher
2018-06-06 12:05     ` Christoph Hellwig
2018-06-06 10:40 ` [PATCH 3/6] iomap: mark newly allocated buffer heads as new Christoph Hellwig
2018-06-06 10:40 ` [PATCH 4/6] iomap: complete partial direct I/O writes synchronously Christoph Hellwig
2018-06-06 10:40 ` [PATCH 5/6] iomap: generic inline data handling Christoph Hellwig
2018-06-07 13:50   ` Andreas Grünbacher
2018-06-06 10:40 ` [PATCH 6/6] iomap: add a page_done callback Christoph Hellwig
2018-06-06 11:37   ` Andreas Gruenbacher
2018-06-06 11:38 ` iomap preparations for GFS2 Andreas Gruenbacher
2018-06-06 15:31   ` Andreas Grünbacher
2018-06-06 15:39     ` Christoph Hellwig
2018-06-14 12:04 iomap preparations for GFS2 v2 Christoph Hellwig
2018-06-14 12:04 ` [PATCH 4/6] iomap: complete partial direct I/O writes synchronously Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).