linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/10] gfs2 iomap write support
@ 2018-06-04 19:31 Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 01/10] iomap: inline data should be an iomap type, not a flag Andreas Gruenbacher
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Here's another update of my gfs2 iomap write patches, with support for
buffered writes as well as direct I/O reads and writes through iomap.


Changes since v7:

 * Drop patch that allows to unconditionally mark the inode dirty in
   __generic_write_end and mark the inode dirty in
   iomap_write_inline_data instead.

 * Call the page_write_end hook even for IOMAP_INLINE mappings.

 * Drop patch "iomap: Put struct iomap_ops into struct iomap".


Significant changes since v6:

 * Switch to Christoph's patch for splitting out __generic_write_end and 
   put the inode-dirtying change in a separate commit.

 * Switch froma mandatory write_end operation to an optional
   page_write_end hook.

 * Add a patch to pass struct iomap_ops in struct iomap at the end for
   demonstration purposes.


Significant changes since v5:

 * Initial gfs2 specific cleanups split off and posted separately.

 * New patch for generic iomap inline data handling.

 * Iomap write_begin operation removed.


A 4.17 based version of the patches can be found here:

  https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-write

Thanks,
Andreas

Andreas Gruenbacher (8):
  iomap: Mark newly allocated buffer heads as new
  iomap: Complete partial direct I/O writes synchronously
  iomap: Generic inline data handling
  iomap: Add page_write_end iomap hook
  gfs2: iomap buffered write support
  gfs2: gfs2_extent_length cleanup
  gfs2: iomap direct I/O support
  gfs2: Remove gfs2_write_{begin,end}

Christoph Hellwig (2):
  iomap: inline data should be an iomap type, not a flag
  fs: factor out a __generic_write_end helper

 fs/buffer.c           |  76 ++++-----
 fs/ext4/inline.c      |   4 +-
 fs/gfs2/aops.c        | 328 ++-----------------------------------
 fs/gfs2/aops.h        |  22 +++
 fs/gfs2/bmap.c        | 365 ++++++++++++++++++++++++++++++++++++------
 fs/gfs2/file.c        | 168 +++++++++++++++++--
 fs/internal.h         |   2 +
 fs/iomap.c            | 136 ++++++++++++----
 include/linux/iomap.h |  11 +-
 9 files changed, 672 insertions(+), 440 deletions(-)
 create mode 100644 fs/gfs2/aops.h

-- 
2.17.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v8 01/10] iomap: inline data should be an iomap type, not a flag
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 02/10] iomap: Mark newly allocated buffer heads as new Andreas Gruenbacher
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

From: Christoph Hellwig <hch@lst.de>

Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/ext4/inline.c      | 4 ++--
 fs/gfs2/bmap.c        | 4 ++--
 fs/iomap.c            | 8 ++++----
 include/linux/iomap.h | 2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
 	iomap->offset = 0;
 	iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
 			      i_size_read(inode));
-	iomap->type = 0;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 
 out:
 	up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index a7b586e02693..59a78056530a 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -767,8 +767,8 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
 	iomap->length = i_size_read(inode);
-	iomap->type = IOMAP_MAPPED;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 }
 
 /**
diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..54b693da3a35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -498,22 +498,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 	case IOMAP_HOLE:
 		/* skip holes */
 		return 0;
+	case IOMAP_MAPPED:
+		break;
 	case IOMAP_DELALLOC:
 		flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
 		break;
 	case IOMAP_UNWRITTEN:
 		flags |= FIEMAP_EXTENT_UNWRITTEN;
 		break;
-	case IOMAP_MAPPED:
-		break;
+	case IOMAP_INLINE:
+		flags |= FIEMAP_EXTENT_DATA_INLINE;
 	}
 
 	if (iomap->flags & IOMAP_F_MERGED)
 		flags |= FIEMAP_EXTENT_MERGED;
 	if (iomap->flags & IOMAP_F_SHARED)
 		flags |= FIEMAP_EXTENT_SHARED;
-	if (iomap->flags & IOMAP_F_DATA_INLINE)
-		flags |= FIEMAP_EXTENT_DATA_INLINE;
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
 			iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 19a07de28212..918f14075702 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
 #define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE	0x05	/* data inline in the inode */
 
 /*
  * Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
  */
 #define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED		0x20	/* block shared with another file */
-#define IOMAP_F_DATA_INLINE	0x40	/* data inline in the inode */
 
 /*
  * Magic value for addr:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 02/10] iomap: Mark newly allocated buffer heads as new
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 01/10] iomap: inline data should be an iomap type, not a flag Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set.  This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..0d29bb28b404 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,16 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 		break;
 	case IOMAP_UNWRITTEN:
 		/*
-		 * For unwritten regions, we always need to ensure that
-		 * sub-block writes cause the regions in the block we are not
-		 * writing to are zeroed. Set the buffer as new to ensure this.
+		 * For unwritten regions, we always need to ensure that regions
+		 * in the block we are not writing to are zeroed. Mark the
+		 * buffer as new to ensure this.
 		 */
 		set_buffer_new(bh);
 		set_buffer_unwritten(bh);
 		/* FALLTHRU */
 	case IOMAP_MAPPED:
-		if (offset >= i_size_read(inode))
+		if ((iomap->flags & IOMAP_F_NEW) ||
+		    offset >= i_size_read(inode))
 			set_buffer_new(bh);
 		bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
 				inode->i_blkbits;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 01/10] iomap: inline data should be an iomap type, not a flag Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 02/10] iomap: Mark newly allocated buffer heads as new Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-05 12:10   ` David Sterba
  2018-06-04 19:31 ` [PATCH v8 04/10] fs: factor out a __generic_write_end helper Andreas Gruenbacher
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write.  In that
case, wait for the short write to complete to allow gfs2 to recover.

This will make xfstest generic/240 work on gfs2.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 54b693da3a35..a0d3b7742060 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -696,6 +696,7 @@ struct iomap_dio {
 	atomic_t		ref;
 	unsigned		flags;
 	int			error;
+	bool			wait_for_completion;
 
 	union {
 		/* used during submission and for synchronous completion: */
@@ -797,9 +798,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 		iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (is_sync_kiocb(dio->iocb)) {
+		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
-
 			WRITE_ONCE(dio->submit.waiter, NULL);
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -990,13 +990,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
+	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
-	if (is_sync_kiocb(iocb)) {
-		dio->submit.waiter = current;
-		dio->submit.cookie = BLK_QC_T_NONE;
-		dio->submit.last_queue = NULL;
-	}
+	dio->submit.waiter = current;
+	dio->submit.cookie = BLK_QC_T_NONE;
+	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
 		if (pos >= dio->i_size)
@@ -1033,7 +1032,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1048,8 +1047,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK)
+			if (ret == -ENOTBLK) {
+				dio->wait_for_completion = true;
 				ret = 0;
+			}
 			break;
 		}
 		pos += ret;
@@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (ret < 0)
 		iomap_dio_set_error(dio, ret);
 
+	smp_mb__before_atomic();
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!is_sync_kiocb(iocb))
+		if (!dio->wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 04/10] fs: factor out a __generic_write_end helper
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 05/10] iomap: Generic inline data handling Andreas Gruenbacher
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel

From: Christoph Hellwig <hch@lst.de>

Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c   | 67 +++++++++++++++++++++++++++------------------------
 fs/internal.h |  2 ++
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0d29bb28b404..cb5bbf22b1ce 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2077,6 +2077,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 }
 EXPORT_SYMBOL(block_write_begin);
 
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page)
+{
+	loff_t old_size = inode->i_size;
+	bool i_size_changed = false;
+
+	/*
+	 * No need to use i_size_read() here, the i_size cannot change under us
+	 * because we hold i_rwsem.
+	 *
+	 * But it's important to update i_size while still holding page lock:
+	 * page writeout could otherwise come in and zero beyond i_size.
+	 */
+	if (pos + copied > inode->i_size) {
+		i_size_write(inode, pos + copied);
+		i_size_changed = true;
+	}
+
+	unlock_page(page);
+	put_page(page);
+
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
+	/*
+	 * Don't mark the inode dirty under page lock. First, it unnecessarily
+	 * makes the holding time of page lock longer. Second, it forces lock
+	 * ordering of page lock and transaction start for journaling
+	 * filesystems.
+	 */
+	if (i_size_changed)
+		mark_inode_dirty(inode);
+	return copied;
+}
+
 int block_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
@@ -2117,39 +2151,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
 {
-	struct inode *inode = mapping->host;
-	loff_t old_size = inode->i_size;
-	int i_size_changed = 0;
-
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-
-	/*
-	 * No need to use i_size_read() here, the i_size
-	 * cannot change under us because we hold i_mutex.
-	 *
-	 * But it's important to update i_size while still holding page lock:
-	 * page writeout could otherwise come in and zero beyond i_size.
-	 */
-	if (pos+copied > inode->i_size) {
-		i_size_write(inode, pos+copied);
-		i_size_changed = 1;
-	}
-
-	unlock_page(page);
-	put_page(page);
-
-	if (old_size < pos)
-		pagecache_isize_extended(inode, old_size, pos);
-	/*
-	 * Don't mark the inode dirty under page lock. First, it unnecessarily
-	 * makes the holding time of page lock longer. Second, it forces lock
-	 * ordering of page lock and transaction start for journaling
-	 * filesystems.
-	 */
-	if (i_size_changed)
-		mark_inode_dirty(inode);
-
-	return copied;
+	return __generic_write_end(mapping->host, pos, copied, page);
 }
 EXPORT_SYMBOL(generic_write_end);
 
diff --git a/fs/internal.h b/fs/internal.h
index 980d005b21b4..4a18bdbd2214 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 extern void guard_bio_eod(int rw, struct bio *bio);
 extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 		get_block_t *get_block, struct iomap *iomap);
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page);
 
 /*
  * char_dev.c
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 05/10] iomap: Generic inline data handling
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (3 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 04/10] fs: factor out a __generic_write_end helper Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-05  5:34   ` Christoph Hellwig
  2018-06-04 19:31 ` [PATCH v8 06/10] iomap: Add page_write_end iomap hook Andreas Gruenbacher
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Add generic inline data handling by adding a pointer to the inline data
region to struct iomap.  When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.

This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/iomap.c            | 56 +++++++++++++++++++++++++++++++++++++++----
 include/linux/iomap.h |  1 +
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index a0d3b7742060..48cd67227811 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -108,6 +108,41 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
+static void
+iomap_read_inline_data(struct page *page, struct iomap *iomap, loff_t size)
+{
+	void *data = iomap->inline_data;
+	void *addr;
+
+	if (PageUptodate(page))
+		return;
+
+	BUG_ON(page->index);
+	BUG_ON(size > PAGE_SIZE - offset_in_page(data));
+
+	addr = kmap_atomic(page);
+	memcpy(addr, data, size);
+	memset(addr + size, 0, PAGE_SIZE - size);
+	kunmap_atomic(addr);
+	SetPageUptodate(page);
+}
+
+static void
+iomap_write_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap, off_t pos, unsigned copied)
+{
+	void *data = iomap->inline_data;
+	void *addr;
+
+	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(data));
+	WARN_ON_ONCE(!PageUptodate(page));
+
+	addr = kmap_atomic(page);
+	memcpy(data + pos, addr + pos, copied);
+	kunmap_atomic(addr);
+	mark_inode_dirty(inode);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
@@ -125,6 +160,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	if (!page)
 		return -ENOMEM;
 
+	if (iomap->type == IOMAP_INLINE) {
+		iomap_read_inline_data(page, iomap, inode->i_size);
+		goto out;
+	}
+
 	status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	if (unlikely(status)) {
 		unlock_page(page);
@@ -134,16 +174,23 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		iomap_write_failed(inode, pos, len);
 	}
 
+out:
 	*pagep = page;
 	return status;
 }
 
 static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page)
+		unsigned copied, struct page *page, struct iomap *iomap)
 {
 	int ret;
 
+	if (iomap->type == IOMAP_INLINE) {
+		iomap_write_inline_data(inode, page, iomap, pos, copied);
+		__generic_write_end(inode, pos, copied, page);
+		return copied;
+	}
+
 	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
 			copied, page, NULL);
 	if (ret < len)
@@ -200,7 +247,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page);
+		status = iomap_write_end(inode, pos, bytes, copied, page,
+				iomap);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -294,7 +342,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -346,7 +394,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 918f14075702..c61113c71a60 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -47,6 +47,7 @@ struct iomap {
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
+	void			*inline_data;  /* inline data buffer */
 	struct block_device	*bdev;	/* block device for I/O */
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 };
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 06/10] iomap: Add page_write_end iomap hook
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (4 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 05/10] iomap: Generic inline data handling Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-05  7:56   ` Andreas Grünbacher
  2018-06-05 12:07   ` David Sterba
  2018-06-04 19:31 ` [PATCH v8 07/10] gfs2: iomap buffered write support Andreas Gruenbacher
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Add a page_write_end hook called when done writing to a page, for
filesystems that implement data journaling: in that case, pages are
written to the journal before being written back to their proper on-disk
locations.  The new hook is bypassed for IOMAP_INLINE mappings.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/iomap.c            | 58 ++++++++++++++++++++++++++++++++-----------
 include/linux/iomap.h |  8 ++++++
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 48cd67227811..3b34c957d2fd 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -181,16 +181,22 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 
 static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page, struct iomap *iomap)
+		unsigned copied, struct page *page, struct iomap *iomap,
+		const struct iomap_ops *ops)
 {
+	typeof(ops->page_write_end) page_write_end = ops->page_write_end;
 	int ret;
 
 	if (iomap->type == IOMAP_INLINE) {
 		iomap_write_inline_data(inode, page, iomap, pos, copied);
 		__generic_write_end(inode, pos, copied, page);
+		if (page_write_end)
+			page_write_end(inode, pos, copied, page, iomap);
 		return copied;
 	}
 
+	if (page_write_end)
+		page_write_end(inode, pos, copied, page, iomap);
 	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
 			copied, page, NULL);
 	if (ret < len)
@@ -198,11 +204,17 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	return ret;
 }
 
+struct iomap_write_args {
+	const struct iomap_ops *ops;
+	struct iov_iter *iter;
+};
+
 static loff_t
 iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
-	struct iov_iter *i = data;
+	struct iomap_write_args *args = data;
+	struct iov_iter *i = args->iter;
 	long status = 0;
 	ssize_t written = 0;
 	unsigned int flags = AOP_FLAG_NOFS;
@@ -248,7 +260,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		flush_dcache_page(page);
 
 		status = iomap_write_end(inode, pos, bytes, copied, page,
-				iomap);
+				iomap, args->ops);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -285,10 +297,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
 	loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+	struct iomap_write_args args = {
+		.ops = ops,
+		.iter = iter,
+	};
 
 	while (iov_iter_count(iter)) {
 		ret = iomap_apply(inode, pos, iov_iter_count(iter),
-				IOMAP_WRITE, ops, iter, iomap_write_actor);
+				IOMAP_WRITE, ops, &args, iomap_write_actor);
 		if (ret <= 0)
 			break;
 		pos += ret;
@@ -319,6 +335,7 @@ static loff_t
 iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
+	const struct iomap_ops *ops = data;
 	long status = 0;
 	ssize_t written = 0;
 
@@ -342,7 +359,8 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+				ops);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -368,8 +386,8 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 	loff_t ret;
 
 	while (len) {
-		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
-				iomap_dirty_actor);
+		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops,
+				(void *)ops, iomap_dirty_actor);
 		if (ret <= 0)
 			return ret;
 		pos += ret;
@@ -381,7 +399,8 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 EXPORT_SYMBOL_GPL(iomap_file_dirty);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap)
+		unsigned bytes, const struct iomap_ops *ops,
+		struct iomap *iomap)
 {
 	struct page *page;
 	int status;
@@ -394,7 +413,8 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+			ops);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -407,11 +427,16 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 			offset, bytes);
 }
 
+struct iomap_zero_range_args {
+	const struct iomap_ops *ops;
+	bool *did_zero;
+};
+
 static loff_t
 iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		void *data, struct iomap *iomap)
 {
-	bool *did_zero = data;
+	struct iomap_zero_range_args *args = data;
 	loff_t written = 0;
 	int status;
 
@@ -428,15 +453,16 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		if (IS_DAX(inode))
 			status = iomap_dax_zero(pos, offset, bytes, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap);
+			status = iomap_zero(inode, pos, offset, bytes,
+					args->ops, iomap);
 		if (status < 0)
 			return status;
 
 		pos += bytes;
 		count -= bytes;
 		written += bytes;
-		if (did_zero)
-			*did_zero = true;
+		if (args->did_zero)
+			*args->did_zero = true;
 	} while (count > 0);
 
 	return written;
@@ -446,11 +472,15 @@ int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
 {
+	struct iomap_zero_range_args args = {
+		.ops = ops,
+		.did_zero = did_zero,
+	};
 	loff_t ret;
 
 	while (len > 0) {
 		ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
-				ops, did_zero, iomap_zero_range_actor);
+				ops, &args, iomap_zero_range_actor);
 		if (ret <= 0)
 			return ret;
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c61113c71a60..ac7f1b2c1cbe 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -8,6 +8,7 @@ struct fiemap_extent_info;
 struct inode;
 struct iov_iter;
 struct kiocb;
+struct page;
 struct vm_area_struct;
 struct vm_fault;
 
@@ -71,6 +72,13 @@ struct iomap_ops {
 	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
 			unsigned flags, struct iomap *iomap);
 
+	/*
+	 * Called when done writing to a page (optional; skipped for
+	 * IOMAP_INLINE mappings).
+	 */
+	void (*page_write_end)(struct inode *inode, loff_t pos, unsigned copied,
+			struct page *page, struct iomap *iomap);
+
 	/*
 	 * Commit and/or unreserve space previous allocated using iomap_begin.
 	 * Written indicates the length of the successful write operation which
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 07/10] gfs2: iomap buffered write support
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (5 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 06/10] iomap: Add page_write_end iomap hook Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 08/10] gfs2: gfs2_extent_length cleanup Andreas Gruenbacher
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

With the traditional page-based writes, blocks are allocated separately
for each page written to.  With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage.  This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c |  20 +--
 fs/gfs2/aops.h |  22 ++++
 fs/gfs2/bmap.c | 326 ++++++++++++++++++++++++++++++++++++++++++++-----
 fs/gfs2/file.c |  46 +++++--
 4 files changed, 368 insertions(+), 46 deletions(-)
 create mode 100644 fs/gfs2/aops.h

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a1c6b5de9200..3d9633175aa8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -22,6 +22,7 @@
 #include <linux/backing-dev.h>
 #include <linux/uio.h>
 #include <trace/events/writeback.h>
+#include <linux/sched/signal.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -36,10 +37,11 @@
 #include "super.h"
 #include "util.h"
 #include "glops.h"
+#include "aops.h"
 
 
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
-				   unsigned int from, unsigned int len)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+			    unsigned int from, unsigned int len)
 {
 	struct buffer_head *head = page_buffers(page);
 	unsigned int bsize = head->b_size;
@@ -462,7 +464,7 @@ static int gfs2_jdata_writepages(struct address_space *mapping,
  * Returns: errno
  */
 
-static int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
+int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
 {
 	struct buffer_head *dibh;
 	u64 dsize = i_size_read(&ip->i_inode);
@@ -773,7 +775,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
  * adjust_fs_space - Adjusts the free space available due to gfs2_grow
  * @inode: the rindex inode
  */
-static void adjust_fs_space(struct inode *inode)
+void adjust_fs_space(struct inode *inode)
 {
 	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
 	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
@@ -819,11 +821,11 @@ static void adjust_fs_space(struct inode *inode)
  * This copies the data from the page into the inode block after
  * the inode data structure itself.
  *
- * Returns: errno
+ * Returns: copied bytes or errno
  */
-static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
-				  loff_t pos, unsigned copied,
-				  struct page *page)
+int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+			   loff_t pos, unsigned copied,
+			   struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	u64 to = pos + copied;
@@ -862,7 +864,7 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
  * The main write_end function for GFS2. We just put our locking around the VFS
  * provided functions.
  *
- * Returns: errno
+ * Returns: copied bytes or errno
  */
 
 static int gfs2_write_end(struct file *file, struct address_space *mapping,
diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
new file mode 100644
index 000000000000..976bb32dd405
--- /dev/null
+++ b/fs/gfs2/aops.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2017 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License version 2.
+ */
+
+#ifndef __AOPS_DOT_H__
+#define __AOPS_DOT_H__
+
+#include "incore.h"
+
+extern int stuffed_readpage(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+				  loff_t pos, unsigned copied,
+				  struct page *page);
+extern void adjust_fs_space(struct inode *inode);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+				   unsigned int from, unsigned int len);
+
+#endif /* __AOPS_DOT_H__ */
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 59a78056530a..236e159c56d9 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -28,6 +28,7 @@
 #include "trans.h"
 #include "dir.h"
 #include "util.h"
+#include "aops.h"
 #include "trace_gfs2.h"
 
 /* This doesn't need to be that large as max 64 bit pointers in a 4k
@@ -41,6 +42,8 @@ struct metapath {
 	int mp_aheight; /* actual height (lookup height) */
 };
 
+static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length);
+
 /**
  * gfs2_unstuffer_page - unstuff a stuffed inode into a block cached by a page
  * @ip: the inode
@@ -389,7 +392,7 @@ static int fillup_metapath(struct gfs2_inode *ip, struct metapath *mp, int h)
 	return mp->mp_aheight - x - 1;
 }
 
-static inline void release_metapath(struct metapath *mp)
+static void release_metapath(struct metapath *mp)
 {
 	int i;
 
@@ -397,6 +400,7 @@ static inline void release_metapath(struct metapath *mp)
 		if (mp->mp_bh[i] == NULL)
 			break;
 		brelse(mp->mp_bh[i]);
+		mp->mp_bh[i] = NULL;
 	}
 }
 
@@ -609,11 +613,13 @@ enum alloc_state {
  *  ii) Indirect blocks to fill in lower part of the metadata tree
  * iii) Data blocks
  *
- * The function is in two parts. The first part works out the total
- * number of blocks which we need. The second part does the actual
- * allocation asking for an extent at a time (if enough contiguous free
- * blocks are available, there will only be one request per bmap call)
- * and uses the state machine to initialise the blocks in order.
+ * This function is called after gfs2_iomap_get, which works out the
+ * total number of blocks which we need via gfs2_alloc_size.
+ *
+ * We then do the actual allocation asking for an extent at a time (if
+ * enough contiguous free blocks are available, there will only be one
+ * allocation request per call) and uses the state machine to initialise
+ * the blocks in order.
  *
  * Right now, this function will allocate at most one indirect block
  * worth of data -- with a default block size of 4K, that's slightly
@@ -633,39 +639,26 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	struct buffer_head *dibh = mp->mp_bh[0];
 	u64 bn;
 	unsigned n, i, blks, alloced = 0, iblks = 0, branch_start = 0;
-	unsigned dblks = 0;
-	unsigned ptrs_per_blk;
+	size_t dblks = iomap->length >> inode->i_blkbits;
 	const unsigned end_of_metadata = mp->mp_fheight - 1;
 	int ret;
 	enum alloc_state state;
 	__be64 *ptr;
 	__be64 zero_bn = 0;
-	size_t maxlen = iomap->length >> inode->i_blkbits;
 
 	BUG_ON(mp->mp_aheight < 1);
 	BUG_ON(dibh == NULL);
+	BUG_ON(dblks < 1);
 
 	gfs2_trans_add_meta(ip->i_gl, dibh);
 
 	down_write(&ip->i_rw_mutex);
 
 	if (mp->mp_fheight == mp->mp_aheight) {
-		struct buffer_head *bh;
-		int eob;
-
-		/* Bottom indirect block exists, find unalloced extent size */
-		ptr = metapointer(end_of_metadata, mp);
-		bh = mp->mp_bh[end_of_metadata];
-		dblks = gfs2_extent_length(bh->b_data, bh->b_size, ptr,
-					   maxlen, &eob);
-		BUG_ON(dblks < 1);
+		/* Bottom indirect block exists */
 		state = ALLOC_DATA;
 	} else {
 		/* Need to allocate indirect blocks */
-		ptrs_per_blk = mp->mp_fheight > 1 ? sdp->sd_inptrs :
-			sdp->sd_diptrs;
-		dblks = min(maxlen, (size_t)(ptrs_per_blk -
-					     mp->mp_list[end_of_metadata]));
 		if (mp->mp_fheight == ip->i_height) {
 			/* Writing into existing tree, extend tree down */
 			iblks = mp->mp_fheight - mp->mp_aheight;
@@ -750,6 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 		}
 	} while (iomap->addr == IOMAP_NULL_ADDR);
 
+	iomap->type = IOMAP_MAPPED;
 	iomap->length = (u64)dblks << inode->i_blkbits;
 	ip->i_height = mp->mp_fheight;
 	gfs2_add_inode_blocks(&ip->i_inode, alloced);
@@ -759,18 +753,63 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	return ret;
 }
 
-static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
+static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
+			       u64 length)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 
 	iomap->addr = (ip->i_no_addr << inode->i_blkbits) +
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
-	iomap->length = i_size_read(inode);
+	iomap->length = length;
 	iomap->type = IOMAP_INLINE;
 	iomap->flags = 0;
 }
 
+/**
+ * gfs2_alloc_size - Compute the maximum allocation size
+ * @inode: The inode
+ * @mp: The metapath
+ * @size: Requested size in blocks
+ *
+ * Compute the maximum size of the next allocation at @mp.
+ *
+ * Returns: size in blocks
+ */
+static u64 gfs2_alloc_size(struct inode *inode, struct metapath *mp, u64 size)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	const __be64 *first, *ptr, *end;
+
+	/*
+	 * For writes to stuffed files, this function is called twice via
+	 * gfs2_iomap_get, before and after unstuffing. The size we return the
+	 * first time needs to be large enough to get the reservation and
+	 * allocation sizes right.  The size we return the second time must
+	 * be exact or else gfs2_iomap_alloc won't do the right thing.
+	 */
+
+	if (gfs2_is_stuffed(ip) || mp->mp_fheight != mp->mp_aheight) {
+		unsigned maxsize = mp->mp_fheight > 1 ?
+			sdp->sd_inptrs : sdp->sd_diptrs;
+		maxsize -= mp->mp_list[mp->mp_fheight - 1];
+		if (size > maxsize)
+			size = maxsize;
+		return size;
+	}
+
+	first = metapointer(ip->i_height - 1, mp);
+	end = metaend(ip->i_height - 1, mp);
+	if (end - first > size)
+		end = first + size;
+	for (ptr = first; ptr < end; ptr++) {
+		if (*ptr)
+			break;
+	}
+	return ptr - first;
+}
+
 /**
  * gfs2_iomap_get - Map blocks from an inode to disk blocks
  * @inode: The inode
@@ -804,10 +843,15 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		if (flags & IOMAP_REPORT) {
 			if (pos >= i_size_read(inode))
 				return -ENOENT;
-			gfs2_stuffed_iomap(inode, iomap);
+			gfs2_stuffed_iomap(inode, iomap,
+					   i_size_read(inode));
+			return 0;
+		}
+		if (pos + length <= gfs2_max_stuffed_size(ip)) {
+			gfs2_stuffed_iomap(inode, iomap,
+					   gfs2_max_stuffed_size(ip));
 			return 0;
 		}
-		BUG_ON(!(flags & IOMAP_WRITE));
 	}
 	lblock = pos >> inode->i_blkbits;
 	iomap->offset = lblock << inode->i_blkbits;
@@ -867,10 +911,160 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 		else
 			iomap->length = size - pos;
+	} else if (flags & IOMAP_WRITE) {
+		u64 size;
+		size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
+		if (size < iomap->length)
+			iomap->length = size;
+	} else {
+		if (height == ip->i_height)
+			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 	}
 	goto out;
 }
 
+static int gfs2_write_lock(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	int error;
+
+	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+	error = gfs2_glock_nq(&ip->i_gh);
+	if (error)
+		goto out_uninit;
+	if (&ip->i_inode == sdp->sd_rindex) {
+		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+		error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
+					   GL_NOCACHE, &m_ip->i_gh);
+		if (error)
+			goto out_unlock;
+	}
+	return 0;
+
+out_unlock:
+	gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+	gfs2_holder_uninit(&ip->i_gh);
+	return error;
+}
+
+static void gfs2_write_unlock(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	if (&ip->i_inode == sdp->sd_rindex) {
+		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+		gfs2_glock_dq_uninit(&m_ip->i_gh);
+	}
+	gfs2_glock_dq_uninit(&ip->i_gh);
+}
+
+static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length,
+				  unsigned flags, struct iomap *iomap)
+{
+	struct metapath mp = { .mp_aheight = 1, };
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+	bool unstuff, alloc_required;
+	struct buffer_head *dibh = NULL;
+	int ret;
+
+	ret = gfs2_write_lock(inode);
+	if (ret)
+		return ret;
+
+	unstuff = gfs2_is_stuffed(ip) &&
+		  pos + length > gfs2_max_stuffed_size(ip);
+
+	ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+	if (ret)
+		goto out_release;
+
+	if (iomap->type == IOMAP_INLINE) {
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (ret)
+			goto out_release;
+		iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode);
+		/* dibh reference put in gfs2_iomap_end */
+	}
+
+	alloc_required = unstuff || iomap->type == IOMAP_HOLE;
+
+	if (alloc_required || gfs2_is_jdata(ip))
+		gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
+
+	if (alloc_required) {
+		struct gfs2_alloc_parms ap = { .target = data_blocks + ind_blocks };
+		ret = gfs2_quota_lock_check(ip, &ap);
+		if (ret)
+			goto out_release;
+
+		ret = gfs2_inplace_reserve(ip, &ap);
+		if (ret)
+			goto out_qunlock;
+	}
+
+	rblocks = RES_DINODE + ind_blocks;
+	if (gfs2_is_jdata(ip))
+		rblocks += data_blocks;
+	if (ind_blocks || data_blocks)
+		rblocks += RES_STATFS + RES_QUOTA;
+	if (inode == sdp->sd_rindex)
+		rblocks += 2 * RES_STATFS;
+	if (alloc_required)
+		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+
+	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
+	if (ret)
+		goto out_trans_fail;
+
+	if (unstuff) {
+		ret = gfs2_unstuff_dinode(ip, NULL);
+		if (ret)
+			goto out_trans_end;
+		release_metapath(&mp);
+		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
+				     flags, iomap, &mp);
+		if (ret)
+			goto out_trans_end;
+	}
+
+	if (iomap->type == IOMAP_HOLE) {
+		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+		if (ret) {
+			gfs2_trans_end(sdp);
+			if (alloc_required)
+				gfs2_inplace_release(ip);
+			punch_hole(ip, iomap->offset, iomap->length);
+			goto out_qunlock;
+		}
+	}
+	release_metapath(&mp);
+	return 0;
+
+out_trans_end:
+	gfs2_trans_end(sdp);
+out_trans_fail:
+	if (alloc_required)
+		gfs2_inplace_release(ip);
+out_qunlock:
+	if (alloc_required)
+		gfs2_quota_unlock(ip);
+out_release:
+	release_metapath(&mp);
+	if (dibh) {
+		brelse(dibh);
+		iomap->inline_data = NULL;
+	}
+	gfs2_write_unlock(inode);
+	return ret;
+}
+
 static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 			    unsigned flags, struct iomap *iomap)
 {
@@ -880,10 +1074,7 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	trace_gfs2_iomap_start(ip, pos, length, flags);
 	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
-		if (!ret && iomap->type == IOMAP_HOLE)
-			ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
-		release_metapath(&mp);
+		ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
 	} else {
 		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 		release_metapath(&mp);
@@ -892,8 +1083,83 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
+static void gfs2_iomap_page_write_end(struct inode *inode, loff_t pos,
+				      unsigned copied, struct page *page,
+				      struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+
+	if (gfs2_is_jdata(ip) && iomap->type != IOMAP_INLINE)
+		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+}
+
+static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+			  ssize_t written, unsigned flags, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	struct gfs2_trans *tr = current->journal_info;
+
+	if (!(flags & IOMAP_WRITE))
+		return 0;
+
+	if (iomap->type == IOMAP_INLINE) {
+		struct buffer_head *dibh;
+
+		dibh = gfs2_getbuf(ip->i_gl, ip->i_no_addr, NO_CREATE);
+		if (likely(dibh)) {
+			gfs2_trans_add_meta(ip->i_gl, dibh);
+			put_bh(dibh);
+			brelse(dibh);
+		}
+		iomap->inline_data = NULL;
+		goto out;
+	}
+
+	gfs2_ordered_add_inode(ip);
+
+	if (tr->tr_num_buf_new)
+		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+	else {
+		struct buffer_head *dibh;
+
+		dibh = gfs2_getbuf(ip->i_gl, ip->i_no_addr, NO_CREATE);
+		if (likely(dibh)) {
+			gfs2_trans_add_meta(ip->i_gl, dibh);
+			brelse(dibh);
+		}
+	}
+
+out:
+	if (inode == sdp->sd_rindex) {
+		adjust_fs_space(inode);
+		sdp->sd_rindex_uptodate = 0;
+	}
+
+	gfs2_trans_end(sdp);
+	gfs2_inplace_release(ip);
+
+	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
+		/* Deallocate blocks that were just allocated. */
+		loff_t blockmask = i_blocksize(inode) - 1;
+		loff_t end = (pos + length) & ~blockmask;
+		pos = (pos + written + blockmask) & ~blockmask;
+		if (pos < end) {
+			truncate_pagecache_range(inode, pos, end - 1);
+			punch_hole(ip, pos, end - pos);
+		}
+	}
+
+	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
+		gfs2_quota_unlock(ip);
+	gfs2_write_unlock(inode);
+	return 0;
+}
+
 const struct iomap_ops gfs2_iomap_ops = {
 	.iomap_begin = gfs2_iomap_begin,
+	.page_write_end = gfs2_iomap_page_write_end,
+	.iomap_end = gfs2_iomap_end,
 };
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 7137db7b0119..16dd395479a5 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -26,10 +26,12 @@
 #include <linux/dlm.h>
 #include <linux/dlm_plock.h>
 #include <linux/delay.h>
+#include <linux/backing-dev.h>
 
 #include "gfs2.h"
 #include "incore.h"
 #include "bmap.h"
+#include "aops.h"
 #include "dir.h"
 #include "glock.h"
 #include "glops.h"
@@ -691,9 +693,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 /**
  * gfs2_file_write_iter - Perform a write to a file
  * @iocb: The io context
- * @iov: The data to write
- * @nr_segs: Number of @iov segments
- * @pos: The file position
+ * @from: The data to write
  *
  * We have to do a lock/unlock here to refresh the inode size for
  * O_APPEND writes, otherwise we can land up writing at the wrong
@@ -705,8 +705,9 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
-	struct gfs2_inode *ip = GFS2_I(file_inode(file));
-	int ret;
+	struct inode *inode = file_inode(file);
+	struct gfs2_inode *ip = GFS2_I(inode);
+	ssize_t ret;
 
 	ret = gfs2_rsqa_alloc(ip);
 	if (ret)
@@ -723,7 +724,38 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		gfs2_glock_dq_uninit(&gh);
 	}
 
-	return generic_file_write_iter(iocb, from);
+	if (iocb->ki_flags & IOCB_DIRECT)
+		return generic_file_write_iter(iocb, from);
+
+	inode_lock(inode);
+	ret = generic_write_checks(iocb, from);
+	if (ret <= 0)
+		goto out;
+
+	/* We can write back this queue in page reclaim */
+	current->backing_dev_info = inode_to_bdi(inode);
+
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out2;
+
+	ret = file_update_time(file);
+	if (ret)
+		goto out2;
+
+	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+
+out2:
+	current->backing_dev_info = NULL;
+out:
+	inode_unlock(inode);
+	if (likely(ret > 0)) {
+		iocb->ki_pos += ret;
+
+		/* Handle various SYNC-type writes */
+		ret = generic_write_sync(iocb, ret);
+	}
+	return ret;
 }
 
 static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -754,7 +786,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 		if (error)
 			goto out;
 		offset = iomap.offset + iomap.length;
-		if (iomap.type != IOMAP_HOLE)
+		if (!(iomap.flags & IOMAP_F_NEW))
 			continue;
 		error = sb_issue_zeroout(sb, iomap.addr >> inode->i_blkbits,
 					 iomap.length >> inode->i_blkbits,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 08/10] gfs2: gfs2_extent_length cleanup
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (6 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 07/10] gfs2: iomap buffered write support Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 09/10] gfs2: iomap direct I/O support Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 10/10] gfs2: Remove gfs2_write_{begin,end} Andreas Gruenbacher
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 236e159c56d9..be6b01041aaa 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -406,22 +406,17 @@ static void release_metapath(struct metapath *mp)
 
 /**
  * gfs2_extent_length - Returns length of an extent of blocks
- * @start: Start of the buffer
- * @len: Length of the buffer in bytes
- * @ptr: Current position in the buffer
- * @limit: Max extent length to return (0 = unlimited)
+ * @bh: The metadata block
+ * @ptr: Current position in @bh
+ * @limit: Max extent length to return
  * @eob: Set to 1 if we hit "end of block"
  *
- * If the first block is zero (unallocated) it will return the number of
- * unallocated blocks in the extent, otherwise it will return the number
- * of contiguous blocks in the extent.
- *
  * Returns: The length of the extent (minimum of one block)
  */
 
-static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __be64 *ptr, size_t limit, int *eob)
+static inline unsigned int gfs2_extent_length(struct buffer_head *bh, __be64 *ptr, size_t limit, int *eob)
 {
-	const __be64 *end = (start + len);
+	const __be64 *end = (__be64 *)(bh->b_data + bh->b_size);
 	const __be64 *first = ptr;
 	u64 d = be64_to_cpu(*ptr);
 
@@ -430,14 +425,11 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
 		ptr++;
 		if (ptr >= end)
 			break;
-		if (limit && --limit == 0)
-			break;
-		if (d)
-			d++;
+		d++;
 	} while(be64_to_cpu(*ptr) == d);
 	if (ptr >= end)
 		*eob = 1;
-	return (ptr - first);
+	return ptr - first;
 }
 
 typedef const __be64 *(*gfs2_metadata_walker)(
@@ -883,7 +875,7 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		goto do_alloc;
 
 	bh = mp->mp_bh[ip->i_height - 1];
-	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
+	len = gfs2_extent_length(bh, ptr, len, &eob);
 
 	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
 	iomap->length = len << inode->i_blkbits;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 09/10] gfs2: iomap direct I/O support
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (7 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 08/10] gfs2: gfs2_extent_length cleanup Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  2018-06-04 19:31 ` [PATCH v8 10/10] gfs2: Remove gfs2_write_{begin,end} Andreas Gruenbacher
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 100 +----------------------------------
 fs/gfs2/bmap.c |  15 +++++-
 fs/gfs2/file.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 143 insertions(+), 110 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 3d9633175aa8..b0a219fae162 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -84,12 +84,6 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
 	return 0;
 }
 
-static int gfs2_get_block_direct(struct inode *inode, sector_t lblock,
-				 struct buffer_head *bh_result, int create)
-{
-	return gfs2_block_map(inode, lblock, bh_result, 0);
-}
-
 /**
  * gfs2_writepage_common - Common bits of writepage
  * @page: The page to be written
@@ -1021,96 +1015,6 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
 		try_to_release_page(page, 0);
 }
 
-/**
- * gfs2_ok_for_dio - check that dio is valid on this file
- * @ip: The inode
- * @offset: The offset at which we are reading or writing
- *
- * Returns: 0 (to ignore the i/o request and thus fall back to buffered i/o)
- *          1 (to accept the i/o request)
- */
-static int gfs2_ok_for_dio(struct gfs2_inode *ip, loff_t offset)
-{
-	/*
-	 * Should we return an error here? I can't see that O_DIRECT for
-	 * a stuffed file makes any sense. For now we'll silently fall
-	 * back to buffered I/O
-	 */
-	if (gfs2_is_stuffed(ip))
-		return 0;
-
-	if (offset >= i_size_read(&ip->i_inode))
-		return 0;
-	return 1;
-}
-
-
-
-static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-	struct address_space *mapping = inode->i_mapping;
-	struct gfs2_inode *ip = GFS2_I(inode);
-	loff_t offset = iocb->ki_pos;
-	struct gfs2_holder gh;
-	int rv;
-
-	/*
-	 * Deferred lock, even if its a write, since we do no allocation
-	 * on this path. All we need change is atime, and this lock mode
-	 * ensures that other nodes have flushed their buffered read caches
-	 * (i.e. their page cache entries for this inode). We do not,
-	 * unfortunately have the option of only flushing a range like
-	 * the VFS does.
-	 */
-	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
-	rv = gfs2_glock_nq(&gh);
-	if (rv)
-		goto out_uninit;
-	rv = gfs2_ok_for_dio(ip, offset);
-	if (rv != 1)
-		goto out; /* dio not valid, fall back to buffered i/o */
-
-	/*
-	 * Now since we are holding a deferred (CW) lock at this point, you
-	 * might be wondering why this is ever needed. There is a case however
-	 * where we've granted a deferred local lock against a cached exclusive
-	 * glock. That is ok provided all granted local locks are deferred, but
-	 * it also means that it is possible to encounter pages which are
-	 * cached and possibly also mapped. So here we check for that and sort
-	 * them out ahead of the dio. The glock state machine will take care of
-	 * everything else.
-	 *
-	 * If in fact the cached glock state (gl->gl_state) is deferred (CW) in
-	 * the first place, mapping->nr_pages will always be zero.
-	 */
-	if (mapping->nrpages) {
-		loff_t lstart = offset & ~(PAGE_SIZE - 1);
-		loff_t len = iov_iter_count(iter);
-		loff_t end = PAGE_ALIGN(offset + len) - 1;
-
-		rv = 0;
-		if (len == 0)
-			goto out;
-		if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
-			unmap_shared_mapping_range(ip->i_inode.i_mapping, offset, len);
-		rv = filemap_write_and_wait_range(mapping, lstart, end);
-		if (rv)
-			goto out;
-		if (iov_iter_rw(iter) == WRITE)
-			truncate_inode_pages_range(mapping, lstart, end);
-	}
-
-	rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
-				  gfs2_get_block_direct, NULL, NULL, 0);
-out:
-	gfs2_glock_dq(&gh);
-out_uninit:
-	gfs2_holder_uninit(&gh);
-	return rv;
-}
-
 /**
  * gfs2_releasepage - free the metadata associated with a page
  * @page: the page that's being released
@@ -1191,7 +1095,7 @@ static const struct address_space_operations gfs2_writeback_aops = {
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
-	.direct_IO = gfs2_direct_IO,
+	.direct_IO = noop_direct_IO,
 	.migratepage = buffer_migrate_page,
 	.is_partially_uptodate = block_is_partially_uptodate,
 	.error_remove_page = generic_error_remove_page,
@@ -1208,7 +1112,7 @@ static const struct address_space_operations gfs2_ordered_aops = {
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
-	.direct_IO = gfs2_direct_IO,
+	.direct_IO = noop_direct_IO,
 	.migratepage = buffer_migrate_page,
 	.is_partially_uptodate = block_is_partially_uptodate,
 	.error_remove_page = generic_error_remove_page,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index be6b01041aaa..5a4958d275da 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -905,6 +905,10 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 			iomap->length = size - pos;
 	} else if (flags & IOMAP_WRITE) {
 		u64 size;
+
+		if (flags & IOMAP_DIRECT)
+			goto out;
+
 		size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
 		if (size < iomap->length)
 			iomap->length = size;
@@ -1066,7 +1070,14 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	trace_gfs2_iomap_start(ip, pos, length, flags);
 	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+		if (flags & IOMAP_DIRECT) {
+			ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+			release_metapath(&mp);
+			if (iomap->type != IOMAP_MAPPED)
+				ret = -ENOTBLK;
+		} else {
+			ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+		}
 	} else {
 		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 		release_metapath(&mp);
@@ -1092,7 +1103,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_trans *tr = current->journal_info;
 
-	if (!(flags & IOMAP_WRITE))
+	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		return 0;
 
 	if (iomap->type == IOMAP_INLINE) {
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 16dd395479a5..8de92708f18b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -690,6 +690,91 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 	return ret ? ret : ret1;
 }
 
+static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *file = iocb->ki_filp;
+	struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
+	size_t count = iov_iter_count(to);
+	struct gfs2_holder gh;
+	ssize_t ret;
+
+	if (!count)
+		return 0; /* skip atime */
+
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+
+	/* fall back to buffered I/O for stuffed files */
+	ret = -ENOTBLK;
+	if (gfs2_is_stuffed(ip))
+		goto out;
+
+	ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL);
+
+out:
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
+static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	size_t len = iov_iter_count(from);
+	loff_t offset = iocb->ki_pos;
+	struct gfs2_holder gh;
+	ssize_t ret;
+
+	/*
+	 * Deferred lock, even if its a write, since we do no allocation on
+	 * this path. All we need to change is the atime, and this lock mode
+	 * ensures that other nodes have flushed their buffered read caches
+	 * (i.e. their page cache entries for this inode). We do not,
+	 * unfortunately, have the option of only flushing a range like the
+	 * VFS does.
+	 */
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+
+	/* Silently fall back to buffered I/O for stuffed files */
+	if (gfs2_is_stuffed(ip))
+		goto out;
+
+	/* Silently fall back to buffered I/O when writing beyond EOF */
+	if (offset + len > i_size_read(&ip->i_inode))
+		goto out;
+
+	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL);
+
+out:
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
+static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	ssize_t ret;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		ret = gfs2_file_direct_read(iocb, to);
+		if (likely(ret != -ENOTBLK))
+			goto out;
+		iocb->ki_flags &= ~IOCB_DIRECT;
+	}
+	ret = generic_file_read_iter(iocb, to);
+out:
+	return ret;
+}
+
 /**
  * gfs2_file_write_iter - Perform a write to a file
  * @iocb: The io context
@@ -707,7 +792,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct gfs2_inode *ip = GFS2_I(inode);
-	ssize_t ret;
+	ssize_t written = 0, ret;
 
 	ret = gfs2_rsqa_alloc(ip);
 	if (ret)
@@ -724,9 +809,6 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		gfs2_glock_dq_uninit(&gh);
 	}
 
-	if (iocb->ki_flags & IOCB_DIRECT)
-		return generic_file_write_iter(iocb, from);
-
 	inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret <= 0)
@@ -743,19 +825,55 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (ret)
 		goto out2;
 
-	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		struct address_space *mapping = file->f_mapping;
+		loff_t pos, endbyte;
+		ssize_t buffered;
+
+		written = gfs2_file_direct_write(iocb, from);
+		if (written < 0 || !iov_iter_count(from))
+			goto out2;
+
+		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+		if (unlikely(ret < 0))
+			goto out2;
+		buffered = ret;
+
+		/*
+		 * We need to ensure that the page cache pages are written to
+		 * disk and invalidated to preserve the expected O_DIRECT
+		 * semantics.
+		 */
+		pos = iocb->ki_pos;
+		endbyte = pos + buffered - 1;
+		ret = filemap_write_and_wait_range(mapping, pos, endbyte);
+		if (!ret) {
+			iocb->ki_pos += buffered;
+			written += buffered;
+			invalidate_mapping_pages(mapping,
+						 pos >> PAGE_SHIFT,
+						 endbyte >> PAGE_SHIFT);
+		} else {
+			/*
+			 * We don't know how much we wrote, so just return
+			 * the number of bytes which were direct-written
+			 */
+		}
+	} else {
+		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+		if (likely(ret > 0))
+			iocb->ki_pos += ret;
+	}
 
 out2:
 	current->backing_dev_info = NULL;
 out:
 	inode_unlock(inode);
 	if (likely(ret > 0)) {
-		iocb->ki_pos += ret;
-
 		/* Handle various SYNC-type writes */
 		ret = generic_write_sync(iocb, ret);
 	}
-	return ret;
+	return written ? written : ret;
 }
 
 static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -1157,7 +1275,7 @@ static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
 
 const struct file_operations gfs2_file_fops = {
 	.llseek		= gfs2_llseek,
-	.read_iter	= generic_file_read_iter,
+	.read_iter	= gfs2_file_read_iter,
 	.write_iter	= gfs2_file_write_iter,
 	.unlocked_ioctl	= gfs2_ioctl,
 	.mmap		= gfs2_mmap,
@@ -1187,7 +1305,7 @@ const struct file_operations gfs2_dir_fops = {
 
 const struct file_operations gfs2_file_fops_nolock = {
 	.llseek		= gfs2_llseek,
-	.read_iter	= generic_file_read_iter,
+	.read_iter	= gfs2_file_read_iter,
 	.write_iter	= gfs2_file_write_iter,
 	.unlocked_ioctl	= gfs2_ioctl,
 	.mmap		= gfs2_mmap,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 10/10] gfs2: Remove gfs2_write_{begin,end}
  2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
                   ` (8 preceding siblings ...)
  2018-06-04 19:31 ` [PATCH v8 09/10] gfs2: iomap direct I/O support Andreas Gruenbacher
@ 2018-06-04 19:31 ` Andreas Gruenbacher
  9 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-04 19:31 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Now that generic_file_write_iter is no longer used, there are no
remaining users of these address space operations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 210 -------------------------------------------------
 1 file changed, 210 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b0a219fae162..cc80fd71f3dd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -639,132 +639,6 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	return ret;
 }
 
-/**
- * gfs2_write_begin - Begin to write to a file
- * @file: The file to write to
- * @mapping: The mapping in which to write
- * @pos: The file offset at which to start writing
- * @len: Length of the write
- * @flags: Various flags
- * @pagep: Pointer to return the page
- * @fsdata: Pointer to return fs data (unused by GFS2)
- *
- * Returns: errno
- */
-
-static int gfs2_write_begin(struct file *file, struct address_space *mapping,
-			    loff_t pos, unsigned len, unsigned flags,
-			    struct page **pagep, void **fsdata)
-{
-	struct gfs2_inode *ip = GFS2_I(mapping->host);
-	struct gfs2_sbd *sdp = GFS2_SB(mapping->host);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
-	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
-	unsigned requested = 0;
-	int alloc_required;
-	int error = 0;
-	pgoff_t index = pos >> PAGE_SHIFT;
-	unsigned from = pos & (PAGE_SIZE - 1);
-	struct page *page;
-
-	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
-	error = gfs2_glock_nq(&ip->i_gh);
-	if (unlikely(error))
-		goto out_uninit;
-	if (&ip->i_inode == sdp->sd_rindex) {
-		error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
-					   GL_NOCACHE, &m_ip->i_gh);
-		if (unlikely(error)) {
-			gfs2_glock_dq(&ip->i_gh);
-			goto out_uninit;
-		}
-	}
-
-	alloc_required = gfs2_write_alloc_required(ip, pos, len);
-
-	if (alloc_required || gfs2_is_jdata(ip))
-		gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
-
-	if (alloc_required) {
-		struct gfs2_alloc_parms ap = { .aflags = 0, };
-		requested = data_blocks + ind_blocks;
-		ap.target = requested;
-		error = gfs2_quota_lock_check(ip, &ap);
-		if (error)
-			goto out_unlock;
-
-		error = gfs2_inplace_reserve(ip, &ap);
-		if (error)
-			goto out_qunlock;
-	}
-
-	rblocks = RES_DINODE + ind_blocks;
-	if (gfs2_is_jdata(ip))
-		rblocks += data_blocks ? data_blocks : 1;
-	if (ind_blocks || data_blocks)
-		rblocks += RES_STATFS + RES_QUOTA;
-	if (&ip->i_inode == sdp->sd_rindex)
-		rblocks += 2 * RES_STATFS;
-	if (alloc_required)
-		rblocks += gfs2_rg_blocks(ip, requested);
-
-	error = gfs2_trans_begin(sdp, rblocks,
-				 PAGE_SIZE/sdp->sd_sb.sb_bsize);
-	if (error)
-		goto out_trans_fail;
-
-	error = -ENOMEM;
-	flags |= AOP_FLAG_NOFS;
-	page = grab_cache_page_write_begin(mapping, index, flags);
-	*pagep = page;
-	if (unlikely(!page))
-		goto out_endtrans;
-
-	if (gfs2_is_stuffed(ip)) {
-		error = 0;
-		if (pos + len > gfs2_max_stuffed_size(ip)) {
-			error = gfs2_unstuff_dinode(ip, page);
-			if (error == 0)
-				goto prepare_write;
-		} else if (!PageUptodate(page)) {
-			error = stuffed_readpage(ip, page);
-		}
-		goto out;
-	}
-
-prepare_write:
-	error = __block_write_begin(page, from, len, gfs2_block_map);
-out:
-	if (error == 0)
-		return 0;
-
-	unlock_page(page);
-	put_page(page);
-
-	gfs2_trans_end(sdp);
-	if (pos + len > ip->i_inode.i_size)
-		gfs2_trim_blocks(&ip->i_inode);
-	goto out_trans_fail;
-
-out_endtrans:
-	gfs2_trans_end(sdp);
-out_trans_fail:
-	if (alloc_required) {
-		gfs2_inplace_release(ip);
-out_qunlock:
-		gfs2_quota_unlock(ip);
-	}
-out_unlock:
-	if (&ip->i_inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-out_uninit:
-	gfs2_holder_uninit(&ip->i_gh);
-	return error;
-}
-
 /**
  * adjust_fs_space - Adjusts the free space available due to gfs2_grow
  * @inode: the rindex inode
@@ -845,84 +719,6 @@ int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
 	return copied;
 }
 
-/**
- * gfs2_write_end
- * @file: The file to write to
- * @mapping: The address space to write to
- * @pos: The file position
- * @len: The length of the data
- * @copied: How much was actually copied by the VFS
- * @page: The page that has been written
- * @fsdata: The fsdata (unused in GFS2)
- *
- * The main write_end function for GFS2. We just put our locking around the VFS
- * provided functions.
- *
- * Returns: copied bytes or errno
- */
-
-static int gfs2_write_end(struct file *file, struct address_space *mapping,
-			  loff_t pos, unsigned len, unsigned copied,
-			  struct page *page, void *fsdata)
-{
-	struct inode *inode = page->mapping->host;
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
-	struct buffer_head *dibh;
-	int ret;
-	struct gfs2_trans *tr = current->journal_info;
-	BUG_ON(!tr);
-
-	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
-
-	ret = gfs2_meta_inode_buffer(ip, &dibh);
-	if (unlikely(ret))
-		goto out;
-
-	if (gfs2_is_stuffed(ip)) {
-		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
-		page = NULL;
-		goto out2;
-	}
-
-	if (gfs2_is_jdata(ip))
-		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
-	else
-		gfs2_ordered_add_inode(ip);
-
-	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
-	page = NULL;
-	if (tr->tr_num_buf_new)
-		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-	else
-		gfs2_trans_add_meta(ip->i_gl, dibh);
-
-out2:
-	if (inode == sdp->sd_rindex) {
-		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
-
-	brelse(dibh);
-out:
-	if (page) {
-		unlock_page(page);
-		put_page(page);
-	}
-	gfs2_trans_end(sdp);
-	gfs2_inplace_release(ip);
-	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
-		gfs2_quota_unlock(ip);
-	if (inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-	gfs2_holder_uninit(&ip->i_gh);
-	return ret;
-}
-
 /**
  * jdata_set_page_dirty - Page dirtying function
  * @page: The page to dirty
@@ -1090,8 +886,6 @@ static const struct address_space_operations gfs2_writeback_aops = {
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
@@ -1106,8 +900,6 @@ static const struct address_space_operations gfs2_ordered_aops = {
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.set_page_dirty = __set_page_dirty_buffers,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
@@ -1123,8 +915,6 @@ static const struct address_space_operations gfs2_jdata_aops = {
 	.writepages = gfs2_jdata_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.set_page_dirty = jdata_set_page_dirty,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 05/10] iomap: Generic inline data handling
  2018-06-04 19:31 ` [PATCH v8 05/10] iomap: Generic inline data handling Andreas Gruenbacher
@ 2018-06-05  5:34   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2018-06-05  5:34 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel

On Mon, Jun 04, 2018 at 09:31:18PM +0200, Andreas Gruenbacher wrote:
> Add generic inline data handling by adding a pointer to the inline data
> region to struct iomap.  When handling a buffered IOMAP_INLINE write,
> iomap_write_begin will copy the current inline data from the inline data
> region into the page cache, and iomap_write_end will copy the changes in
> the page cache back to the inline data region.
> 
> This doesn't cover inline data reads and direct I/O yet because so far,
> we have no users.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 06/10] iomap: Add page_write_end iomap hook
  2018-06-04 19:31 ` [PATCH v8 06/10] iomap: Add page_write_end iomap hook Andreas Gruenbacher
@ 2018-06-05  7:56   ` Andreas Grünbacher
  2018-06-05 12:07   ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: Andreas Grünbacher @ 2018-06-05  7:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: cluster-devel, Andreas Gruenbacher, Linux FS-devel Mailing List

2018-06-04 21:31 GMT+02:00 Andreas Gruenbacher <agruenba@redhat.com>:
> Add a page_write_end hook called when done writing to a page, for
> filesystems that implement data journaling: in that case, pages are
> written to the journal before being written back to their proper on-disk
> locations.

> The new hook is bypassed for IOMAP_INLINE mappings.

Oops, no longer true of course.

> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>  fs/iomap.c            | 58 ++++++++++++++++++++++++++++++++-----------
>  include/linux/iomap.h |  8 ++++++
>  2 files changed, 52 insertions(+), 14 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 48cd67227811..3b34c957d2fd 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -181,16 +181,22 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>
>  static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -               unsigned copied, struct page *page, struct iomap *iomap)
> +               unsigned copied, struct page *page, struct iomap *iomap,
> +               const struct iomap_ops *ops)
>  {
> +       typeof(ops->page_write_end) page_write_end = ops->page_write_end;
>         int ret;
>
>         if (iomap->type == IOMAP_INLINE) {
>                 iomap_write_inline_data(inode, page, iomap, pos, copied);
>                 __generic_write_end(inode, pos, copied, page);
> +               if (page_write_end)
> +                       page_write_end(inode, pos, copied, page, iomap);
>                 return copied;
>         }
>
> +       if (page_write_end)
> +               page_write_end(inode, pos, copied, page, iomap);
>         ret = generic_write_end(NULL, inode->i_mapping, pos, len,
>                         copied, page, NULL);
>         if (ret < len)
> @@ -198,11 +204,17 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>         return ret;
>  }
>
> +struct iomap_write_args {
> +       const struct iomap_ops *ops;
> +       struct iov_iter *iter;
> +};
> +
>  static loff_t
>  iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>                 struct iomap *iomap)
>  {
> -       struct iov_iter *i = data;
> +       struct iomap_write_args *args = data;
> +       struct iov_iter *i = args->iter;
>         long status = 0;
>         ssize_t written = 0;
>         unsigned int flags = AOP_FLAG_NOFS;
> @@ -248,7 +260,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>                 flush_dcache_page(page);
>
>                 status = iomap_write_end(inode, pos, bytes, copied, page,
> -                               iomap);
> +                               iomap, args->ops);
>                 if (unlikely(status < 0))
>                         break;
>                 copied = status;
> @@ -285,10 +297,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
>  {
>         struct inode *inode = iocb->ki_filp->f_mapping->host;
>         loff_t pos = iocb->ki_pos, ret = 0, written = 0;
> +       struct iomap_write_args args = {
> +               .ops = ops,
> +               .iter = iter,
> +       };
>
>         while (iov_iter_count(iter)) {
>                 ret = iomap_apply(inode, pos, iov_iter_count(iter),
> -                               IOMAP_WRITE, ops, iter, iomap_write_actor);
> +                               IOMAP_WRITE, ops, &args, iomap_write_actor);
>                 if (ret <= 0)
>                         break;
>                 pos += ret;
> @@ -319,6 +335,7 @@ static loff_t
>  iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>                 struct iomap *iomap)
>  {
> +       const struct iomap_ops *ops = data;
>         long status = 0;
>         ssize_t written = 0;
>
> @@ -342,7 +359,8 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
>                 WARN_ON_ONCE(!PageUptodate(page));
>
> -               status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +               status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> +                               ops);
>                 if (unlikely(status <= 0)) {
>                         if (WARN_ON_ONCE(status == 0))
>                                 return -EIO;
> @@ -368,8 +386,8 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>         loff_t ret;
>
>         while (len) {
> -               ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
> -                               iomap_dirty_actor);
> +               ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops,
> +                               (void *)ops, iomap_dirty_actor);
>                 if (ret <= 0)
>                         return ret;
>                 pos += ret;
> @@ -381,7 +399,8 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>  EXPORT_SYMBOL_GPL(iomap_file_dirty);
>
>  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -               unsigned bytes, struct iomap *iomap)
> +               unsigned bytes, const struct iomap_ops *ops,
> +               struct iomap *iomap)
>  {
>         struct page *page;
>         int status;
> @@ -394,7 +413,8 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>         zero_user(page, offset, bytes);
>         mark_page_accessed(page);
>
> -       return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +       return iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> +                       ops);
>  }
>
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -407,11 +427,16 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
>                         offset, bytes);
>  }
>
> +struct iomap_zero_range_args {
> +       const struct iomap_ops *ops;
> +       bool *did_zero;
> +};
> +
>  static loff_t
>  iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
>                 void *data, struct iomap *iomap)
>  {
> -       bool *did_zero = data;
> +       struct iomap_zero_range_args *args = data;
>         loff_t written = 0;
>         int status;
>
> @@ -428,15 +453,16 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
>                 if (IS_DAX(inode))
>                         status = iomap_dax_zero(pos, offset, bytes, iomap);
>                 else
> -                       status = iomap_zero(inode, pos, offset, bytes, iomap);
> +                       status = iomap_zero(inode, pos, offset, bytes,
> +                                       args->ops, iomap);
>                 if (status < 0)
>                         return status;
>
>                 pos += bytes;
>                 count -= bytes;
>                 written += bytes;
> -               if (did_zero)
> -                       *did_zero = true;
> +               if (args->did_zero)
> +                       *args->did_zero = true;
>         } while (count > 0);
>
>         return written;
> @@ -446,11 +472,15 @@ int
>  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>                 const struct iomap_ops *ops)
>  {
> +       struct iomap_zero_range_args args = {
> +               .ops = ops,
> +               .did_zero = did_zero,
> +       };
>         loff_t ret;
>
>         while (len > 0) {
>                 ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
> -                               ops, did_zero, iomap_zero_range_actor);
> +                               ops, &args, iomap_zero_range_actor);
>                 if (ret <= 0)
>                         return ret;
>
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index c61113c71a60..ac7f1b2c1cbe 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -8,6 +8,7 @@ struct fiemap_extent_info;
>  struct inode;
>  struct iov_iter;
>  struct kiocb;
> +struct page;
>  struct vm_area_struct;
>  struct vm_fault;
>
> @@ -71,6 +72,13 @@ struct iomap_ops {
>         int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
>                         unsigned flags, struct iomap *iomap);
>
> +       /*
> +        * Called when done writing to a page (optional; skipped for
> +        * IOMAP_INLINE mappings).

The comment needs updating too, of course.

> +        */
> +       void (*page_write_end)(struct inode *inode, loff_t pos, unsigned copied,
> +                       struct page *page, struct iomap *iomap);
> +
>         /*
>          * Commit and/or unreserve space previous allocated using iomap_begin.
>          * Written indicates the length of the successful write operation which
> --
> 2.17.0
>

With those two changes, is this commit okay as well?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 06/10] iomap: Add page_write_end iomap hook
  2018-06-04 19:31 ` [PATCH v8 06/10] iomap: Add page_write_end iomap hook Andreas Gruenbacher
  2018-06-05  7:56   ` Andreas Grünbacher
@ 2018-06-05 12:07   ` David Sterba
  2018-06-05 12:17     ` Andreas Grünbacher
  1 sibling, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-06-05 12:07 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel

On Mon, Jun 04, 2018 at 09:31:19PM +0200, Andreas Gruenbacher wrote:
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -181,16 +181,22 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  
>  static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page, struct iomap *iomap)
> +		unsigned copied, struct page *page, struct iomap *iomap,
> +		const struct iomap_ops *ops)
>  {
> +	typeof(ops->page_write_end) page_write_end = ops->page_write_end;

Is the reason to use typeof is to avoid repeating the type of
page_write_end? As it's only for a temporary variable with 2 uses,
ops->page_write_end does not hurt readability nor is too much typing.

I would not recommend using typeof outside of the justified contexts
like macros or without a good reason.

>  	int ret;
>  
>  	if (iomap->type == IOMAP_INLINE) {
>  		iomap_write_inline_data(inode, page, iomap, pos, copied);
>  		__generic_write_end(inode, pos, copied, page);
> +		if (page_write_end)
> +			page_write_end(inode, pos, copied, page, iomap);
>  		return copied;
>  	}
>  
> +	if (page_write_end)
> +		page_write_end(inode, pos, copied, page, iomap);
>  	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
>  			copied, page, NULL);
>  	if (ret < len)
> @@ -198,11 +204,17 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously
  2018-06-04 19:31 ` [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
@ 2018-06-05 12:10   ` David Sterba
  2018-06-05 12:32     ` Andreas Grünbacher
  2018-06-06 10:26     ` Christoph Hellwig
  0 siblings, 2 replies; 21+ messages in thread
From: David Sterba @ 2018-06-05 12:10 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel

On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
> @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	if (ret < 0)
>  		iomap_dio_set_error(dio, ret);
>  
> +	smp_mb__before_atomic();
>  	if (!atomic_dec_and_test(&dio->ref)) {

The barrier should be documented. I tried to do a quick look around the
code if it's clear why it's there but it's not. Thanks.

> -		if (!is_sync_kiocb(iocb))
> +		if (!dio->wait_for_completion)
>  			return -EIOCBQUEUED;
>  
>  		for (;;) {

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 06/10] iomap: Add page_write_end iomap hook
  2018-06-05 12:07   ` David Sterba
@ 2018-06-05 12:17     ` Andreas Grünbacher
  2018-06-05 12:29       ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Grünbacher @ 2018-06-05 12:17 UTC (permalink / raw)
  To: dsterba
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Linux FS-devel Mailing List

2018-06-05 14:07 GMT+02:00 David Sterba <dsterba@suse.cz>:
> On Mon, Jun 04, 2018 at 09:31:19PM +0200, Andreas Gruenbacher wrote:
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -181,16 +181,22 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>>
>>  static int
>>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>> -             unsigned copied, struct page *page, struct iomap *iomap)
>> +             unsigned copied, struct page *page, struct iomap *iomap,
>> +             const struct iomap_ops *ops)
>>  {
>> +     typeof(ops->page_write_end) page_write_end = ops->page_write_end;
>
> Is the reason to use typeof is to avoid repeating the type of
> page_write_end?

Yes, the type is void (*)(struct inode *, loff_t, unsigned, struct
page *, struct iomap *), which is a bit bulky.

> As it's only for a temporary variable with 2 uses,
> ops->page_write_end does not hurt readability nor is too much typing.
> I would not recommend using typeof outside of the justified contexts
> like macros or without a good reason.

Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 06/10] iomap: Add page_write_end iomap hook
  2018-06-05 12:17     ` Andreas Grünbacher
@ 2018-06-05 12:29       ` David Sterba
  2018-06-05 12:50         ` Andreas Grünbacher
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-06-05 12:29 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: dsterba, Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Linux FS-devel Mailing List

On Tue, Jun 05, 2018 at 02:17:38PM +0200, Andreas Gr�nbacher wrote:
> 2018-06-05 14:07 GMT+02:00 David Sterba <dsterba@suse.cz>:
> > On Mon, Jun 04, 2018 at 09:31:19PM +0200, Andreas Gruenbacher wrote:
> >> --- a/fs/iomap.c
> >> +++ b/fs/iomap.c
> >> @@ -181,16 +181,22 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >>
> >>  static int
> >>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> >> -             unsigned copied, struct page *page, struct iomap *iomap)
> >> +             unsigned copied, struct page *page, struct iomap *iomap,
> >> +             const struct iomap_ops *ops)
> >>  {
> >> +     typeof(ops->page_write_end) page_write_end = ops->page_write_end;
> >
> > Is the reason to use typeof is to avoid repeating the type of
> > page_write_end?
> 
> Yes, the type is void (*)(struct inode *, loff_t, unsigned, struct
> page *, struct iomap *), which is a bit bulky.

Agreed, so why don't you simply use ops->page_write_end ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously
  2018-06-05 12:10   ` David Sterba
@ 2018-06-05 12:32     ` Andreas Grünbacher
  2018-06-06 10:26     ` Christoph Hellwig
  1 sibling, 0 replies; 21+ messages in thread
From: Andreas Grünbacher @ 2018-06-05 12:32 UTC (permalink / raw)
  To: dsterba
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Linux FS-devel Mailing List

2018-06-05 14:10 GMT+02:00 David Sterba <dsterba@suse.cz>:
> On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
>> @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>       if (ret < 0)
>>               iomap_dio_set_error(dio, ret);
>>

+       /*
+        * Make sure changes to dio are visible globally before the atomic
+        * counter decrement.
+        */

>> +     smp_mb__before_atomic();
>>       if (!atomic_dec_and_test(&dio->ref)) {
>
> The barrier should be documented. I tried to do a quick look around the
> code if it's clear why it's there but it's not. Thanks.
>
>> -             if (!is_sync_kiocb(iocb))
>> +             if (!dio->wait_for_completion)
>>                       return -EIOCBQUEUED;
>>
>>               for (;;) {

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 06/10] iomap: Add page_write_end iomap hook
  2018-06-05 12:29       ` David Sterba
@ 2018-06-05 12:50         ` Andreas Grünbacher
  0 siblings, 0 replies; 21+ messages in thread
From: Andreas Grünbacher @ 2018-06-05 12:50 UTC (permalink / raw)
  To: dsterba
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Linux FS-devel Mailing List

2018-06-05 14:29 GMT+02:00 David Sterba <dsterba@suse.cz>:
> On Tue, Jun 05, 2018 at 02:17:38PM +0200, Andreas Grünbacher wrote:
>> 2018-06-05 14:07 GMT+02:00 David Sterba <dsterba@suse.cz>:
>> > On Mon, Jun 04, 2018 at 09:31:19PM +0200, Andreas Gruenbacher wrote:
>> >> --- a/fs/iomap.c
>> >> +++ b/fs/iomap.c
>> >> @@ -181,16 +181,22 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>> >>
>> >>  static int
>> >>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>> >> -             unsigned copied, struct page *page, struct iomap *iomap)
>> >> +             unsigned copied, struct page *page, struct iomap *iomap,
>> >> +             const struct iomap_ops *ops)
>> >>  {
>> >> +     typeof(ops->page_write_end) page_write_end = ops->page_write_end;
>> >
>> > Is the reason to use typeof is to avoid repeating the type of
>> > page_write_end?
>>
>> Yes, the type is void (*)(struct inode *, loff_t, unsigned, struct
>> page *, struct iomap *), which is a bit bulky.
>
> Agreed, so why don't you simply use ops->page_write_end ?

Alright.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously
  2018-06-05 12:10   ` David Sterba
  2018-06-05 12:32     ` Andreas Grünbacher
@ 2018-06-06 10:26     ` Christoph Hellwig
  2018-06-06 11:44       ` Andreas Gruenbacher
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-06-06 10:26 UTC (permalink / raw)
  To: David Sterba
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig, linux-fsdevel

On Tue, Jun 05, 2018 at 02:10:24PM +0200, David Sterba wrote:
> On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
> > @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >  	if (ret < 0)
> >  		iomap_dio_set_error(dio, ret);
> >  
> > +	smp_mb__before_atomic();
> >  	if (!atomic_dec_and_test(&dio->ref)) {
> 
> The barrier should be documented. I tried to do a quick look around the
> code if it's clear why it's there but it's not. Thanks.

It doesn't really make sense.  smp_mb__before_atomic is for relaxed
atomic operations, which atomic_dec_and_test is not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously
  2018-06-06 10:26     ` Christoph Hellwig
@ 2018-06-06 11:44       ` Andreas Gruenbacher
  0 siblings, 0 replies; 21+ messages in thread
From: Andreas Gruenbacher @ 2018-06-06 11:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Sterba, cluster-devel, linux-fsdevel

On 6 June 2018 at 12:26, Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Jun 05, 2018 at 02:10:24PM +0200, David Sterba wrote:
>> On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
>> > @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>> >     if (ret < 0)
>> >             iomap_dio_set_error(dio, ret);
>> >
>> > +   smp_mb__before_atomic();
>> >     if (!atomic_dec_and_test(&dio->ref)) {
>>
>> The barrier should be documented. I tried to do a quick look around the
>> code if it's clear why it's there but it's not. Thanks.
>
> It doesn't really make sense.  smp_mb__before_atomic is for relaxed
> atomic operations, which atomic_dec_and_test is not.

After re-reading Documentation/core-api/atomic_ops.rst again, I agree
it's not needed. There is an example in the documentation that was
confusing me.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-06-06 11:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 19:31 [PATCH v8 00/10] gfs2 iomap write support Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 01/10] iomap: inline data should be an iomap type, not a flag Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 02/10] iomap: Mark newly allocated buffer heads as new Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 03/10] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
2018-06-05 12:10   ` David Sterba
2018-06-05 12:32     ` Andreas Grünbacher
2018-06-06 10:26     ` Christoph Hellwig
2018-06-06 11:44       ` Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 04/10] fs: factor out a __generic_write_end helper Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 05/10] iomap: Generic inline data handling Andreas Gruenbacher
2018-06-05  5:34   ` Christoph Hellwig
2018-06-04 19:31 ` [PATCH v8 06/10] iomap: Add page_write_end iomap hook Andreas Gruenbacher
2018-06-05  7:56   ` Andreas Grünbacher
2018-06-05 12:07   ` David Sterba
2018-06-05 12:17     ` Andreas Grünbacher
2018-06-05 12:29       ` David Sterba
2018-06-05 12:50         ` Andreas Grünbacher
2018-06-04 19:31 ` [PATCH v8 07/10] gfs2: iomap buffered write support Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 08/10] gfs2: gfs2_extent_length cleanup Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 09/10] gfs2: iomap direct I/O support Andreas Gruenbacher
2018-06-04 19:31 ` [PATCH v8 10/10] gfs2: Remove gfs2_write_{begin,end} Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).