All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/14] gfs2 iomap write support
@ 2018-05-30  9:48 ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Here's an update of my gfs2 iomap write patch queue, with support for
buffered writes as well as direct I/O reads and writes through iomap.
This update fixes a few bugs in the gfs2 specific parts.  The patches
are still meant for the upcoming merge window.

The first five patches are minor gfs2 specific cleanups and
improvements.

The actual fun starts with patch "iomap: Add write_{begin,end} iomap
operations" which adds callbacks into the filesystem for any per-page
processing.  Christoph has suggested that at least the stuffed inode
(aka. inline data) handline should be moved to the iomap_{begin,end}
operations.  As the last patch tries to demonstrate, this is a bit
tricky, so even though Christoph's approach is slightly cleaner
conceptually, it doesn't look like an improvement overall.

Patches "iomap: Mark newly allocated buffer heads as new" and "iomap:
Complete partial direct I/O writes synchronously" have been added as
discussed.

A 4.17-rc7 based version of the patches can be found here:

  https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-write

Thanks,
Andreas

Andreas Gruenbacher (13):
  gfs2: Update find_metapath comment
  gfs2: hole_size improvement
  gfs2: gfs2_stuffed_write_end cleanup
  gfs2: Remove ordered write mode handling from gfs2_trans_add_data
  gfs2: Iomap cleanups and improvements
  iomap: Add write_{begin,end} iomap operations
  iomap: Mark newly allocated buffer heads as new
  gfs2: iomap buffered write support
  gfs2: gfs2_extent_length cleanup
  iomap: Complete partial direct I/O writes synchronously
  gfs2: iomap direct I/O support
  gfs2: Remove gfs2_write_{begin,end}
  gfs2: Handle stuffed files in iomap_{begin,end}

Christoph Hellwig (1):
  iomap: inline data should be an iomap type, not a flag

 fs/buffer.c           |   8 +-
 fs/ext2/inode.c       |   2 +
 fs/ext4/inline.c      |   4 +-
 fs/ext4/inode.c       |   2 +
 fs/gfs2/aops.c        | 344 +------------------
 fs/gfs2/aops.h        |  22 ++
 fs/gfs2/bmap.c        | 772 ++++++++++++++++++++++++++++++++----------
 fs/gfs2/bmap.h        |   6 +-
 fs/gfs2/file.c        | 174 +++++++++-
 fs/gfs2/inode.c       |   4 -
 fs/gfs2/log.h         |   7 +-
 fs/gfs2/quota.c       |   5 +-
 fs/gfs2/trans.c       |  27 +-
 fs/iomap.c            | 109 +++---
 fs/xfs/xfs_iomap.c    |   2 +
 include/linux/iomap.h |  25 +-
 16 files changed, 920 insertions(+), 593 deletions(-)
 create mode 100644 fs/gfs2/aops.h

-- 
2.17.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 00/14] gfs2 iomap write support
@ 2018-05-30  9:48 ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Here's an update of my gfs2 iomap write patch queue, with support for
buffered writes as well as direct I/O reads and writes through iomap.
This update fixes a few bugs in the gfs2 specific parts.  The patches
are still meant for the upcoming merge window.

The first five patches are minor gfs2 specific cleanups and
improvements.

The actual fun starts with patch "iomap: Add write_{begin,end} iomap
operations" which adds callbacks into the filesystem for any per-page
processing.  Christoph has suggested that at least the stuffed inode
(aka. inline data) handline should be moved to the iomap_{begin,end}
operations.  As the last patch tries to demonstrate, this is a bit
tricky, so even though Christoph's approach is slightly cleaner
conceptually, it doesn't look like an improvement overall.

Patches "iomap: Mark newly allocated buffer heads as new" and "iomap:
Complete partial direct I/O writes synchronously" have been added as
discussed.

A 4.17-rc7 based version of the patches can be found here:

  https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-write

Thanks,
Andreas

Andreas Gruenbacher (13):
  gfs2: Update find_metapath comment
  gfs2: hole_size improvement
  gfs2: gfs2_stuffed_write_end cleanup
  gfs2: Remove ordered write mode handling from gfs2_trans_add_data
  gfs2: Iomap cleanups and improvements
  iomap: Add write_{begin,end} iomap operations
  iomap: Mark newly allocated buffer heads as new
  gfs2: iomap buffered write support
  gfs2: gfs2_extent_length cleanup
  iomap: Complete partial direct I/O writes synchronously
  gfs2: iomap direct I/O support
  gfs2: Remove gfs2_write_{begin,end}
  gfs2: Handle stuffed files in iomap_{begin,end}

Christoph Hellwig (1):
  iomap: inline data should be an iomap type, not a flag

 fs/buffer.c           |   8 +-
 fs/ext2/inode.c       |   2 +
 fs/ext4/inline.c      |   4 +-
 fs/ext4/inode.c       |   2 +
 fs/gfs2/aops.c        | 344 +------------------
 fs/gfs2/aops.h        |  22 ++
 fs/gfs2/bmap.c        | 772 ++++++++++++++++++++++++++++++++----------
 fs/gfs2/bmap.h        |   6 +-
 fs/gfs2/file.c        | 174 +++++++++-
 fs/gfs2/inode.c       |   4 -
 fs/gfs2/log.h         |   7 +-
 fs/gfs2/quota.c       |   5 +-
 fs/gfs2/trans.c       |  27 +-
 fs/iomap.c            | 109 +++---
 fs/xfs/xfs_iomap.c    |   2 +
 include/linux/iomap.h |  25 +-
 16 files changed, 920 insertions(+), 593 deletions(-)
 create mode 100644 fs/gfs2/aops.h

-- 
2.17.0



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v5 01/14] gfs2: Update find_metapath comment
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0590e93494f7..fcf2f7d166de 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -176,8 +176,8 @@ int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page)
 /**
  * find_metapath - Find path through the metadata tree
  * @sdp: The superblock
- * @mp: The metapath to return the result in
  * @block: The disk block to look up
+ * @mp: The metapath to return the result in
  * @height: The pre-calculated height of the metadata tree
  *
  *   This routine returns a struct metapath structure that defines a path
@@ -188,8 +188,7 @@ int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page)
  *   filesystem with a blocksize of 4096.
  *
  *   find_metapath() would return a struct metapath structure set to:
- *   mp_offset = 101342453, mp_height = 3, mp_list[0] = 0, mp_list[1] = 48,
- *   and mp_list[2] = 165.
+ *   mp_fheight = 3, mp_list[0] = 0, mp_list[1] = 48, and mp_list[2] = 165.
  *
  *   That means that in order to get to the block containing the byte at
  *   offset 101342453, we would load the indirect block pointed to by pointer
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 01/14] gfs2: Update find_metapath comment
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0590e93494f7..fcf2f7d166de 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -176,8 +176,8 @@ int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page)
 /**
  * find_metapath - Find path through the metadata tree
  * @sdp: The superblock
- * @mp: The metapath to return the result in
  * @block: The disk block to look up
+ * @mp: The metapath to return the result in
  * @height: The pre-calculated height of the metadata tree
  *
  *   This routine returns a struct metapath structure that defines a path
@@ -188,8 +188,7 @@ int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page)
  *   filesystem with a blocksize of 4096.
  *
  *   find_metapath() would return a struct metapath structure set to:
- *   mp_offset = 101342453, mp_height = 3, mp_list[0] = 0, mp_list[1] = 48,
- *   and mp_list[2] = 165.
+ *   mp_fheight = 3, mp_list[0] = 0, mp_list[1] = 48, and mp_list[2] = 165.
  *
  *   That means that in order to get to the block containing the byte at
  *   offset 101342453, we would load the indirect block pointed to by pointer
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 02/14] gfs2: hole_size improvement
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Reimplement function hole_size based on a generic function for walking
the metadata tree and rename hole_size to gfs2_hole_size.  While
previously, multiple invocations of hole_size were sometimes needed to
walk across the entire hole, the new implementation always returns the
entire hole at once (provided that the caller is interested in the total
size).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 210 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 153 insertions(+), 57 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index fcf2f7d166de..69f846418ad5 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -278,6 +278,21 @@ static inline __be64 *metapointer(unsigned int height, const struct metapath *mp
 	return p + mp->mp_list[height];
 }
 
+static inline const __be64 *metaend(unsigned int height, const struct metapath *mp)
+{
+	const struct buffer_head *bh = mp->mp_bh[height];
+	return (const __be64 *)(bh->b_data + bh->b_size);
+}
+
+static void clone_metapath(struct metapath *clone, struct metapath *mp)
+{
+	unsigned int hgt;
+
+	*clone = *mp;
+	for (hgt = 0; hgt < mp->mp_aheight; hgt++)
+		get_bh(clone->mp_bh[hgt]);
+}
+
 static void gfs2_metapath_ra(struct gfs2_glock *gl, __be64 *start, __be64 *end)
 {
 	const __be64 *t;
@@ -419,6 +434,142 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
 	return (ptr - first);
 }
 
+typedef const __be64 *(*gfs2_metadata_walker)(
+		struct metapath *mp,
+		const __be64 *start, const __be64 *end,
+		u64 factor, void *data);
+
+#define WALK_STOP ((__be64 *)0)
+#define WALK_NEXT ((__be64 *)1)
+
+static int gfs2_walk_metadata(struct inode *inode, sector_t lblock,
+		u64 len, struct metapath *mp, gfs2_metadata_walker walker,
+		void *data)
+{
+	struct metapath clone;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	const __be64 *start, *end, *ptr;
+	u64 factor = 1;
+	unsigned int hgt;
+	int ret = 0;
+
+	for (hgt = ip->i_height - 1; hgt >= mp->mp_aheight; hgt--)
+		factor *= sdp->sd_inptrs;
+
+	for (;;) {
+		u64 step;
+
+		/* Walk indirect block. */
+		start = metapointer(hgt, mp);
+		end = metaend(hgt, mp);
+
+		step = (end - start) * factor;
+		if (step > len)
+			end = start + DIV_ROUND_UP_ULL(len, factor);
+
+		ptr = walker(mp, start, end, factor, data);
+		if (ptr == WALK_STOP)
+			break;
+		if (step >= len)
+			break;
+		len -= step;
+		if (ptr != WALK_NEXT) {
+			BUG_ON(!*ptr);
+			mp->mp_list[hgt] += ptr - start;
+			goto fill_up_metapath;
+		}
+
+lower_metapath:
+		/* Decrease height of metapath. */
+		if (mp != &clone) {
+			clone_metapath(&clone, mp);
+			mp = &clone;
+		}
+		brelse(mp->mp_bh[hgt]);
+		mp->mp_bh[hgt] = NULL;
+		if (!hgt)
+			break;
+		hgt--;
+		factor *= sdp->sd_inptrs;
+
+		/* Advance in metadata tree. */
+		(mp->mp_list[hgt])++;
+		start = metapointer(hgt, mp);
+		end = metaend(hgt, mp);
+		if (start >= end) {
+			mp->mp_list[hgt] = 0;
+			if (!hgt)
+				break;
+			goto lower_metapath;
+		}
+
+fill_up_metapath:
+		/* Increase height of metapath. */
+		if (mp != &clone) {
+			clone_metapath(&clone, mp);
+			mp = &clone;
+		}
+		ret = fillup_metapath(ip, mp, ip->i_height - 1);
+		if (ret < 0)
+			break;
+		hgt += ret;
+		for (; ret; ret--)
+			do_div(factor, sdp->sd_inptrs);
+		mp->mp_aheight = hgt + 1;
+	}
+	if (mp == &clone)
+		release_metapath(mp);
+	return ret;
+}
+
+struct gfs2_hole_walker_args {
+	u64 blocks;
+};
+
+static const __be64 *gfs2_hole_walker(struct metapath *mp,
+		const __be64 *start, const __be64 *end,
+		u64 factor, void *data)
+{
+	struct gfs2_hole_walker_args *args = data;
+	const __be64 *ptr;
+
+	for (ptr = start; ptr < end; ptr++) {
+		if (*ptr) {
+			args->blocks += (ptr - start) * factor;
+			if (mp->mp_aheight == mp->mp_fheight)
+				return WALK_STOP;
+			return ptr;  /* increase height */
+		}
+	}
+	args->blocks += (end - start) * factor;
+	return WALK_NEXT;
+}
+
+/**
+ * gfs2_hole_size - figure out the size of a hole
+ * @inode: The inode
+ * @lblock: The logical starting block number
+ * @len: How far to look (in blocks)
+ * @mp: The metapath at lblock
+ * @iomap: The iomap to store the hole size in
+ *
+ * This function modifies @mp.
+ *
+ * Returns: errno on error
+ */
+static int gfs2_hole_size(struct inode *inode, sector_t lblock, u64 len,
+			  struct metapath *mp, struct iomap *iomap)
+{
+	struct gfs2_hole_walker_args args = { };
+	int ret = 0;
+
+	ret = gfs2_walk_metadata(inode, lblock, len, mp, gfs2_hole_walker, &args);
+	if (!ret)
+		iomap->length = args.blocks << inode->i_blkbits;
+	return ret;
+}
+
 static inline void bmap_lock(struct gfs2_inode *ip, int create)
 {
 	if (create)
@@ -615,62 +766,6 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	return 0;
 }
 
-/**
- * hole_size - figure out the size of a hole
- * @inode: The inode
- * @lblock: The logical starting block number
- * @mp: The metapath
- *
- * Returns: The hole size in bytes
- *
- */
-static u64 hole_size(struct inode *inode, sector_t lblock, struct metapath *mp)
-{
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct metapath mp_eof;
-	u64 factor = 1;
-	int hgt;
-	u64 holesz = 0;
-	const __be64 *first, *end, *ptr;
-	const struct buffer_head *bh;
-	u64 lblock_stop = (i_size_read(inode) - 1) >> inode->i_blkbits;
-	int zeroptrs;
-	bool done = false;
-
-	/* Get another metapath, to the very last byte */
-	find_metapath(sdp, lblock_stop, &mp_eof, ip->i_height);
-	for (hgt = ip->i_height - 1; hgt >= 0 && !done; hgt--) {
-		bh = mp->mp_bh[hgt];
-		if (bh) {
-			zeroptrs = 0;
-			first = metapointer(hgt, mp);
-			end = (const __be64 *)(bh->b_data + bh->b_size);
-
-			for (ptr = first; ptr < end; ptr++) {
-				if (*ptr) {
-					done = true;
-					break;
-				} else {
-					zeroptrs++;
-				}
-			}
-		} else {
-			zeroptrs = sdp->sd_inptrs;
-		}
-		if (factor * zeroptrs >= lblock_stop - lblock + 1) {
-			holesz = lblock_stop - lblock + 1;
-			break;
-		}
-		holesz += factor * zeroptrs;
-
-		factor *= sdp->sd_inptrs;
-		if (hgt && (mp->mp_list[hgt - 1] < mp_eof.mp_list[hgt - 1]))
-			(mp->mp_list[hgt - 1])++;
-	}
-	return holesz << inode->i_blkbits;
-}
-
 static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
@@ -726,6 +821,7 @@ int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	lblock = pos >> inode->i_blkbits;
 	lend = (pos + length + sdp->sd_sb.sb_bsize - 1) >> inode->i_blkbits;
+	len = lend - lblock;
 
 	iomap->offset = lblock << inode->i_blkbits;
 	iomap->addr = IOMAP_NULL_ADDR;
@@ -780,7 +876,7 @@ int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 		if (pos >= size)
 			ret = -ENOENT;
 		else if (height <= ip->i_height)
-			iomap->length = hole_size(inode, lblock, &mp);
+			ret = gfs2_hole_size(inode, lblock, len, &mp, iomap);
 		else
 			iomap->length = size - pos;
 	}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 02/14] gfs2: hole_size improvement
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Reimplement function hole_size based on a generic function for walking
the metadata tree and rename hole_size to gfs2_hole_size.  While
previously, multiple invocations of hole_size were sometimes needed to
walk across the entire hole, the new implementation always returns the
entire hole at once (provided that the caller is interested in the total
size).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 210 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 153 insertions(+), 57 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index fcf2f7d166de..69f846418ad5 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -278,6 +278,21 @@ static inline __be64 *metapointer(unsigned int height, const struct metapath *mp
 	return p + mp->mp_list[height];
 }
 
+static inline const __be64 *metaend(unsigned int height, const struct metapath *mp)
+{
+	const struct buffer_head *bh = mp->mp_bh[height];
+	return (const __be64 *)(bh->b_data + bh->b_size);
+}
+
+static void clone_metapath(struct metapath *clone, struct metapath *mp)
+{
+	unsigned int hgt;
+
+	*clone = *mp;
+	for (hgt = 0; hgt < mp->mp_aheight; hgt++)
+		get_bh(clone->mp_bh[hgt]);
+}
+
 static void gfs2_metapath_ra(struct gfs2_glock *gl, __be64 *start, __be64 *end)
 {
 	const __be64 *t;
@@ -419,6 +434,142 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
 	return (ptr - first);
 }
 
+typedef const __be64 *(*gfs2_metadata_walker)(
+		struct metapath *mp,
+		const __be64 *start, const __be64 *end,
+		u64 factor, void *data);
+
+#define WALK_STOP ((__be64 *)0)
+#define WALK_NEXT ((__be64 *)1)
+
+static int gfs2_walk_metadata(struct inode *inode, sector_t lblock,
+		u64 len, struct metapath *mp, gfs2_metadata_walker walker,
+		void *data)
+{
+	struct metapath clone;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	const __be64 *start, *end, *ptr;
+	u64 factor = 1;
+	unsigned int hgt;
+	int ret = 0;
+
+	for (hgt = ip->i_height - 1; hgt >= mp->mp_aheight; hgt--)
+		factor *= sdp->sd_inptrs;
+
+	for (;;) {
+		u64 step;
+
+		/* Walk indirect block. */
+		start = metapointer(hgt, mp);
+		end = metaend(hgt, mp);
+
+		step = (end - start) * factor;
+		if (step > len)
+			end = start + DIV_ROUND_UP_ULL(len, factor);
+
+		ptr = walker(mp, start, end, factor, data);
+		if (ptr == WALK_STOP)
+			break;
+		if (step >= len)
+			break;
+		len -= step;
+		if (ptr != WALK_NEXT) {
+			BUG_ON(!*ptr);
+			mp->mp_list[hgt] += ptr - start;
+			goto fill_up_metapath;
+		}
+
+lower_metapath:
+		/* Decrease height of metapath. */
+		if (mp != &clone) {
+			clone_metapath(&clone, mp);
+			mp = &clone;
+		}
+		brelse(mp->mp_bh[hgt]);
+		mp->mp_bh[hgt] = NULL;
+		if (!hgt)
+			break;
+		hgt--;
+		factor *= sdp->sd_inptrs;
+
+		/* Advance in metadata tree. */
+		(mp->mp_list[hgt])++;
+		start = metapointer(hgt, mp);
+		end = metaend(hgt, mp);
+		if (start >= end) {
+			mp->mp_list[hgt] = 0;
+			if (!hgt)
+				break;
+			goto lower_metapath;
+		}
+
+fill_up_metapath:
+		/* Increase height of metapath. */
+		if (mp != &clone) {
+			clone_metapath(&clone, mp);
+			mp = &clone;
+		}
+		ret = fillup_metapath(ip, mp, ip->i_height - 1);
+		if (ret < 0)
+			break;
+		hgt += ret;
+		for (; ret; ret--)
+			do_div(factor, sdp->sd_inptrs);
+		mp->mp_aheight = hgt + 1;
+	}
+	if (mp == &clone)
+		release_metapath(mp);
+	return ret;
+}
+
+struct gfs2_hole_walker_args {
+	u64 blocks;
+};
+
+static const __be64 *gfs2_hole_walker(struct metapath *mp,
+		const __be64 *start, const __be64 *end,
+		u64 factor, void *data)
+{
+	struct gfs2_hole_walker_args *args = data;
+	const __be64 *ptr;
+
+	for (ptr = start; ptr < end; ptr++) {
+		if (*ptr) {
+			args->blocks += (ptr - start) * factor;
+			if (mp->mp_aheight == mp->mp_fheight)
+				return WALK_STOP;
+			return ptr;  /* increase height */
+		}
+	}
+	args->blocks += (end - start) * factor;
+	return WALK_NEXT;
+}
+
+/**
+ * gfs2_hole_size - figure out the size of a hole
+ * @inode: The inode
+ * @lblock: The logical starting block number
+ * @len: How far to look (in blocks)
+ * @mp: The metapath at lblock
+ * @iomap: The iomap to store the hole size in
+ *
+ * This function modifies @mp.
+ *
+ * Returns: errno on error
+ */
+static int gfs2_hole_size(struct inode *inode, sector_t lblock, u64 len,
+			  struct metapath *mp, struct iomap *iomap)
+{
+	struct gfs2_hole_walker_args args = { };
+	int ret = 0;
+
+	ret = gfs2_walk_metadata(inode, lblock, len, mp, gfs2_hole_walker, &args);
+	if (!ret)
+		iomap->length = args.blocks << inode->i_blkbits;
+	return ret;
+}
+
 static inline void bmap_lock(struct gfs2_inode *ip, int create)
 {
 	if (create)
@@ -615,62 +766,6 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	return 0;
 }
 
-/**
- * hole_size - figure out the size of a hole
- * @inode: The inode
- * @lblock: The logical starting block number
- * @mp: The metapath
- *
- * Returns: The hole size in bytes
- *
- */
-static u64 hole_size(struct inode *inode, sector_t lblock, struct metapath *mp)
-{
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct metapath mp_eof;
-	u64 factor = 1;
-	int hgt;
-	u64 holesz = 0;
-	const __be64 *first, *end, *ptr;
-	const struct buffer_head *bh;
-	u64 lblock_stop = (i_size_read(inode) - 1) >> inode->i_blkbits;
-	int zeroptrs;
-	bool done = false;
-
-	/* Get another metapath, to the very last byte */
-	find_metapath(sdp, lblock_stop, &mp_eof, ip->i_height);
-	for (hgt = ip->i_height - 1; hgt >= 0 && !done; hgt--) {
-		bh = mp->mp_bh[hgt];
-		if (bh) {
-			zeroptrs = 0;
-			first = metapointer(hgt, mp);
-			end = (const __be64 *)(bh->b_data + bh->b_size);
-
-			for (ptr = first; ptr < end; ptr++) {
-				if (*ptr) {
-					done = true;
-					break;
-				} else {
-					zeroptrs++;
-				}
-			}
-		} else {
-			zeroptrs = sdp->sd_inptrs;
-		}
-		if (factor * zeroptrs >= lblock_stop - lblock + 1) {
-			holesz = lblock_stop - lblock + 1;
-			break;
-		}
-		holesz += factor * zeroptrs;
-
-		factor *= sdp->sd_inptrs;
-		if (hgt && (mp->mp_list[hgt - 1] < mp_eof.mp_list[hgt - 1]))
-			(mp->mp_list[hgt - 1])++;
-	}
-	return holesz << inode->i_blkbits;
-}
-
 static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
@@ -726,6 +821,7 @@ int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	lblock = pos >> inode->i_blkbits;
 	lend = (pos + length + sdp->sd_sb.sb_bsize - 1) >> inode->i_blkbits;
+	len = lend - lblock;
 
 	iomap->offset = lblock << inode->i_blkbits;
 	iomap->addr = IOMAP_NULL_ADDR;
@@ -780,7 +876,7 @@ int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 		if (pos >= size)
 			ret = -ENOENT;
 		else if (height <= ip->i_height)
-			iomap->length = hole_size(inode, lblock, &mp);
+			ret = gfs2_hole_size(inode, lblock, len, &mp, iomap);
 		else
 			iomap->length = size - pos;
 	}
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 03/14] gfs2: gfs2_stuffed_write_end cleanup
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

First, change the sanity check in gfs2_stuffed_write_end to check for
the actual write size instead of the requested write size.

Second, use the existing teardown code in gfs2_write_end instead of
duplicating it in gfs2_stuffed_write_end.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 49 ++++++++++++++++++-------------------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index f58716567972..e8c0387652ed 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -814,7 +814,6 @@ static void adjust_fs_space(struct inode *inode)
  * @inode: The inode
  * @dibh: The buffer_head containing the on-disk inode
  * @pos: The file position
- * @len: The length of the write
  * @copied: How much was actually copied by the VFS
  * @page: The page
  *
@@ -824,17 +823,15 @@ static void adjust_fs_space(struct inode *inode)
  * Returns: errno
  */
 static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
-				  loff_t pos, unsigned len, unsigned copied,
+				  loff_t pos, unsigned copied,
 				  struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
 	u64 to = pos + copied;
 	void *kaddr;
 	unsigned char *buf = dibh->b_data + sizeof(struct gfs2_dinode);
 
-	BUG_ON(pos + len > gfs2_max_stuffed_size(ip));
+	BUG_ON(pos + copied > gfs2_max_stuffed_size(ip));
 
 	kaddr = kmap_atomic(page);
 	memcpy(buf + pos, kaddr + pos, copied);
@@ -850,20 +847,6 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
 			i_size_write(inode, to);
 		mark_inode_dirty(inode);
 	}
-
-	if (inode == sdp->sd_rindex) {
-		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
-
-	brelse(dibh);
-	gfs2_trans_end(sdp);
-	if (inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-	gfs2_holder_uninit(&ip->i_gh);
 	return copied;
 }
 
@@ -877,9 +860,8 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
  * @page: The page that has been written
  * @fsdata: The fsdata (unused in GFS2)
  *
- * The main write_end function for GFS2. We have a separate one for
- * stuffed files as they are slightly different, otherwise we just
- * put our locking around the VFS provided functions.
+ * The main write_end function for GFS2. We just put our locking around the VFS
+ * provided functions.
  *
  * Returns: errno
  */
@@ -900,32 +882,37 @@ static int gfs2_write_end(struct file *file, struct address_space *mapping,
 	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
 
 	ret = gfs2_meta_inode_buffer(ip, &dibh);
-	if (unlikely(ret)) {
-		unlock_page(page);
-		put_page(page);
-		goto failed;
-	}
+	if (unlikely(ret))
+		goto out;
 
-	if (gfs2_is_stuffed(ip))
-		return gfs2_stuffed_write_end(inode, dibh, pos, len, copied, page);
+	if (gfs2_is_stuffed(ip)) {
+		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
+		page = NULL;
+		goto out2;
+	}
 
 	if (!gfs2_is_writeback(ip))
 		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
 
 	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
+	page = NULL;
 	if (tr->tr_num_buf_new)
 		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 	else
 		gfs2_trans_add_meta(ip->i_gl, dibh);
 
-
+out2:
 	if (inode == sdp->sd_rindex) {
 		adjust_fs_space(inode);
 		sdp->sd_rindex_uptodate = 0;
 	}
 
 	brelse(dibh);
-failed:
+out:
+	if (page) {
+		unlock_page(page);
+		put_page(page);
+	}
 	gfs2_trans_end(sdp);
 	gfs2_inplace_release(ip);
 	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 03/14] gfs2: gfs2_stuffed_write_end cleanup
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

First, change the sanity check in gfs2_stuffed_write_end to check for
the actual write size instead of the requested write size.

Second, use the existing teardown code in gfs2_write_end instead of
duplicating it in gfs2_stuffed_write_end.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 49 ++++++++++++++++++-------------------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index f58716567972..e8c0387652ed 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -814,7 +814,6 @@ static void adjust_fs_space(struct inode *inode)
  * @inode: The inode
  * @dibh: The buffer_head containing the on-disk inode
  * @pos: The file position
- * @len: The length of the write
  * @copied: How much was actually copied by the VFS
  * @page: The page
  *
@@ -824,17 +823,15 @@ static void adjust_fs_space(struct inode *inode)
  * Returns: errno
  */
 static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
-				  loff_t pos, unsigned len, unsigned copied,
+				  loff_t pos, unsigned copied,
 				  struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
 	u64 to = pos + copied;
 	void *kaddr;
 	unsigned char *buf = dibh->b_data + sizeof(struct gfs2_dinode);
 
-	BUG_ON(pos + len > gfs2_max_stuffed_size(ip));
+	BUG_ON(pos + copied > gfs2_max_stuffed_size(ip));
 
 	kaddr = kmap_atomic(page);
 	memcpy(buf + pos, kaddr + pos, copied);
@@ -850,20 +847,6 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
 			i_size_write(inode, to);
 		mark_inode_dirty(inode);
 	}
-
-	if (inode == sdp->sd_rindex) {
-		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
-
-	brelse(dibh);
-	gfs2_trans_end(sdp);
-	if (inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-	gfs2_holder_uninit(&ip->i_gh);
 	return copied;
 }
 
@@ -877,9 +860,8 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
  * @page: The page that has been written
  * @fsdata: The fsdata (unused in GFS2)
  *
- * The main write_end function for GFS2. We have a separate one for
- * stuffed files as they are slightly different, otherwise we just
- * put our locking around the VFS provided functions.
+ * The main write_end function for GFS2. We just put our locking around the VFS
+ * provided functions.
  *
  * Returns: errno
  */
@@ -900,32 +882,37 @@ static int gfs2_write_end(struct file *file, struct address_space *mapping,
 	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
 
 	ret = gfs2_meta_inode_buffer(ip, &dibh);
-	if (unlikely(ret)) {
-		unlock_page(page);
-		put_page(page);
-		goto failed;
-	}
+	if (unlikely(ret))
+		goto out;
 
-	if (gfs2_is_stuffed(ip))
-		return gfs2_stuffed_write_end(inode, dibh, pos, len, copied, page);
+	if (gfs2_is_stuffed(ip)) {
+		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
+		page = NULL;
+		goto out2;
+	}
 
 	if (!gfs2_is_writeback(ip))
 		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
 
 	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
+	page = NULL;
 	if (tr->tr_num_buf_new)
 		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 	else
 		gfs2_trans_add_meta(ip->i_gl, dibh);
 
-
+out2:
 	if (inode == sdp->sd_rindex) {
 		adjust_fs_space(inode);
 		sdp->sd_rindex_uptodate = 0;
 	}
 
 	brelse(dibh);
-failed:
+out:
+	if (page) {
+		unlock_page(page);
+		put_page(page);
+	}
 	gfs2_trans_end(sdp);
 	gfs2_inplace_release(ip);
 	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 04/14] gfs2: Remove ordered write mode handling from gfs2_trans_add_data
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

In journaled data mode, we need to add each buffer head to the current
transaction.  In ordered write mode, we only need to add the inode to
the ordered inode list.  So far, both cases are handled in
gfs2_trans_add_data.  This makes the code look misleading and is
inefficient for small block sizes as well.  Handle both cases separately
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c  |  7 ++++---
 fs/gfs2/bmap.c  | 12 ++++++++----
 fs/gfs2/log.h   |  7 ++++++-
 fs/gfs2/quota.c |  5 ++++-
 fs/gfs2/trans.c | 27 ++++++++-------------------
 5 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index e8c0387652ed..a1c6b5de9200 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -54,8 +54,7 @@ static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
 			continue;
 		if (start >= to)
 			break;
-		if (gfs2_is_jdata(ip))
-			set_buffer_uptodate(bh);
+		set_buffer_uptodate(bh);
 		gfs2_trans_add_data(ip->i_gl, bh);
 	}
 }
@@ -891,8 +890,10 @@ static int gfs2_write_end(struct file *file, struct address_space *mapping,
 		goto out2;
 	}
 
-	if (!gfs2_is_writeback(ip))
+	if (gfs2_is_jdata(ip))
 		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
+	else
+		gfs2_ordered_add_inode(ip);
 
 	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
 	page = NULL;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 69f846418ad5..5226c3bfbcf7 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -89,10 +89,12 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,
 		map_bh(bh, inode->i_sb, block);
 
 	set_buffer_uptodate(bh);
-	if (!gfs2_is_jdata(ip))
-		mark_buffer_dirty(bh);
-	if (!gfs2_is_writeback(ip))
+	if (gfs2_is_jdata(ip))
 		gfs2_trans_add_data(ip->i_gl, bh);
+	else {
+		mark_buffer_dirty(bh);
+		gfs2_ordered_add_inode(ip);
+	}
 
 	if (release) {
 		unlock_page(page);
@@ -1028,8 +1030,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
 		err = 0;
 	}
 
-	if (!gfs2_is_writeback(ip))
+	if (gfs2_is_jdata(ip))
 		gfs2_trans_add_data(ip->i_gl, bh);
+	else
+		gfs2_ordered_add_inode(ip);
 
 	zero_user(page, offset, length);
 	mark_buffer_dirty(bh);
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 1862e310a067..20241436126d 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -14,6 +14,7 @@
 #include <linux/spinlock.h>
 #include <linux/writeback.h>
 #include "incore.h"
+#include "inode.h"
 
 /**
  * gfs2_log_lock - acquire the right to mess with the log manager
@@ -50,8 +51,12 @@ static inline void gfs2_log_pointers_init(struct gfs2_sbd *sdp,
 
 static inline void gfs2_ordered_add_inode(struct gfs2_inode *ip)
 {
-	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+	struct gfs2_sbd *sdp;
 
+	if (!gfs2_is_ordered(ip))
+		return;
+
+	sdp = GFS2_SB(&ip->i_inode);
 	if (!test_bit(GIF_ORDERED, &ip->i_flags)) {
 		spin_lock(&sdp->sd_ordered_lock);
 		if (!test_and_set_bit(GIF_ORDERED, &ip->i_flags))
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 7a98abd340ee..e8585dfd209f 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -735,7 +735,10 @@ static int gfs2_write_buf_to_page(struct gfs2_inode *ip, unsigned long index,
 			if (!buffer_uptodate(bh))
 				goto unlock_out;
 		}
-		gfs2_trans_add_data(ip->i_gl, bh);
+		if (gfs2_is_jdata(ip))
+			gfs2_trans_add_data(ip->i_gl, bh);
+		else
+			gfs2_ordered_add_inode(ip);
 
 		/* If we need to write to the next block as well */
 		if (to_write > (bsize - boff)) {
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index c75cacaa349b..064c9a0ef046 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -143,32 +143,21 @@ static struct gfs2_bufdata *gfs2_alloc_bufdata(struct gfs2_glock *gl,
  * @gl: The inode glock associated with the buffer
  * @bh: The buffer to add
  *
- * This is used in two distinct cases:
- * i) In ordered write mode
- *    We put the data buffer on a list so that we can ensure that it's
- *    synced to disk at the right time
- * ii) In journaled data mode
- *    We need to journal the data block in the same way as metadata in
- *    the functions above. The difference is that here we have a tag
- *    which is two __be64's being the block number (as per meta data)
- *    and a flag which says whether the data block needs escaping or
- *    not. This means we need a new log entry for each 251 or so data
- *    blocks, which isn't an enormous overhead but twice as much as
- *    for normal metadata blocks.
+ * This is used in journaled data mode.
+ * We need to journal the data block in the same way as metadata in
+ * the functions above. The difference is that here we have a tag
+ * which is two __be64's being the block number (as per meta data)
+ * and a flag which says whether the data block needs escaping or
+ * not. This means we need a new log entry for each 251 or so data
+ * blocks, which isn't an enormous overhead but twice as much as
+ * for normal metadata blocks.
  */
 void gfs2_trans_add_data(struct gfs2_glock *gl, struct buffer_head *bh)
 {
 	struct gfs2_trans *tr = current->journal_info;
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
-	struct address_space *mapping = bh->b_page->mapping;
-	struct gfs2_inode *ip = GFS2_I(mapping->host);
 	struct gfs2_bufdata *bd;
 
-	if (!gfs2_is_jdata(ip)) {
-		gfs2_ordered_add_inode(ip);
-		return;
-	}
-
 	lock_buffer(bh);
 	if (buffer_pinned(bh)) {
 		set_bit(TR_TOUCHED, &tr->tr_flags);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 04/14] gfs2: Remove ordered write mode handling from gfs2_trans_add_data
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In journaled data mode, we need to add each buffer head to the current
transaction.  In ordered write mode, we only need to add the inode to
the ordered inode list.  So far, both cases are handled in
gfs2_trans_add_data.  This makes the code look misleading and is
inefficient for small block sizes as well.  Handle both cases separately
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c  |  7 ++++---
 fs/gfs2/bmap.c  | 12 ++++++++----
 fs/gfs2/log.h   |  7 ++++++-
 fs/gfs2/quota.c |  5 ++++-
 fs/gfs2/trans.c | 27 ++++++++-------------------
 5 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index e8c0387652ed..a1c6b5de9200 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -54,8 +54,7 @@ static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
 			continue;
 		if (start >= to)
 			break;
-		if (gfs2_is_jdata(ip))
-			set_buffer_uptodate(bh);
+		set_buffer_uptodate(bh);
 		gfs2_trans_add_data(ip->i_gl, bh);
 	}
 }
@@ -891,8 +890,10 @@ static int gfs2_write_end(struct file *file, struct address_space *mapping,
 		goto out2;
 	}
 
-	if (!gfs2_is_writeback(ip))
+	if (gfs2_is_jdata(ip))
 		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
+	else
+		gfs2_ordered_add_inode(ip);
 
 	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
 	page = NULL;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 69f846418ad5..5226c3bfbcf7 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -89,10 +89,12 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,
 		map_bh(bh, inode->i_sb, block);
 
 	set_buffer_uptodate(bh);
-	if (!gfs2_is_jdata(ip))
-		mark_buffer_dirty(bh);
-	if (!gfs2_is_writeback(ip))
+	if (gfs2_is_jdata(ip))
 		gfs2_trans_add_data(ip->i_gl, bh);
+	else {
+		mark_buffer_dirty(bh);
+		gfs2_ordered_add_inode(ip);
+	}
 
 	if (release) {
 		unlock_page(page);
@@ -1028,8 +1030,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
 		err = 0;
 	}
 
-	if (!gfs2_is_writeback(ip))
+	if (gfs2_is_jdata(ip))
 		gfs2_trans_add_data(ip->i_gl, bh);
+	else
+		gfs2_ordered_add_inode(ip);
 
 	zero_user(page, offset, length);
 	mark_buffer_dirty(bh);
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 1862e310a067..20241436126d 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -14,6 +14,7 @@
 #include <linux/spinlock.h>
 #include <linux/writeback.h>
 #include "incore.h"
+#include "inode.h"
 
 /**
  * gfs2_log_lock - acquire the right to mess with the log manager
@@ -50,8 +51,12 @@ static inline void gfs2_log_pointers_init(struct gfs2_sbd *sdp,
 
 static inline void gfs2_ordered_add_inode(struct gfs2_inode *ip)
 {
-	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+	struct gfs2_sbd *sdp;
 
+	if (!gfs2_is_ordered(ip))
+		return;
+
+	sdp = GFS2_SB(&ip->i_inode);
 	if (!test_bit(GIF_ORDERED, &ip->i_flags)) {
 		spin_lock(&sdp->sd_ordered_lock);
 		if (!test_and_set_bit(GIF_ORDERED, &ip->i_flags))
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 7a98abd340ee..e8585dfd209f 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -735,7 +735,10 @@ static int gfs2_write_buf_to_page(struct gfs2_inode *ip, unsigned long index,
 			if (!buffer_uptodate(bh))
 				goto unlock_out;
 		}
-		gfs2_trans_add_data(ip->i_gl, bh);
+		if (gfs2_is_jdata(ip))
+			gfs2_trans_add_data(ip->i_gl, bh);
+		else
+			gfs2_ordered_add_inode(ip);
 
 		/* If we need to write to the next block as well */
 		if (to_write > (bsize - boff)) {
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index c75cacaa349b..064c9a0ef046 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -143,32 +143,21 @@ static struct gfs2_bufdata *gfs2_alloc_bufdata(struct gfs2_glock *gl,
  * @gl: The inode glock associated with the buffer
  * @bh: The buffer to add
  *
- * This is used in two distinct cases:
- * i) In ordered write mode
- *    We put the data buffer on a list so that we can ensure that it's
- *    synced to disk at the right time
- * ii) In journaled data mode
- *    We need to journal the data block in the same way as metadata in
- *    the functions above. The difference is that here we have a tag
- *    which is two __be64's being the block number (as per meta data)
- *    and a flag which says whether the data block needs escaping or
- *    not. This means we need a new log entry for each 251 or so data
- *    blocks, which isn't an enormous overhead but twice as much as
- *    for normal metadata blocks.
+ * This is used in journaled data mode.
+ * We need to journal the data block in the same way as metadata in
+ * the functions above. The difference is that here we have a tag
+ * which is two __be64's being the block number (as per meta data)
+ * and a flag which says whether the data block needs escaping or
+ * not. This means we need a new log entry for each 251 or so data
+ * blocks, which isn't an enormous overhead but twice as much as
+ * for normal metadata blocks.
  */
 void gfs2_trans_add_data(struct gfs2_glock *gl, struct buffer_head *bh)
 {
 	struct gfs2_trans *tr = current->journal_info;
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
-	struct address_space *mapping = bh->b_page->mapping;
-	struct gfs2_inode *ip = GFS2_I(mapping->host);
 	struct gfs2_bufdata *bd;
 
-	if (!gfs2_is_jdata(ip)) {
-		gfs2_ordered_add_inode(ip);
-		return;
-	}
-
 	lock_buffer(bh);
 	if (buffer_pinned(bh)) {
 		set_bit(TR_TOUCHED, &tr->tr_flags);
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 05/14] gfs2: Iomap cleanups and improvements
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Clean up gfs2_iomap_alloc and gfs2_iomap_get.  Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.

Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.

In fallocate_chunk, zero-initialize struct iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c  | 203 ++++++++++++++++++++++++++++--------------------
 fs/gfs2/bmap.h  |   6 +-
 fs/gfs2/file.c  |   6 +-
 fs/gfs2/inode.c |   4 -
 4 files changed, 125 insertions(+), 94 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 5226c3bfbcf7..8ea65aea34da 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -572,22 +572,6 @@ static int gfs2_hole_size(struct inode *inode, sector_t lblock, u64 len,
 	return ret;
 }
 
-static inline void bmap_lock(struct gfs2_inode *ip, int create)
-{
-	if (create)
-		down_write(&ip->i_rw_mutex);
-	else
-		down_read(&ip->i_rw_mutex);
-}
-
-static inline void bmap_unlock(struct gfs2_inode *ip, int create)
-{
-	if (create)
-		up_write(&ip->i_rw_mutex);
-	else
-		up_read(&ip->i_rw_mutex);
-}
-
 static inline __be64 *gfs2_indirect_init(struct metapath *mp,
 					 struct gfs2_glock *gl, unsigned int i,
 					 unsigned offset, u64 bn)
@@ -614,15 +598,11 @@ enum alloc_state {
 };
 
 /**
- * gfs2_bmap_alloc - Build a metadata tree of the requested height
+ * gfs2_iomap_alloc - Build a metadata tree of the requested height
  * @inode: The GFS2 inode
- * @lblock: The logical starting block of the extent
- * @bh_map: This is used to return the mapping details
- * @zero_new: True if newly allocated blocks should be zeroed
+ * @iomap: The iomap structure
+ * @flags: iomap flags
  * @mp: The metapath, with proper height information calculated
- * @maxlen: The max number of data blocks to alloc
- * @dblock: Pointer to return the resulting new block
- * @dblks: Pointer to return the number of blocks allocated
  *
  * In this routine we may have to alloc:
  *   i) Indirect blocks to grow the metadata tree height
@@ -635,6 +615,13 @@ enum alloc_state {
  * blocks are available, there will only be one request per bmap call)
  * and uses the state machine to initialise the blocks in order.
  *
+ * Right now, this function will allocate at most one indirect block
+ * worth of data -- with a default block size of 4K, that's slightly
+ * less than 2M.  If this limitation is ever removed to allow huge
+ * allocations, we would probably still want to limit the iomap size we
+ * return to avoid stalling other tasks during huge writes; the next
+ * iomap iteration would then find the blocks already allocated.
+ *
  * Returns: errno on error
  */
 
@@ -649,6 +636,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	unsigned dblks = 0;
 	unsigned ptrs_per_blk;
 	const unsigned end_of_metadata = mp->mp_fheight - 1;
+	int ret;
 	enum alloc_state state;
 	__be64 *ptr;
 	__be64 zero_bn = 0;
@@ -659,6 +647,8 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 
 	gfs2_trans_add_meta(ip->i_gl, dibh);
 
+	down_write(&ip->i_rw_mutex);
+
 	if (mp->mp_fheight == mp->mp_aheight) {
 		struct buffer_head *bh;
 		int eob;
@@ -694,11 +684,10 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	blks = dblks + iblks;
 	i = mp->mp_aheight;
 	do {
-		int error;
 		n = blks - alloced;
-		error = gfs2_alloc_blocks(ip, &bn, &n, 0, NULL);
-		if (error)
-			return error;
+		ret = gfs2_alloc_blocks(ip, &bn, &n, 0, NULL);
+		if (ret)
+			goto out;
 		alloced += n;
 		if (state != ALLOC_DATA || gfs2_is_jdata(ip))
 			gfs2_trans_add_unrevoke(sdp, bn, n);
@@ -754,7 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 			dblks = n;
 			ptr = metapointer(end_of_metadata, mp);
 			iomap->addr = bn << inode->i_blkbits;
-			iomap->flags |= IOMAP_F_NEW;
+			iomap->flags |= IOMAP_F_MERGED | IOMAP_F_NEW;
 			while (n-- > 0)
 				*ptr++ = cpu_to_be64(bn++);
 			break;
@@ -764,8 +753,10 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	iomap->length = (u64)dblks << inode->i_blkbits;
 	ip->i_height = mp->mp_fheight;
 	gfs2_add_inode_blocks(&ip->i_inode, alloced);
-	gfs2_dinode_out(ip, mp->mp_bh[0]->b_data);
-	return 0;
+	gfs2_dinode_out(ip, dibh->b_data);
+out:
+	up_write(&ip->i_rw_mutex);
+	return ret;
 }
 
 static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
@@ -781,110 +772,130 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 }
 
 /**
- * gfs2_iomap_begin - Map blocks from an inode to disk blocks
+ * gfs2_iomap_get - Map blocks from an inode to disk blocks
  * @inode: The inode
  * @pos: Starting position in bytes
  * @length: Length to map, in bytes
  * @flags: iomap flags
  * @iomap: The iomap structure
+ * @mp: The metapath
  *
  * Returns: errno
  */
-int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
-		     unsigned flags, struct iomap *iomap)
+static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
+			  unsigned flags, struct iomap *iomap,
+			  struct metapath *mp)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct metapath mp = { .mp_aheight = 1, };
 	__be64 *ptr;
 	sector_t lblock;
-	sector_t lend;
-	int ret = 0;
+	sector_t lblock_stop;
+	int ret;
 	int eob;
-	unsigned int len;
+	u64 len;
 	struct buffer_head *bh;
 	u8 height;
 
-	trace_gfs2_iomap_start(ip, pos, length, flags);
-	if (!length) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (!length)
+		return -EINVAL;
 
 	if (gfs2_is_stuffed(ip)) {
 		if (flags & IOMAP_REPORT) {
+			if (pos >= i_size_read(inode))
+				return -ENOENT;
 			gfs2_stuffed_iomap(inode, iomap);
-			if (pos >= iomap->length)
-				ret = -ENOENT;
-			goto out;
+			return 0;
 		}
 		BUG_ON(!(flags & IOMAP_WRITE));
 	}
-
 	lblock = pos >> inode->i_blkbits;
-	lend = (pos + length + sdp->sd_sb.sb_bsize - 1) >> inode->i_blkbits;
-	len = lend - lblock;
-
 	iomap->offset = lblock << inode->i_blkbits;
-	iomap->addr = IOMAP_NULL_ADDR;
-	iomap->type = IOMAP_HOLE;
-	iomap->length = (u64)(lend - lblock) << inode->i_blkbits;
-	iomap->flags = IOMAP_F_MERGED;
-	bmap_lock(ip, flags & IOMAP_WRITE);
+	lblock_stop = (pos + length - 1) >> inode->i_blkbits;
+	len = lblock_stop - lblock + 1;
+
+	down_read(&ip->i_rw_mutex);
 
-	ret = gfs2_meta_inode_buffer(ip, &mp.mp_bh[0]);
+	ret = gfs2_meta_inode_buffer(ip, &mp->mp_bh[0]);
 	if (ret)
-		goto out_release;
+		goto unlock;
 
 	height = ip->i_height;
 	while ((lblock + 1) * sdp->sd_sb.sb_bsize > sdp->sd_heightsize[height])
 		height++;
-	find_metapath(sdp, lblock, &mp, height);
+	find_metapath(sdp, lblock, mp, height);
 	if (height > ip->i_height || gfs2_is_stuffed(ip))
 		goto do_alloc;
 
-	ret = lookup_metapath(ip, &mp);
+	ret = lookup_metapath(ip, mp);
 	if (ret)
-		goto out_release;
+		goto unlock;
 
-	if (mp.mp_aheight != ip->i_height)
+	if (mp->mp_aheight != ip->i_height)
 		goto do_alloc;
 
-	ptr = metapointer(ip->i_height - 1, &mp);
+	ptr = metapointer(ip->i_height - 1, mp);
 	if (*ptr == 0)
 		goto do_alloc;
 
-	iomap->type = IOMAP_MAPPED;
-	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
+	bh = mp->mp_bh[ip->i_height - 1];
+	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
 
-	bh = mp.mp_bh[ip->i_height - 1];
-	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, lend - lblock, &eob);
+	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
+	iomap->length = len << inode->i_blkbits;
+	iomap->type = IOMAP_MAPPED;
+	iomap->flags = IOMAP_F_MERGED;
 	if (eob)
 		iomap->flags |= IOMAP_F_BOUNDARY;
-	iomap->length = (u64)len << inode->i_blkbits;
 
-out_release:
-	release_metapath(&mp);
-	bmap_unlock(ip, flags & IOMAP_WRITE);
 out:
-	trace_gfs2_iomap_end(ip, iomap, ret);
+	iomap->bdev = inode->i_sb->s_bdev;
+unlock:
+	up_read(&ip->i_rw_mutex);
 	return ret;
 
 do_alloc:
-	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
-	} else if (flags & IOMAP_REPORT) {
+	iomap->addr = IOMAP_NULL_ADDR;
+	iomap->length = len << inode->i_blkbits;
+	iomap->type = IOMAP_HOLE;
+	iomap->flags = 0;
+	if (flags & IOMAP_REPORT) {
 		loff_t size = i_size_read(inode);
 		if (pos >= size)
 			ret = -ENOENT;
-		else if (height <= ip->i_height)
-			ret = gfs2_hole_size(inode, lblock, len, &mp, iomap);
+		else if (height == ip->i_height)
+			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 		else
 			iomap->length = size - pos;
 	}
-	goto out_release;
+	goto out;
 }
 
+static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
+			    unsigned flags, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct metapath mp = { .mp_aheight = 1, };
+	int ret;
+
+	trace_gfs2_iomap_start(ip, pos, length, flags);
+	if (flags & IOMAP_WRITE) {
+		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+		if (!ret && iomap->type == IOMAP_HOLE)
+			ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+		release_metapath(&mp);
+	} else {
+		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+		release_metapath(&mp);
+	}
+	trace_gfs2_iomap_end(ip, iomap, ret);
+	return ret;
+}
+
+const struct iomap_ops gfs2_iomap_ops = {
+	.iomap_begin = gfs2_iomap_begin,
+};
+
 /**
  * gfs2_block_map - Map one or more blocks of an inode to a disk block
  * @inode: The inode
@@ -910,25 +921,34 @@ int gfs2_block_map(struct inode *inode, sector_t lblock,
 		   struct buffer_head *bh_map, int create)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
+	loff_t pos = (loff_t)lblock << inode->i_blkbits;
+	loff_t length = bh_map->b_size;
+	struct metapath mp = { .mp_aheight = 1, };
 	struct iomap iomap;
-	int ret, flags = 0;
+	int ret;
 
 	clear_buffer_mapped(bh_map);
 	clear_buffer_new(bh_map);
 	clear_buffer_boundary(bh_map);
 	trace_gfs2_bmap(ip, bh_map, lblock, create, 1);
 
-	if (create)
-		flags |= IOMAP_WRITE;
-	ret = gfs2_iomap_begin(inode, (loff_t)lblock << inode->i_blkbits,
-			       bh_map->b_size, flags, &iomap);
-	if (ret) {
-		if (!create && ret == -ENOENT) {
-			/* Return unmapped buffer beyond the end of file.  */
+	if (create) {
+		ret = gfs2_iomap_get(inode, pos, length, IOMAP_WRITE, &iomap, &mp);
+		if (!ret && iomap.type == IOMAP_HOLE)
+			ret = gfs2_iomap_alloc(inode, &iomap, IOMAP_WRITE, &mp);
+		release_metapath(&mp);
+	} else {
+		ret = gfs2_iomap_get(inode, pos, length, 0, &iomap, &mp);
+		release_metapath(&mp);
+
+		/* Return unmapped buffer beyond the end of file. */
+		if (ret == -ENOENT) {
 			ret = 0;
+			goto out;
 		}
-		goto out;
 	}
+	if (ret)
+		goto out;
 
 	if (iomap.length > bh_map->b_size) {
 		iomap.length = bh_map->b_size;
@@ -1143,6 +1163,19 @@ static int trunc_start(struct inode *inode, u64 newsize)
 	return error;
 }
 
+int gfs2_iomap_get_alloc(struct inode *inode, loff_t pos, loff_t length,
+			 struct iomap *iomap)
+{
+	struct metapath mp = { .mp_aheight = 1, };
+	int ret;
+
+	ret = gfs2_iomap_get(inode, pos, length, IOMAP_WRITE, iomap, &mp);
+	if (!ret && iomap->type == IOMAP_HOLE)
+		ret = gfs2_iomap_alloc(inode, iomap, IOMAP_WRITE, &mp);
+	release_metapath(&mp);
+	return ret;
+}
+
 /**
  * sweep_bh_for_rgrps - find an rgrp in a meta buffer and free blocks therein
  * @ip: inode
diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h
index c3402fe00653..6b18fb323f0a 100644
--- a/fs/gfs2/bmap.h
+++ b/fs/gfs2/bmap.h
@@ -46,11 +46,13 @@ static inline void gfs2_write_calc_reserv(const struct gfs2_inode *ip,
 	}
 }
 
+extern const struct iomap_ops gfs2_iomap_ops;
+
 extern int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
 extern int gfs2_block_map(struct inode *inode, sector_t lblock,
 			  struct buffer_head *bh, int create);
-extern int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
-			    unsigned flags, struct iomap *iomap);
+extern int gfs2_iomap_get_alloc(struct inode *inode, loff_t pos, loff_t length,
+				struct iomap *iomap);
 extern int gfs2_extent_map(struct inode *inode, u64 lblock, int *new,
 			   u64 *dblock, unsigned *extlen);
 extern int gfs2_setattr_size(struct inode *inode, u64 size);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4b71f021a9e2..378ff1a1e9ae 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -733,7 +733,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 	struct gfs2_inode *ip = GFS2_I(inode);
 	loff_t end = offset + len;
 	struct buffer_head *dibh;
-	struct iomap iomap;
+	struct iomap iomap = { 0 };
 	int error;
 
 	error = gfs2_meta_inode_buffer(ip, &dibh);
@@ -749,8 +749,8 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 	}
 
 	while (offset < end) {
-		error = gfs2_iomap_begin(inode, offset, end - offset,
-					 IOMAP_WRITE, &iomap);
+		error = gfs2_iomap_get_alloc(inode, offset, end - offset,
+					     &iomap);
 		if (error)
 			goto out;
 		offset = iomap.offset + iomap.length;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 8700eb815638..feda55f67050 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2006,10 +2006,6 @@ static int gfs2_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
-const struct iomap_ops gfs2_iomap_ops = {
-	.iomap_begin = gfs2_iomap_begin,
-};
-
 static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		       u64 start, u64 len)
 {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 05/14] gfs2: Iomap cleanups and improvements
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Clean up gfs2_iomap_alloc and gfs2_iomap_get.  Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.

Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.

In fallocate_chunk, zero-initialize struct iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c  | 203 ++++++++++++++++++++++++++++--------------------
 fs/gfs2/bmap.h  |   6 +-
 fs/gfs2/file.c  |   6 +-
 fs/gfs2/inode.c |   4 -
 4 files changed, 125 insertions(+), 94 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 5226c3bfbcf7..8ea65aea34da 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -572,22 +572,6 @@ static int gfs2_hole_size(struct inode *inode, sector_t lblock, u64 len,
 	return ret;
 }
 
-static inline void bmap_lock(struct gfs2_inode *ip, int create)
-{
-	if (create)
-		down_write(&ip->i_rw_mutex);
-	else
-		down_read(&ip->i_rw_mutex);
-}
-
-static inline void bmap_unlock(struct gfs2_inode *ip, int create)
-{
-	if (create)
-		up_write(&ip->i_rw_mutex);
-	else
-		up_read(&ip->i_rw_mutex);
-}
-
 static inline __be64 *gfs2_indirect_init(struct metapath *mp,
 					 struct gfs2_glock *gl, unsigned int i,
 					 unsigned offset, u64 bn)
@@ -614,15 +598,11 @@ enum alloc_state {
 };
 
 /**
- * gfs2_bmap_alloc - Build a metadata tree of the requested height
+ * gfs2_iomap_alloc - Build a metadata tree of the requested height
  * @inode: The GFS2 inode
- * @lblock: The logical starting block of the extent
- * @bh_map: This is used to return the mapping details
- * @zero_new: True if newly allocated blocks should be zeroed
+ * @iomap: The iomap structure
+ * @flags: iomap flags
  * @mp: The metapath, with proper height information calculated
- * @maxlen: The max number of data blocks to alloc
- * @dblock: Pointer to return the resulting new block
- * @dblks: Pointer to return the number of blocks allocated
  *
  * In this routine we may have to alloc:
  *   i) Indirect blocks to grow the metadata tree height
@@ -635,6 +615,13 @@ enum alloc_state {
  * blocks are available, there will only be one request per bmap call)
  * and uses the state machine to initialise the blocks in order.
  *
+ * Right now, this function will allocate at most one indirect block
+ * worth of data -- with a default block size of 4K, that's slightly
+ * less than 2M.  If this limitation is ever removed to allow huge
+ * allocations, we would probably still want to limit the iomap size we
+ * return to avoid stalling other tasks during huge writes; the next
+ * iomap iteration would then find the blocks already allocated.
+ *
  * Returns: errno on error
  */
 
@@ -649,6 +636,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	unsigned dblks = 0;
 	unsigned ptrs_per_blk;
 	const unsigned end_of_metadata = mp->mp_fheight - 1;
+	int ret;
 	enum alloc_state state;
 	__be64 *ptr;
 	__be64 zero_bn = 0;
@@ -659,6 +647,8 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 
 	gfs2_trans_add_meta(ip->i_gl, dibh);
 
+	down_write(&ip->i_rw_mutex);
+
 	if (mp->mp_fheight == mp->mp_aheight) {
 		struct buffer_head *bh;
 		int eob;
@@ -694,11 +684,10 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	blks = dblks + iblks;
 	i = mp->mp_aheight;
 	do {
-		int error;
 		n = blks - alloced;
-		error = gfs2_alloc_blocks(ip, &bn, &n, 0, NULL);
-		if (error)
-			return error;
+		ret = gfs2_alloc_blocks(ip, &bn, &n, 0, NULL);
+		if (ret)
+			goto out;
 		alloced += n;
 		if (state != ALLOC_DATA || gfs2_is_jdata(ip))
 			gfs2_trans_add_unrevoke(sdp, bn, n);
@@ -754,7 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 			dblks = n;
 			ptr = metapointer(end_of_metadata, mp);
 			iomap->addr = bn << inode->i_blkbits;
-			iomap->flags |= IOMAP_F_NEW;
+			iomap->flags |= IOMAP_F_MERGED | IOMAP_F_NEW;
 			while (n-- > 0)
 				*ptr++ = cpu_to_be64(bn++);
 			break;
@@ -764,8 +753,10 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	iomap->length = (u64)dblks << inode->i_blkbits;
 	ip->i_height = mp->mp_fheight;
 	gfs2_add_inode_blocks(&ip->i_inode, alloced);
-	gfs2_dinode_out(ip, mp->mp_bh[0]->b_data);
-	return 0;
+	gfs2_dinode_out(ip, dibh->b_data);
+out:
+	up_write(&ip->i_rw_mutex);
+	return ret;
 }
 
 static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
@@ -781,110 +772,130 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
 }
 
 /**
- * gfs2_iomap_begin - Map blocks from an inode to disk blocks
+ * gfs2_iomap_get - Map blocks from an inode to disk blocks
  * @inode: The inode
  * @pos: Starting position in bytes
  * @length: Length to map, in bytes
  * @flags: iomap flags
  * @iomap: The iomap structure
+ * @mp: The metapath
  *
  * Returns: errno
  */
-int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
-		     unsigned flags, struct iomap *iomap)
+static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
+			  unsigned flags, struct iomap *iomap,
+			  struct metapath *mp)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct metapath mp = { .mp_aheight = 1, };
 	__be64 *ptr;
 	sector_t lblock;
-	sector_t lend;
-	int ret = 0;
+	sector_t lblock_stop;
+	int ret;
 	int eob;
-	unsigned int len;
+	u64 len;
 	struct buffer_head *bh;
 	u8 height;
 
-	trace_gfs2_iomap_start(ip, pos, length, flags);
-	if (!length) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (!length)
+		return -EINVAL;
 
 	if (gfs2_is_stuffed(ip)) {
 		if (flags & IOMAP_REPORT) {
+			if (pos >= i_size_read(inode))
+				return -ENOENT;
 			gfs2_stuffed_iomap(inode, iomap);
-			if (pos >= iomap->length)
-				ret = -ENOENT;
-			goto out;
+			return 0;
 		}
 		BUG_ON(!(flags & IOMAP_WRITE));
 	}
-
 	lblock = pos >> inode->i_blkbits;
-	lend = (pos + length + sdp->sd_sb.sb_bsize - 1) >> inode->i_blkbits;
-	len = lend - lblock;
-
 	iomap->offset = lblock << inode->i_blkbits;
-	iomap->addr = IOMAP_NULL_ADDR;
-	iomap->type = IOMAP_HOLE;
-	iomap->length = (u64)(lend - lblock) << inode->i_blkbits;
-	iomap->flags = IOMAP_F_MERGED;
-	bmap_lock(ip, flags & IOMAP_WRITE);
+	lblock_stop = (pos + length - 1) >> inode->i_blkbits;
+	len = lblock_stop - lblock + 1;
+
+	down_read(&ip->i_rw_mutex);
 
-	ret = gfs2_meta_inode_buffer(ip, &mp.mp_bh[0]);
+	ret = gfs2_meta_inode_buffer(ip, &mp->mp_bh[0]);
 	if (ret)
-		goto out_release;
+		goto unlock;
 
 	height = ip->i_height;
 	while ((lblock + 1) * sdp->sd_sb.sb_bsize > sdp->sd_heightsize[height])
 		height++;
-	find_metapath(sdp, lblock, &mp, height);
+	find_metapath(sdp, lblock, mp, height);
 	if (height > ip->i_height || gfs2_is_stuffed(ip))
 		goto do_alloc;
 
-	ret = lookup_metapath(ip, &mp);
+	ret = lookup_metapath(ip, mp);
 	if (ret)
-		goto out_release;
+		goto unlock;
 
-	if (mp.mp_aheight != ip->i_height)
+	if (mp->mp_aheight != ip->i_height)
 		goto do_alloc;
 
-	ptr = metapointer(ip->i_height - 1, &mp);
+	ptr = metapointer(ip->i_height - 1, mp);
 	if (*ptr == 0)
 		goto do_alloc;
 
-	iomap->type = IOMAP_MAPPED;
-	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
+	bh = mp->mp_bh[ip->i_height - 1];
+	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
 
-	bh = mp.mp_bh[ip->i_height - 1];
-	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, lend - lblock, &eob);
+	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
+	iomap->length = len << inode->i_blkbits;
+	iomap->type = IOMAP_MAPPED;
+	iomap->flags = IOMAP_F_MERGED;
 	if (eob)
 		iomap->flags |= IOMAP_F_BOUNDARY;
-	iomap->length = (u64)len << inode->i_blkbits;
 
-out_release:
-	release_metapath(&mp);
-	bmap_unlock(ip, flags & IOMAP_WRITE);
 out:
-	trace_gfs2_iomap_end(ip, iomap, ret);
+	iomap->bdev = inode->i_sb->s_bdev;
+unlock:
+	up_read(&ip->i_rw_mutex);
 	return ret;
 
 do_alloc:
-	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
-	} else if (flags & IOMAP_REPORT) {
+	iomap->addr = IOMAP_NULL_ADDR;
+	iomap->length = len << inode->i_blkbits;
+	iomap->type = IOMAP_HOLE;
+	iomap->flags = 0;
+	if (flags & IOMAP_REPORT) {
 		loff_t size = i_size_read(inode);
 		if (pos >= size)
 			ret = -ENOENT;
-		else if (height <= ip->i_height)
-			ret = gfs2_hole_size(inode, lblock, len, &mp, iomap);
+		else if (height == ip->i_height)
+			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 		else
 			iomap->length = size - pos;
 	}
-	goto out_release;
+	goto out;
 }
 
+static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
+			    unsigned flags, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct metapath mp = { .mp_aheight = 1, };
+	int ret;
+
+	trace_gfs2_iomap_start(ip, pos, length, flags);
+	if (flags & IOMAP_WRITE) {
+		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+		if (!ret && iomap->type == IOMAP_HOLE)
+			ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+		release_metapath(&mp);
+	} else {
+		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+		release_metapath(&mp);
+	}
+	trace_gfs2_iomap_end(ip, iomap, ret);
+	return ret;
+}
+
+const struct iomap_ops gfs2_iomap_ops = {
+	.iomap_begin = gfs2_iomap_begin,
+};
+
 /**
  * gfs2_block_map - Map one or more blocks of an inode to a disk block
  * @inode: The inode
@@ -910,25 +921,34 @@ int gfs2_block_map(struct inode *inode, sector_t lblock,
 		   struct buffer_head *bh_map, int create)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
+	loff_t pos = (loff_t)lblock << inode->i_blkbits;
+	loff_t length = bh_map->b_size;
+	struct metapath mp = { .mp_aheight = 1, };
 	struct iomap iomap;
-	int ret, flags = 0;
+	int ret;
 
 	clear_buffer_mapped(bh_map);
 	clear_buffer_new(bh_map);
 	clear_buffer_boundary(bh_map);
 	trace_gfs2_bmap(ip, bh_map, lblock, create, 1);
 
-	if (create)
-		flags |= IOMAP_WRITE;
-	ret = gfs2_iomap_begin(inode, (loff_t)lblock << inode->i_blkbits,
-			       bh_map->b_size, flags, &iomap);
-	if (ret) {
-		if (!create && ret == -ENOENT) {
-			/* Return unmapped buffer beyond the end of file.  */
+	if (create) {
+		ret = gfs2_iomap_get(inode, pos, length, IOMAP_WRITE, &iomap, &mp);
+		if (!ret && iomap.type == IOMAP_HOLE)
+			ret = gfs2_iomap_alloc(inode, &iomap, IOMAP_WRITE, &mp);
+		release_metapath(&mp);
+	} else {
+		ret = gfs2_iomap_get(inode, pos, length, 0, &iomap, &mp);
+		release_metapath(&mp);
+
+		/* Return unmapped buffer beyond the end of file. */
+		if (ret == -ENOENT) {
 			ret = 0;
+			goto out;
 		}
-		goto out;
 	}
+	if (ret)
+		goto out;
 
 	if (iomap.length > bh_map->b_size) {
 		iomap.length = bh_map->b_size;
@@ -1143,6 +1163,19 @@ static int trunc_start(struct inode *inode, u64 newsize)
 	return error;
 }
 
+int gfs2_iomap_get_alloc(struct inode *inode, loff_t pos, loff_t length,
+			 struct iomap *iomap)
+{
+	struct metapath mp = { .mp_aheight = 1, };
+	int ret;
+
+	ret = gfs2_iomap_get(inode, pos, length, IOMAP_WRITE, iomap, &mp);
+	if (!ret && iomap->type == IOMAP_HOLE)
+		ret = gfs2_iomap_alloc(inode, iomap, IOMAP_WRITE, &mp);
+	release_metapath(&mp);
+	return ret;
+}
+
 /**
  * sweep_bh_for_rgrps - find an rgrp in a meta buffer and free blocks therein
  * @ip: inode
diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h
index c3402fe00653..6b18fb323f0a 100644
--- a/fs/gfs2/bmap.h
+++ b/fs/gfs2/bmap.h
@@ -46,11 +46,13 @@ static inline void gfs2_write_calc_reserv(const struct gfs2_inode *ip,
 	}
 }
 
+extern const struct iomap_ops gfs2_iomap_ops;
+
 extern int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
 extern int gfs2_block_map(struct inode *inode, sector_t lblock,
 			  struct buffer_head *bh, int create);
-extern int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
-			    unsigned flags, struct iomap *iomap);
+extern int gfs2_iomap_get_alloc(struct inode *inode, loff_t pos, loff_t length,
+				struct iomap *iomap);
 extern int gfs2_extent_map(struct inode *inode, u64 lblock, int *new,
 			   u64 *dblock, unsigned *extlen);
 extern int gfs2_setattr_size(struct inode *inode, u64 size);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4b71f021a9e2..378ff1a1e9ae 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -733,7 +733,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 	struct gfs2_inode *ip = GFS2_I(inode);
 	loff_t end = offset + len;
 	struct buffer_head *dibh;
-	struct iomap iomap;
+	struct iomap iomap = { 0 };
 	int error;
 
 	error = gfs2_meta_inode_buffer(ip, &dibh);
@@ -749,8 +749,8 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 	}
 
 	while (offset < end) {
-		error = gfs2_iomap_begin(inode, offset, end - offset,
-					 IOMAP_WRITE, &iomap);
+		error = gfs2_iomap_get_alloc(inode, offset, end - offset,
+					     &iomap);
 		if (error)
 			goto out;
 		offset = iomap.offset + iomap.length;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 8700eb815638..feda55f67050 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2006,10 +2006,6 @@ static int gfs2_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
-const struct iomap_ops gfs2_iomap_ops = {
-	.iomap_begin = gfs2_iomap_begin,
-};
-
 static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		       u64 start, u64 len)
 {
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 06/14] iomap: Add write_{begin,end} iomap operations
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig
  Cc: linux-fsdevel, Andreas Gruenbacher, Dave Chinner

Add write_begin and write_end operations to struct iomap_ops to provide
a way of overriding the default behavior of iomap_write_begin and
iomap_write_end.  This is needed for implementing data journaling: in
the data journaling case, pages are written into the journal before
being written back to their proper on-disk locations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
---
 fs/ext2/inode.c       |  2 ++
 fs/ext4/inode.c       |  2 ++
 fs/iomap.c            | 62 ++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_iomap.c    |  2 ++
 include/linux/iomap.h | 22 +++++++++++++++
 5 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..3da481f5e36e 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -848,6 +848,8 @@ ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 
 const struct iomap_ops ext2_iomap_ops = {
 	.iomap_begin		= ext2_iomap_begin,
+	.write_begin		= iomap_write_begin,
+	.write_end		= iomap_write_end,
 	.iomap_end		= ext2_iomap_end,
 };
 #else
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..05c8e8015f13 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3604,6 +3604,8 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 
 const struct iomap_ops ext4_iomap_ops = {
 	.iomap_begin		= ext4_iomap_begin,
+	.write_begin		= iomap_write_begin,
+	.write_end		= iomap_write_end,
 	.iomap_end		= ext4_iomap_end,
 };
 
diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..27d97a290623 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -108,7 +108,7 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
-static int
+int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
 {
@@ -137,8 +137,9 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	*pagep = page;
 	return status;
 }
+EXPORT_SYMBOL_GPL(iomap_write_begin);
 
-static int
+int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		unsigned copied, struct page *page)
 {
@@ -150,12 +151,19 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		iomap_write_failed(inode, pos, len);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iomap_write_end);
+
+struct iomap_write_args {
+	struct iov_iter *iter;
+	const struct iomap_ops *ops;
+};
 
 static loff_t
 iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
-	struct iov_iter *i = data;
+	struct iomap_write_args *args = data;
+	struct iov_iter *i = args->iter;
 	long status = 0;
 	ssize_t written = 0;
 	unsigned int flags = AOP_FLAG_NOFS;
@@ -188,7 +196,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, flags, &page,
+		status = args->ops->write_begin(inode, pos, bytes, flags, &page,
 				iomap);
 		if (unlikely(status))
 			break;
@@ -200,7 +208,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page);
+		status = args->ops->write_end(inode, pos, bytes, copied, page);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -237,10 +245,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
 	loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+	struct iomap_write_args args = {
+		.iter = iter,
+		.ops = ops,
+	};
 
 	while (iov_iter_count(iter)) {
 		ret = iomap_apply(inode, pos, iov_iter_count(iter),
-				IOMAP_WRITE, ops, iter, iomap_write_actor);
+				IOMAP_WRITE, ops, &args, iomap_write_actor);
 		if (ret <= 0)
 			break;
 		pos += ret;
@@ -271,6 +283,7 @@ static loff_t
 iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
+	const struct iomap_ops *ops = data;
 	long status = 0;
 	ssize_t written = 0;
 
@@ -286,15 +299,15 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		if (IS_ERR(rpage))
 			return PTR_ERR(rpage);
 
-		status = iomap_write_begin(inode, pos, bytes,
-					   AOP_FLAG_NOFS, &page, iomap);
+		status = ops->write_begin(inode, pos, bytes,
+					  AOP_FLAG_NOFS, &page, iomap);
 		put_page(rpage);
 		if (unlikely(status))
 			return status;
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page);
+		status = ops->write_end(inode, pos, bytes, bytes, page);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -320,7 +333,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 	loff_t ret;
 
 	while (len) {
-		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
+		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, (void *)ops,
 				iomap_dirty_actor);
 		if (ret <= 0)
 			return ret;
@@ -333,20 +346,21 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 EXPORT_SYMBOL_GPL(iomap_file_dirty);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap)
+		unsigned bytes, const struct iomap_ops *ops,
+		struct iomap *iomap)
 {
 	struct page *page;
 	int status;
 
-	status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
-				   iomap);
+	status = ops->write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
+				  iomap);
 	if (status)
 		return status;
 
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page);
+	return ops->write_end(inode, pos, bytes, bytes, page);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -359,11 +373,16 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 			offset, bytes);
 }
 
+struct iomap_zero_range_args {
+	const struct iomap_ops *ops;
+	bool *did_zero;
+};
+
 static loff_t
 iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		void *data, struct iomap *iomap)
 {
-	bool *did_zero = data;
+	struct iomap_zero_range_args *args = data;
 	loff_t written = 0;
 	int status;
 
@@ -380,15 +399,16 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		if (IS_DAX(inode))
 			status = iomap_dax_zero(pos, offset, bytes, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap);
+			status = iomap_zero(inode, pos, offset, bytes,
+					(void *)args->ops, iomap);
 		if (status < 0)
 			return status;
 
 		pos += bytes;
 		count -= bytes;
 		written += bytes;
-		if (did_zero)
-			*did_zero = true;
+		if (args->did_zero)
+			*args->did_zero = true;
 	} while (count > 0);
 
 	return written;
@@ -398,11 +418,15 @@ int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
 {
+	struct iomap_zero_range_args args = {
+		.ops = ops,
+		.did_zero = did_zero,
+	};
 	loff_t ret;
 
 	while (len > 0) {
 		ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
-				ops, did_zero, iomap_zero_range_actor);
+				ops, &args, iomap_zero_range_actor);
 		if (ret <= 0)
 			return ret;
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 046469fcc1b8..2eec725cf989 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1199,6 +1199,8 @@ xfs_file_iomap_end(
 
 const struct iomap_ops xfs_iomap_ops = {
 	.iomap_begin		= xfs_file_iomap_begin,
+	.write_begin		= iomap_write_begin,
+	.write_end		= iomap_write_end,
 	.iomap_end		= xfs_file_iomap_end,
 };
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 19a07de28212..423f7ecc1231 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -61,6 +61,8 @@ struct iomap {
 #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
 #define IOMAP_NOWAIT		(1 << 5) /* Don't wait for writeback */
 
+struct page;
+
 struct iomap_ops {
 	/*
 	 * Return the existing mapping at pos, or reserve space starting at
@@ -70,6 +72,21 @@ struct iomap_ops {
 	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
 			unsigned flags, struct iomap *iomap);
 
+	/*
+	 * Begin writing to a page, similar to block_write_begin but in a
+	 * mapping returned by iomap_begin.  Usually initialized to
+	 * iomap_write_begin.
+	 */
+	int (*write_begin)(struct inode *inode, loff_t pos, unsigned len,
+			   unsigned flags, struct page **pagep,
+			   struct iomap *iomap);
+
+	/*
+	 * End writing to a page.  Usually initialized to iomap_write_end.
+	 */
+	int (*write_end)(struct inode *inode, loff_t pos, unsigned len,
+			 unsigned copied, struct page *page);
+
 	/*
 	 * Commit and/or unreserve space previous allocated using iomap_begin.
 	 * Written indicates the length of the successful write operation which
@@ -80,6 +97,11 @@ struct iomap_ops {
 			ssize_t written, unsigned flags, struct iomap *iomap);
 };
 
+int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+		      unsigned flags, struct page **pagep, struct iomap *iomap);
+int iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+		    unsigned copied, struct page *page);
+
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 06/14] iomap: Add write_{begin, end} iomap operations
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add write_begin and write_end operations to struct iomap_ops to provide
a way of overriding the default behavior of iomap_write_begin and
iomap_write_end.  This is needed for implementing data journaling: in
the data journaling case, pages are written into the journal before
being written back to their proper on-disk locations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
---
 fs/ext2/inode.c       |  2 ++
 fs/ext4/inode.c       |  2 ++
 fs/iomap.c            | 62 ++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_iomap.c    |  2 ++
 include/linux/iomap.h | 22 +++++++++++++++
 5 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..3da481f5e36e 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -848,6 +848,8 @@ ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 
 const struct iomap_ops ext2_iomap_ops = {
 	.iomap_begin		= ext2_iomap_begin,
+	.write_begin		= iomap_write_begin,
+	.write_end		= iomap_write_end,
 	.iomap_end		= ext2_iomap_end,
 };
 #else
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..05c8e8015f13 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3604,6 +3604,8 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 
 const struct iomap_ops ext4_iomap_ops = {
 	.iomap_begin		= ext4_iomap_begin,
+	.write_begin		= iomap_write_begin,
+	.write_end		= iomap_write_end,
 	.iomap_end		= ext4_iomap_end,
 };
 
diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..27d97a290623 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -108,7 +108,7 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
-static int
+int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
 {
@@ -137,8 +137,9 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	*pagep = page;
 	return status;
 }
+EXPORT_SYMBOL_GPL(iomap_write_begin);
 
-static int
+int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		unsigned copied, struct page *page)
 {
@@ -150,12 +151,19 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		iomap_write_failed(inode, pos, len);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iomap_write_end);
+
+struct iomap_write_args {
+	struct iov_iter *iter;
+	const struct iomap_ops *ops;
+};
 
 static loff_t
 iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
-	struct iov_iter *i = data;
+	struct iomap_write_args *args = data;
+	struct iov_iter *i = args->iter;
 	long status = 0;
 	ssize_t written = 0;
 	unsigned int flags = AOP_FLAG_NOFS;
@@ -188,7 +196,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, flags, &page,
+		status = args->ops->write_begin(inode, pos, bytes, flags, &page,
 				iomap);
 		if (unlikely(status))
 			break;
@@ -200,7 +208,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page);
+		status = args->ops->write_end(inode, pos, bytes, copied, page);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -237,10 +245,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
 	loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+	struct iomap_write_args args = {
+		.iter = iter,
+		.ops = ops,
+	};
 
 	while (iov_iter_count(iter)) {
 		ret = iomap_apply(inode, pos, iov_iter_count(iter),
-				IOMAP_WRITE, ops, iter, iomap_write_actor);
+				IOMAP_WRITE, ops, &args, iomap_write_actor);
 		if (ret <= 0)
 			break;
 		pos += ret;
@@ -271,6 +283,7 @@ static loff_t
 iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
+	const struct iomap_ops *ops = data;
 	long status = 0;
 	ssize_t written = 0;
 
@@ -286,15 +299,15 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		if (IS_ERR(rpage))
 			return PTR_ERR(rpage);
 
-		status = iomap_write_begin(inode, pos, bytes,
-					   AOP_FLAG_NOFS, &page, iomap);
+		status = ops->write_begin(inode, pos, bytes,
+					  AOP_FLAG_NOFS, &page, iomap);
 		put_page(rpage);
 		if (unlikely(status))
 			return status;
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page);
+		status = ops->write_end(inode, pos, bytes, bytes, page);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -320,7 +333,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 	loff_t ret;
 
 	while (len) {
-		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
+		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, (void *)ops,
 				iomap_dirty_actor);
 		if (ret <= 0)
 			return ret;
@@ -333,20 +346,21 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 EXPORT_SYMBOL_GPL(iomap_file_dirty);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap)
+		unsigned bytes, const struct iomap_ops *ops,
+		struct iomap *iomap)
 {
 	struct page *page;
 	int status;
 
-	status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
-				   iomap);
+	status = ops->write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
+				  iomap);
 	if (status)
 		return status;
 
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page);
+	return ops->write_end(inode, pos, bytes, bytes, page);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -359,11 +373,16 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 			offset, bytes);
 }
 
+struct iomap_zero_range_args {
+	const struct iomap_ops *ops;
+	bool *did_zero;
+};
+
 static loff_t
 iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		void *data, struct iomap *iomap)
 {
-	bool *did_zero = data;
+	struct iomap_zero_range_args *args = data;
 	loff_t written = 0;
 	int status;
 
@@ -380,15 +399,16 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		if (IS_DAX(inode))
 			status = iomap_dax_zero(pos, offset, bytes, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap);
+			status = iomap_zero(inode, pos, offset, bytes,
+					(void *)args->ops, iomap);
 		if (status < 0)
 			return status;
 
 		pos += bytes;
 		count -= bytes;
 		written += bytes;
-		if (did_zero)
-			*did_zero = true;
+		if (args->did_zero)
+			*args->did_zero = true;
 	} while (count > 0);
 
 	return written;
@@ -398,11 +418,15 @@ int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
 {
+	struct iomap_zero_range_args args = {
+		.ops = ops,
+		.did_zero = did_zero,
+	};
 	loff_t ret;
 
 	while (len > 0) {
 		ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
-				ops, did_zero, iomap_zero_range_actor);
+				ops, &args, iomap_zero_range_actor);
 		if (ret <= 0)
 			return ret;
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 046469fcc1b8..2eec725cf989 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1199,6 +1199,8 @@ xfs_file_iomap_end(
 
 const struct iomap_ops xfs_iomap_ops = {
 	.iomap_begin		= xfs_file_iomap_begin,
+	.write_begin		= iomap_write_begin,
+	.write_end		= iomap_write_end,
 	.iomap_end		= xfs_file_iomap_end,
 };
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 19a07de28212..423f7ecc1231 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -61,6 +61,8 @@ struct iomap {
 #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
 #define IOMAP_NOWAIT		(1 << 5) /* Don't wait for writeback */
 
+struct page;
+
 struct iomap_ops {
 	/*
 	 * Return the existing mapping@pos, or reserve space starting at
@@ -70,6 +72,21 @@ struct iomap_ops {
 	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
 			unsigned flags, struct iomap *iomap);
 
+	/*
+	 * Begin writing to a page, similar to block_write_begin but in a
+	 * mapping returned by iomap_begin.  Usually initialized to
+	 * iomap_write_begin.
+	 */
+	int (*write_begin)(struct inode *inode, loff_t pos, unsigned len,
+			   unsigned flags, struct page **pagep,
+			   struct iomap *iomap);
+
+	/*
+	 * End writing to a page.  Usually initialized to iomap_write_end.
+	 */
+	int (*write_end)(struct inode *inode, loff_t pos, unsigned len,
+			 unsigned copied, struct page *page);
+
 	/*
 	 * Commit and/or unreserve space previous allocated using iomap_begin.
 	 * Written indicates the length of the successful write operation which
@@ -80,6 +97,11 @@ struct iomap_ops {
 			ssize_t written, unsigned flags, struct iomap *iomap);
 };
 
+int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+		      unsigned flags, struct page **pagep, struct iomap *iomap);
+int iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+		    unsigned copied, struct page *page);
+
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 07/14] iomap: Mark newly allocated buffer heads as new
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set.  This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..5220d9efcd18 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,15 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 		break;
 	case IOMAP_UNWRITTEN:
 		/*
-		 * For unwritten regions, we always need to ensure that
-		 * sub-block writes cause the regions in the block we are not
-		 * writing to are zeroed. Set the buffer as new to ensure this.
+		 * For unwritten regions, we always need to ensure that regions
+		 * in the block we are not writing to are zeroed. Mark the
+		 * buffer as new to ensure this.
 		 */
 		set_buffer_new(bh);
 		set_buffer_unwritten(bh);
 		/* FALLTHRU */
 	case IOMAP_MAPPED:
-		if (offset >= i_size_read(inode))
+		if ((iomap->flags & IOMAP_F_NEW) || offset >= i_size_read(inode))
 			set_buffer_new(bh);
 		bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
 				inode->i_blkbits;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 07/14] iomap: Mark newly allocated buffer heads as new
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set.  This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..5220d9efcd18 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,15 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 		break;
 	case IOMAP_UNWRITTEN:
 		/*
-		 * For unwritten regions, we always need to ensure that
-		 * sub-block writes cause the regions in the block we are not
-		 * writing to are zeroed. Set the buffer as new to ensure this.
+		 * For unwritten regions, we always need to ensure that regions
+		 * in the block we are not writing to are zeroed. Mark the
+		 * buffer as new to ensure this.
 		 */
 		set_buffer_new(bh);
 		set_buffer_unwritten(bh);
 		/* FALLTHRU */
 	case IOMAP_MAPPED:
-		if (offset >= i_size_read(inode))
+		if ((iomap->flags & IOMAP_F_NEW) || offset >= i_size_read(inode))
 			set_buffer_new(bh);
 		bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
 				inode->i_blkbits;
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 08/14] gfs2: iomap buffered write support
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

With the traditional page-based writes, blocks are allocated separately
for each page written to.  With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage.  This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c |  20 +--
 fs/gfs2/aops.h |  22 ++++
 fs/gfs2/bmap.c | 346 ++++++++++++++++++++++++++++++++++++++++++++-----
 fs/gfs2/file.c |  46 ++++++-
 4 files changed, 388 insertions(+), 46 deletions(-)
 create mode 100644 fs/gfs2/aops.h

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a1c6b5de9200..3d9633175aa8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -22,6 +22,7 @@
 #include <linux/backing-dev.h>
 #include <linux/uio.h>
 #include <trace/events/writeback.h>
+#include <linux/sched/signal.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -36,10 +37,11 @@
 #include "super.h"
 #include "util.h"
 #include "glops.h"
+#include "aops.h"
 
 
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
-				   unsigned int from, unsigned int len)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+			    unsigned int from, unsigned int len)
 {
 	struct buffer_head *head = page_buffers(page);
 	unsigned int bsize = head->b_size;
@@ -462,7 +464,7 @@ static int gfs2_jdata_writepages(struct address_space *mapping,
  * Returns: errno
  */
 
-static int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
+int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
 {
 	struct buffer_head *dibh;
 	u64 dsize = i_size_read(&ip->i_inode);
@@ -773,7 +775,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
  * adjust_fs_space - Adjusts the free space available due to gfs2_grow
  * @inode: the rindex inode
  */
-static void adjust_fs_space(struct inode *inode)
+void adjust_fs_space(struct inode *inode)
 {
 	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
 	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
@@ -819,11 +821,11 @@ static void adjust_fs_space(struct inode *inode)
  * This copies the data from the page into the inode block after
  * the inode data structure itself.
  *
- * Returns: errno
+ * Returns: copied bytes or errno
  */
-static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
-				  loff_t pos, unsigned copied,
-				  struct page *page)
+int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+			   loff_t pos, unsigned copied,
+			   struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	u64 to = pos + copied;
@@ -862,7 +864,7 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
  * The main write_end function for GFS2. We just put our locking around the VFS
  * provided functions.
  *
- * Returns: errno
+ * Returns: copied bytes or errno
  */
 
 static int gfs2_write_end(struct file *file, struct address_space *mapping,
diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
new file mode 100644
index 000000000000..976bb32dd405
--- /dev/null
+++ b/fs/gfs2/aops.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2017 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License version 2.
+ */
+
+#ifndef __AOPS_DOT_H__
+#define __AOPS_DOT_H__
+
+#include "incore.h"
+
+extern int stuffed_readpage(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+				  loff_t pos, unsigned copied,
+				  struct page *page);
+extern void adjust_fs_space(struct inode *inode);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+				   unsigned int from, unsigned int len);
+
+#endif /* __AOPS_DOT_H__ */
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 8ea65aea34da..d70672a1a29c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -28,6 +28,7 @@
 #include "trans.h"
 #include "dir.h"
 #include "util.h"
+#include "aops.h"
 #include "trace_gfs2.h"
 
 /* This doesn't need to be that large as max 64 bit pointers in a 4k
@@ -41,6 +42,8 @@ struct metapath {
 	int mp_aheight; /* actual height (lookup height) */
 };
 
+static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length);
+
 /**
  * gfs2_unstuffer_page - unstuff a stuffed inode into a block cached by a page
  * @ip: the inode
@@ -389,7 +392,7 @@ static int fillup_metapath(struct gfs2_inode *ip, struct metapath *mp, int h)
 	return mp->mp_aheight - x - 1;
 }
 
-static inline void release_metapath(struct metapath *mp)
+static void release_metapath(struct metapath *mp)
 {
 	int i;
 
@@ -397,6 +400,7 @@ static inline void release_metapath(struct metapath *mp)
 		if (mp->mp_bh[i] == NULL)
 			break;
 		brelse(mp->mp_bh[i]);
+		mp->mp_bh[i] = NULL;
 	}
 }
 
@@ -609,11 +613,13 @@ enum alloc_state {
  *  ii) Indirect blocks to fill in lower part of the metadata tree
  * iii) Data blocks
  *
- * The function is in two parts. The first part works out the total
- * number of blocks which we need. The second part does the actual
- * allocation asking for an extent at a time (if enough contiguous free
- * blocks are available, there will only be one request per bmap call)
- * and uses the state machine to initialise the blocks in order.
+ * This function is called after gfs2_iomap_get, which works out the
+ * total number of blocks which we need via gfs2_alloc_size.
+ *
+ * We then do the actual allocation asking for an extent at a time (if
+ * enough contiguous free blocks are available, there will only be one
+ * allocation request per call) and uses the state machine to initialise
+ * the blocks in order.
  *
  * Right now, this function will allocate at most one indirect block
  * worth of data -- with a default block size of 4K, that's slightly
@@ -633,39 +639,26 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	struct buffer_head *dibh = mp->mp_bh[0];
 	u64 bn;
 	unsigned n, i, blks, alloced = 0, iblks = 0, branch_start = 0;
-	unsigned dblks = 0;
-	unsigned ptrs_per_blk;
+	size_t dblks = iomap->length >> inode->i_blkbits;
 	const unsigned end_of_metadata = mp->mp_fheight - 1;
 	int ret;
 	enum alloc_state state;
 	__be64 *ptr;
 	__be64 zero_bn = 0;
-	size_t maxlen = iomap->length >> inode->i_blkbits;
 
 	BUG_ON(mp->mp_aheight < 1);
 	BUG_ON(dibh == NULL);
+	BUG_ON(dblks < 1);
 
 	gfs2_trans_add_meta(ip->i_gl, dibh);
 
 	down_write(&ip->i_rw_mutex);
 
 	if (mp->mp_fheight == mp->mp_aheight) {
-		struct buffer_head *bh;
-		int eob;
-
-		/* Bottom indirect block exists, find unalloced extent size */
-		ptr = metapointer(end_of_metadata, mp);
-		bh = mp->mp_bh[end_of_metadata];
-		dblks = gfs2_extent_length(bh->b_data, bh->b_size, ptr,
-					   maxlen, &eob);
-		BUG_ON(dblks < 1);
+		/* Bottom indirect block exists */
 		state = ALLOC_DATA;
 	} else {
 		/* Need to allocate indirect blocks */
-		ptrs_per_blk = mp->mp_fheight > 1 ? sdp->sd_inptrs :
-			sdp->sd_diptrs;
-		dblks = min(maxlen, (size_t)(ptrs_per_blk -
-					     mp->mp_list[end_of_metadata]));
 		if (mp->mp_fheight == ip->i_height) {
 			/* Writing into existing tree, extend tree down */
 			iblks = mp->mp_fheight - mp->mp_aheight;
@@ -750,6 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 		}
 	} while (iomap->addr == IOMAP_NULL_ADDR);
 
+	iomap->type = IOMAP_MAPPED;
 	iomap->length = (u64)dblks << inode->i_blkbits;
 	ip->i_height = mp->mp_fheight;
 	gfs2_add_inode_blocks(&ip->i_inode, alloced);
@@ -759,18 +753,63 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	return ret;
 }
 
-static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
+static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
+			       u64 length)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 
 	iomap->addr = (ip->i_no_addr << inode->i_blkbits) +
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
-	iomap->length = i_size_read(inode);
+	iomap->length = length;
 	iomap->type = IOMAP_MAPPED;
 	iomap->flags = IOMAP_F_DATA_INLINE;
 }
 
+/**
+ * gfs2_alloc_size - Compute the maximum allocation size
+ * @inode: The inode
+ * @mp: The metapath
+ * @size: Requested size in blocks
+ *
+ * Compute the maximum size of the next allocation at @mp.
+ *
+ * Returns: size in blocks
+ */
+static u64 gfs2_alloc_size(struct inode *inode, struct metapath *mp, u64 size)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	const __be64 *first, *ptr, *end;
+
+	/*
+	 * For writes to stuffed files, this function is called twice via
+	 * gfs2_iomap_get, before and after unstuffing. The size we return the
+	 * first time needs to be large enough to get the reservation and
+	 * allocation sizes right.  The size we return the second time must
+	 * be exact or else gfs2_iomap_alloc won't do the right thing.
+	 */
+
+	if (gfs2_is_stuffed(ip) || mp->mp_fheight != mp->mp_aheight) {
+		unsigned maxsize = mp->mp_fheight > 1 ?
+			sdp->sd_inptrs : sdp->sd_diptrs;
+		maxsize -= mp->mp_list[mp->mp_fheight - 1];
+		if (size > maxsize)
+			size = maxsize;
+		return size;
+	}
+
+	first = metapointer(ip->i_height - 1, mp);
+	end = metaend(ip->i_height - 1, mp);
+	if (end - first > size)
+		end = first + size;
+	for (ptr = first; ptr < end; ptr++) {
+		if (*ptr)
+			break;
+	}
+	return ptr - first;
+}
+
 /**
  * gfs2_iomap_get - Map blocks from an inode to disk blocks
  * @inode: The inode
@@ -804,10 +843,15 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		if (flags & IOMAP_REPORT) {
 			if (pos >= i_size_read(inode))
 				return -ENOENT;
-			gfs2_stuffed_iomap(inode, iomap);
+			gfs2_stuffed_iomap(inode, iomap,
+					   i_size_read(inode));
+			return 0;
+		}
+		if (pos + length <= gfs2_max_stuffed_size(ip)) {
+			gfs2_stuffed_iomap(inode, iomap,
+					   gfs2_max_stuffed_size(ip));
 			return 0;
 		}
-		BUG_ON(!(flags & IOMAP_WRITE));
 	}
 	lblock = pos >> inode->i_blkbits;
 	iomap->offset = lblock << inode->i_blkbits;
@@ -867,10 +911,147 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 		else
 			iomap->length = size - pos;
+	} else if (flags & IOMAP_WRITE) {
+		u64 size;
+		size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
+		if (size < iomap->length)
+			iomap->length = size;
+	} else {
+		if (height == ip->i_height)
+			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 	}
 	goto out;
 }
 
+static int gfs2_write_lock(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	int error;
+
+	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+	error = gfs2_glock_nq(&ip->i_gh);
+	if (error)
+		goto out_uninit;
+	if (&ip->i_inode == sdp->sd_rindex) {
+		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+		error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
+					   GL_NOCACHE, &m_ip->i_gh);
+		if (error)
+			goto out_unlock;
+	}
+	return 0;
+
+out_unlock:
+	gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+	gfs2_holder_uninit(&ip->i_gh);
+	return error;
+}
+
+static void gfs2_write_unlock(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	if (&ip->i_inode == sdp->sd_rindex) {
+		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+		gfs2_glock_dq_uninit(&m_ip->i_gh);
+	}
+	gfs2_glock_dq_uninit(&ip->i_gh);
+}
+
+static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length,
+				  unsigned flags, struct iomap *iomap)
+{
+	struct metapath mp = { .mp_aheight = 1, };
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+	bool unstuff, alloc_required;
+	int ret;
+
+	ret = gfs2_write_lock(inode);
+	if (ret)
+		return ret;
+
+	unstuff = gfs2_is_stuffed(ip) &&
+		  pos + length > gfs2_max_stuffed_size(ip);
+
+	ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+	if (ret)
+		goto out_release;
+
+	alloc_required = unstuff || iomap->type != IOMAP_MAPPED;
+
+	if (alloc_required || gfs2_is_jdata(ip))
+		gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
+
+	if (alloc_required) {
+		struct gfs2_alloc_parms ap = { .target = data_blocks + ind_blocks };
+		ret = gfs2_quota_lock_check(ip, &ap);
+		if (ret)
+			goto out_release;
+
+		ret = gfs2_inplace_reserve(ip, &ap);
+		if (ret)
+			goto out_qunlock;
+	}
+
+	rblocks = RES_DINODE + ind_blocks;
+	if (gfs2_is_jdata(ip))
+		rblocks += data_blocks;
+	if (ind_blocks || data_blocks)
+		rblocks += RES_STATFS + RES_QUOTA;
+	if (inode == sdp->sd_rindex)
+		rblocks += 2 * RES_STATFS;
+	if (alloc_required)
+		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+
+	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
+	if (ret)
+		goto out_trans_fail;
+
+	if (unstuff) {
+		ret = gfs2_unstuff_dinode(ip, NULL);
+		if (ret)
+			goto out_trans_end;
+		release_metapath(&mp);
+		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
+				     flags, iomap, &mp);
+		if (ret)
+			goto out_trans_end;
+	}
+
+	if (iomap->type != IOMAP_MAPPED) {
+		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+		if (ret) {
+			gfs2_trans_end(sdp);
+			if (alloc_required)
+				gfs2_inplace_release(ip);
+			punch_hole(ip, iomap->offset, iomap->length);
+			goto out_qunlock;
+		}
+	}
+	release_metapath(&mp);
+	return 0;
+
+out_trans_end:
+	gfs2_trans_end(sdp);
+out_trans_fail:
+	if (alloc_required)
+		gfs2_inplace_release(ip);
+out_qunlock:
+	if (alloc_required)
+		gfs2_quota_unlock(ip);
+out_release:
+	release_metapath(&mp);
+	gfs2_write_unlock(inode);
+	return ret;
+}
+
 static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 			    unsigned flags, struct iomap *iomap)
 {
@@ -880,10 +1061,7 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	trace_gfs2_iomap_start(ip, pos, length, flags);
 	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
-		if (!ret && iomap->type == IOMAP_HOLE)
-			ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
-		release_metapath(&mp);
+		ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
 	} else {
 		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 		release_metapath(&mp);
@@ -892,8 +1070,116 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
+static int
+gfs2_iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+		       unsigned flags, struct page **pagep, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct page *page;
+	int ret;
+
+	if (gfs2_is_stuffed(ip)) {
+		BUG_ON(pos + len > gfs2_max_stuffed_size(ip));
+
+		page = grab_cache_page_write_begin(inode->i_mapping, 0, flags);
+		if (!page)
+			return -ENOMEM;
+
+		if (!PageUptodate(page)) {
+			ret = stuffed_readpage(ip, page);
+			if (ret) {
+				unlock_page(page);
+				put_page(page);
+				return ret;
+			}
+		}
+		*pagep = page;
+		return 0;
+	}
+
+	return iomap_write_begin(inode, pos, len, flags, pagep, iomap);
+}
+
+static int gfs2_iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+				unsigned copied, struct page *page)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct buffer_head *dibh;
+	int ret;
+
+	if (gfs2_is_stuffed(ip)) {
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (ret) {
+			unlock_page(page);
+			put_page(page);
+			return ret;
+		}
+		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
+		brelse(dibh);
+		return ret;
+	}
+
+	if (gfs2_is_jdata(ip))
+		gfs2_page_add_databufs(ip, page, offset_in_page(pos), len);
+
+	return iomap_write_end(inode, pos, len, copied, page);
+}
+
+static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+			  ssize_t written, unsigned flags, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	struct gfs2_trans *tr = current->journal_info;
+	int ret = 0;
+
+	if (!(flags & IOMAP_WRITE))
+		return 0;
+
+	gfs2_ordered_add_inode(ip);
+
+	if (tr->tr_num_buf_new)
+		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+	else {
+		struct buffer_head *dibh;
+
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (likely(!ret)) {
+			gfs2_trans_add_meta(ip->i_gl, dibh);
+			brelse(dibh);
+		}
+	}
+
+	if (inode == sdp->sd_rindex) {
+		adjust_fs_space(inode);
+		sdp->sd_rindex_uptodate = 0;
+	}
+
+	gfs2_trans_end(sdp);
+	gfs2_inplace_release(ip);
+
+	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
+		/* Deallocate blocks that were just allocated. */
+		loff_t blockmask = i_blocksize(inode) - 1;
+		loff_t end = (pos + length) & ~blockmask;
+		pos = (pos + written + blockmask) & ~blockmask;
+		if (pos < end) {
+			truncate_pagecache_range(inode, pos, end - 1);
+			punch_hole(ip, pos, end - pos);
+		}
+	}
+
+	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
+		gfs2_quota_unlock(ip);
+	gfs2_write_unlock(inode);
+	return ret;
+}
+
 const struct iomap_ops gfs2_iomap_ops = {
 	.iomap_begin = gfs2_iomap_begin,
+	.write_begin = gfs2_iomap_write_begin,
+	.write_end = gfs2_iomap_write_end,
+	.iomap_end = gfs2_iomap_end,
 };
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 378ff1a1e9ae..ebc245536cb3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -26,10 +26,12 @@
 #include <linux/dlm.h>
 #include <linux/dlm_plock.h>
 #include <linux/delay.h>
+#include <linux/backing-dev.h>
 
 #include "gfs2.h"
 #include "incore.h"
 #include "bmap.h"
+#include "aops.h"
 #include "dir.h"
 #include "glock.h"
 #include "glops.h"
@@ -691,9 +693,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 /**
  * gfs2_file_write_iter - Perform a write to a file
  * @iocb: The io context
- * @iov: The data to write
- * @nr_segs: Number of @iov segments
- * @pos: The file position
+ * @from: The data to write
  *
  * We have to do a lock/unlock here to refresh the inode size for
  * O_APPEND writes, otherwise we can land up writing at the wrong
@@ -705,8 +705,9 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
-	struct gfs2_inode *ip = GFS2_I(file_inode(file));
-	int ret;
+	struct inode *inode = file_inode(file);
+	struct gfs2_inode *ip = GFS2_I(inode);
+	ssize_t ret;
 
 	ret = gfs2_rsqa_alloc(ip);
 	if (ret)
@@ -723,7 +724,38 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		gfs2_glock_dq_uninit(&gh);
 	}
 
-	return generic_file_write_iter(iocb, from);
+	if (iocb->ki_flags & IOCB_DIRECT)
+		return generic_file_write_iter(iocb, from);
+
+	inode_lock(inode);
+	ret = generic_write_checks(iocb, from);
+	if (ret <= 0)
+		goto out;
+
+	/* We can write back this queue in page reclaim */
+	current->backing_dev_info = inode_to_bdi(inode);
+
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out2;
+
+	ret = file_update_time(file);
+	if (ret)
+		goto out2;
+
+	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+
+out2:
+	current->backing_dev_info = NULL;
+out:
+	inode_unlock(inode);
+	if (likely(ret > 0)) {
+		iocb->ki_pos += ret;
+
+		/* Handle various SYNC-type writes */
+		ret = generic_write_sync(iocb, ret);
+	}
+	return ret;
 }
 
 static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -754,7 +786,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 		if (error)
 			goto out;
 		offset = iomap.offset + iomap.length;
-		if (iomap.type != IOMAP_HOLE)
+		if (!(iomap.flags & IOMAP_F_NEW))
 			continue;
 		error = sb_issue_zeroout(sb, iomap.addr >> inode->i_blkbits,
 					 iomap.length >> inode->i_blkbits,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 08/14] gfs2: iomap buffered write support
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

With the traditional page-based writes, blocks are allocated separately
for each page written to.  With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage.  This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c |  20 +--
 fs/gfs2/aops.h |  22 ++++
 fs/gfs2/bmap.c | 346 ++++++++++++++++++++++++++++++++++++++++++++-----
 fs/gfs2/file.c |  46 ++++++-
 4 files changed, 388 insertions(+), 46 deletions(-)
 create mode 100644 fs/gfs2/aops.h

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a1c6b5de9200..3d9633175aa8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -22,6 +22,7 @@
 #include <linux/backing-dev.h>
 #include <linux/uio.h>
 #include <trace/events/writeback.h>
+#include <linux/sched/signal.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -36,10 +37,11 @@
 #include "super.h"
 #include "util.h"
 #include "glops.h"
+#include "aops.h"
 
 
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
-				   unsigned int from, unsigned int len)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+			    unsigned int from, unsigned int len)
 {
 	struct buffer_head *head = page_buffers(page);
 	unsigned int bsize = head->b_size;
@@ -462,7 +464,7 @@ static int gfs2_jdata_writepages(struct address_space *mapping,
  * Returns: errno
  */
 
-static int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
+int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
 {
 	struct buffer_head *dibh;
 	u64 dsize = i_size_read(&ip->i_inode);
@@ -773,7 +775,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
  * adjust_fs_space - Adjusts the free space available due to gfs2_grow
  * @inode: the rindex inode
  */
-static void adjust_fs_space(struct inode *inode)
+void adjust_fs_space(struct inode *inode)
 {
 	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
 	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
@@ -819,11 +821,11 @@ static void adjust_fs_space(struct inode *inode)
  * This copies the data from the page into the inode block after
  * the inode data structure itself.
  *
- * Returns: errno
+ * Returns: copied bytes or errno
  */
-static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
-				  loff_t pos, unsigned copied,
-				  struct page *page)
+int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+			   loff_t pos, unsigned copied,
+			   struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	u64 to = pos + copied;
@@ -862,7 +864,7 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
  * The main write_end function for GFS2. We just put our locking around the VFS
  * provided functions.
  *
- * Returns: errno
+ * Returns: copied bytes or errno
  */
 
 static int gfs2_write_end(struct file *file, struct address_space *mapping,
diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
new file mode 100644
index 000000000000..976bb32dd405
--- /dev/null
+++ b/fs/gfs2/aops.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2017 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License version 2.
+ */
+
+#ifndef __AOPS_DOT_H__
+#define __AOPS_DOT_H__
+
+#include "incore.h"
+
+extern int stuffed_readpage(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+				  loff_t pos, unsigned copied,
+				  struct page *page);
+extern void adjust_fs_space(struct inode *inode);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+				   unsigned int from, unsigned int len);
+
+#endif /* __AOPS_DOT_H__ */
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 8ea65aea34da..d70672a1a29c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -28,6 +28,7 @@
 #include "trans.h"
 #include "dir.h"
 #include "util.h"
+#include "aops.h"
 #include "trace_gfs2.h"
 
 /* This doesn't need to be that large as max 64 bit pointers in a 4k
@@ -41,6 +42,8 @@ struct metapath {
 	int mp_aheight; /* actual height (lookup height) */
 };
 
+static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length);
+
 /**
  * gfs2_unstuffer_page - unstuff a stuffed inode into a block cached by a page
  * @ip: the inode
@@ -389,7 +392,7 @@ static int fillup_metapath(struct gfs2_inode *ip, struct metapath *mp, int h)
 	return mp->mp_aheight - x - 1;
 }
 
-static inline void release_metapath(struct metapath *mp)
+static void release_metapath(struct metapath *mp)
 {
 	int i;
 
@@ -397,6 +400,7 @@ static inline void release_metapath(struct metapath *mp)
 		if (mp->mp_bh[i] == NULL)
 			break;
 		brelse(mp->mp_bh[i]);
+		mp->mp_bh[i] = NULL;
 	}
 }
 
@@ -609,11 +613,13 @@ enum alloc_state {
  *  ii) Indirect blocks to fill in lower part of the metadata tree
  * iii) Data blocks
  *
- * The function is in two parts. The first part works out the total
- * number of blocks which we need. The second part does the actual
- * allocation asking for an extent at a time (if enough contiguous free
- * blocks are available, there will only be one request per bmap call)
- * and uses the state machine to initialise the blocks in order.
+ * This function is called after gfs2_iomap_get, which works out the
+ * total number of blocks which we need via gfs2_alloc_size.
+ *
+ * We then do the actual allocation asking for an extent at a time (if
+ * enough contiguous free blocks are available, there will only be one
+ * allocation request per call) and uses the state machine to initialise
+ * the blocks in order.
  *
  * Right now, this function will allocate at most one indirect block
  * worth of data -- with a default block size of 4K, that's slightly
@@ -633,39 +639,26 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	struct buffer_head *dibh = mp->mp_bh[0];
 	u64 bn;
 	unsigned n, i, blks, alloced = 0, iblks = 0, branch_start = 0;
-	unsigned dblks = 0;
-	unsigned ptrs_per_blk;
+	size_t dblks = iomap->length >> inode->i_blkbits;
 	const unsigned end_of_metadata = mp->mp_fheight - 1;
 	int ret;
 	enum alloc_state state;
 	__be64 *ptr;
 	__be64 zero_bn = 0;
-	size_t maxlen = iomap->length >> inode->i_blkbits;
 
 	BUG_ON(mp->mp_aheight < 1);
 	BUG_ON(dibh == NULL);
+	BUG_ON(dblks < 1);
 
 	gfs2_trans_add_meta(ip->i_gl, dibh);
 
 	down_write(&ip->i_rw_mutex);
 
 	if (mp->mp_fheight == mp->mp_aheight) {
-		struct buffer_head *bh;
-		int eob;
-
-		/* Bottom indirect block exists, find unalloced extent size */
-		ptr = metapointer(end_of_metadata, mp);
-		bh = mp->mp_bh[end_of_metadata];
-		dblks = gfs2_extent_length(bh->b_data, bh->b_size, ptr,
-					   maxlen, &eob);
-		BUG_ON(dblks < 1);
+		/* Bottom indirect block exists */
 		state = ALLOC_DATA;
 	} else {
 		/* Need to allocate indirect blocks */
-		ptrs_per_blk = mp->mp_fheight > 1 ? sdp->sd_inptrs :
-			sdp->sd_diptrs;
-		dblks = min(maxlen, (size_t)(ptrs_per_blk -
-					     mp->mp_list[end_of_metadata]));
 		if (mp->mp_fheight == ip->i_height) {
 			/* Writing into existing tree, extend tree down */
 			iblks = mp->mp_fheight - mp->mp_aheight;
@@ -750,6 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 		}
 	} while (iomap->addr == IOMAP_NULL_ADDR);
 
+	iomap->type = IOMAP_MAPPED;
 	iomap->length = (u64)dblks << inode->i_blkbits;
 	ip->i_height = mp->mp_fheight;
 	gfs2_add_inode_blocks(&ip->i_inode, alloced);
@@ -759,18 +753,63 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
 	return ret;
 }
 
-static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
+static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
+			       u64 length)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 
 	iomap->addr = (ip->i_no_addr << inode->i_blkbits) +
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
-	iomap->length = i_size_read(inode);
+	iomap->length = length;
 	iomap->type = IOMAP_MAPPED;
 	iomap->flags = IOMAP_F_DATA_INLINE;
 }
 
+/**
+ * gfs2_alloc_size - Compute the maximum allocation size
+ * @inode: The inode
+ * @mp: The metapath
+ * @size: Requested size in blocks
+ *
+ * Compute the maximum size of the next allocation at @mp.
+ *
+ * Returns: size in blocks
+ */
+static u64 gfs2_alloc_size(struct inode *inode, struct metapath *mp, u64 size)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	const __be64 *first, *ptr, *end;
+
+	/*
+	 * For writes to stuffed files, this function is called twice via
+	 * gfs2_iomap_get, before and after unstuffing. The size we return the
+	 * first time needs to be large enough to get the reservation and
+	 * allocation sizes right.  The size we return the second time must
+	 * be exact or else gfs2_iomap_alloc won't do the right thing.
+	 */
+
+	if (gfs2_is_stuffed(ip) || mp->mp_fheight != mp->mp_aheight) {
+		unsigned maxsize = mp->mp_fheight > 1 ?
+			sdp->sd_inptrs : sdp->sd_diptrs;
+		maxsize -= mp->mp_list[mp->mp_fheight - 1];
+		if (size > maxsize)
+			size = maxsize;
+		return size;
+	}
+
+	first = metapointer(ip->i_height - 1, mp);
+	end = metaend(ip->i_height - 1, mp);
+	if (end - first > size)
+		end = first + size;
+	for (ptr = first; ptr < end; ptr++) {
+		if (*ptr)
+			break;
+	}
+	return ptr - first;
+}
+
 /**
  * gfs2_iomap_get - Map blocks from an inode to disk blocks
  * @inode: The inode
@@ -804,10 +843,15 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		if (flags & IOMAP_REPORT) {
 			if (pos >= i_size_read(inode))
 				return -ENOENT;
-			gfs2_stuffed_iomap(inode, iomap);
+			gfs2_stuffed_iomap(inode, iomap,
+					   i_size_read(inode));
+			return 0;
+		}
+		if (pos + length <= gfs2_max_stuffed_size(ip)) {
+			gfs2_stuffed_iomap(inode, iomap,
+					   gfs2_max_stuffed_size(ip));
 			return 0;
 		}
-		BUG_ON(!(flags & IOMAP_WRITE));
 	}
 	lblock = pos >> inode->i_blkbits;
 	iomap->offset = lblock << inode->i_blkbits;
@@ -867,10 +911,147 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 		else
 			iomap->length = size - pos;
+	} else if (flags & IOMAP_WRITE) {
+		u64 size;
+		size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
+		if (size < iomap->length)
+			iomap->length = size;
+	} else {
+		if (height == ip->i_height)
+			ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
 	}
 	goto out;
 }
 
+static int gfs2_write_lock(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	int error;
+
+	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+	error = gfs2_glock_nq(&ip->i_gh);
+	if (error)
+		goto out_uninit;
+	if (&ip->i_inode == sdp->sd_rindex) {
+		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+		error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
+					   GL_NOCACHE, &m_ip->i_gh);
+		if (error)
+			goto out_unlock;
+	}
+	return 0;
+
+out_unlock:
+	gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+	gfs2_holder_uninit(&ip->i_gh);
+	return error;
+}
+
+static void gfs2_write_unlock(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	if (&ip->i_inode == sdp->sd_rindex) {
+		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+		gfs2_glock_dq_uninit(&m_ip->i_gh);
+	}
+	gfs2_glock_dq_uninit(&ip->i_gh);
+}
+
+static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length,
+				  unsigned flags, struct iomap *iomap)
+{
+	struct metapath mp = { .mp_aheight = 1, };
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+	bool unstuff, alloc_required;
+	int ret;
+
+	ret = gfs2_write_lock(inode);
+	if (ret)
+		return ret;
+
+	unstuff = gfs2_is_stuffed(ip) &&
+		  pos + length > gfs2_max_stuffed_size(ip);
+
+	ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+	if (ret)
+		goto out_release;
+
+	alloc_required = unstuff || iomap->type != IOMAP_MAPPED;
+
+	if (alloc_required || gfs2_is_jdata(ip))
+		gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
+
+	if (alloc_required) {
+		struct gfs2_alloc_parms ap = { .target = data_blocks + ind_blocks };
+		ret = gfs2_quota_lock_check(ip, &ap);
+		if (ret)
+			goto out_release;
+
+		ret = gfs2_inplace_reserve(ip, &ap);
+		if (ret)
+			goto out_qunlock;
+	}
+
+	rblocks = RES_DINODE + ind_blocks;
+	if (gfs2_is_jdata(ip))
+		rblocks += data_blocks;
+	if (ind_blocks || data_blocks)
+		rblocks += RES_STATFS + RES_QUOTA;
+	if (inode == sdp->sd_rindex)
+		rblocks += 2 * RES_STATFS;
+	if (alloc_required)
+		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+
+	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
+	if (ret)
+		goto out_trans_fail;
+
+	if (unstuff) {
+		ret = gfs2_unstuff_dinode(ip, NULL);
+		if (ret)
+			goto out_trans_end;
+		release_metapath(&mp);
+		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
+				     flags, iomap, &mp);
+		if (ret)
+			goto out_trans_end;
+	}
+
+	if (iomap->type != IOMAP_MAPPED) {
+		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+		if (ret) {
+			gfs2_trans_end(sdp);
+			if (alloc_required)
+				gfs2_inplace_release(ip);
+			punch_hole(ip, iomap->offset, iomap->length);
+			goto out_qunlock;
+		}
+	}
+	release_metapath(&mp);
+	return 0;
+
+out_trans_end:
+	gfs2_trans_end(sdp);
+out_trans_fail:
+	if (alloc_required)
+		gfs2_inplace_release(ip);
+out_qunlock:
+	if (alloc_required)
+		gfs2_quota_unlock(ip);
+out_release:
+	release_metapath(&mp);
+	gfs2_write_unlock(inode);
+	return ret;
+}
+
 static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 			    unsigned flags, struct iomap *iomap)
 {
@@ -880,10 +1061,7 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	trace_gfs2_iomap_start(ip, pos, length, flags);
 	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
-		if (!ret && iomap->type == IOMAP_HOLE)
-			ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
-		release_metapath(&mp);
+		ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
 	} else {
 		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 		release_metapath(&mp);
@@ -892,8 +1070,116 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
+static int
+gfs2_iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+		       unsigned flags, struct page **pagep, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct page *page;
+	int ret;
+
+	if (gfs2_is_stuffed(ip)) {
+		BUG_ON(pos + len > gfs2_max_stuffed_size(ip));
+
+		page = grab_cache_page_write_begin(inode->i_mapping, 0, flags);
+		if (!page)
+			return -ENOMEM;
+
+		if (!PageUptodate(page)) {
+			ret = stuffed_readpage(ip, page);
+			if (ret) {
+				unlock_page(page);
+				put_page(page);
+				return ret;
+			}
+		}
+		*pagep = page;
+		return 0;
+	}
+
+	return iomap_write_begin(inode, pos, len, flags, pagep, iomap);
+}
+
+static int gfs2_iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+				unsigned copied, struct page *page)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct buffer_head *dibh;
+	int ret;
+
+	if (gfs2_is_stuffed(ip)) {
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (ret) {
+			unlock_page(page);
+			put_page(page);
+			return ret;
+		}
+		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
+		brelse(dibh);
+		return ret;
+	}
+
+	if (gfs2_is_jdata(ip))
+		gfs2_page_add_databufs(ip, page, offset_in_page(pos), len);
+
+	return iomap_write_end(inode, pos, len, copied, page);
+}
+
+static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+			  ssize_t written, unsigned flags, struct iomap *iomap)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	struct gfs2_trans *tr = current->journal_info;
+	int ret = 0;
+
+	if (!(flags & IOMAP_WRITE))
+		return 0;
+
+	gfs2_ordered_add_inode(ip);
+
+	if (tr->tr_num_buf_new)
+		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+	else {
+		struct buffer_head *dibh;
+
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (likely(!ret)) {
+			gfs2_trans_add_meta(ip->i_gl, dibh);
+			brelse(dibh);
+		}
+	}
+
+	if (inode == sdp->sd_rindex) {
+		adjust_fs_space(inode);
+		sdp->sd_rindex_uptodate = 0;
+	}
+
+	gfs2_trans_end(sdp);
+	gfs2_inplace_release(ip);
+
+	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
+		/* Deallocate blocks that were just allocated. */
+		loff_t blockmask = i_blocksize(inode) - 1;
+		loff_t end = (pos + length) & ~blockmask;
+		pos = (pos + written + blockmask) & ~blockmask;
+		if (pos < end) {
+			truncate_pagecache_range(inode, pos, end - 1);
+			punch_hole(ip, pos, end - pos);
+		}
+	}
+
+	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
+		gfs2_quota_unlock(ip);
+	gfs2_write_unlock(inode);
+	return ret;
+}
+
 const struct iomap_ops gfs2_iomap_ops = {
 	.iomap_begin = gfs2_iomap_begin,
+	.write_begin = gfs2_iomap_write_begin,
+	.write_end = gfs2_iomap_write_end,
+	.iomap_end = gfs2_iomap_end,
 };
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 378ff1a1e9ae..ebc245536cb3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -26,10 +26,12 @@
 #include <linux/dlm.h>
 #include <linux/dlm_plock.h>
 #include <linux/delay.h>
+#include <linux/backing-dev.h>
 
 #include "gfs2.h"
 #include "incore.h"
 #include "bmap.h"
+#include "aops.h"
 #include "dir.h"
 #include "glock.h"
 #include "glops.h"
@@ -691,9 +693,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 /**
  * gfs2_file_write_iter - Perform a write to a file
  * @iocb: The io context
- * @iov: The data to write
- * @nr_segs: Number of @iov segments
- * @pos: The file position
+ * @from: The data to write
  *
  * We have to do a lock/unlock here to refresh the inode size for
  * O_APPEND writes, otherwise we can land up writing at the wrong
@@ -705,8 +705,9 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
-	struct gfs2_inode *ip = GFS2_I(file_inode(file));
-	int ret;
+	struct inode *inode = file_inode(file);
+	struct gfs2_inode *ip = GFS2_I(inode);
+	ssize_t ret;
 
 	ret = gfs2_rsqa_alloc(ip);
 	if (ret)
@@ -723,7 +724,38 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		gfs2_glock_dq_uninit(&gh);
 	}
 
-	return generic_file_write_iter(iocb, from);
+	if (iocb->ki_flags & IOCB_DIRECT)
+		return generic_file_write_iter(iocb, from);
+
+	inode_lock(inode);
+	ret = generic_write_checks(iocb, from);
+	if (ret <= 0)
+		goto out;
+
+	/* We can write back this queue in page reclaim */
+	current->backing_dev_info = inode_to_bdi(inode);
+
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out2;
+
+	ret = file_update_time(file);
+	if (ret)
+		goto out2;
+
+	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+
+out2:
+	current->backing_dev_info = NULL;
+out:
+	inode_unlock(inode);
+	if (likely(ret > 0)) {
+		iocb->ki_pos += ret;
+
+		/* Handle various SYNC-type writes */
+		ret = generic_write_sync(iocb, ret);
+	}
+	return ret;
 }
 
 static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -754,7 +786,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
 		if (error)
 			goto out;
 		offset = iomap.offset + iomap.length;
-		if (iomap.type != IOMAP_HOLE)
+		if (!(iomap.flags & IOMAP_F_NEW))
 			continue;
 		error = sb_issue_zeroout(sb, iomap.addr >> inode->i_blkbits,
 					 iomap.length >> inode->i_blkbits,
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 09/14] gfs2: gfs2_extent_length cleanup
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index d70672a1a29c..05c5486debd6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -406,22 +406,17 @@ static void release_metapath(struct metapath *mp)
 
 /**
  * gfs2_extent_length - Returns length of an extent of blocks
- * @start: Start of the buffer
- * @len: Length of the buffer in bytes
- * @ptr: Current position in the buffer
- * @limit: Max extent length to return (0 = unlimited)
+ * @bh: The metadata block
+ * @ptr: Current position in @bh
+ * @limit: Max extent length to return
  * @eob: Set to 1 if we hit "end of block"
  *
- * If the first block is zero (unallocated) it will return the number of
- * unallocated blocks in the extent, otherwise it will return the number
- * of contiguous blocks in the extent.
- *
  * Returns: The length of the extent (minimum of one block)
  */
 
-static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __be64 *ptr, size_t limit, int *eob)
+static inline unsigned int gfs2_extent_length(struct buffer_head *bh, __be64 *ptr, size_t limit, int *eob)
 {
-	const __be64 *end = (start + len);
+	const __be64 *end = (__be64 *)(bh->b_data + bh->b_size);
 	const __be64 *first = ptr;
 	u64 d = be64_to_cpu(*ptr);
 
@@ -430,14 +425,11 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
 		ptr++;
 		if (ptr >= end)
 			break;
-		if (limit && --limit == 0)
-			break;
-		if (d)
-			d++;
+		d++;
 	} while(be64_to_cpu(*ptr) == d);
 	if (ptr >= end)
 		*eob = 1;
-	return (ptr - first);
+	return ptr - first;
 }
 
 typedef const __be64 *(*gfs2_metadata_walker)(
@@ -883,7 +875,7 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		goto do_alloc;
 
 	bh = mp->mp_bh[ip->i_height - 1];
-	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
+	len = gfs2_extent_length(bh, ptr, len, &eob);
 
 	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
 	iomap->length = len << inode->i_blkbits;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 09/14] gfs2: gfs2_extent_length cleanup
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index d70672a1a29c..05c5486debd6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -406,22 +406,17 @@ static void release_metapath(struct metapath *mp)
 
 /**
  * gfs2_extent_length - Returns length of an extent of blocks
- * @start: Start of the buffer
- * @len: Length of the buffer in bytes
- * @ptr: Current position in the buffer
- * @limit: Max extent length to return (0 = unlimited)
+ * @bh: The metadata block
+ * @ptr: Current position in @bh
+ * @limit: Max extent length to return
  * @eob: Set to 1 if we hit "end of block"
  *
- * If the first block is zero (unallocated) it will return the number of
- * unallocated blocks in the extent, otherwise it will return the number
- * of contiguous blocks in the extent.
- *
  * Returns: The length of the extent (minimum of one block)
  */
 
-static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __be64 *ptr, size_t limit, int *eob)
+static inline unsigned int gfs2_extent_length(struct buffer_head *bh, __be64 *ptr, size_t limit, int *eob)
 {
-	const __be64 *end = (start + len);
+	const __be64 *end = (__be64 *)(bh->b_data + bh->b_size);
 	const __be64 *first = ptr;
 	u64 d = be64_to_cpu(*ptr);
 
@@ -430,14 +425,11 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
 		ptr++;
 		if (ptr >= end)
 			break;
-		if (limit && --limit == 0)
-			break;
-		if (d)
-			d++;
+		d++;
 	} while(be64_to_cpu(*ptr) == d);
 	if (ptr >= end)
 		*eob = 1;
-	return (ptr - first);
+	return ptr - first;
 }
 
 typedef const __be64 *(*gfs2_metadata_walker)(
@@ -883,7 +875,7 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		goto do_alloc;
 
 	bh = mp->mp_bh[ip->i_height - 1];
-	len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
+	len = gfs2_extent_length(bh, ptr, len, &eob);
 
 	iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
 	iomap->length = len << inode->i_blkbits;
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 10/14] iomap: Complete partial direct I/O writes synchronously
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig
  Cc: linux-fsdevel, Andreas Gruenbacher, Dave Chinner

According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write using buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is used by gfs2, which doesn't allow
block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that ends in a hole will result in a short write.  When
that happens, we want to complete the remaining write as buffered I/O.

To allow that, change iomap_dio_rw to wait for short direct I/O writes
like __blockdev_direct_IO does instead of returning -EIOCBQUEUED.

This makes xfstest generic/240 work on gfs2.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
---
 fs/iomap.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 27d97a290623..fcd3e26aadb7 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -720,6 +720,7 @@ struct iomap_dio {
 	atomic_t		ref;
 	unsigned		flags;
 	int			error;
+	bool			wait_for_completion;
 
 	union {
 		/* used during submission and for synchronous completion: */
@@ -821,9 +822,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 		iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (is_sync_kiocb(dio->iocb)) {
+		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
-
 			WRITE_ONCE(dio->submit.waiter, NULL);
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -1014,13 +1014,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
+	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
-	if (is_sync_kiocb(iocb)) {
-		dio->submit.waiter = current;
-		dio->submit.cookie = BLK_QC_T_NONE;
-		dio->submit.last_queue = NULL;
-	}
+	dio->submit.waiter = current;
+	dio->submit.cookie = BLK_QC_T_NONE;
+	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
 		if (pos >= dio->i_size)
@@ -1057,7 +1056,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1072,8 +1071,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK)
+			if (ret == -ENOTBLK) {
+				dio->wait_for_completion = true;
 				ret = 0;
+			}
 			break;
 		}
 		pos += ret;
@@ -1086,8 +1087,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (ret < 0)
 		iomap_dio_set_error(dio, ret);
 
+	smp_mb__before_atomic();
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!is_sync_kiocb(iocb))
+		if (!dio->wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 10/14] iomap: Complete partial direct I/O writes synchronously
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write using buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is used by gfs2, which doesn't allow
block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that ends in a hole will result in a short write.  When
that happens, we want to complete the remaining write as buffered I/O.

To allow that, change iomap_dio_rw to wait for short direct I/O writes
like __blockdev_direct_IO does instead of returning -EIOCBQUEUED.

This makes xfstest generic/240 work on gfs2.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
---
 fs/iomap.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 27d97a290623..fcd3e26aadb7 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -720,6 +720,7 @@ struct iomap_dio {
 	atomic_t		ref;
 	unsigned		flags;
 	int			error;
+	bool			wait_for_completion;
 
 	union {
 		/* used during submission and for synchronous completion: */
@@ -821,9 +822,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 		iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (is_sync_kiocb(dio->iocb)) {
+		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
-
 			WRITE_ONCE(dio->submit.waiter, NULL);
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -1014,13 +1014,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
+	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
-	if (is_sync_kiocb(iocb)) {
-		dio->submit.waiter = current;
-		dio->submit.cookie = BLK_QC_T_NONE;
-		dio->submit.last_queue = NULL;
-	}
+	dio->submit.waiter = current;
+	dio->submit.cookie = BLK_QC_T_NONE;
+	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
 		if (pos >= dio->i_size)
@@ -1057,7 +1056,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1072,8 +1071,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK)
+			if (ret == -ENOTBLK) {
+				dio->wait_for_completion = true;
 				ret = 0;
+			}
 			break;
 		}
 		pos += ret;
@@ -1086,8 +1087,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (ret < 0)
 		iomap_dio_set_error(dio, ret);
 
+	smp_mb__before_atomic();
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!is_sync_kiocb(iocb))
+		if (!dio->wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 11/14] gfs2: iomap direct I/O support
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 100 +----------------------------------
 fs/gfs2/bmap.c |  15 +++++-
 fs/gfs2/file.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 143 insertions(+), 110 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 3d9633175aa8..b0a219fae162 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -84,12 +84,6 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
 	return 0;
 }
 
-static int gfs2_get_block_direct(struct inode *inode, sector_t lblock,
-				 struct buffer_head *bh_result, int create)
-{
-	return gfs2_block_map(inode, lblock, bh_result, 0);
-}
-
 /**
  * gfs2_writepage_common - Common bits of writepage
  * @page: The page to be written
@@ -1021,96 +1015,6 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
 		try_to_release_page(page, 0);
 }
 
-/**
- * gfs2_ok_for_dio - check that dio is valid on this file
- * @ip: The inode
- * @offset: The offset at which we are reading or writing
- *
- * Returns: 0 (to ignore the i/o request and thus fall back to buffered i/o)
- *          1 (to accept the i/o request)
- */
-static int gfs2_ok_for_dio(struct gfs2_inode *ip, loff_t offset)
-{
-	/*
-	 * Should we return an error here? I can't see that O_DIRECT for
-	 * a stuffed file makes any sense. For now we'll silently fall
-	 * back to buffered I/O
-	 */
-	if (gfs2_is_stuffed(ip))
-		return 0;
-
-	if (offset >= i_size_read(&ip->i_inode))
-		return 0;
-	return 1;
-}
-
-
-
-static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-	struct address_space *mapping = inode->i_mapping;
-	struct gfs2_inode *ip = GFS2_I(inode);
-	loff_t offset = iocb->ki_pos;
-	struct gfs2_holder gh;
-	int rv;
-
-	/*
-	 * Deferred lock, even if its a write, since we do no allocation
-	 * on this path. All we need change is atime, and this lock mode
-	 * ensures that other nodes have flushed their buffered read caches
-	 * (i.e. their page cache entries for this inode). We do not,
-	 * unfortunately have the option of only flushing a range like
-	 * the VFS does.
-	 */
-	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
-	rv = gfs2_glock_nq(&gh);
-	if (rv)
-		goto out_uninit;
-	rv = gfs2_ok_for_dio(ip, offset);
-	if (rv != 1)
-		goto out; /* dio not valid, fall back to buffered i/o */
-
-	/*
-	 * Now since we are holding a deferred (CW) lock at this point, you
-	 * might be wondering why this is ever needed. There is a case however
-	 * where we've granted a deferred local lock against a cached exclusive
-	 * glock. That is ok provided all granted local locks are deferred, but
-	 * it also means that it is possible to encounter pages which are
-	 * cached and possibly also mapped. So here we check for that and sort
-	 * them out ahead of the dio. The glock state machine will take care of
-	 * everything else.
-	 *
-	 * If in fact the cached glock state (gl->gl_state) is deferred (CW) in
-	 * the first place, mapping->nr_pages will always be zero.
-	 */
-	if (mapping->nrpages) {
-		loff_t lstart = offset & ~(PAGE_SIZE - 1);
-		loff_t len = iov_iter_count(iter);
-		loff_t end = PAGE_ALIGN(offset + len) - 1;
-
-		rv = 0;
-		if (len == 0)
-			goto out;
-		if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
-			unmap_shared_mapping_range(ip->i_inode.i_mapping, offset, len);
-		rv = filemap_write_and_wait_range(mapping, lstart, end);
-		if (rv)
-			goto out;
-		if (iov_iter_rw(iter) == WRITE)
-			truncate_inode_pages_range(mapping, lstart, end);
-	}
-
-	rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
-				  gfs2_get_block_direct, NULL, NULL, 0);
-out:
-	gfs2_glock_dq(&gh);
-out_uninit:
-	gfs2_holder_uninit(&gh);
-	return rv;
-}
-
 /**
  * gfs2_releasepage - free the metadata associated with a page
  * @page: the page that's being released
@@ -1191,7 +1095,7 @@ static const struct address_space_operations gfs2_writeback_aops = {
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
-	.direct_IO = gfs2_direct_IO,
+	.direct_IO = noop_direct_IO,
 	.migratepage = buffer_migrate_page,
 	.is_partially_uptodate = block_is_partially_uptodate,
 	.error_remove_page = generic_error_remove_page,
@@ -1208,7 +1112,7 @@ static const struct address_space_operations gfs2_ordered_aops = {
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
-	.direct_IO = gfs2_direct_IO,
+	.direct_IO = noop_direct_IO,
 	.migratepage = buffer_migrate_page,
 	.is_partially_uptodate = block_is_partially_uptodate,
 	.error_remove_page = generic_error_remove_page,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 05c5486debd6..69d87a38a209 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -905,6 +905,10 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 			iomap->length = size - pos;
 	} else if (flags & IOMAP_WRITE) {
 		u64 size;
+
+		if (flags & IOMAP_DIRECT)
+			goto out;
+
 		size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
 		if (size < iomap->length)
 			iomap->length = size;
@@ -1053,7 +1057,14 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	trace_gfs2_iomap_start(ip, pos, length, flags);
 	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+		if (flags & IOMAP_DIRECT) {
+			ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+			release_metapath(&mp);
+			if (iomap->type != IOMAP_MAPPED)
+				ret = -ENOTBLK;
+		} else {
+			ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+		}
 	} else {
 		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 		release_metapath(&mp);
@@ -1125,7 +1136,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	struct gfs2_trans *tr = current->journal_info;
 	int ret = 0;
 
-	if (!(flags & IOMAP_WRITE))
+	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		return 0;
 
 	gfs2_ordered_add_inode(ip);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ebc245536cb3..35ba70331665 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -690,6 +690,91 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 	return ret ? ret : ret1;
 }
 
+static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *file = iocb->ki_filp;
+	struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
+	size_t count = iov_iter_count(to);
+	struct gfs2_holder gh;
+	ssize_t ret;
+
+	if (!count)
+		return 0; /* skip atime */
+
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+
+	/* fall back to buffered I/O for stuffed files */
+	ret = -ENOTBLK;
+	if (gfs2_is_stuffed(ip))
+		goto out;
+
+	ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL);
+
+out:
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
+static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	size_t len = iov_iter_count(from);
+	loff_t offset = iocb->ki_pos;
+	struct gfs2_holder gh;
+	ssize_t ret;
+
+	/*
+	 * Deferred lock, even if its a write, since we do no allocation on
+	 * this path. All we need to change is the atime, and this lock mode
+	 * ensures that other nodes have flushed their buffered read caches
+	 * (i.e. their page cache entries for this inode). We do not,
+	 * unfortunately, have the option of only flushing a range like the
+	 * VFS does.
+	 */
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+
+	/* Silently fall back to buffered I/O for stuffed files */
+	if (gfs2_is_stuffed(ip))
+		goto out;
+
+	/* Silently fall back to buffered I/O when writing beyond EOF */
+	if (offset + len > i_size_read(&ip->i_inode))
+		goto out;
+
+	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL);
+
+out:
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
+static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	ssize_t ret;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		ret = gfs2_file_direct_read(iocb, to);
+		if (likely(ret != -ENOTBLK))
+			goto out;
+		iocb->ki_flags &= ~IOCB_DIRECT;
+	}
+	ret = generic_file_read_iter(iocb, to);
+out:
+	return ret;
+}
+
 /**
  * gfs2_file_write_iter - Perform a write to a file
  * @iocb: The io context
@@ -707,7 +792,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct gfs2_inode *ip = GFS2_I(inode);
-	ssize_t ret;
+	ssize_t written = 0, ret;
 
 	ret = gfs2_rsqa_alloc(ip);
 	if (ret)
@@ -724,9 +809,6 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		gfs2_glock_dq_uninit(&gh);
 	}
 
-	if (iocb->ki_flags & IOCB_DIRECT)
-		return generic_file_write_iter(iocb, from);
-
 	inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret <= 0)
@@ -743,19 +825,55 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (ret)
 		goto out2;
 
-	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		struct address_space *mapping = file->f_mapping;
+		loff_t pos, endbyte;
+		ssize_t buffered;
+
+		written = gfs2_file_direct_write(iocb, from);
+		if (written < 0 || !iov_iter_count(from))
+			goto out2;
+
+		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+		if (unlikely(ret < 0))
+			goto out2;
+		buffered = ret;
+
+		/*
+		 * We need to ensure that the page cache pages are written to
+		 * disk and invalidated to preserve the expected O_DIRECT
+		 * semantics.
+		 */
+		pos = iocb->ki_pos;
+		endbyte = pos + buffered - 1;
+		ret = filemap_write_and_wait_range(mapping, pos, endbyte);
+		if (!ret) {
+			iocb->ki_pos += buffered;
+			written += buffered;
+			invalidate_mapping_pages(mapping,
+						 pos >> PAGE_SHIFT,
+						 endbyte >> PAGE_SHIFT);
+		} else {
+			/*
+			 * We don't know how much we wrote, so just return
+			 * the number of bytes which were direct-written
+			 */
+		}
+	} else {
+		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+		if (likely(ret > 0))
+			iocb->ki_pos += ret;
+	}
 
 out2:
 	current->backing_dev_info = NULL;
 out:
 	inode_unlock(inode);
 	if (likely(ret > 0)) {
-		iocb->ki_pos += ret;
-
 		/* Handle various SYNC-type writes */
 		ret = generic_write_sync(iocb, ret);
 	}
-	return ret;
+	return written ? written : ret;
 }
 
 static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -1157,7 +1275,7 @@ static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
 
 const struct file_operations gfs2_file_fops = {
 	.llseek		= gfs2_llseek,
-	.read_iter	= generic_file_read_iter,
+	.read_iter	= gfs2_file_read_iter,
 	.write_iter	= gfs2_file_write_iter,
 	.unlocked_ioctl	= gfs2_ioctl,
 	.mmap		= gfs2_mmap,
@@ -1187,7 +1305,7 @@ const struct file_operations gfs2_dir_fops = {
 
 const struct file_operations gfs2_file_fops_nolock = {
 	.llseek		= gfs2_llseek,
-	.read_iter	= generic_file_read_iter,
+	.read_iter	= gfs2_file_read_iter,
 	.write_iter	= gfs2_file_write_iter,
 	.unlocked_ioctl	= gfs2_ioctl,
 	.mmap		= gfs2_mmap,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 11/14] gfs2: iomap direct I/O support
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 100 +----------------------------------
 fs/gfs2/bmap.c |  15 +++++-
 fs/gfs2/file.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 143 insertions(+), 110 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 3d9633175aa8..b0a219fae162 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -84,12 +84,6 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
 	return 0;
 }
 
-static int gfs2_get_block_direct(struct inode *inode, sector_t lblock,
-				 struct buffer_head *bh_result, int create)
-{
-	return gfs2_block_map(inode, lblock, bh_result, 0);
-}
-
 /**
  * gfs2_writepage_common - Common bits of writepage
  * @page: The page to be written
@@ -1021,96 +1015,6 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
 		try_to_release_page(page, 0);
 }
 
-/**
- * gfs2_ok_for_dio - check that dio is valid on this file
- * @ip: The inode
- * @offset: The offset at which we are reading or writing
- *
- * Returns: 0 (to ignore the i/o request and thus fall back to buffered i/o)
- *          1 (to accept the i/o request)
- */
-static int gfs2_ok_for_dio(struct gfs2_inode *ip, loff_t offset)
-{
-	/*
-	 * Should we return an error here? I can't see that O_DIRECT for
-	 * a stuffed file makes any sense. For now we'll silently fall
-	 * back to buffered I/O
-	 */
-	if (gfs2_is_stuffed(ip))
-		return 0;
-
-	if (offset >= i_size_read(&ip->i_inode))
-		return 0;
-	return 1;
-}
-
-
-
-static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-	struct address_space *mapping = inode->i_mapping;
-	struct gfs2_inode *ip = GFS2_I(inode);
-	loff_t offset = iocb->ki_pos;
-	struct gfs2_holder gh;
-	int rv;
-
-	/*
-	 * Deferred lock, even if its a write, since we do no allocation
-	 * on this path. All we need change is atime, and this lock mode
-	 * ensures that other nodes have flushed their buffered read caches
-	 * (i.e. their page cache entries for this inode). We do not,
-	 * unfortunately have the option of only flushing a range like
-	 * the VFS does.
-	 */
-	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
-	rv = gfs2_glock_nq(&gh);
-	if (rv)
-		goto out_uninit;
-	rv = gfs2_ok_for_dio(ip, offset);
-	if (rv != 1)
-		goto out; /* dio not valid, fall back to buffered i/o */
-
-	/*
-	 * Now since we are holding a deferred (CW) lock at this point, you
-	 * might be wondering why this is ever needed. There is a case however
-	 * where we've granted a deferred local lock against a cached exclusive
-	 * glock. That is ok provided all granted local locks are deferred, but
-	 * it also means that it is possible to encounter pages which are
-	 * cached and possibly also mapped. So here we check for that and sort
-	 * them out ahead of the dio. The glock state machine will take care of
-	 * everything else.
-	 *
-	 * If in fact the cached glock state (gl->gl_state) is deferred (CW) in
-	 * the first place, mapping->nr_pages will always be zero.
-	 */
-	if (mapping->nrpages) {
-		loff_t lstart = offset & ~(PAGE_SIZE - 1);
-		loff_t len = iov_iter_count(iter);
-		loff_t end = PAGE_ALIGN(offset + len) - 1;
-
-		rv = 0;
-		if (len == 0)
-			goto out;
-		if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
-			unmap_shared_mapping_range(ip->i_inode.i_mapping, offset, len);
-		rv = filemap_write_and_wait_range(mapping, lstart, end);
-		if (rv)
-			goto out;
-		if (iov_iter_rw(iter) == WRITE)
-			truncate_inode_pages_range(mapping, lstart, end);
-	}
-
-	rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
-				  gfs2_get_block_direct, NULL, NULL, 0);
-out:
-	gfs2_glock_dq(&gh);
-out_uninit:
-	gfs2_holder_uninit(&gh);
-	return rv;
-}
-
 /**
  * gfs2_releasepage - free the metadata associated with a page
  * @page: the page that's being released
@@ -1191,7 +1095,7 @@ static const struct address_space_operations gfs2_writeback_aops = {
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
-	.direct_IO = gfs2_direct_IO,
+	.direct_IO = noop_direct_IO,
 	.migratepage = buffer_migrate_page,
 	.is_partially_uptodate = block_is_partially_uptodate,
 	.error_remove_page = generic_error_remove_page,
@@ -1208,7 +1112,7 @@ static const struct address_space_operations gfs2_ordered_aops = {
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
-	.direct_IO = gfs2_direct_IO,
+	.direct_IO = noop_direct_IO,
 	.migratepage = buffer_migrate_page,
 	.is_partially_uptodate = block_is_partially_uptodate,
 	.error_remove_page = generic_error_remove_page,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 05c5486debd6..69d87a38a209 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -905,6 +905,10 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 			iomap->length = size - pos;
 	} else if (flags & IOMAP_WRITE) {
 		u64 size;
+
+		if (flags & IOMAP_DIRECT)
+			goto out;
+
 		size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
 		if (size < iomap->length)
 			iomap->length = size;
@@ -1053,7 +1057,14 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	trace_gfs2_iomap_start(ip, pos, length, flags);
 	if (flags & IOMAP_WRITE) {
-		ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+		if (flags & IOMAP_DIRECT) {
+			ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+			release_metapath(&mp);
+			if (iomap->type != IOMAP_MAPPED)
+				ret = -ENOTBLK;
+		} else {
+			ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+		}
 	} else {
 		ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 		release_metapath(&mp);
@@ -1125,7 +1136,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	struct gfs2_trans *tr = current->journal_info;
 	int ret = 0;
 
-	if (!(flags & IOMAP_WRITE))
+	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		return 0;
 
 	gfs2_ordered_add_inode(ip);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ebc245536cb3..35ba70331665 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -690,6 +690,91 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 	return ret ? ret : ret1;
 }
 
+static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *file = iocb->ki_filp;
+	struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
+	size_t count = iov_iter_count(to);
+	struct gfs2_holder gh;
+	ssize_t ret;
+
+	if (!count)
+		return 0; /* skip atime */
+
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+
+	/* fall back to buffered I/O for stuffed files */
+	ret = -ENOTBLK;
+	if (gfs2_is_stuffed(ip))
+		goto out;
+
+	ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL);
+
+out:
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
+static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	size_t len = iov_iter_count(from);
+	loff_t offset = iocb->ki_pos;
+	struct gfs2_holder gh;
+	ssize_t ret;
+
+	/*
+	 * Deferred lock, even if its a write, since we do no allocation on
+	 * this path. All we need to change is the atime, and this lock mode
+	 * ensures that other nodes have flushed their buffered read caches
+	 * (i.e. their page cache entries for this inode). We do not,
+	 * unfortunately, have the option of only flushing a range like the
+	 * VFS does.
+	 */
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+
+	/* Silently fall back to buffered I/O for stuffed files */
+	if (gfs2_is_stuffed(ip))
+		goto out;
+
+	/* Silently fall back to buffered I/O when writing beyond EOF */
+	if (offset + len > i_size_read(&ip->i_inode))
+		goto out;
+
+	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL);
+
+out:
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
+static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	ssize_t ret;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		ret = gfs2_file_direct_read(iocb, to);
+		if (likely(ret != -ENOTBLK))
+			goto out;
+		iocb->ki_flags &= ~IOCB_DIRECT;
+	}
+	ret = generic_file_read_iter(iocb, to);
+out:
+	return ret;
+}
+
 /**
  * gfs2_file_write_iter - Perform a write to a file
  * @iocb: The io context
@@ -707,7 +792,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct gfs2_inode *ip = GFS2_I(inode);
-	ssize_t ret;
+	ssize_t written = 0, ret;
 
 	ret = gfs2_rsqa_alloc(ip);
 	if (ret)
@@ -724,9 +809,6 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		gfs2_glock_dq_uninit(&gh);
 	}
 
-	if (iocb->ki_flags & IOCB_DIRECT)
-		return generic_file_write_iter(iocb, from);
-
 	inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret <= 0)
@@ -743,19 +825,55 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (ret)
 		goto out2;
 
-	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		struct address_space *mapping = file->f_mapping;
+		loff_t pos, endbyte;
+		ssize_t buffered;
+
+		written = gfs2_file_direct_write(iocb, from);
+		if (written < 0 || !iov_iter_count(from))
+			goto out2;
+
+		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+		if (unlikely(ret < 0))
+			goto out2;
+		buffered = ret;
+
+		/*
+		 * We need to ensure that the page cache pages are written to
+		 * disk and invalidated to preserve the expected O_DIRECT
+		 * semantics.
+		 */
+		pos = iocb->ki_pos;
+		endbyte = pos + buffered - 1;
+		ret = filemap_write_and_wait_range(mapping, pos, endbyte);
+		if (!ret) {
+			iocb->ki_pos += buffered;
+			written += buffered;
+			invalidate_mapping_pages(mapping,
+						 pos >> PAGE_SHIFT,
+						 endbyte >> PAGE_SHIFT);
+		} else {
+			/*
+			 * We don't know how much we wrote, so just return
+			 * the number of bytes which were direct-written
+			 */
+		}
+	} else {
+		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+		if (likely(ret > 0))
+			iocb->ki_pos += ret;
+	}
 
 out2:
 	current->backing_dev_info = NULL;
 out:
 	inode_unlock(inode);
 	if (likely(ret > 0)) {
-		iocb->ki_pos += ret;
-
 		/* Handle various SYNC-type writes */
 		ret = generic_write_sync(iocb, ret);
 	}
-	return ret;
+	return written ? written : ret;
 }
 
 static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -1157,7 +1275,7 @@ static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
 
 const struct file_operations gfs2_file_fops = {
 	.llseek		= gfs2_llseek,
-	.read_iter	= generic_file_read_iter,
+	.read_iter	= gfs2_file_read_iter,
 	.write_iter	= gfs2_file_write_iter,
 	.unlocked_ioctl	= gfs2_ioctl,
 	.mmap		= gfs2_mmap,
@@ -1187,7 +1305,7 @@ const struct file_operations gfs2_dir_fops = {
 
 const struct file_operations gfs2_file_fops_nolock = {
 	.llseek		= gfs2_llseek,
-	.read_iter	= generic_file_read_iter,
+	.read_iter	= gfs2_file_read_iter,
 	.write_iter	= gfs2_file_write_iter,
 	.unlocked_ioctl	= gfs2_ioctl,
 	.mmap		= gfs2_mmap,
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 12/14] gfs2: Remove gfs2_write_{begin,end}
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Now that generic_file_write_iter is no longer used, there are no
remaining users of these address space operations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 210 -------------------------------------------------
 1 file changed, 210 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b0a219fae162..cc80fd71f3dd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -639,132 +639,6 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	return ret;
 }
 
-/**
- * gfs2_write_begin - Begin to write to a file
- * @file: The file to write to
- * @mapping: The mapping in which to write
- * @pos: The file offset at which to start writing
- * @len: Length of the write
- * @flags: Various flags
- * @pagep: Pointer to return the page
- * @fsdata: Pointer to return fs data (unused by GFS2)
- *
- * Returns: errno
- */
-
-static int gfs2_write_begin(struct file *file, struct address_space *mapping,
-			    loff_t pos, unsigned len, unsigned flags,
-			    struct page **pagep, void **fsdata)
-{
-	struct gfs2_inode *ip = GFS2_I(mapping->host);
-	struct gfs2_sbd *sdp = GFS2_SB(mapping->host);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
-	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
-	unsigned requested = 0;
-	int alloc_required;
-	int error = 0;
-	pgoff_t index = pos >> PAGE_SHIFT;
-	unsigned from = pos & (PAGE_SIZE - 1);
-	struct page *page;
-
-	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
-	error = gfs2_glock_nq(&ip->i_gh);
-	if (unlikely(error))
-		goto out_uninit;
-	if (&ip->i_inode == sdp->sd_rindex) {
-		error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
-					   GL_NOCACHE, &m_ip->i_gh);
-		if (unlikely(error)) {
-			gfs2_glock_dq(&ip->i_gh);
-			goto out_uninit;
-		}
-	}
-
-	alloc_required = gfs2_write_alloc_required(ip, pos, len);
-
-	if (alloc_required || gfs2_is_jdata(ip))
-		gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
-
-	if (alloc_required) {
-		struct gfs2_alloc_parms ap = { .aflags = 0, };
-		requested = data_blocks + ind_blocks;
-		ap.target = requested;
-		error = gfs2_quota_lock_check(ip, &ap);
-		if (error)
-			goto out_unlock;
-
-		error = gfs2_inplace_reserve(ip, &ap);
-		if (error)
-			goto out_qunlock;
-	}
-
-	rblocks = RES_DINODE + ind_blocks;
-	if (gfs2_is_jdata(ip))
-		rblocks += data_blocks ? data_blocks : 1;
-	if (ind_blocks || data_blocks)
-		rblocks += RES_STATFS + RES_QUOTA;
-	if (&ip->i_inode == sdp->sd_rindex)
-		rblocks += 2 * RES_STATFS;
-	if (alloc_required)
-		rblocks += gfs2_rg_blocks(ip, requested);
-
-	error = gfs2_trans_begin(sdp, rblocks,
-				 PAGE_SIZE/sdp->sd_sb.sb_bsize);
-	if (error)
-		goto out_trans_fail;
-
-	error = -ENOMEM;
-	flags |= AOP_FLAG_NOFS;
-	page = grab_cache_page_write_begin(mapping, index, flags);
-	*pagep = page;
-	if (unlikely(!page))
-		goto out_endtrans;
-
-	if (gfs2_is_stuffed(ip)) {
-		error = 0;
-		if (pos + len > gfs2_max_stuffed_size(ip)) {
-			error = gfs2_unstuff_dinode(ip, page);
-			if (error == 0)
-				goto prepare_write;
-		} else if (!PageUptodate(page)) {
-			error = stuffed_readpage(ip, page);
-		}
-		goto out;
-	}
-
-prepare_write:
-	error = __block_write_begin(page, from, len, gfs2_block_map);
-out:
-	if (error == 0)
-		return 0;
-
-	unlock_page(page);
-	put_page(page);
-
-	gfs2_trans_end(sdp);
-	if (pos + len > ip->i_inode.i_size)
-		gfs2_trim_blocks(&ip->i_inode);
-	goto out_trans_fail;
-
-out_endtrans:
-	gfs2_trans_end(sdp);
-out_trans_fail:
-	if (alloc_required) {
-		gfs2_inplace_release(ip);
-out_qunlock:
-		gfs2_quota_unlock(ip);
-	}
-out_unlock:
-	if (&ip->i_inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-out_uninit:
-	gfs2_holder_uninit(&ip->i_gh);
-	return error;
-}
-
 /**
  * adjust_fs_space - Adjusts the free space available due to gfs2_grow
  * @inode: the rindex inode
@@ -845,84 +719,6 @@ int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
 	return copied;
 }
 
-/**
- * gfs2_write_end
- * @file: The file to write to
- * @mapping: The address space to write to
- * @pos: The file position
- * @len: The length of the data
- * @copied: How much was actually copied by the VFS
- * @page: The page that has been written
- * @fsdata: The fsdata (unused in GFS2)
- *
- * The main write_end function for GFS2. We just put our locking around the VFS
- * provided functions.
- *
- * Returns: copied bytes or errno
- */
-
-static int gfs2_write_end(struct file *file, struct address_space *mapping,
-			  loff_t pos, unsigned len, unsigned copied,
-			  struct page *page, void *fsdata)
-{
-	struct inode *inode = page->mapping->host;
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
-	struct buffer_head *dibh;
-	int ret;
-	struct gfs2_trans *tr = current->journal_info;
-	BUG_ON(!tr);
-
-	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
-
-	ret = gfs2_meta_inode_buffer(ip, &dibh);
-	if (unlikely(ret))
-		goto out;
-
-	if (gfs2_is_stuffed(ip)) {
-		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
-		page = NULL;
-		goto out2;
-	}
-
-	if (gfs2_is_jdata(ip))
-		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
-	else
-		gfs2_ordered_add_inode(ip);
-
-	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
-	page = NULL;
-	if (tr->tr_num_buf_new)
-		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-	else
-		gfs2_trans_add_meta(ip->i_gl, dibh);
-
-out2:
-	if (inode == sdp->sd_rindex) {
-		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
-
-	brelse(dibh);
-out:
-	if (page) {
-		unlock_page(page);
-		put_page(page);
-	}
-	gfs2_trans_end(sdp);
-	gfs2_inplace_release(ip);
-	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
-		gfs2_quota_unlock(ip);
-	if (inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-	gfs2_holder_uninit(&ip->i_gh);
-	return ret;
-}
-
 /**
  * jdata_set_page_dirty - Page dirtying function
  * @page: The page to dirty
@@ -1090,8 +886,6 @@ static const struct address_space_operations gfs2_writeback_aops = {
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
@@ -1106,8 +900,6 @@ static const struct address_space_operations gfs2_ordered_aops = {
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.set_page_dirty = __set_page_dirty_buffers,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
@@ -1123,8 +915,6 @@ static const struct address_space_operations gfs2_jdata_aops = {
 	.writepages = gfs2_jdata_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.set_page_dirty = jdata_set_page_dirty,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 12/14] gfs2: Remove gfs2_write_{begin, end}
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Now that generic_file_write_iter is no longer used, there are no
remaining users of these address space operations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 210 -------------------------------------------------
 1 file changed, 210 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b0a219fae162..cc80fd71f3dd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -639,132 +639,6 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	return ret;
 }
 
-/**
- * gfs2_write_begin - Begin to write to a file
- * @file: The file to write to
- * @mapping: The mapping in which to write
- * @pos: The file offset at which to start writing
- * @len: Length of the write
- * @flags: Various flags
- * @pagep: Pointer to return the page
- * @fsdata: Pointer to return fs data (unused by GFS2)
- *
- * Returns: errno
- */
-
-static int gfs2_write_begin(struct file *file, struct address_space *mapping,
-			    loff_t pos, unsigned len, unsigned flags,
-			    struct page **pagep, void **fsdata)
-{
-	struct gfs2_inode *ip = GFS2_I(mapping->host);
-	struct gfs2_sbd *sdp = GFS2_SB(mapping->host);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
-	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
-	unsigned requested = 0;
-	int alloc_required;
-	int error = 0;
-	pgoff_t index = pos >> PAGE_SHIFT;
-	unsigned from = pos & (PAGE_SIZE - 1);
-	struct page *page;
-
-	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
-	error = gfs2_glock_nq(&ip->i_gh);
-	if (unlikely(error))
-		goto out_uninit;
-	if (&ip->i_inode == sdp->sd_rindex) {
-		error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
-					   GL_NOCACHE, &m_ip->i_gh);
-		if (unlikely(error)) {
-			gfs2_glock_dq(&ip->i_gh);
-			goto out_uninit;
-		}
-	}
-
-	alloc_required = gfs2_write_alloc_required(ip, pos, len);
-
-	if (alloc_required || gfs2_is_jdata(ip))
-		gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
-
-	if (alloc_required) {
-		struct gfs2_alloc_parms ap = { .aflags = 0, };
-		requested = data_blocks + ind_blocks;
-		ap.target = requested;
-		error = gfs2_quota_lock_check(ip, &ap);
-		if (error)
-			goto out_unlock;
-
-		error = gfs2_inplace_reserve(ip, &ap);
-		if (error)
-			goto out_qunlock;
-	}
-
-	rblocks = RES_DINODE + ind_blocks;
-	if (gfs2_is_jdata(ip))
-		rblocks += data_blocks ? data_blocks : 1;
-	if (ind_blocks || data_blocks)
-		rblocks += RES_STATFS + RES_QUOTA;
-	if (&ip->i_inode == sdp->sd_rindex)
-		rblocks += 2 * RES_STATFS;
-	if (alloc_required)
-		rblocks += gfs2_rg_blocks(ip, requested);
-
-	error = gfs2_trans_begin(sdp, rblocks,
-				 PAGE_SIZE/sdp->sd_sb.sb_bsize);
-	if (error)
-		goto out_trans_fail;
-
-	error = -ENOMEM;
-	flags |= AOP_FLAG_NOFS;
-	page = grab_cache_page_write_begin(mapping, index, flags);
-	*pagep = page;
-	if (unlikely(!page))
-		goto out_endtrans;
-
-	if (gfs2_is_stuffed(ip)) {
-		error = 0;
-		if (pos + len > gfs2_max_stuffed_size(ip)) {
-			error = gfs2_unstuff_dinode(ip, page);
-			if (error == 0)
-				goto prepare_write;
-		} else if (!PageUptodate(page)) {
-			error = stuffed_readpage(ip, page);
-		}
-		goto out;
-	}
-
-prepare_write:
-	error = __block_write_begin(page, from, len, gfs2_block_map);
-out:
-	if (error == 0)
-		return 0;
-
-	unlock_page(page);
-	put_page(page);
-
-	gfs2_trans_end(sdp);
-	if (pos + len > ip->i_inode.i_size)
-		gfs2_trim_blocks(&ip->i_inode);
-	goto out_trans_fail;
-
-out_endtrans:
-	gfs2_trans_end(sdp);
-out_trans_fail:
-	if (alloc_required) {
-		gfs2_inplace_release(ip);
-out_qunlock:
-		gfs2_quota_unlock(ip);
-	}
-out_unlock:
-	if (&ip->i_inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-out_uninit:
-	gfs2_holder_uninit(&ip->i_gh);
-	return error;
-}
-
 /**
  * adjust_fs_space - Adjusts the free space available due to gfs2_grow
  * @inode: the rindex inode
@@ -845,84 +719,6 @@ int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
 	return copied;
 }
 
-/**
- * gfs2_write_end
- * @file: The file to write to
- * @mapping: The address space to write to
- * @pos: The file position
- * @len: The length of the data
- * @copied: How much was actually copied by the VFS
- * @page: The page that has been written
- * @fsdata: The fsdata (unused in GFS2)
- *
- * The main write_end function for GFS2. We just put our locking around the VFS
- * provided functions.
- *
- * Returns: copied bytes or errno
- */
-
-static int gfs2_write_end(struct file *file, struct address_space *mapping,
-			  loff_t pos, unsigned len, unsigned copied,
-			  struct page *page, void *fsdata)
-{
-	struct inode *inode = page->mapping->host;
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
-	struct buffer_head *dibh;
-	int ret;
-	struct gfs2_trans *tr = current->journal_info;
-	BUG_ON(!tr);
-
-	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
-
-	ret = gfs2_meta_inode_buffer(ip, &dibh);
-	if (unlikely(ret))
-		goto out;
-
-	if (gfs2_is_stuffed(ip)) {
-		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
-		page = NULL;
-		goto out2;
-	}
-
-	if (gfs2_is_jdata(ip))
-		gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
-	else
-		gfs2_ordered_add_inode(ip);
-
-	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
-	page = NULL;
-	if (tr->tr_num_buf_new)
-		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-	else
-		gfs2_trans_add_meta(ip->i_gl, dibh);
-
-out2:
-	if (inode == sdp->sd_rindex) {
-		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
-
-	brelse(dibh);
-out:
-	if (page) {
-		unlock_page(page);
-		put_page(page);
-	}
-	gfs2_trans_end(sdp);
-	gfs2_inplace_release(ip);
-	if (ip->i_qadata && ip->i_qadata->qa_qd_num)
-		gfs2_quota_unlock(ip);
-	if (inode == sdp->sd_rindex) {
-		gfs2_glock_dq(&m_ip->i_gh);
-		gfs2_holder_uninit(&m_ip->i_gh);
-	}
-	gfs2_glock_dq(&ip->i_gh);
-	gfs2_holder_uninit(&ip->i_gh);
-	return ret;
-}
-
 /**
  * jdata_set_page_dirty - Page dirtying function
  * @page: The page to dirty
@@ -1090,8 +886,6 @@ static const struct address_space_operations gfs2_writeback_aops = {
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
@@ -1106,8 +900,6 @@ static const struct address_space_operations gfs2_ordered_aops = {
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.set_page_dirty = __set_page_dirty_buffers,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
@@ -1123,8 +915,6 @@ static const struct address_space_operations gfs2_jdata_aops = {
 	.writepages = gfs2_jdata_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
-	.write_begin = gfs2_write_begin,
-	.write_end = gfs2_write_end,
 	.set_page_dirty = jdata_set_page_dirty,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 13/14] iomap: inline data should be an iomap type, not a flag
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel

From: Christoph Hellwig <hch@lst.de>

Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

[Minimally adjusted to fit into the gfs2 iomap-write patch queue.]

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/inline.c      | 4 ++--
 fs/gfs2/bmap.c        | 8 ++++----
 fs/iomap.c            | 8 ++++----
 include/linux/iomap.h | 2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
 	iomap->offset = 0;
 	iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
 			      i_size_read(inode));
-	iomap->type = 0;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 
 out:
 	up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 69d87a38a209..c157af31dc56 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -754,8 +754,8 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
 	iomap->length = length;
-	iomap->type = IOMAP_MAPPED;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 }
 
 /**
@@ -980,7 +980,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 	if (ret)
 		goto out_release;
 
-	alloc_required = unstuff || iomap->type != IOMAP_MAPPED;
+	alloc_required = unstuff || iomap->type == IOMAP_HOLE;
 
 	if (alloc_required || gfs2_is_jdata(ip))
 		gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
@@ -1021,7 +1021,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 			goto out_trans_end;
 	}
 
-	if (iomap->type != IOMAP_MAPPED) {
+	if (iomap->type == IOMAP_HOLE) {
 		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
 		if (ret) {
 			gfs2_trans_end(sdp);
diff --git a/fs/iomap.c b/fs/iomap.c
index fcd3e26aadb7..819b6c56ecfe 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -522,22 +522,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 	case IOMAP_HOLE:
 		/* skip holes */
 		return 0;
+	case IOMAP_MAPPED:
+		break;
 	case IOMAP_DELALLOC:
 		flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
 		break;
 	case IOMAP_UNWRITTEN:
 		flags |= FIEMAP_EXTENT_UNWRITTEN;
 		break;
-	case IOMAP_MAPPED:
-		break;
+	case IOMAP_INLINE:
+		flags |= FIEMAP_EXTENT_DATA_INLINE;
 	}
 
 	if (iomap->flags & IOMAP_F_MERGED)
 		flags |= FIEMAP_EXTENT_MERGED;
 	if (iomap->flags & IOMAP_F_SHARED)
 		flags |= FIEMAP_EXTENT_SHARED;
-	if (iomap->flags & IOMAP_F_DATA_INLINE)
-		flags |= FIEMAP_EXTENT_DATA_INLINE;
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
 			iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 423f7ecc1231..0a018a58d84e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
 #define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE	0x05
 
 /*
  * Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
  */
 #define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED		0x20	/* block shared with another file */
-#define IOMAP_F_DATA_INLINE	0x40	/* data inline in the inode */
 
 /*
  * Magic value for addr:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 13/14] iomap: inline data should be an iomap type, not a flag
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Christoph Hellwig <hch@lst.de>

Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

[Minimally adjusted to fit into the gfs2 iomap-write patch queue.]

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/inline.c      | 4 ++--
 fs/gfs2/bmap.c        | 8 ++++----
 fs/iomap.c            | 8 ++++----
 include/linux/iomap.h | 2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
 	iomap->offset = 0;
 	iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
 			      i_size_read(inode));
-	iomap->type = 0;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 
 out:
 	up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 69d87a38a209..c157af31dc56 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -754,8 +754,8 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
 		      sizeof(struct gfs2_dinode);
 	iomap->offset = 0;
 	iomap->length = length;
-	iomap->type = IOMAP_MAPPED;
-	iomap->flags = IOMAP_F_DATA_INLINE;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
 }
 
 /**
@@ -980,7 +980,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 	if (ret)
 		goto out_release;
 
-	alloc_required = unstuff || iomap->type != IOMAP_MAPPED;
+	alloc_required = unstuff || iomap->type == IOMAP_HOLE;
 
 	if (alloc_required || gfs2_is_jdata(ip))
 		gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
@@ -1021,7 +1021,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 			goto out_trans_end;
 	}
 
-	if (iomap->type != IOMAP_MAPPED) {
+	if (iomap->type == IOMAP_HOLE) {
 		ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
 		if (ret) {
 			gfs2_trans_end(sdp);
diff --git a/fs/iomap.c b/fs/iomap.c
index fcd3e26aadb7..819b6c56ecfe 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -522,22 +522,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 	case IOMAP_HOLE:
 		/* skip holes */
 		return 0;
+	case IOMAP_MAPPED:
+		break;
 	case IOMAP_DELALLOC:
 		flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
 		break;
 	case IOMAP_UNWRITTEN:
 		flags |= FIEMAP_EXTENT_UNWRITTEN;
 		break;
-	case IOMAP_MAPPED:
-		break;
+	case IOMAP_INLINE:
+		flags |= FIEMAP_EXTENT_DATA_INLINE;
 	}
 
 	if (iomap->flags & IOMAP_F_MERGED)
 		flags |= FIEMAP_EXTENT_MERGED;
 	if (iomap->flags & IOMAP_F_SHARED)
 		flags |= FIEMAP_EXTENT_SHARED;
-	if (iomap->flags & IOMAP_F_DATA_INLINE)
-		flags |= FIEMAP_EXTENT_DATA_INLINE;
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
 			iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 423f7ecc1231..0a018a58d84e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
 #define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE	0x05
 
 /*
  * Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
  */
 #define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED		0x20	/* block shared with another file */
-#define IOMAP_F_DATA_INLINE	0x40	/* data inline in the inode */
 
 /*
  * Magic value for addr:
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin,end}
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher

Proof of concept: move the gfs2 stuffed file handling from the
iomap_write_{begin,end} handlers to the iomap_{begin,end} handlers.
With this, the page written to is locked before faulting in the page to
read from, so when both refer to the same page, we end up with a
deadlock as demonstrated by xfstest generic/248.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c        | 91 ++++++++++++++++++++-----------------------
 fs/iomap.c            | 21 ++++++----
 include/linux/iomap.h |  1 +
 3 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index c157af31dc56..05b599a70de6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -966,15 +966,30 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
-	bool unstuff, alloc_required;
+	bool unstuff = false, alloc_required;
 	int ret;
 
 	ret = gfs2_write_lock(inode);
 	if (ret)
 		return ret;
 
-	unstuff = gfs2_is_stuffed(ip) &&
-		  pos + length > gfs2_max_stuffed_size(ip);
+	if (gfs2_is_stuffed(ip)) {
+		unstuff = pos + length > gfs2_max_stuffed_size(ip);
+
+		if (!unstuff) {
+			iomap->page = grab_cache_page_write_begin(
+					inode->i_mapping, 0, flags);
+			ret = -ENOMEM;
+			if (!iomap->page)
+				goto out;
+
+			if (!PageUptodate(iomap->page)) {
+				ret = stuffed_readpage(ip, iomap->page);
+				if (ret)
+					goto out;
+			}
+		}
+	}
 
 	ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 	if (ret)
@@ -1044,6 +1059,12 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 		gfs2_quota_unlock(ip);
 out_release:
 	release_metapath(&mp);
+out:
+	if (iomap->page) {
+		unlock_page(iomap->page);
+		put_page(iomap->page);
+		iomap->page = NULL;
+	}
 	gfs2_write_unlock(inode);
 	return ret;
 }
@@ -1073,54 +1094,10 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
-static int
-gfs2_iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
-		       unsigned flags, struct page **pagep, struct iomap *iomap)
-{
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct page *page;
-	int ret;
-
-	if (gfs2_is_stuffed(ip)) {
-		BUG_ON(pos + len > gfs2_max_stuffed_size(ip));
-
-		page = grab_cache_page_write_begin(inode->i_mapping, 0, flags);
-		if (!page)
-			return -ENOMEM;
-
-		if (!PageUptodate(page)) {
-			ret = stuffed_readpage(ip, page);
-			if (ret) {
-				unlock_page(page);
-				put_page(page);
-				return ret;
-			}
-		}
-		*pagep = page;
-		return 0;
-	}
-
-	return iomap_write_begin(inode, pos, len, flags, pagep, iomap);
-}
-
 static int gfs2_iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 				unsigned copied, struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct buffer_head *dibh;
-	int ret;
-
-	if (gfs2_is_stuffed(ip)) {
-		ret = gfs2_meta_inode_buffer(ip, &dibh);
-		if (ret) {
-			unlock_page(page);
-			put_page(page);
-			return ret;
-		}
-		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
-		brelse(dibh);
-		return ret;
-	}
 
 	if (gfs2_is_jdata(ip))
 		gfs2_page_add_databufs(ip, page, offset_in_page(pos), len);
@@ -1139,6 +1116,23 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		return 0;
 
+	if (iomap->page) {
+		struct buffer_head *dibh;
+
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (ret) {
+			unlock_page(iomap->page);
+			put_page(iomap->page);
+			iomap->page = NULL;
+			goto out;
+		}
+		ret = gfs2_stuffed_write_end(inode, dibh, pos, written,
+					     iomap->page);
+		iomap->page = NULL;
+		brelse(dibh);
+		goto out;
+	}
+
 	gfs2_ordered_add_inode(ip);
 
 	if (tr->tr_num_buf_new)
@@ -1153,6 +1147,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 		}
 	}
 
+out:
 	if (inode == sdp->sd_rindex) {
 		adjust_fs_space(inode);
 		sdp->sd_rindex_uptodate = 0;
@@ -1180,7 +1175,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 
 const struct iomap_ops gfs2_iomap_ops = {
 	.iomap_begin = gfs2_iomap_begin,
-	.write_begin = gfs2_iomap_write_begin,
+	.write_begin = iomap_write_begin,
 	.write_end = gfs2_iomap_write_end,
 	.iomap_end = gfs2_iomap_end,
 };
diff --git a/fs/iomap.c b/fs/iomap.c
index 819b6c56ecfe..2a1d78976775 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -196,10 +196,13 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = args->ops->write_begin(inode, pos, bytes, flags, &page,
-				iomap);
-		if (unlikely(status))
-			break;
+		page = iomap->page;
+		if (!iomap->page) {
+			status = args->ops->write_begin(inode, pos, bytes, flags, &page,
+					iomap);
+			if (unlikely(status))
+				break;
+		}
 
 		if (mapping_writably_mapped(inode->i_mapping))
 			flush_dcache_page(page);
@@ -208,10 +211,12 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = args->ops->write_end(inode, pos, bytes, copied, page);
-		if (unlikely(status < 0))
-			break;
-		copied = status;
+		if (!iomap->page) {
+			status = args->ops->write_end(inode, pos, bytes, copied, page);
+			if (unlikely(status < 0))
+				break;
+			copied = status;
+		}
 
 		cond_resched();
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0a018a58d84e..db807157c154 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -47,6 +47,7 @@ struct iomap {
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
+	struct page		*page;
 	struct block_device	*bdev;	/* block device for I/O */
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 };
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin, end}
@ 2018-05-30  9:48   ` Andreas Gruenbacher
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Gruenbacher @ 2018-05-30  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Proof of concept: move the gfs2 stuffed file handling from the
iomap_write_{begin,end} handlers to the iomap_{begin,end} handlers.
With this, the page written to is locked before faulting in the page to
read from, so when both refer to the same page, we end up with a
deadlock as demonstrated by xfstest generic/248.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c        | 91 ++++++++++++++++++++-----------------------
 fs/iomap.c            | 21 ++++++----
 include/linux/iomap.h |  1 +
 3 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index c157af31dc56..05b599a70de6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -966,15 +966,30 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
-	bool unstuff, alloc_required;
+	bool unstuff = false, alloc_required;
 	int ret;
 
 	ret = gfs2_write_lock(inode);
 	if (ret)
 		return ret;
 
-	unstuff = gfs2_is_stuffed(ip) &&
-		  pos + length > gfs2_max_stuffed_size(ip);
+	if (gfs2_is_stuffed(ip)) {
+		unstuff = pos + length > gfs2_max_stuffed_size(ip);
+
+		if (!unstuff) {
+			iomap->page = grab_cache_page_write_begin(
+					inode->i_mapping, 0, flags);
+			ret = -ENOMEM;
+			if (!iomap->page)
+				goto out;
+
+			if (!PageUptodate(iomap->page)) {
+				ret = stuffed_readpage(ip, iomap->page);
+				if (ret)
+					goto out;
+			}
+		}
+	}
 
 	ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
 	if (ret)
@@ -1044,6 +1059,12 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length
 		gfs2_quota_unlock(ip);
 out_release:
 	release_metapath(&mp);
+out:
+	if (iomap->page) {
+		unlock_page(iomap->page);
+		put_page(iomap->page);
+		iomap->page = NULL;
+	}
 	gfs2_write_unlock(inode);
 	return ret;
 }
@@ -1073,54 +1094,10 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
-static int
-gfs2_iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
-		       unsigned flags, struct page **pagep, struct iomap *iomap)
-{
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct page *page;
-	int ret;
-
-	if (gfs2_is_stuffed(ip)) {
-		BUG_ON(pos + len > gfs2_max_stuffed_size(ip));
-
-		page = grab_cache_page_write_begin(inode->i_mapping, 0, flags);
-		if (!page)
-			return -ENOMEM;
-
-		if (!PageUptodate(page)) {
-			ret = stuffed_readpage(ip, page);
-			if (ret) {
-				unlock_page(page);
-				put_page(page);
-				return ret;
-			}
-		}
-		*pagep = page;
-		return 0;
-	}
-
-	return iomap_write_begin(inode, pos, len, flags, pagep, iomap);
-}
-
 static int gfs2_iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 				unsigned copied, struct page *page)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct buffer_head *dibh;
-	int ret;
-
-	if (gfs2_is_stuffed(ip)) {
-		ret = gfs2_meta_inode_buffer(ip, &dibh);
-		if (ret) {
-			unlock_page(page);
-			put_page(page);
-			return ret;
-		}
-		ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
-		brelse(dibh);
-		return ret;
-	}
 
 	if (gfs2_is_jdata(ip))
 		gfs2_page_add_databufs(ip, page, offset_in_page(pos), len);
@@ -1139,6 +1116,23 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		return 0;
 
+	if (iomap->page) {
+		struct buffer_head *dibh;
+
+		ret = gfs2_meta_inode_buffer(ip, &dibh);
+		if (ret) {
+			unlock_page(iomap->page);
+			put_page(iomap->page);
+			iomap->page = NULL;
+			goto out;
+		}
+		ret = gfs2_stuffed_write_end(inode, dibh, pos, written,
+					     iomap->page);
+		iomap->page = NULL;
+		brelse(dibh);
+		goto out;
+	}
+
 	gfs2_ordered_add_inode(ip);
 
 	if (tr->tr_num_buf_new)
@@ -1153,6 +1147,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 		}
 	}
 
+out:
 	if (inode == sdp->sd_rindex) {
 		adjust_fs_space(inode);
 		sdp->sd_rindex_uptodate = 0;
@@ -1180,7 +1175,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 
 const struct iomap_ops gfs2_iomap_ops = {
 	.iomap_begin = gfs2_iomap_begin,
-	.write_begin = gfs2_iomap_write_begin,
+	.write_begin = iomap_write_begin,
 	.write_end = gfs2_iomap_write_end,
 	.iomap_end = gfs2_iomap_end,
 };
diff --git a/fs/iomap.c b/fs/iomap.c
index 819b6c56ecfe..2a1d78976775 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -196,10 +196,13 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = args->ops->write_begin(inode, pos, bytes, flags, &page,
-				iomap);
-		if (unlikely(status))
-			break;
+		page = iomap->page;
+		if (!iomap->page) {
+			status = args->ops->write_begin(inode, pos, bytes, flags, &page,
+					iomap);
+			if (unlikely(status))
+				break;
+		}
 
 		if (mapping_writably_mapped(inode->i_mapping))
 			flush_dcache_page(page);
@@ -208,10 +211,12 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = args->ops->write_end(inode, pos, bytes, copied, page);
-		if (unlikely(status < 0))
-			break;
-		copied = status;
+		if (!iomap->page) {
+			status = args->ops->write_end(inode, pos, bytes, copied, page);
+			if (unlikely(status < 0))
+				break;
+			copied = status;
+		}
 
 		cond_resched();
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0a018a58d84e..db807157c154 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -47,6 +47,7 @@ struct iomap {
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
+	struct page		*page;
 	struct block_device	*bdev;	/* block device for I/O */
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 };
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin,end}
  2018-05-30  9:48   ` [Cluster-devel] [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin, end} Andreas Gruenbacher
@ 2018-05-30  9:57     ` Christoph Hellwig
  -1 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:57 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel

> +	if (gfs2_is_stuffed(ip)) {
> +		unstuff = pos + length > gfs2_max_stuffed_size(ip);
> +
> +		if (!unstuff) {
> +			iomap->page = grab_cache_page_write_begin(
> +					inode->i_mapping, 0, flags);
> +			ret = -ENOMEM;
> +			if (!iomap->page)
> +				goto out;
> +
> +			if (!PageUptodate(iomap->page)) {
> +				ret = stuffed_readpage(ip, iomap->page);
> +				if (ret)
> +					goto out;
> +			}
> +		}
> +	}

grab_cache_page_write_begin needs to remain in iomap.  Please just
set up a pointer for later copying here.

This will be important to support e.g. iomap_readpage(s) and to easily
support direct I/O from inline data (just copy to/from the stuffed
data pointer in iomap).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin, end}
@ 2018-05-30  9:57     ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> +	if (gfs2_is_stuffed(ip)) {
> +		unstuff = pos + length > gfs2_max_stuffed_size(ip);
> +
> +		if (!unstuff) {
> +			iomap->page = grab_cache_page_write_begin(
> +					inode->i_mapping, 0, flags);
> +			ret = -ENOMEM;
> +			if (!iomap->page)
> +				goto out;
> +
> +			if (!PageUptodate(iomap->page)) {
> +				ret = stuffed_readpage(ip, iomap->page);
> +				if (ret)
> +					goto out;
> +			}
> +		}
> +	}

grab_cache_page_write_begin needs to remain in iomap.  Please just
set up a pointer for later copying here.

This will be important to support e.g. iomap_readpage(s) and to easily
support direct I/O from inline data (just copy to/from the stuffed
data pointer in iomap).



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Cluster-devel] [PATCH v5 00/14] gfs2 iomap write support
  2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-05-30 15:48   ` Bob Peterson
  -1 siblings, 0 replies; 34+ messages in thread
From: Bob Peterson @ 2018-05-30 15:48 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel

Hi,

----- Original Message -----
> Here's an update of my gfs2 iomap write patch queue, with support for
> buffered writes as well as direct I/O reads and writes through iomap.
> This update fixes a few bugs in the gfs2 specific parts.  The patches
> are still meant for the upcoming merge window.
> 
> The first five patches are minor gfs2 specific cleanups and
> improvements.

If this is true, I propose that we treat the first 5 patches in the
series as separate individual patches and push them accordingly.
Then we can treat the subsequent 9 patches as the gfs2 iomap-write set.
Do you see any reason why we can't treat them as separate patches?

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Cluster-devel] [PATCH v5 00/14] gfs2 iomap write support
@ 2018-05-30 15:48   ` Bob Peterson
  0 siblings, 0 replies; 34+ messages in thread
From: Bob Peterson @ 2018-05-30 15:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

----- Original Message -----
> Here's an update of my gfs2 iomap write patch queue, with support for
> buffered writes as well as direct I/O reads and writes through iomap.
> This update fixes a few bugs in the gfs2 specific parts.  The patches
> are still meant for the upcoming merge window.
> 
> The first five patches are minor gfs2 specific cleanups and
> improvements.

If this is true, I propose that we treat the first 5 patches in the
series as separate individual patches and push them accordingly.
Then we can treat the subsequent 9 patches as the gfs2 iomap-write set.
Do you see any reason why we can't treat them as separate patches?

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-05-30 15:48 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  9:48 [PATCH v5 00/14] gfs2 iomap write support Andreas Gruenbacher
2018-05-30  9:48 ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 01/14] gfs2: Update find_metapath comment Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 02/14] gfs2: hole_size improvement Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 03/14] gfs2: gfs2_stuffed_write_end cleanup Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 04/14] gfs2: Remove ordered write mode handling from gfs2_trans_add_data Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 05/14] gfs2: Iomap cleanups and improvements Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 06/14] iomap: Add write_{begin,end} iomap operations Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] [PATCH v5 06/14] iomap: Add write_{begin, end} " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 07/14] iomap: Mark newly allocated buffer heads as new Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 08/14] gfs2: iomap buffered write support Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 09/14] gfs2: gfs2_extent_length cleanup Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 10/14] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 11/14] gfs2: iomap direct I/O support Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 12/14] gfs2: Remove gfs2_write_{begin,end} Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] [PATCH v5 12/14] gfs2: Remove gfs2_write_{begin, end} Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 13/14] iomap: inline data should be an iomap type, not a flag Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] " Andreas Gruenbacher
2018-05-30  9:48 ` [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin,end} Andreas Gruenbacher
2018-05-30  9:48   ` [Cluster-devel] [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin, end} Andreas Gruenbacher
2018-05-30  9:57   ` [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin,end} Christoph Hellwig
2018-05-30  9:57     ` [Cluster-devel] [PATCH v5 14/14] gfs2: Handle stuffed files in iomap_{begin, end} Christoph Hellwig
2018-05-30 15:48 ` [Cluster-devel] [PATCH v5 00/14] gfs2 iomap write support Bob Peterson
2018-05-30 15:48   ` Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.