All of lore.kernel.org
 help / color / mirror / Atom feed
* COW improvements and always_cow support V3
@ 2018-12-03 22:24 Christoph Hellwig
  2018-12-03 22:24 ` [PATCH 01/11] xfs: remove xfs_trim_extent_eof Christoph Hellwig
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

Hi all,

this series adds the always_cow mode support after improving our COW
write support a little bit first.

The always_cow mode stresses the COW path a lot, but with a few xfstests
fixups it generall looks good, except for a few tests that complain about
fragmentation, which is rather inherent in this mode, and xfs/326 which
inserts error tags into the COW path not getting the expected result.

Changes since v2:
 - add a patch to remove xfs_trim_extent_eof
 - add a patch to remove the separate io_type and rely on existing state
   in the writeback path
 - rework the truncate race handling in the writeback path a little more

Changes since v1:
 - make delalloc and unwritten extent conversions simpler and more robust
 - add a few additional cleanups
 - support all fallocate modes but actual preallocation
 - rebase on top of a fix from Brian (which is included as first patch
   to make the patch set more usable)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/11] xfs: remove xfs_trim_extent_eof
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 21:45   ` Darrick J. Wong
  2018-12-03 22:24 ` [PATCH 02/11] xfs: remove the io_type field from the writeback context and ioend Christoph Hellwig
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

Opencoding this function in the only caller makes it blindly obvious
what is going on instead of having to look at two files for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 11 -----------
 fs/xfs/libxfs/xfs_bmap.h |  1 -
 fs/xfs/xfs_aops.c        |  2 +-
 3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 19e921d1586f..f16d42abc500 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3685,17 +3685,6 @@ xfs_trim_extent(
 	}
 }
 
-/* trim extent to within eof */
-void
-xfs_trim_extent_eof(
-	struct xfs_bmbt_irec	*irec,
-	struct xfs_inode	*ip)
-
-{
-	xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount,
-					      i_size_read(VFS_I(ip))));
-}
-
 /*
  * Trim the returned map to the required bounds
  */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 488dc8860fd7..f9a925caa70e 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -181,7 +181,6 @@ static inline bool xfs_bmap_is_real_extent(struct xfs_bmbt_irec *irec)
 
 void	xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno,
 		xfs_filblks_t len);
-void	xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *);
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 int	xfs_bmap_set_attrforkoff(struct xfs_inode *ip, int size, int *version);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 338b9d9984e0..d7275075878e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -329,7 +329,7 @@ xfs_map_blocks(
 	 * mechanism to protect us from arbitrary extent modifying contexts, not
 	 * just eofblocks.
 	 */
-	xfs_trim_extent_eof(&wpc->imap, ip);
+	xfs_trim_extent(&wpc->imap, 0, XFS_B_TO_FSB(mp, i_size_read(inode)));
 
 	/*
 	 * COW fork blocks can overlap data fork blocks even if the blocks
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/11] xfs: remove the io_type field from the writeback context and ioend
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
  2018-12-03 22:24 ` [PATCH 01/11] xfs: remove xfs_trim_extent_eof Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 21:45   ` Darrick J. Wong
  2018-12-03 22:24 ` [PATCH 03/11] xfs: remove the s_maxbytes checks in xfs_map_blocks Christoph Hellwig
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

The io_type field contains what is basically a summary of information
from the inode fork and the imap.  But we can just as easily use that
information directly, simplifying a few bits here and there and
improving the trace points.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c    | 91 ++++++++++++++++++++------------------------
 fs/xfs/xfs_aops.h    | 21 +---------
 fs/xfs/xfs_iomap.c   |  8 ++--
 fs/xfs/xfs_reflink.c |  2 +-
 fs/xfs/xfs_trace.h   | 28 +++++++-------
 5 files changed, 62 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index d7275075878e..8fec6fd4c632 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -28,7 +28,7 @@
  */
 struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
-	unsigned int		io_type;
+	int			fork;
 	unsigned int		cow_seq;
 	struct xfs_ioend	*ioend;
 };
@@ -255,30 +255,20 @@ xfs_end_io(
 	 */
 	error = blk_status_to_errno(ioend->io_bio->bi_status);
 	if (unlikely(error)) {
-		switch (ioend->io_type) {
-		case XFS_IO_COW:
+		if (ioend->io_fork == XFS_COW_FORK)
 			xfs_reflink_cancel_cow_range(ip, offset, size, true);
-			break;
-		}
-
 		goto done;
 	}
 
 	/*
-	 * Success:  commit the COW or unwritten blocks if needed.
+	 * Success: commit the COW or unwritten blocks if needed.
 	 */
-	switch (ioend->io_type) {
-	case XFS_IO_COW:
+	if (ioend->io_fork == XFS_COW_FORK)
 		error = xfs_reflink_end_cow(ip, offset, size);
-		break;
-	case XFS_IO_UNWRITTEN:
-		/* writeback should never update isize */
+	else if (ioend->io_state == XFS_EXT_UNWRITTEN)
 		error = xfs_iomap_write_unwritten(ip, offset, size, false);
-		break;
-	default:
+	else
 		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans);
-		break;
-	}
 
 done:
 	if (ioend->io_append_trans)
@@ -293,7 +283,8 @@ xfs_end_bio(
 	struct xfs_ioend	*ioend = bio->bi_private;
 	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
 
-	if (ioend->io_type == XFS_IO_UNWRITTEN || ioend->io_type == XFS_IO_COW)
+	if (ioend->io_fork == XFS_COW_FORK ||
+	    ioend->io_state == XFS_EXT_UNWRITTEN)
 		queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
 	else if (ioend->io_append_trans)
 		queue_work(mp->m_data_workqueue, &ioend->io_work);
@@ -313,7 +304,6 @@ xfs_map_blocks(
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
 	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
 	struct xfs_bmbt_irec	imap;
-	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
 	bool			imap_valid;
 	int			error = 0;
@@ -350,7 +340,7 @@ xfs_map_blocks(
 		     offset_fsb < wpc->imap.br_startoff + wpc->imap.br_blockcount;
 	if (imap_valid &&
 	    (!xfs_inode_has_cow_data(ip) ||
-	     wpc->io_type == XFS_IO_COW ||
+	     wpc->fork == XFS_COW_FORK ||
 	     wpc->cow_seq == READ_ONCE(ip->i_cowfp->if_seq)))
 		return 0;
 
@@ -382,6 +372,9 @@ xfs_map_blocks(
 	if (cow_fsb != NULLFILEOFF && cow_fsb <= offset_fsb) {
 		wpc->cow_seq = READ_ONCE(ip->i_cowfp->if_seq);
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+		wpc->fork = XFS_COW_FORK;
+
 		/*
 		 * Truncate can race with writeback since writeback doesn't
 		 * take the iolock and truncate decreases the file size before
@@ -394,11 +387,13 @@ xfs_map_blocks(
 		 * will kill the contents anyway.
 		 */
 		if (offset > i_size_read(inode)) {
-			wpc->io_type = XFS_IO_HOLE;
+			wpc->imap.br_blockcount = end_fsb - offset_fsb;
+			wpc->imap.br_startoff = offset_fsb;
+			wpc->imap.br_startblock = HOLESTARTBLOCK;
+			wpc->imap.br_state = XFS_EXT_NORM;
 			return 0;
 		}
-		whichfork = XFS_COW_FORK;
-		wpc->io_type = XFS_IO_COW;
+
 		goto allocate_blocks;
 	}
 
@@ -419,12 +414,14 @@ xfs_map_blocks(
 		imap.br_startoff = end_fsb;	/* fake a hole past EOF */
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
+	wpc->fork = XFS_DATA_FORK;
+
 	if (imap.br_startoff > offset_fsb) {
 		/* landed in a hole or beyond EOF */
 		imap.br_blockcount = imap.br_startoff - offset_fsb;
 		imap.br_startoff = offset_fsb;
 		imap.br_startblock = HOLESTARTBLOCK;
-		wpc->io_type = XFS_IO_HOLE;
+		imap.br_state = XFS_EXT_NORM;
 	} else {
 		/*
 		 * Truncate to the next COW extent if there is one.  This is the
@@ -436,30 +433,23 @@ xfs_map_blocks(
 		    cow_fsb < imap.br_startoff + imap.br_blockcount)
 			imap.br_blockcount = cow_fsb - imap.br_startoff;
 
-		if (isnullstartblock(imap.br_startblock)) {
-			/* got a delalloc extent */
-			wpc->io_type = XFS_IO_DELALLOC;
+		/* got a delalloc extent? */
+		if (isnullstartblock(imap.br_startblock))
 			goto allocate_blocks;
-		}
-
-		if (imap.br_state == XFS_EXT_UNWRITTEN)
-			wpc->io_type = XFS_IO_UNWRITTEN;
-		else
-			wpc->io_type = XFS_IO_OVERWRITE;
 	}
 
 	wpc->imap = imap;
-	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
+	trace_xfs_map_blocks_found(ip, offset, count, wpc->fork, &imap);
 	return 0;
 allocate_blocks:
-	error = xfs_iomap_write_allocate(ip, whichfork, offset, &imap,
+	error = xfs_iomap_write_allocate(ip, wpc->fork, offset, &imap,
 			&wpc->cow_seq);
 	if (error)
 		return error;
-	ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
+	ASSERT(wpc->fork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
 	       imap.br_startoff + imap.br_blockcount <= cow_fsb);
 	wpc->imap = imap;
-	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
+	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->fork, &imap);
 	return 0;
 }
 
@@ -484,7 +474,7 @@ xfs_submit_ioend(
 	int			status)
 {
 	/* Convert CoW extents to regular */
-	if (!status && ioend->io_type == XFS_IO_COW) {
+	if (!status && ioend->io_fork == XFS_COW_FORK) {
 		/*
 		 * Yuk. This can do memory allocation, but is not a
 		 * transactional operation so everything is done in GFP_KERNEL
@@ -502,7 +492,8 @@ xfs_submit_ioend(
 
 	/* Reserve log space if we might write beyond the on-disk inode size. */
 	if (!status &&
-	    ioend->io_type != XFS_IO_UNWRITTEN &&
+	    (ioend->io_fork == XFS_COW_FORK ||
+	     ioend->io_state != XFS_EXT_UNWRITTEN) &&
 	    xfs_ioend_is_append(ioend) &&
 	    !ioend->io_append_trans)
 		status = xfs_setfilesize_trans_alloc(ioend);
@@ -531,7 +522,8 @@ xfs_submit_ioend(
 static struct xfs_ioend *
 xfs_alloc_ioend(
 	struct inode		*inode,
-	unsigned int		type,
+	int			fork,
+	xfs_exntst_t		state,
 	xfs_off_t		offset,
 	struct block_device	*bdev,
 	sector_t		sector)
@@ -545,7 +537,8 @@ xfs_alloc_ioend(
 
 	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
 	INIT_LIST_HEAD(&ioend->io_list);
-	ioend->io_type = type;
+	ioend->io_fork = fork;
+	ioend->io_state = state;
 	ioend->io_inode = inode;
 	ioend->io_size = 0;
 	ioend->io_offset = offset;
@@ -606,13 +599,15 @@ xfs_add_to_ioend(
 	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
 		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
 
-	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
+	if (!wpc->ioend ||
+	    wpc->fork != wpc->ioend->io_fork ||
+	    wpc->imap.br_state != wpc->ioend->io_state ||
 	    sector != bio_end_sector(wpc->ioend->io_bio) ||
 	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
-		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
-				bdev, sector);
+		wpc->ioend = xfs_alloc_ioend(inode, wpc->fork,
+				wpc->imap.br_state, offset, bdev, sector);
 	}
 
 	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
@@ -721,7 +716,7 @@ xfs_writepage_map(
 		error = xfs_map_blocks(wpc, inode, file_offset);
 		if (error)
 			break;
-		if (wpc->io_type == XFS_IO_HOLE)
+		if (wpc->imap.br_startblock == HOLESTARTBLOCK)
 			continue;
 		xfs_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
 				 &submit_list);
@@ -916,9 +911,7 @@ xfs_vm_writepage(
 	struct page		*page,
 	struct writeback_control *wbc)
 {
-	struct xfs_writepage_ctx wpc = {
-		.io_type = XFS_IO_HOLE,
-	};
+	struct xfs_writepage_ctx wpc = { };
 	int			ret;
 
 	ret = xfs_do_writepage(page, wbc, &wpc);
@@ -932,9 +925,7 @@ xfs_vm_writepages(
 	struct address_space	*mapping,
 	struct writeback_control *wbc)
 {
-	struct xfs_writepage_ctx wpc = {
-		.io_type = XFS_IO_HOLE,
-	};
+	struct xfs_writepage_ctx wpc = { };
 	int			ret;
 
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 494b4338446e..6c2615b83c5d 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -8,30 +8,13 @@
 
 extern struct bio_set xfs_ioend_bioset;
 
-/*
- * Types of I/O for bmap clustering and I/O completion tracking.
- */
-enum {
-	XFS_IO_HOLE,		/* covers region without any block allocation */
-	XFS_IO_DELALLOC,	/* covers delalloc region */
-	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
-	XFS_IO_OVERWRITE,	/* covers already allocated extent */
-	XFS_IO_COW,		/* covers copy-on-write extent */
-};
-
-#define XFS_IO_TYPES \
-	{ XFS_IO_HOLE,			"hole" },	\
-	{ XFS_IO_DELALLOC,		"delalloc" },	\
-	{ XFS_IO_UNWRITTEN,		"unwritten" },	\
-	{ XFS_IO_OVERWRITE,		"overwrite" },	\
-	{ XFS_IO_COW,			"CoW" }
-
 /*
  * Structure for buffered I/O completions.
  */
 struct xfs_ioend {
 	struct list_head	io_list;	/* next ioend in chain */
-	unsigned int		io_type;	/* delalloc / unwritten */
+	int			io_fork;	/* inode fork written back */
+	xfs_exntst_t		io_state;	/* extent state */
 	struct inode		*io_inode;	/* file being written to */
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 27c93b5f029d..32a7c169e096 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -575,7 +575,7 @@ xfs_file_iomap_begin_delay(
 				goto out_unlock;
 		}
 
-		trace_xfs_iomap_found(ip, offset, count, 0, &got);
+		trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK, &got);
 		goto done;
 	}
 
@@ -647,7 +647,7 @@ xfs_file_iomap_begin_delay(
 	 * them out if the write happens to fail.
 	 */
 	iomap->flags |= IOMAP_F_NEW;
-	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
+	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
 		got.br_startblock = DELAYSTARTBLOCK;
@@ -1139,7 +1139,7 @@ xfs_file_iomap_begin(
 		return error;
 
 	iomap->flags |= IOMAP_F_NEW;
-	trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
+	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
 
 out_finish:
 	if (xfs_ipincount(ip) && (ip->i_itemp->ili_fsync_fields
@@ -1155,7 +1155,7 @@ xfs_file_iomap_begin(
 out_found:
 	ASSERT(nimaps);
 	xfs_iunlock(ip, lockmode);
-	trace_xfs_iomap_found(ip, offset, length, 0, &imap);
+	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
 	goto out_finish;
 
 out_unlock:
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 322a852ce284..a8c32632090c 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1148,7 +1148,7 @@ xfs_reflink_remap_blocks(
 			break;
 		ASSERT(nimaps == 1);
 
-		trace_xfs_reflink_remap_imap(src, srcoff, len, XFS_IO_OVERWRITE,
+		trace_xfs_reflink_remap_imap(src, srcoff, len, XFS_DATA_FORK,
 				&imap);
 
 		/* Translate imap into the destination file. */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 8a6532aae779..870865913bd8 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1210,15 +1210,15 @@ DEFINE_READPAGE_EVENT(xfs_vm_readpages);
 
 DECLARE_EVENT_CLASS(xfs_imap_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
-		 int type, struct xfs_bmbt_irec *irec),
-	TP_ARGS(ip, offset, count, type, irec),
+		 int whichfork, struct xfs_bmbt_irec *irec),
+	TP_ARGS(ip, offset, count, whichfork, irec),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(loff_t, size)
 		__field(loff_t, offset)
 		__field(size_t, count)
-		__field(int, type)
+		__field(int, whichfork)
 		__field(xfs_fileoff_t, startoff)
 		__field(xfs_fsblock_t, startblock)
 		__field(xfs_filblks_t, blockcount)
@@ -1229,33 +1229,33 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
 		__entry->size = ip->i_d.di_size;
 		__entry->offset = offset;
 		__entry->count = count;
-		__entry->type = type;
+		__entry->whichfork = whichfork;
 		__entry->startoff = irec ? irec->br_startoff : 0;
 		__entry->startblock = irec ? irec->br_startblock : 0;
 		__entry->blockcount = irec ? irec->br_blockcount : 0;
 	),
 	TP_printk("dev %d:%d ino 0x%llx size 0x%llx offset 0x%llx count %zd "
-		  "type %s startoff 0x%llx startblock %lld blockcount 0x%llx",
+		  "fork %s startoff 0x%llx startblock %lld blockcount 0x%llx",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->size,
 		  __entry->offset,
 		  __entry->count,
-		  __print_symbolic(__entry->type, XFS_IO_TYPES),
+		  __entry->whichfork == XFS_COW_FORK ? "cow" : "data",
 		  __entry->startoff,
 		  (int64_t)__entry->startblock,
 		  __entry->blockcount)
 )
 
-#define DEFINE_IOMAP_EVENT(name)	\
+#define DEFINE_IMAP_EVENT(name)	\
 DEFINE_EVENT(xfs_imap_class, name,	\
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,	\
-		 int type, struct xfs_bmbt_irec *irec),		\
-	TP_ARGS(ip, offset, count, type, irec))
-DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
-DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_iomap_alloc);
-DEFINE_IOMAP_EVENT(xfs_iomap_found);
+		 int whichfork, struct xfs_bmbt_irec *irec),		\
+	TP_ARGS(ip, offset, count, whichfork, irec))
+DEFINE_IMAP_EVENT(xfs_map_blocks_found);
+DEFINE_IMAP_EVENT(xfs_map_blocks_alloc);
+DEFINE_IMAP_EVENT(xfs_iomap_alloc);
+DEFINE_IMAP_EVENT(xfs_iomap_found);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
@@ -3055,7 +3055,7 @@ DEFINE_EVENT(xfs_inode_irec_class, name, \
 DEFINE_INODE_EVENT(xfs_reflink_set_inode_flag);
 DEFINE_INODE_EVENT(xfs_reflink_unset_inode_flag);
 DEFINE_ITRUNC_EVENT(xfs_reflink_update_inode_size);
-DEFINE_IOMAP_EVENT(xfs_reflink_remap_imap);
+DEFINE_IMAP_EVENT(xfs_reflink_remap_imap);
 TRACE_EVENT(xfs_reflink_remap_blocks_loop,
 	TP_PROTO(struct xfs_inode *src, xfs_fileoff_t soffset,
 		 xfs_filblks_t len, struct xfs_inode *dest,
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/11] xfs: remove the s_maxbytes checks in xfs_map_blocks
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
  2018-12-03 22:24 ` [PATCH 01/11] xfs: remove xfs_trim_extent_eof Christoph Hellwig
  2018-12-03 22:24 ` [PATCH 02/11] xfs: remove the io_type field from the writeback context and ioend Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 22:31   ` Darrick J. Wong
  2018-12-03 22:24 ` [PATCH 04/11] xfs: rework the truncate race handling in the writeback path Christoph Hellwig
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

We already ensure all data fits into s_maxbytes in the write / fault
path.  The only reason we have them here is that they were copy and
pasted from xfs_bmapi_read when we stopped using that function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8fec6fd4c632..5b6fab283316 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -301,7 +301,8 @@ xfs_map_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
 	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
 	struct xfs_bmbt_irec	imap;
 	struct xfs_iext_cursor	icur;
@@ -356,11 +357,6 @@ xfs_map_blocks(
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
-	ASSERT(offset <= mp->m_super->s_maxbytes);
-
-	if (offset > mp->m_super->s_maxbytes - count)
-		count = mp->m_super->s_maxbytes - offset;
-	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 
 	/*
 	 * Check if this is offset is covered by a COW extents, and if yes use
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/11] xfs: rework the truncate race handling in the writeback path
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-12-03 22:24 ` [PATCH 03/11] xfs: remove the s_maxbytes checks in xfs_map_blocks Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 23:03   ` Darrick J. Wong
  2018-12-03 22:24 ` [PATCH 05/11] xfs: make xfs_bmbt_to_iomap more useful Christoph Hellwig
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

We currently try to handle the races with truncate and COW to data fork
conversion rather ad-hoc in a few places in the writeback path:

 - xfs_map_blocks contains an i_size check for the COW fork only, and
   returns an imap containing a hole to make the writeback code skip
   the rest of the page
 - xfs_iomap_write_allocate does another i_size check under ilock, and
   does an extent tree lookup to find the last extent to skip everthing
   beyond that, returning -EAGAIN if either is invalid to make the
   writeback code exit early
 - xfs_bmapi_write can ignore holes for delalloc conversions, but only
   does so if called for the COW fork

Replace this with a coherent scheme:

 - check i_size first in xfs_map_blocks, and skip any processing if we
   already are beyond i_size by presenting a hole until the end of the
   file to the caller
 - in xfs_iomap_write_allocate check i_size again, and return -EAGAIN
   if we are beyond it now that we've taken ilock.
 - Skip holes for all delalloc conversion in xfs_bmapi_write instead
   of doing a separate lookup before calling it
 - in xfs_map_blocks retry the case where xfs_iomap_write_allocate
   could not perform a conversion one single time if we were on a COW
   fork to handle the race where an extent moved from the COW to the
   data fork, and else return a hole to skip writeback as we must
   have races with writeback

Overall this greatly simplifies the code, makes it more robust and also
handles the COW to data fork race properly that we did not handle
previosuly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c |  27 ++++-----
 fs/xfs/xfs_aops.c        |  61 +++++++++++++-------
 fs/xfs/xfs_iomap.c       | 121 ++++++++++++---------------------------
 3 files changed, 87 insertions(+), 122 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f16d42abc500..1992ed8a60b0 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4305,28 +4305,21 @@ xfs_bmapi_write(
 		/* in hole or beyond EOF? */
 		if (eof || bma.got.br_startoff > bno) {
 			/*
-			 * CoW fork conversions should /never/ hit EOF or
-			 * holes.  There should always be something for us
-			 * to work on.
+			 * It is possible that the extents have changed since
+			 * we did the read call as we dropped the ilock for a
+			 * while.  We have to be careful about truncates or hole
+			 * punchs here - we are not allowed to allocate
+			 * non-delalloc blocks here.
+			 *
+			 * The only protection against truncation is the pages
+			 * for the range we are being asked to convert are
+			 * locked and hence a truncate will block on them
+			 * first.
 			 */
 			ASSERT(!((flags & XFS_BMAPI_CONVERT) &&
 			         (flags & XFS_BMAPI_COWFORK)));
 
 			if (flags & XFS_BMAPI_DELALLOC) {
-				/*
-				 * For the COW fork we can reasonably get a
-				 * request for converting an extent that races
-				 * with other threads already having converted
-				 * part of it, as there converting COW to
-				 * regular blocks is not protected using the
-				 * IOLOCK.
-				 */
-				ASSERT(flags & XFS_BMAPI_COWFORK);
-				if (!(flags & XFS_BMAPI_COWFORK)) {
-					error = -EIO;
-					goto error0;
-				}
-
 				if (eof || bno >= end)
 					break;
 			} else {
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5b6fab283316..124b8de37115 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -300,6 +300,7 @@ xfs_map_blocks(
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
+	loff_t			isize = i_size_read(inode);
 	ssize_t			count = i_blocksize(inode);
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
@@ -308,6 +309,15 @@ xfs_map_blocks(
 	struct xfs_iext_cursor	icur;
 	bool			imap_valid;
 	int			error = 0;
+	int			retries = 0;
+
+	/*
+	 * If the offset is beyong the inode size we know that we raced with
+	 * trunacte and are done now.  Note that we'll recheck this again
+	 * under the ilock later before doing delalloc conversions.
+	 */
+	if (offset > isize)
+		goto eof;
 
 	/*
 	 * We have to make sure the cached mapping is within EOF to protect
@@ -320,7 +330,7 @@ xfs_map_blocks(
 	 * mechanism to protect us from arbitrary extent modifying contexts, not
 	 * just eofblocks.
 	 */
-	xfs_trim_extent(&wpc->imap, 0, XFS_B_TO_FSB(mp, i_size_read(inode)));
+	xfs_trim_extent(&wpc->imap, 0, XFS_B_TO_FSB(mp, isize));
 
 	/*
 	 * COW fork blocks can overlap data fork blocks even if the blocks
@@ -354,6 +364,7 @@ xfs_map_blocks(
 	 * into real extents.  If we return without a valid map, it means we
 	 * landed in a hole and we skip the block.
 	 */
+retry:
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
@@ -370,26 +381,6 @@ xfs_map_blocks(
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 		wpc->fork = XFS_COW_FORK;
-
-		/*
-		 * Truncate can race with writeback since writeback doesn't
-		 * take the iolock and truncate decreases the file size before
-		 * it starts truncating the pages between new_size and old_size.
-		 * Therefore, we can end up in the situation where writeback
-		 * gets a CoW fork mapping but the truncate makes the mapping
-		 * invalid and we end up in here trying to get a new mapping.
-		 * bail out here so that we simply never get a valid mapping
-		 * and so we drop the write altogether.  The page truncation
-		 * will kill the contents anyway.
-		 */
-		if (offset > i_size_read(inode)) {
-			wpc->imap.br_blockcount = end_fsb - offset_fsb;
-			wpc->imap.br_startoff = offset_fsb;
-			wpc->imap.br_startblock = HOLESTARTBLOCK;
-			wpc->imap.br_state = XFS_EXT_NORM;
-			return 0;
-		}
-
 		goto allocate_blocks;
 	}
 
@@ -440,13 +431,39 @@ xfs_map_blocks(
 allocate_blocks:
 	error = xfs_iomap_write_allocate(ip, wpc->fork, offset, &imap,
 			&wpc->cow_seq);
-	if (error)
+	if (error) {
+		if (error == -EAGAIN)
+			goto truncate_race;
 		return error;
+	}
 	ASSERT(wpc->fork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
 	       imap.br_startoff + imap.br_blockcount <= cow_fsb);
 	wpc->imap = imap;
 	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->fork, &imap);
 	return 0;
+
+truncate_race:
+	/*
+	 * If we failed to find the extent in the COW fork we might have raced
+	 * with a COW to data fork conversion or truncate.  Restart the lookup
+	 * to catch the extent in the data fork for the former case, but prevent
+	 * additional retries to avoid looping forever for the latter case.
+	 */
+	if (wpc->fork == XFS_COW_FORK && !retries++) {
+		imap_valid = false;
+		goto retry;
+	}
+eof:
+	/*
+	 * If we raced with truncate there might be no data left at this offset.
+	 * In that case we need to return a hole so that the writeback code
+	 * skips writeback for the rest of the file.
+	 */
+	wpc->imap.br_startoff = offset_fsb;
+	wpc->imap.br_blockcount = end_fsb - offset_fsb;
+	wpc->imap.br_startblock = HOLESTARTBLOCK;
+	wpc->imap.br_state = XFS_EXT_NORM;
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 32a7c169e096..6acfed2ae858 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -685,14 +685,13 @@ xfs_iomap_write_allocate(
 {
 	xfs_mount_t	*mp = ip->i_mount;
 	struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
-	xfs_fileoff_t	offset_fsb, last_block;
+	xfs_fileoff_t	offset_fsb;
 	xfs_fileoff_t	end_fsb, map_start_fsb;
 	xfs_filblks_t	count_fsb;
 	xfs_trans_t	*tp;
 	int		nimaps;
 	int		error = 0;
 	int		flags = XFS_BMAPI_DELALLOC;
-	int		nres;
 
 	if (whichfork == XFS_COW_FORK)
 		flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
@@ -712,95 +711,51 @@ xfs_iomap_write_allocate(
 
 	while (count_fsb != 0) {
 		/*
-		 * Set up a transaction with which to allocate the
-		 * backing store for the file.  Do allocations in a
-		 * loop until we get some space in the range we are
-		 * interested in.  The other space that might be allocated
-		 * is in the delayed allocation extent on which we sit
-		 * but before our buffer starts.
+		 * We have already reserved space for the extent and any
+		 * indirect blocks when creating the delalloc extent, there
+		 * is no need to reserve space in this transaction again.
 		 */
-		nimaps = 0;
-		while (nimaps == 0) {
-			nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
-			/*
-			 * We have already reserved space for the extent and any
-			 * indirect blocks when creating the delalloc extent,
-			 * there is no need to reserve space in this transaction
-			 * again.
-			 */
-			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0,
-					0, XFS_TRANS_RESERVE, &tp);
-			if (error)
-				return error;
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0,
+				0, XFS_TRANS_RESERVE, &tp);
+		if (error)
+			return error;
 
-			xfs_ilock(ip, XFS_ILOCK_EXCL);
-			xfs_trans_ijoin(tp, ip, 0);
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
 
-			/*
-			 * it is possible that the extents have changed since
-			 * we did the read call as we dropped the ilock for a
-			 * while. We have to be careful about truncates or hole
-			 * punchs here - we are not allowed to allocate
-			 * non-delalloc blocks here.
-			 *
-			 * The only protection against truncation is the pages
-			 * for the range we are being asked to convert are
-			 * locked and hence a truncate will block on them
-			 * first.
-			 *
-			 * As a result, if we go beyond the range we really
-			 * need and hit an delalloc extent boundary followed by
-			 * a hole while we have excess blocks in the map, we
-			 * will fill the hole incorrectly and overrun the
-			 * transaction reservation.
-			 *
-			 * Using a single map prevents this as we are forced to
-			 * check each map we look for overlap with the desired
-			 * range and abort as soon as we find it. Also, given
-			 * that we only return a single map, having one beyond
-			 * what we can return is probably a bit silly.
-			 *
-			 * We also need to check that we don't go beyond EOF;
-			 * this is a truncate optimisation as a truncate sets
-			 * the new file size before block on the pages we
-			 * currently have locked under writeback. Because they
-			 * are about to be tossed, we don't need to write them
-			 * back....
-			 */
-			nimaps = 1;
-			end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
-			error = xfs_bmap_last_offset(ip, &last_block,
-							XFS_DATA_FORK);
-			if (error)
+		/*
+		 * We need to check that we don't go beyond EOF; this is a
+		 * truncate optimisation as a truncate sets the new file size
+		 * before block on the pages we currently have locked under
+		 * writeback.  Because they are about to be tossed, we don't
+		 * need to write them back....
+		 */
+		end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
+		if (map_start_fsb + count_fsb > end_fsb) {
+			count_fsb = end_fsb - map_start_fsb;
+			if (count_fsb == 0) {
+				error = -EAGAIN;
 				goto trans_cancel;
-
-			last_block = XFS_FILEOFF_MAX(last_block, end_fsb);
-			if ((map_start_fsb + count_fsb) > last_block) {
-				count_fsb = last_block - map_start_fsb;
-				if (count_fsb == 0) {
-					error = -EAGAIN;
-					goto trans_cancel;
-				}
 			}
+		}
 
-			/*
-			 * From this point onwards we overwrite the imap
-			 * pointer that the caller gave to us.
-			 */
-			error = xfs_bmapi_write(tp, ip, map_start_fsb,
-						count_fsb, flags, nres, imap,
-						&nimaps);
-			if (error)
-				goto trans_cancel;
+		nimaps = 1;
+		xfs_trans_ijoin(tp, ip, 0);
+		error = xfs_bmapi_write(tp, ip, map_start_fsb, count_fsb, flags,
+				XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK),
+				imap, &nimaps);
+		if (error)
+			goto trans_cancel;
 
-			error = xfs_trans_commit(tp);
-			if (error)
-				goto error0;
+		error = xfs_trans_commit(tp);
+		if (error)
+			goto error0;
 
-			if (whichfork == XFS_COW_FORK)
-				*cow_seq = READ_ONCE(ifp->if_seq);
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		}
+		if (whichfork == XFS_COW_FORK)
+			*cow_seq = READ_ONCE(ifp->if_seq);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+		if (nimaps == 0)
+			return -EAGAIN;
 
 		/*
 		 * See if we were able to allocate an extent that
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/11] xfs: make xfs_bmbt_to_iomap more useful
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-12-03 22:24 ` [PATCH 04/11] xfs: rework the truncate race handling in the writeback path Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 21:46   ` Darrick J. Wong
  2018-12-03 22:24 ` [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints Christoph Hellwig
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

Move checking for invalid zero blocks and setting of various iomap flags
into this helper.  Also make it deal with "raw" delalloc extents to
avoid clutter in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 84 +++++++++++++++++++++-------------------------
 fs/xfs/xfs_iomap.h |  4 +--
 fs/xfs/xfs_pnfs.c  |  2 +-
 3 files changed, 41 insertions(+), 49 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6acfed2ae858..9f1fd224bb06 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -35,18 +35,40 @@
 #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
 						<< mp->m_writeio_log)
 
-void
+static int
+xfs_alert_fsblock_zero(
+	xfs_inode_t	*ip,
+	xfs_bmbt_irec_t	*imap)
+{
+	xfs_alert_tag(ip->i_mount, XFS_PTAG_FSBLOCK_ZERO,
+			"Access to block zero in inode %llu "
+			"start_block: %llx start_off: %llx "
+			"blkcnt: %llx extent-state: %x",
+		(unsigned long long)ip->i_ino,
+		(unsigned long long)imap->br_startblock,
+		(unsigned long long)imap->br_startoff,
+		(unsigned long long)imap->br_blockcount,
+		imap->br_state);
+	return -EFSCORRUPTED;
+}
+
+int
 xfs_bmbt_to_iomap(
 	struct xfs_inode	*ip,
 	struct iomap		*iomap,
-	struct xfs_bmbt_irec	*imap)
+	struct xfs_bmbt_irec	*imap,
+	bool			shared)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 
+	if (unlikely(!imap->br_startblock && !XFS_IS_REALTIME_INODE(ip)))
+		return xfs_alert_fsblock_zero(ip, imap);
+
 	if (imap->br_startblock == HOLESTARTBLOCK) {
 		iomap->addr = IOMAP_NULL_ADDR;
 		iomap->type = IOMAP_HOLE;
-	} else if (imap->br_startblock == DELAYSTARTBLOCK) {
+	} else if (imap->br_startblock == DELAYSTARTBLOCK ||
+		   isnullstartblock(imap->br_startblock)) {
 		iomap->addr = IOMAP_NULL_ADDR;
 		iomap->type = IOMAP_DELALLOC;
 	} else {
@@ -60,6 +82,13 @@ xfs_bmbt_to_iomap(
 	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
 	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
 	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
+
+	if (xfs_ipincount(ip) &&
+	    (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
+		iomap->flags |= IOMAP_F_DIRTY;
+	if (shared)
+		iomap->flags |= IOMAP_F_SHARED;
+	return 0;
 }
 
 static void
@@ -138,23 +167,6 @@ xfs_iomap_eof_align_last_fsb(
 	return 0;
 }
 
-STATIC int
-xfs_alert_fsblock_zero(
-	xfs_inode_t	*ip,
-	xfs_bmbt_irec_t	*imap)
-{
-	xfs_alert_tag(ip->i_mount, XFS_PTAG_FSBLOCK_ZERO,
-			"Access to block zero in inode %llu "
-			"start_block: %llx start_off: %llx "
-			"blkcnt: %llx extent-state: %x",
-		(unsigned long long)ip->i_ino,
-		(unsigned long long)imap->br_startblock,
-		(unsigned long long)imap->br_startoff,
-		(unsigned long long)imap->br_blockcount,
-		imap->br_state);
-	return -EFSCORRUPTED;
-}
-
 int
 xfs_iomap_write_direct(
 	xfs_inode_t	*ip,
@@ -649,17 +661,7 @@ xfs_file_iomap_begin_delay(
 	iomap->flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
 done:
-	if (isnullstartblock(got.br_startblock))
-		got.br_startblock = DELAYSTARTBLOCK;
-
-	if (!got.br_startblock) {
-		error = xfs_alert_fsblock_zero(ip, &got);
-		if (error)
-			goto out_unlock;
-	}
-
-	xfs_bmbt_to_iomap(ip, iomap, &got);
-
+	error = xfs_bmbt_to_iomap(ip, iomap, &got, false);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
@@ -1097,15 +1099,7 @@ xfs_file_iomap_begin(
 	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
 
 out_finish:
-	if (xfs_ipincount(ip) && (ip->i_itemp->ili_fsync_fields
-				& ~XFS_ILOG_TIMESTAMP))
-		iomap->flags |= IOMAP_F_DIRTY;
-
-	xfs_bmbt_to_iomap(ip, iomap, &imap);
-
-	if (shared)
-		iomap->flags |= IOMAP_F_SHARED;
-	return 0;
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, shared);
 
 out_found:
 	ASSERT(nimaps);
@@ -1228,12 +1222,10 @@ xfs_xattr_iomap_begin(
 out_unlock:
 	xfs_iunlock(ip, lockmode);
 
-	if (!error) {
-		ASSERT(nimaps);
-		xfs_bmbt_to_iomap(ip, iomap, &imap);
-	}
-
-	return error;
+	if (error)
+		return error;
+	ASSERT(nimaps);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, false);
 }
 
 const struct iomap_ops xfs_xattr_iomap_ops = {
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index c6170548831b..ed27e41b687c 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -17,8 +17,8 @@ int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t,
 			struct xfs_bmbt_irec *, unsigned int *);
 int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
 
-void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
-		struct xfs_bmbt_irec *);
+int xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
+		struct xfs_bmbt_irec *, bool shared);
 xfs_extlen_t xfs_eof_alignment(struct xfs_inode *ip, xfs_extlen_t extsize);
 
 static inline xfs_filblks_t
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index f44c3599527d..bde2c9f56a46 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -185,7 +185,7 @@ xfs_fs_map_blocks(
 	}
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
-	xfs_bmbt_to_iomap(ip, iomap, &imap);
+	error = xfs_bmbt_to_iomap(ip, iomap, &imap, false);
 	*device_generation = mp->m_generation;
 	return error;
 out_unlock:
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-12-03 22:24 ` [PATCH 05/11] xfs: make xfs_bmbt_to_iomap more useful Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 21:44   ` Darrick J. Wong
  2018-12-03 22:24 ` [PATCH 07/11] xfs: also truncate holes covered by COW blocks Christoph Hellwig
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

While using delalloc for extsize hints is generally a good idea, the
current code that does so only for COW doesn't help us much and creates
a lot of special cases.  Switch it to use real allocations like we
do for direct I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c   | 28 +++++++++++++++++-----------
 fs/xfs/xfs_reflink.c |  5 ++++-
 fs/xfs/xfs_reflink.h |  5 ++---
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 9f1fd224bb06..d851abac16a9 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1039,22 +1039,28 @@ xfs_file_iomap_begin(
 	 * been done up front, so we don't need to do them here.
 	 */
 	if (xfs_is_reflink_inode(ip)) {
+		struct xfs_bmbt_irec	orig = imap;
+
 		/* if zeroing doesn't need COW allocation, then we are done. */
 		if ((flags & IOMAP_ZERO) &&
 		    !needs_cow_for_zeroing(&imap, nimaps))
 			goto out_found;
 
-		if (flags & IOMAP_DIRECT) {
-			/* may drop and re-acquire the ilock */
-			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
-					&lockmode);
-			if (error)
-				goto out_unlock;
-		} else {
-			error = xfs_reflink_reserve_cow(ip, &imap);
-			if (error)
-				goto out_unlock;
-		}
+		error = xfs_reflink_allocate_cow(ip, &imap, &shared, &lockmode,
+						 flags);
+		if (error)
+			goto out_unlock;
+
+		/*
+		 * For buffered writes we need to report the address of the
+		 * previous block (if there was any) so that the higher level
+		 * write code can perform read-modify-write operations.  For
+		 * direct I/O code, which must be block aligned we need to
+		 * report the newly allocated address.
+		 */
+		if (!(flags & IOMAP_DIRECT) &&
+		    orig.br_startblock != HOLESTARTBLOCK)
+			imap = orig;
 
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index a8c32632090c..bdbaff1b3fb7 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -397,7 +397,8 @@ xfs_reflink_allocate_cow(
 	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*imap,
 	bool			*shared,
-	uint			*lockmode)
+	uint			*lockmode,
+	unsigned		flags)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = imap->br_startoff;
@@ -471,6 +472,8 @@ xfs_reflink_allocate_cow(
 	if (nimaps == 0)
 		return -ENOSPC;
 convert:
+	if (!(flags & IOMAP_DIRECT))
+		return 0;
 	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
 
 out_unreserve:
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 6d73daef1f13..d76fc520cac8 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -12,10 +12,9 @@ extern int xfs_reflink_find_shared(struct xfs_mount *mp, struct xfs_trans *tp,
 extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *irec, bool *shared);
 
-extern int xfs_reflink_reserve_cow(struct xfs_inode *ip,
-		struct xfs_bmbt_irec *imap);
 extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
-		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
+		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
+		unsigned flags);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/11] xfs: also truncate holes covered by COW blocks
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-12-03 22:24 ` [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints Christoph Hellwig
@ 2018-12-03 22:24 ` Christoph Hellwig
  2018-12-18 23:39   ` Darrick J. Wong
  2018-12-03 22:25 ` [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay Christoph Hellwig
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:24 UTC (permalink / raw)
  To: linux-xfs

This only matters if we want to write data through the COW fork that is
not actually an overwrite of existing data.  Reasons for that are
speculative COW fork allocations using the cowextsize, or a mode where
we always write through the COW fork.  Currently both can't actually
happen, but I plan to enable them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 124b8de37115..7d95a84064e7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -403,28 +403,29 @@ xfs_map_blocks(
 
 	wpc->fork = XFS_DATA_FORK;
 
+	/* landed in a hole or beyond EOF? */
 	if (imap.br_startoff > offset_fsb) {
-		/* landed in a hole or beyond EOF */
 		imap.br_blockcount = imap.br_startoff - offset_fsb;
 		imap.br_startoff = offset_fsb;
 		imap.br_startblock = HOLESTARTBLOCK;
 		imap.br_state = XFS_EXT_NORM;
-	} else {
-		/*
-		 * Truncate to the next COW extent if there is one.  This is the
-		 * only opportunity to do this because we can skip COW fork
-		 * lookups for the subsequent blocks in the mapping; however,
-		 * the requirement to treat the COW range separately remains.
-		 */
-		if (cow_fsb != NULLFILEOFF &&
-		    cow_fsb < imap.br_startoff + imap.br_blockcount)
-			imap.br_blockcount = cow_fsb - imap.br_startoff;
-
-		/* got a delalloc extent? */
-		if (isnullstartblock(imap.br_startblock))
-			goto allocate_blocks;
 	}
 
+	/*
+	 * Truncate to the next COW extent if there is one.  This is the only
+	 * opportunity to do this because we can skip COW fork lookups for the
+	 * subsequent blocks in the mapping; however, the requirement to treat
+	 * the COW range separately remains.
+	 */
+	if (cow_fsb != NULLFILEOFF &&
+	    cow_fsb < imap.br_startoff + imap.br_blockcount)
+		imap.br_blockcount = cow_fsb - imap.br_startoff;
+
+	/* got a delalloc extent? */
+	if (imap.br_startblock != HOLESTARTBLOCK &&
+	    isnullstartblock(imap.br_startblock))
+		goto allocate_blocks;
+
 	wpc->imap = imap;
 	trace_xfs_map_blocks_found(ip, offset, count, wpc->fork, &imap);
 	return 0;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-12-03 22:24 ` [PATCH 07/11] xfs: also truncate holes covered by COW blocks Christoph Hellwig
@ 2018-12-03 22:25 ` Christoph Hellwig
  2018-12-18 23:36   ` Darrick J. Wong
  2018-12-03 22:25 ` [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay Christoph Hellwig
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:25 UTC (permalink / raw)
  To: linux-xfs

Besides simplifying the code a bit this allows to actually implement
the behavior of using COW preallocation for non-COW data mentioned
in the current comments.

Note that this breaks the current version of xfs/420, but that is
because the test is broken.  A separate fix will be sent for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c   | 132 ++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_reflink.c |  67 ----------------------
 fs/xfs/xfs_trace.h   |   1 -
 3 files changed, 93 insertions(+), 107 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d851abac16a9..d19f99e5476a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -534,15 +534,16 @@ xfs_file_iomap_begin_delay(
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	xfs_fileoff_t		maxbytes_fsb =
 		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	xfs_fileoff_t		end_fsb;
-	int			error = 0, eof = 0;
-	struct xfs_bmbt_irec	got;
-	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	imap, cmap;
+	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
+	bool			eof = false, cow_eof = false, shared;
+	int			whichfork = XFS_DATA_FORK;
+	int			error = 0;
 
 	ASSERT(!XFS_IS_REALTIME_INODE(ip));
 	ASSERT(!xfs_get_extsz_hint(ip));
@@ -560,7 +561,7 @@ xfs_file_iomap_begin_delay(
 
 	XFS_STATS_INC(mp, xs_blk_mapw);
 
-	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
 		if (error)
 			goto out_unlock;
@@ -568,51 +569,92 @@ xfs_file_iomap_begin_delay(
 
 	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
 
-	eof = !xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got);
+	/*
+	 * Search the data fork fork first to look up our source mapping.  We
+	 * always need the data fork map, as we have to return it to the
+	 * iomap code so that the higher level write code can read data in to
+	 * perform read-modify-write cycles for unaligned writes.
+	 */
+	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
 	if (eof)
-		got.br_startoff = end_fsb; /* fake hole until the end */
+		imap.br_startoff = end_fsb; /* fake hole until the end */
 
-	if (got.br_startoff <= offset_fsb) {
+	/* We never need to allocate blocks for zeroing a hole. */
+	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
+		goto out_unlock;
+	}
+
+	/*
+	 * Search the COW fork extent list even if we did not find a data fork
+	 * extent.  This serves two purposes: first this implements the
+	 * speculative preallocation using cowextisze, so that we also unshare
+	 * block adjacent to shared blocks instead of just the shared blocks
+	 * themselves.  Second the lookup in the extent list is generally faster
+	 * than going out to the shared extent tree.
+	 */
+	if (xfs_is_reflink_inode(ip)) {
+		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
+				&ccur, &cmap);
+		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
+			trace_xfs_reflink_cow_found(ip, &cmap);
+			whichfork = XFS_COW_FORK;
+			goto done;
+		}
+	}
+
+	if (imap.br_startoff <= offset_fsb) {
 		/*
 		 * For reflink files we may need a delalloc reservation when
 		 * overwriting shared extents.   This includes zeroing of
 		 * existing extents that contain data.
 		 */
-		if (xfs_is_reflink_inode(ip) &&
-		    ((flags & IOMAP_WRITE) ||
-		     got.br_state != XFS_EXT_UNWRITTEN)) {
-			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
-			error = xfs_reflink_reserve_cow(ip, &got);
-			if (error)
-				goto out_unlock;
+		if (!xfs_is_reflink_inode(ip) ||
+		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
+			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+					&imap);
+			goto done;
 		}
 
-		trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK, &got);
-		goto done;
-	}
+		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 
-	if (flags & IOMAP_ZERO) {
-		xfs_hole_to_iomap(ip, iomap, offset_fsb, got.br_startoff);
-		goto out_unlock;
+		/* Trim the mapping to the nearest shared extent boundary. */
+		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
+		if (error)
+			goto out_unlock;
+
+		/* Not shared?  Just report the (potentially capped) extent. */
+		if (!shared) {
+			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+					&imap);
+			goto done;
+		}
+
+		/*
+		 * Fork all the shared blocks from our write offset until the
+		 * end of the extent.
+		 */
+		whichfork = XFS_COW_FORK;
+		end_fsb = imap.br_startoff + imap.br_blockcount;
+	} else {
+		/*
+		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
+		 * pages to keep the chunks of work done where somewhat
+		 * symmetric with the work writeback does.  This is a completely
+		 * arbitrary number pulled out of thin air.
+		 *
+		 * Note that the values needs to be less than 32-bits wide until
+		 * the lower level functions are updated.
+		 */
+		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
+		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
 	}
 
 	error = xfs_qm_dqattach_locked(ip, false);
 	if (error)
 		goto out_unlock;
 
-	/*
-	 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES pages
-	 * to keep the chunks of work done where somewhat symmetric with the
-	 * work writeback does. This is a completely arbitrary number pulled
-	 * out of thin air as a best guess for initial testing.
-	 *
-	 * Note that the values needs to be less than 32-bits wide until
-	 * the lower level functions are updated.
-	 */
-	count = min_t(loff_t, count, 1024 * PAGE_SIZE);
-	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
-
-	if (eof) {
+	if (eof && whichfork == XFS_DATA_FORK) {
 		prealloc_blocks = xfs_iomap_prealloc_size(ip, offset, count,
 				&icur);
 		if (prealloc_blocks) {
@@ -635,9 +677,11 @@ xfs_file_iomap_begin_delay(
 	}
 
 retry:
-	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
-			end_fsb - offset_fsb, prealloc_blocks, &got, &icur,
-			eof);
+	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+			end_fsb - offset_fsb, prealloc_blocks,
+			whichfork == XFS_DATA_FORK ? &imap : &cmap,
+			whichfork == XFS_DATA_FORK ? &icur : &ccur,
+			whichfork == XFS_DATA_FORK ? eof : cow_eof);
 	switch (error) {
 	case 0:
 		break;
@@ -659,9 +703,19 @@ xfs_file_iomap_begin_delay(
 	 * them out if the write happens to fail.
 	 */
 	iomap->flags |= IOMAP_F_NEW;
-	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
+	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
 done:
-	error = xfs_bmbt_to_iomap(ip, iomap, &got, false);
+	if (whichfork == XFS_COW_FORK) {
+		if (imap.br_startoff > offset_fsb) {
+			xfs_trim_extent(&cmap, offset_fsb,
+					imap.br_startoff - offset_fsb);
+			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, false);
+			goto out_unlock;
+		}
+		/* ensure we only report blocks we have a reservation for */
+		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
+	}
+	error = xfs_bmbt_to_iomap(ip, iomap, &imap, false);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index bdbaff1b3fb7..d59b556d42cb 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -234,73 +234,6 @@ xfs_reflink_trim_around_shared(
 	}
 }
 
-/*
- * Trim the passed in imap to the next shared/unshared extent boundary, and
- * if imap->br_startoff points to a shared extent reserve space for it in the
- * COW fork.
- *
- * Note that imap will always contain the block numbers for the existing blocks
- * in the data fork, as the upper layers need them for read-modify-write
- * operations.
- */
-int
-xfs_reflink_reserve_cow(
-	struct xfs_inode	*ip,
-	struct xfs_bmbt_irec	*imap)
-{
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	struct xfs_bmbt_irec	got;
-	int			error = 0;
-	bool			eof = false;
-	struct xfs_iext_cursor	icur;
-	bool			shared;
-
-	/*
-	 * Search the COW fork extent list first.  This serves two purposes:
-	 * first this implement the speculative preallocation using cowextisze,
-	 * so that we also unshared block adjacent to shared blocks instead
-	 * of just the shared blocks themselves.  Second the lookup in the
-	 * extent list is generally faster than going out to the shared extent
-	 * tree.
-	 */
-
-	if (!xfs_iext_lookup_extent(ip, ifp, imap->br_startoff, &icur, &got))
-		eof = true;
-	if (!eof && got.br_startoff <= imap->br_startoff) {
-		trace_xfs_reflink_cow_found(ip, imap);
-		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
-		return 0;
-	}
-
-	/* Trim the mapping to the nearest shared extent boundary. */
-	error = xfs_reflink_trim_around_shared(ip, imap, &shared);
-	if (error)
-		return error;
-
-	/* Not shared?  Just report the (potentially capped) extent. */
-	if (!shared)
-		return 0;
-
-	/*
-	 * Fork all the shared blocks from our write offset until the end of
-	 * the extent.
-	 */
-	error = xfs_qm_dqattach_locked(ip, false);
-	if (error)
-		return error;
-
-	error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
-			imap->br_blockcount, 0, &got, &icur, eof);
-	if (error == -ENOSPC || error == -EDQUOT)
-		trace_xfs_reflink_cow_enospc(ip, imap);
-	if (error)
-		return error;
-
-	xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
-	trace_xfs_reflink_cow_alloc(ip, &got);
-	return 0;
-}
-
 /* Convert part of an unwritten CoW extent to a real one. */
 STATIC int
 xfs_reflink_convert_cow_extent(
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 870865913bd8..36e74fc90700 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3179,7 +3179,6 @@ DEFINE_INODE_ERROR_EVENT(xfs_reflink_unshare_error);
 
 /* copy on write */
 DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_around_shared);
-DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_alloc);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_found);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_enospc);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-12-03 22:25 ` [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-12-03 22:25 ` Christoph Hellwig
  2018-12-18 23:38   ` Darrick J. Wong
  2018-12-03 22:25 ` [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust Christoph Hellwig
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:25 UTC (permalink / raw)
  To: linux-xfs

No user of it in the iomap code at the moment, but we should not
actively report wrong information if we can trivially get it right.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d19f99e5476a..bbc5d2e06b06 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -541,7 +541,7 @@ xfs_file_iomap_begin_delay(
 	struct xfs_bmbt_irec	imap, cmap;
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
-	bool			eof = false, cow_eof = false, shared;
+	bool			eof = false, cow_eof = false, shared = false;
 	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 
@@ -709,13 +709,14 @@ xfs_file_iomap_begin_delay(
 		if (imap.br_startoff > offset_fsb) {
 			xfs_trim_extent(&cmap, offset_fsb,
 					imap.br_startoff - offset_fsb);
-			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, false);
+			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, true);
 			goto out_unlock;
 		}
 		/* ensure we only report blocks we have a reservation for */
 		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
+		shared = true;
 	}
-	error = xfs_bmbt_to_iomap(ip, iomap, &imap, false);
+	error = xfs_bmbt_to_iomap(ip, iomap, &imap, shared);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-12-03 22:25 ` [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-12-03 22:25 ` Christoph Hellwig
  2018-12-18 22:22   ` Darrick J. Wong
  2018-12-03 22:25 ` [PATCH 11/11] xfs: introduce an always_cow mode Christoph Hellwig
  2018-12-06  1:05 ` COW improvements and always_cow support V3 Darrick J. Wong
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:25 UTC (permalink / raw)
  To: linux-xfs

If we have racing buffered and direct I/O COW fork extents under
writeback can have been moved to the data fork by the time we call
xfs_reflink_convert_cow from xfs_submit_ioend.  This would be mostly
harmless as the block numbers don't change by this move, except for
the fact that xfs_bmapi_write will crash or trigger asserts when
not finding existing extents, even despite trying to paper over this
with the XFS_BMAPI_CONVERT_ONLY flag.

Instead of special casing non-transaction conversions in the already
way too complicated xfs_bmapi_write just add a new helper for the much
simpler non-transactional COW fork case, which simplify ignores not
found extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 12 ++------
 fs/xfs/libxfs/xfs_bmap.h |  8 +++---
 fs/xfs/xfs_reflink.c     | 61 +++++++++++++++++++++++++---------------
 3 files changed, 45 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1992ed8a60b0..fbed7ed34a7f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2029,7 +2029,7 @@ xfs_bmap_add_extent_delay_real(
 /*
  * Convert an unwritten allocation to a real allocation or vice versa.
  */
-STATIC int				/* error */
+int					/* error */
 xfs_bmap_add_extent_unwritten_real(
 	struct xfs_trans	*tp,
 	xfs_inode_t		*ip,	/* incore inode pointer */
@@ -4236,9 +4236,7 @@ xfs_bmapi_write(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
-	ASSERT(tp != NULL ||
-	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
-			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
+	ASSERT(tp != NULL);
 	ASSERT(len > 0);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
@@ -4316,9 +4314,6 @@ xfs_bmapi_write(
 			 * locked and hence a truncate will block on them
 			 * first.
 			 */
-			ASSERT(!((flags & XFS_BMAPI_CONVERT) &&
-			         (flags & XFS_BMAPI_COWFORK)));
-
 			if (flags & XFS_BMAPI_DELALLOC) {
 				if (eof || bno >= end)
 					break;
@@ -4333,8 +4328,7 @@ xfs_bmapi_write(
 		 * First, deal with the hole before the allocated space
 		 * that we found, if any.
 		 */
-		if ((need_alloc || wasdelay) &&
-		    !(flags & XFS_BMAPI_CONVERT_ONLY)) {
+		if (need_alloc || wasdelay) {
 			bma.eof = eof;
 			bma.conv = !!(flags & XFS_BMAPI_CONVERT);
 			bma.wasdel = wasdelay;
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index f9a925caa70e..ee3848680684 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -98,9 +98,6 @@ struct xfs_extent_free_item
 /* Only convert delalloc space, don't allocate entirely new extents */
 #define XFS_BMAPI_DELALLOC	0x400
 
-/* Only convert unwritten extents, don't allocate new blocks */
-#define XFS_BMAPI_CONVERT_ONLY	0x800
-
 /* Skip online discard of freed extents */
 #define XFS_BMAPI_NODISCARD	0x1000
 
@@ -118,7 +115,6 @@ struct xfs_extent_free_item
 	{ XFS_BMAPI_REMAP,	"REMAP" }, \
 	{ XFS_BMAPI_COWFORK,	"COWFORK" }, \
 	{ XFS_BMAPI_DELALLOC,	"DELALLOC" }, \
-	{ XFS_BMAPI_CONVERT_ONLY, "CONVERT_ONLY" }, \
 	{ XFS_BMAPI_NODISCARD,	"NODISCARD" }, \
 	{ XFS_BMAPI_NORMAP,	"NORMAP" }
 
@@ -227,6 +223,10 @@ int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
 		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
 		struct xfs_bmbt_irec *got, struct xfs_iext_cursor *cur,
 		int eof);
+int	xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
+		struct xfs_inode *ip, int whichfork,
+		struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp,
+		struct xfs_bmbt_irec *new, int *logflagsp);
 
 static inline void
 xfs_bmap_add_free(
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index d59b556d42cb..0cf13cb1b2fe 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -234,26 +234,42 @@ xfs_reflink_trim_around_shared(
 	}
 }
 
-/* Convert part of an unwritten CoW extent to a real one. */
-STATIC int
-xfs_reflink_convert_cow_extent(
-	struct xfs_inode		*ip,
-	struct xfs_bmbt_irec		*imap,
-	xfs_fileoff_t			offset_fsb,
-	xfs_filblks_t			count_fsb)
+static int
+xfs_reflink_convert_cow_locked(
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		offset_fsb,
+	xfs_filblks_t		count_fsb)
 {
-	int				nimaps = 1;
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	got;
+	struct xfs_btree_cur	*dummy_cur = NULL;
+	int			dummy_logflags;
+	int			error;
 
-	if (imap->br_state == XFS_EXT_NORM)
+	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got))
 		return 0;
 
-	xfs_trim_extent(imap, offset_fsb, count_fsb);
-	trace_xfs_reflink_convert_cow(ip, imap);
-	if (imap->br_blockcount == 0)
-		return 0;
-	return xfs_bmapi_write(NULL, ip, imap->br_startoff, imap->br_blockcount,
-			XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT, 0, imap,
-			&nimaps);
+	do {
+		if (got.br_startoff >= offset_fsb + count_fsb)
+			break;
+		if (got.br_state == XFS_EXT_NORM)
+			continue;
+		if (WARN_ON_ONCE(isnullstartblock(got.br_startblock)))
+			return -EIO;
+
+		xfs_trim_extent(&got, offset_fsb, count_fsb);
+		if (!got.br_blockcount)
+			continue;
+
+		got.br_state = XFS_EXT_NORM;
+		error = xfs_bmap_add_extent_unwritten_real(NULL, ip,
+				XFS_COW_FORK, &icur, &dummy_cur, &got,
+				&dummy_logflags);
+		if (error)
+			return error;
+	} while (xfs_iext_next_extent(ip->i_cowfp, &icur, &got));
+
+	return error;
 }
 
 /* Convert all of the unwritten CoW extents in a file's range to real ones. */
@@ -267,15 +283,12 @@ xfs_reflink_convert_cow(
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
 	xfs_filblks_t		count_fsb = end_fsb - offset_fsb;
-	struct xfs_bmbt_irec	imap;
-	int			nimaps = 1, error = 0;
+	int			error;
 
 	ASSERT(count != 0);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	error = xfs_bmapi_write(NULL, ip, offset_fsb, count_fsb,
-			XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT |
-			XFS_BMAPI_CONVERT_ONLY, 0, &imap, &nimaps);
+	error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
@@ -405,9 +418,11 @@ xfs_reflink_allocate_cow(
 	if (nimaps == 0)
 		return -ENOSPC;
 convert:
-	if (!(flags & IOMAP_DIRECT))
+	xfs_trim_extent(imap, offset_fsb, count_fsb);
+	if (!(flags & IOMAP_DIRECT) || imap->br_state == XFS_EXT_NORM)
 		return 0;
-	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
+	trace_xfs_reflink_convert_cow(ip, imap);
+	return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
 
 out_unreserve:
 	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-12-03 22:25 ` [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust Christoph Hellwig
@ 2018-12-03 22:25 ` Christoph Hellwig
  2018-12-18 23:24   ` Darrick J. Wong
  2018-12-06  1:05 ` COW improvements and always_cow support V3 Darrick J. Wong
  11 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 22:25 UTC (permalink / raw)
  To: linux-xfs

Add a mode where XFS never overwrites existing blocks in place.  This
is to aid debugging our COW code, and also put infatructure in place
for things like possible future support for zoned block devices, which
can't support overwrites.

This mode is enabled globally by doing a:

    echo 1 > /sys/fs/xfs/debug/always_cow

Note that the parameter is global to allow running all tests in xfstests
easily in this mode, which would not easily be possible with a per-fs
sysfs file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c    |  2 +-
 fs/xfs/xfs_file.c    | 11 ++++++++++-
 fs/xfs/xfs_iomap.c   | 28 ++++++++++++++++++----------
 fs/xfs/xfs_reflink.c | 28 ++++++++++++++++++++++++----
 fs/xfs/xfs_reflink.h | 13 +++++++++++++
 fs/xfs/xfs_super.c   | 13 +++++++++----
 fs/xfs/xfs_sysctl.h  |  1 +
 fs/xfs/xfs_sysfs.c   | 24 ++++++++++++++++++++++++
 8 files changed, 100 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 7d95a84064e7..a900924f16e1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -986,7 +986,7 @@ xfs_vm_bmap(
 	 * Since we don't pass back blockdev info, we can't return bmap
 	 * information for rt files either.
 	 */
-	if (xfs_is_reflink_inode(ip) || XFS_IS_REALTIME_INODE(ip))
+	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
 		return 0;
 	return iomap_bmap(mapping, block, &xfs_iomap_ops);
 }
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e47425071e65..8d2be043590a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -507,7 +507,7 @@ xfs_file_dio_aio_write(
 		 * We can't properly handle unaligned direct I/O to reflink
 		 * files yet, as we can't unshare a partial block.
 		 */
-		if (xfs_is_reflink_inode(ip)) {
+		if (xfs_is_cow_inode(ip)) {
 			trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count);
 			return -EREMCHG;
 		}
@@ -806,6 +806,15 @@ xfs_file_fallocate(
 		return -EOPNOTSUPP;
 
 	xfs_ilock(ip, iolock);
+	/*
+	 * If always_cow mode we can't use preallocation and thus should not
+	 * allow creating them.
+	 */
+	if (xfs_is_always_cow_inode(ip) && (mode & ~FALLOC_FL_KEEP_SIZE) == 0) {
+		error = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
 	error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP);
 	if (error)
 		goto out_unlock;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index bbc5d2e06b06..244ea0007c09 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -395,12 +395,13 @@ xfs_quota_calc_throttle(
 STATIC xfs_fsblock_t
 xfs_iomap_prealloc_size(
 	struct xfs_inode	*ip,
+	int			whichfork,
 	loff_t			offset,
 	loff_t			count,
 	struct xfs_iext_cursor	*icur)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	struct xfs_bmbt_irec	prev;
 	int			shift = 0;
@@ -593,7 +594,11 @@ xfs_file_iomap_begin_delay(
 	 * themselves.  Second the lookup in the extent list is generally faster
 	 * than going out to the shared extent tree.
 	 */
-	if (xfs_is_reflink_inode(ip)) {
+	if (xfs_is_cow_inode(ip)) {
+		if (!ip->i_cowfp) {
+			ASSERT(!xfs_is_reflink_inode(ip));
+			xfs_ifork_init_cow(ip);
+		}
 		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
 				&ccur, &cmap);
 		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
@@ -609,7 +614,7 @@ xfs_file_iomap_begin_delay(
 		 * overwriting shared extents.   This includes zeroing of
 		 * existing extents that contain data.
 		 */
-		if (!xfs_is_reflink_inode(ip) ||
+		if (!xfs_is_cow_inode(ip) ||
 		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
 			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
 					&imap);
@@ -619,7 +624,7 @@ xfs_file_iomap_begin_delay(
 		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 
 		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
+		error = xfs_inode_need_cow(ip, &imap, &shared);
 		if (error)
 			goto out_unlock;
 
@@ -648,15 +653,18 @@ xfs_file_iomap_begin_delay(
 		 */
 		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
 		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
+
+		if (xfs_is_always_cow_inode(ip))
+			whichfork = XFS_COW_FORK;
 	}
 
 	error = xfs_qm_dqattach_locked(ip, false);
 	if (error)
 		goto out_unlock;
 
-	if (eof && whichfork == XFS_DATA_FORK) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, offset, count,
-				&icur);
+	if (eof) {
+		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+				count, &icur);
 		if (prealloc_blocks) {
 			xfs_extlen_t	align;
 			xfs_off_t	end_offset;
@@ -987,7 +995,7 @@ xfs_ilock_for_iomap(
 	 * COW writes may allocate delalloc space or convert unwritten COW
 	 * extents, so we need to make sure to take the lock exclusively here.
 	 */
-	if (xfs_is_reflink_inode(ip) && is_write) {
+	if (xfs_is_cow_inode(ip) && is_write) {
 		/*
 		 * FIXME: It could still overwrite on unshared extents and not
 		 * need allocation.
@@ -1021,7 +1029,7 @@ xfs_ilock_for_iomap(
 	 * check, so if we got ILOCK_SHARED for a write and but we're now a
 	 * reflink inode we have to switch to ILOCK_EXCL and relock.
 	 */
-	if (mode == XFS_ILOCK_SHARED && is_write && xfs_is_reflink_inode(ip)) {
+	if (mode == XFS_ILOCK_SHARED && is_write && xfs_is_cow_inode(ip)) {
 		xfs_iunlock(ip, mode);
 		mode = XFS_ILOCK_EXCL;
 		goto relock;
@@ -1093,7 +1101,7 @@ xfs_file_iomap_begin(
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
 	 */
-	if (xfs_is_reflink_inode(ip)) {
+	if (xfs_is_cow_inode(ip)) {
 		struct xfs_bmbt_irec	orig = imap;
 
 		/* if zeroing doesn't need COW allocation, then we are done. */
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0cf13cb1b2fe..1da46899c215 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -192,7 +192,7 @@ xfs_reflink_trim_around_shared(
 	int			error = 0;
 
 	/* Holes, unwritten, and delalloc extents cannot be shared */
-	if (!xfs_is_reflink_inode(ip) || !xfs_bmap_is_real_extent(irec)) {
+	if (!xfs_is_cow_inode(ip) || !xfs_bmap_is_real_extent(irec)) {
 		*shared = false;
 		return 0;
 	}
@@ -234,6 +234,23 @@ xfs_reflink_trim_around_shared(
 	}
 }
 
+bool
+xfs_inode_need_cow(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*imap,
+	bool			*shared)
+{
+	/* We can't update any real extents in always COW mode. */
+	if (xfs_is_always_cow_inode(ip) &&
+	    !isnullstartblock(imap->br_startblock)) {
+		*shared = true;
+		return 0;
+	}
+
+	/* Trim the mapping to the nearest shared extent boundary. */
+	return xfs_reflink_trim_around_shared(ip, imap, shared);
+}
+
 static int
 xfs_reflink_convert_cow_locked(
 	struct xfs_inode	*ip,
@@ -321,7 +338,7 @@ xfs_find_trim_cow_extent(
 	if (got.br_startoff > offset_fsb) {
 		xfs_trim_extent(imap, imap->br_startoff,
 				got.br_startoff - imap->br_startoff);
-		return xfs_reflink_trim_around_shared(ip, imap, shared);
+		return xfs_inode_need_cow(ip, imap, shared);
 	}
 
 	*shared = true;
@@ -356,7 +373,10 @@ xfs_reflink_allocate_cow(
 	xfs_extlen_t		resblks = 0;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-	ASSERT(xfs_is_reflink_inode(ip));
+	if (!ip->i_cowfp) {
+		ASSERT(!xfs_is_reflink_inode(ip));
+		xfs_ifork_init_cow(ip);
+	}
 
 	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
 	if (error || !*shared)
@@ -537,7 +557,7 @@ xfs_reflink_cancel_cow_range(
 	int			error;
 
 	trace_xfs_reflink_cancel_cow_range(ip, offset, count);
-	ASSERT(xfs_is_reflink_inode(ip));
+	ASSERT(ip->i_cowfp);
 
 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
 	if (count == NULLFILEOFF)
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index d76fc520cac8..f6505ae37626 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -6,11 +6,24 @@
 #ifndef __XFS_REFLINK_H
 #define __XFS_REFLINK_H 1
 
+static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
+{
+	return xfs_globals.always_cow &&
+		xfs_sb_version_hasreflink(&ip->i_mount->m_sb);
+}
+
+static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
+{
+	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
+}
+
 extern int xfs_reflink_find_shared(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t aglen,
 		xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_maximal);
 extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *irec, bool *shared);
+bool xfs_inode_need_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
+		bool *shared);
 
 extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d3e6cd063688..f4d34749505e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1728,11 +1728,16 @@ xfs_fs_fill_super(
 		}
 	}
 
-	if (xfs_sb_version_hasreflink(&mp->m_sb) && mp->m_sb.sb_rblocks) {
-		xfs_alert(mp,
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		if (mp->m_sb.sb_rblocks) {
+			xfs_alert(mp,
 	"reflink not compatible with realtime device!");
-		error = -EINVAL;
-		goto out_filestream_unmount;
+			error = -EINVAL;
+			goto out_filestream_unmount;
+		}
+
+		if (xfs_globals.always_cow)
+			xfs_info(mp, "using DEBUG-only always_cow mode.");
 	}
 
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb) && mp->m_sb.sb_rblocks) {
diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
index 168488130a19..ad7f9be13087 100644
--- a/fs/xfs/xfs_sysctl.h
+++ b/fs/xfs/xfs_sysctl.h
@@ -85,6 +85,7 @@ struct xfs_globals {
 	int	log_recovery_delay;	/* log recovery delay (secs) */
 	int	mount_delay;		/* mount setup delay (secs) */
 	bool	bug_on_assert;		/* BUG() the kernel on assert failure */
+	bool	always_cow;		/* use COW fork for all overwrites */
 };
 extern struct xfs_globals	xfs_globals;
 
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index cd6a994a7250..cabda13f3c64 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -183,10 +183,34 @@ mount_delay_show(
 }
 XFS_SYSFS_ATTR_RW(mount_delay);
 
+static ssize_t
+always_cow_store(
+	struct kobject	*kobject,
+	const char	*buf,
+	size_t		count)
+{
+	ssize_t		ret;
+
+	ret = kstrtobool(buf, &xfs_globals.always_cow);
+	if (ret < 0)
+		return ret;
+	return count;
+}
+
+static ssize_t
+always_cow_show(
+	struct kobject	*kobject,
+	char		*buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.always_cow);
+}
+XFS_SYSFS_ATTR_RW(always_cow);
+
 static struct attribute *xfs_dbg_attrs[] = {
 	ATTR_LIST(bug_on_assert),
 	ATTR_LIST(log_recovery_delay),
 	ATTR_LIST(mount_delay),
+	ATTR_LIST(always_cow),
 	NULL,
 };
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-12-03 22:25 ` [PATCH 11/11] xfs: introduce an always_cow mode Christoph Hellwig
@ 2018-12-06  1:05 ` Darrick J. Wong
  2018-12-06  4:16   ` Christoph Hellwig
  2018-12-06 20:09   ` Christoph Hellwig
  11 siblings, 2 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-06  1:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:52PM -0500, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds the always_cow mode support after improving our COW
> write support a little bit first.
> 
> The always_cow mode stresses the COW path a lot, but with a few xfstests
> fixups it generall looks good, except for a few tests that complain about
> fragmentation, which is rather inherent in this mode, and xfs/326 which
> inserts error tags into the COW path not getting the expected result.
> 
> Changes since v2:
>  - add a patch to remove xfs_trim_extent_eof
>  - add a patch to remove the separate io_type and rely on existing state
>    in the writeback path
>  - rework the truncate race handling in the writeback path a little more
> 
> Changes since v1:
>  - make delalloc and unwritten extent conversions simpler and more robust
>  - add a few additional cleanups
>  - support all fallocate modes but actual preallocation
>  - rebase on top of a fix from Brian (which is included as first patch
>    to make the patch set more usable)

Hmm, so I tried running xfstests with quota enabled and -g all and
always_cow=1, and saw a bunch of test failures.  Are these expected?
A always_cow=0 run seems to have run fine.

(generic/050 might just be broken so don't spend too much energy on
that one)

--D

--- /tmp/xfstests/tests/generic/050.out	2017-11-27 09:17:11.023736523 -0800
+++ /var/tmp/xfstests//generic/050.out.bad	2018-12-03 22:05:57.391636614 -0800
@@ -2,9 +2,11 @@
 setting device read-only
 mounting read-only block device:
 mount: device write-protected, mounting read-only
+mount: permission denied
 touching file on read-only filesystem (should fail)
 touch: cannot touch 'SCRATCH_MNT/foo': Read-only file system
 unmounting read-only filesystem
+umount: SCRATCH_DEV: not mounted
 setting device read-write
 mounting read-write block device:
 touch files
@@ -18,7 +20,9 @@
 umount: SCRATCH_DEV: not mounted
 mounting filesystem with -o norecovery on a read-only device:
 mount: device write-protected, mounting read-only
+mount: permission denied
 unmounting read-only filesystem
+umount: SCRATCH_DEV: not mounted
 setting device read-write
 mounting filesystem that needs recovery with -o ro:
 *** done
--- /tmp/xfstests/tests/generic/075.out	2017-02-28 09:23:56.049065953 -0800
+++ /var/tmp/xfstests//generic/075.out.bad	2018-12-03 22:11:12.392566688 -0800
@@ -8,11 +8,6 @@
 -----------------------------------------------
 fsx.1 : -d -N numops -S 0 -x
 -----------------------------------------------
-
------------------------------------------------
-fsx.2 : -d -N numops -l filelen -S 0
------------------------------------------------
-
------------------------------------------------
-fsx.3 : -d -N numops -l filelen -S 0 -x
------------------------------------------------
+    fsx (-d -N 1000 -S 0 -x) failed, 0 - compare /var/tmp/xfstests/generic/075.1.{good,bad,fsxlog}
+mv: cannot stat '/tmp/xfstests/075.1.fsxlog': No such file or directory
+od: /tmp/xfstests/075.1.fsxgood: No such file or directory
--- /tmp/xfstests/tests/generic/112.out	2017-02-28 09:23:56.052066035 -0800
+++ /var/tmp/xfstests//generic/112.out.bad	2018-12-03 22:15:41.225044797 -0800
@@ -8,11 +8,5 @@
 -----------------------------------------------
 fsx.1 : -A -d -N numops -S 0 -x
 -----------------------------------------------
-
------------------------------------------------
-fsx.2 : -A -d -N numops -l filelen -S 0
------------------------------------------------
-
------------------------------------------------
-fsx.3 : -A -d -N numops -l filelen -S 0 -x
------------------------------------------------
+    fsx (-A -d -N 1000 -S 0 -x) returned 0 - see 112.1.full
+mv: cannot stat '/tmp/xfstests/112.1.fsxlog': No such file or directory
--- /tmp/xfstests/tests/generic/392.out	2017-02-28 09:23:56.076066685 -0800
+++ /var/tmp/xfstests//generic/392.out.bad	2018-12-03 23:24:31.217086985 -0800
@@ -3,9 +3,13 @@
 ==== i_size 4096 test with fsync ====
 ==== i_time test with fsync ====
 ==== fpunch 1024 test with fsync ====
+Before:  "b: 8464 s: 4202496 a: 2018-12-03 23:24:28.977047177 -0800 m: 2018-12-03 23:24:28.989047390 -0800 c: 2018-12-03 23:24:28.989047390 -0800"
+After :  "b: 8208 s: 4202496 a: 2018-12-03 23:24:28.977047177 -0800 m: 2018-12-03 23:24:28.989047390 -0800 c: 2018-12-03 23:24:28.989047390 -0800"
 ==== fpunch 4096 test with fsync ====
 ==== i_size 1024 test with fdatasync ====
 ==== i_size 4096 test with fdatasync ====
 ==== i_time test with fdatasync ====
 ==== fpunch 1024 test with fdatasync ====
+Before:  "b: 8464 s: 4202496"
+After :  "b: 8208 s: 4202496"
 ==== fpunch 4096 test with fdatasync ====
--- /tmp/xfstests/tests/generic/451.out	2017-09-03 08:56:14.795707491 -0700
+++ /var/tmp/xfstests//generic/451.out.bad	2018-12-03 23:43:53.829680018 -0800
@@ -1,2 +1,9 @@
 QA output created by 451
+get stale data from buffer read
+00000000  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
+*
+0001c000  55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55  UUUUUUUUUUUUUUUU
+*
+0003c000  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
+000a0000
 Silence is golden
--- /dev/null	2018-12-03 21:37:59.088000000 -0800
+++ /var/tmp/xfstests//generic/476.dmesg	2018-12-03 23:51:33.429850088 -0800
@@ -0,0 +1,104 @@
+[ 7804.425766] run fstests generic/476 at 2018-12-03 23:47:40
+[ 7804.828467] XFS (pmem3): using DEBUG-only always_cow mode.
+[ 7804.830272] XFS (pmem3): Mounting V5 Filesystem
+[ 7804.836758] XFS (pmem3): Ending clean mount
+[ 7805.207900] XFS (pmem4): using DEBUG-only always_cow mode.
+[ 7805.210817] XFS (pmem4): Mounting V5 Filesystem
+[ 7805.214250] XFS (pmem4): Ending clean mount
+[ 7805.216764] XFS (pmem4): Quotacheck needed: Please wait.
+[ 7805.220707] XFS (pmem4): Quotacheck: Done.
+[ 7806.345711] XFS (pmem4): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
+[ 7806.345712] XFS (pmem4): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
+
+[ 7892.155964] ======================================================
+[ 7892.158181] WARNING: possible circular locking dependency detected
+[ 7892.159864] 4.20.0-rc1-djw #8 Not tainted
+[ 7892.161865] ------------------------------------------------------
+[ 7892.164655] kswapd0/54 is trying to acquire lock:
+[ 7892.166931] 00000000f0bcc36e (sb_internal){++++}, at: xfs_trans_alloc+0x1a5/0x220 [xfs]
+[ 7892.171000] 
+               but task is already holding lock:
+[ 7892.173813] 000000005352e33b (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
+[ 7892.203207] 
+               which lock already depends on the new lock.
+
+[ 7892.205884] 
+               the existing dependency chain (in reverse order) is:
+[ 7892.207859] 
+               -> #1 (fs_reclaim){+.+.}:
+[ 7892.209537]        kmem_cache_alloc+0x29/0x290
+[ 7892.211472]        kmem_zone_alloc+0x83/0x100 [xfs]
+[ 7892.213626]        xfs_trans_alloc+0x45/0x220 [xfs]
+[ 7892.215745]        xfs_sync_sb+0x35/0x70 [xfs]
+[ 7892.217677]        xfs_quiesce_attr+0x53/0x90 [xfs]
+[ 7892.219820]        xfs_fs_freeze+0x25/0x40 [xfs]
+[ 7892.222046]        freeze_super+0xc8/0x180
+[ 7892.223993]        do_vfs_ioctl+0x5b8/0x750
+[ 7892.225946]        ksys_ioctl+0x36/0x60
+[ 7892.227707]        __x64_sys_ioctl+0x16/0x20
+[ 7892.230376]        do_syscall_64+0x50/0x170
+[ 7892.244271]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
+[ 7892.246233] 
+               -> #0 (sb_internal){++++}:
+[ 7892.248518]        __sb_start_write+0xb0/0x1f0
+[ 7892.251785]        xfs_trans_alloc+0x1a5/0x220 [xfs]
+[ 7892.253483]        xfs_free_eofblocks+0x14e/0x230 [xfs]
+[ 7892.255249]        xfs_fs_destroy_inode+0xd6/0x300 [xfs]
+[ 7892.256814]        dispose_list+0x48/0x70
+[ 7892.258418]        prune_icache_sb+0x52/0x70
+[ 7892.260494]        super_cache_scan+0x13e/0x190
+[ 7892.262403]        shrink_slab.constprop.86+0x1ed/0x560
+[ 7892.264742]        shrink_node+0x99/0x310
+[ 7892.266323]        kswapd+0x33d/0x860
+[ 7892.268272]        kthread+0x106/0x140
+[ 7892.269798]        ret_from_fork+0x3a/0x50
+[ 7892.271957] 
+               other info that might help us debug this:
+
+[ 7892.275294]  Possible unsafe locking scenario:
+
+[ 7892.277820]        CPU0                    CPU1
+[ 7892.279797]        ----                    ----
+[ 7892.281817]   lock(fs_reclaim);
+[ 7892.283167]                                lock(sb_internal);
+[ 7892.285511]                                lock(fs_reclaim);
+[ 7892.287806]   lock(sb_internal);
+[ 7892.289174] 
+                *** DEADLOCK ***
+
+[ 7892.291593] 3 locks held by kswapd0/54:
+[ 7892.293239]  #0: 000000005352e33b (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
+[ 7892.296470]  #1: 00000000d57c66e4 (shrinker_rwsem){++++}, at: shrink_slab.constprop.86+0x5f/0x560
+[ 7892.300215]  #2: 000000007add902d (&type->s_umount_key#33){++++}, at: trylock_super+0x16/0x50
+[ 7892.303874] 
+               stack backtrace:
+[ 7892.306308] CPU: 0 PID: 54 Comm: kswapd0 Not tainted 4.20.0-rc1-djw #8
+[ 7892.309597] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
+[ 7892.313494] Call Trace:
+[ 7892.321691]  dump_stack+0x5e/0x8b
+[ 7892.329352]  print_circular_bug.isra.35+0x212/0x21f
+[ 7892.339719]  __lock_acquire+0x1196/0x1580
+[ 7892.348877]  ? kvm_clock_read+0x14/0x30
+[ 7892.357090]  ? lock_acquire+0x9d/0x1a0
+[ 7892.364959]  lock_acquire+0x9d/0x1a0
+[ 7892.369784]  ? xfs_trans_alloc+0x1a5/0x220 [xfs]
+[ 7892.372053]  __sb_start_write+0xb0/0x1f0
+[ 7892.374286]  ? xfs_trans_alloc+0x1a5/0x220 [xfs]
+[ 7892.376305]  xfs_trans_alloc+0x1a5/0x220 [xfs]
+[ 7892.378462]  xfs_free_eofblocks+0x14e/0x230 [xfs]
+[ 7892.381112]  xfs_fs_destroy_inode+0xd6/0x300 [xfs]
+[ 7892.383713]  dispose_list+0x48/0x70
+[ 7892.385831]  prune_icache_sb+0x52/0x70
+[ 7892.387484]  super_cache_scan+0x13e/0x190
+[ 7892.389412]  shrink_slab.constprop.86+0x1ed/0x560
+[ 7892.391531]  shrink_node+0x99/0x310
+[ 7892.393055]  kswapd+0x33d/0x860
+[ 7892.394507]  ? node_reclaim+0x270/0x270
+[ 7892.396437]  kthread+0x106/0x140
+[ 7892.397902]  ? kthread_cancel_delayed_work_sync+0x10/0x10
+[ 7892.400332]  ret_from_fork+0x3a/0x50
+[ 7977.067500] XFS (pmem3): Unmounting Filesystem
+[ 8036.080093] XFS (pmem4): Unmounting Filesystem
+[ 8037.706746] XFS (pmem4): using DEBUG-only always_cow mode.
+[ 8037.708416] XFS (pmem4): Mounting V5 Filesystem
+[ 8037.718914] XFS (pmem4): Ending clean mount
--- /tmp/xfstests/tests/xfs/026.out	2018-11-01 15:51:51.833340753 -0700
+++ /var/tmp/xfstests//xfs/026.out.bad	2018-12-04 00:15:40.163563143 -0800
@@ -27,6 +27,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4516 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/027.out	2018-11-01 15:51:51.833340753 -0700
+++ /var/tmp/xfstests//xfs/027.out.bad	2018-12-04 00:15:52.707774592 -0800
@@ -18,6 +18,8 @@
 xfsrestore: session id: ID
 xfsrestore: media ID: ID
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4516 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 39 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/046.out	2018-11-01 15:51:51.838340643 -0700
+++ /var/tmp/xfstests//xfs/046.out.bad	2018-12-04 00:18:44.602685223 -0800
@@ -27,6 +27,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4236 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 10 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/056.out	2018-11-01 15:51:51.840340598 -0700
+++ /var/tmp/xfstests//xfs/056.out.bad	2018-12-04 00:20:00.859983661 -0800
@@ -27,6 +27,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4488 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4246 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 7 directories and 11 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/060.out	2018-11-01 15:51:51.841340576 -0700
+++ /var/tmp/xfstests//xfs/060.out.bad	2018-12-04 00:20:58.672970719 -0800
@@ -41,6 +41,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4546 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 41 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/061.out	2018-11-01 15:51:51.841340576 -0700
+++ /var/tmp/xfstests//xfs/061.out.bad	2018-12-04 00:21:10.273169035 -0800
@@ -18,6 +18,8 @@
 xfsrestore: session id: ID
 xfsrestore: media ID: ID
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4488 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4216 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 7 directories and 11 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/063.out	2018-11-01 15:51:51.841340576 -0700
+++ /var/tmp/xfstests//xfs/063.out.bad	2018-12-04 00:21:32.261545187 -0800
@@ -37,6 +37,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4320 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4346 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 4 directories and 21 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/064.out	2018-11-01 15:51:51.842340554 -0700
+++ /var/tmp/xfstests//xfs/064.out.bad	2018-12-04 00:22:25.158451306 -0800
@@ -313,6 +313,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27467/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4236 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27467/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -394,6 +396,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27493/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27493/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -427,6 +431,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27519/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27519/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -460,6 +466,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27545/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27545/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -493,6 +501,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27571/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27571/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -526,6 +536,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27597/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27597/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -559,6 +571,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27623/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27623/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -592,6 +606,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27649/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27649/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -625,6 +641,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27675/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27675/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -658,6 +676,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27701/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/xfsrestorehousekeepingdir.27701/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -692,6 +712,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4236 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -773,6 +795,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -854,6 +877,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -935,6 +959,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -1016,6 +1041,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -1097,6 +1123,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -1178,6 +1205,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -1259,6 +1287,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -1340,6 +1369,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
@@ -1421,6 +1451,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 55 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/066.out	2018-11-01 15:51:51.843340532 -0700
+++ /var/tmp/xfstests//xfs/066.out.bad	2018-12-04 00:22:57.991014568 -0800
@@ -31,6 +31,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 2 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/068.out	2018-11-01 15:51:51.844340510 -0700
+++ /var/tmp/xfstests//xfs/068.out.bad	2018-12-04 00:23:12.487263477 -0800
@@ -21,8 +21,10 @@
 xfsrestore: session id: ID
 xfsrestore: media ID: ID
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 25488 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 17556 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
-xfsrestore: 383 directories and 1335 entries processed
+xfsrestore: 382 directories and 1343 entries processed
 xfsrestore: directory post-processing
 xfsrestore: restoring non-directory files
 xfsrestore: restore complete: SECS seconds elapsed
--- /dev/null	2018-12-03 21:37:59.088000000 -0800
+++ /var/tmp/xfstests//xfs/079.dmesg	2018-12-04 00:24:01.116099243 -0800
@@ -0,0 +1,51 @@
+[ 9983.940112] XFS (pmem3): Unmounting Filesystem
+[ 9984.709782] XFS (pmem4): Unmounting Filesystem
+[ 9984.868053] XFS (pmem4): Mounting V5 Filesystem
+[ 9984.871502] XFS (pmem4): Ending clean mount
+[ 9984.873227] XFS (pmem4): Quotacheck needed: Please wait.
+[ 9985.273714] XFS (pmem4): Quotacheck: Done.
+[ 9985.275187] XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: /raid/home/djwong/cdev/work/linux-djw/fs/xfs/libxfs/xfs_ag_resv.c, line: 319
+[ 9985.349822] WARNING: CPU: 2 PID: 11265 at /raid/home/djwong/cdev/work/linux-djw/fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
+[ 9985.352759] Modules linked in: ext2 ocfs2_stack_o2cb ocfs2_dlm ocfs2 ocfs2_nodemanager configfs quota_tree ocfs2_stackglue dm_thin_pool dm_persistent_data dm_bio_prison mq_deadline deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c bfq dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
+[ 9985.359330] CPU: 2 PID: 11265 Comm: mount Not tainted 4.20.0-rc1-djw #8
+[ 9985.360846] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
+[ 9985.362961] RIP: 0010:assfail+0x25/0x30 [xfs]
+[ 9985.363999] Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 58 48 1e a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d 55 49 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
+[ 9985.368107] RSP: 0000:ffffc90000b63cf0 EFLAGS: 00010246
+[ 9985.369323] RAX: 0000000000000000 RBX: ffff88006dc77000 RCX: 0000000000000000
+[ 9985.370933] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01d5cf3
+[ 9985.372570] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
+[ 9985.374188] R10: 0000000000000001 R11: f000000000000000 R12: ffff88006767e000
+[ 9985.375809] R13: 0000000000000000 R14: ffff88006767e000 R15: 0000000000000000
+[ 9985.377433] FS:  00007f1507eb7840(0000) GS:ffff88007e200000(0000) knlGS:0000000000000000
+[ 9985.379257] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+[ 9985.380590] CR2: 00000000023c70d8 CR3: 0000000078828000 CR4: 00000000000006a0
+[ 9985.382212] Call Trace:
+[ 9985.382866]  xfs_ag_resv_init+0x1bd/0x1d0 [xfs]
+[ 9985.383977]  xfs_fs_reserve_ag_blocks+0x37/0xa0 [xfs]
+[ 9985.385206]  xfs_mountfs+0x89f/0x970 [xfs]
+[ 9985.386220]  xfs_fs_fill_super+0x511/0x6e0 [xfs]
+[ 9985.387343]  ? xfs_test_remount_options+0x60/0x60 [xfs]
+[ 9985.388594]  mount_bdev+0x17a/0x1b0
+[ 9985.389451]  mount_fs+0x15/0x80
+[ 9985.390233]  vfs_kern_mount+0x62/0x160
+[ 9985.391145]  do_mount+0x1dc/0xd70
+[ 9985.391964]  ? copy_mount_options+0x2d/0x180
+[ 9985.392991]  ksys_mount+0x7e/0xd0
+[ 9985.393801]  __x64_sys_mount+0x21/0x30
+[ 9985.394713]  do_syscall_64+0x50/0x170
+[ 9985.395614]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
+[ 9985.396814] RIP: 0033:0x7f1507798b9a
+[ 9985.397681] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
+[ 9985.401810] RSP: 002b:00007ffcdcdc6458 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
+[ 9985.403525] RAX: ffffffffffffffda RBX: 000000000182a060 RCX: 00007f1507798b9a
+[ 9985.405155] RDX: 000000000182a2d0 RSI: 000000000182a310 RDI: 000000000182a2f0
+[ 9985.406781] RBP: 0000000000000000 R08: 000000000182a270 R09: 0000000000000012
+[ 9985.408442] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000000000182a2f0
+[ 9985.410077] R13: 000000000182a2d0 R14: 0000000000000000 R15: 0000000000000005
+[ 9985.411704] irq event stamp: 0
+[ 9985.412456] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
+[ 9985.417384] hardirqs last disabled at (0): [<ffffffff81056678>] copy_process+0x6e8/0x2190
+[ 9985.419283] softirqs last  enabled at (0): [<ffffffff81056678>] copy_process+0x6e8/0x2190
+[ 9985.421144] softirqs last disabled at (0): [<0000000000000000>]           (null)
+[ 9985.422816] ---[ end trace f0b195b925b8de5a ]---
--- /tmp/xfstests/tests/xfs/080.out	2018-11-01 15:51:51.846340466 -0700
+++ /var/tmp/xfstests//xfs/080.out.bad	2018-12-04 00:24:02.892129791 -0800
@@ -1,3 +1,6 @@
 QA output created by 080
 
-Completed rwtest pass 1 successfully.
+iogen:  Could not xfsctl(XFS_IOC_RESVSP) 104185344 bytes in file rwtest.file: Operation not supported (95)
+iogen warning:  Couldn't create file rwtest.file of 104185344 bytes
+iogen:  Could not create, or gather info for any test files
+rwtest.sh : iogen reported errors (r=2)
--- /tmp/xfstests/tests/xfs/114.out	2018-11-01 15:51:51.863340090 -0700
+++ /var/tmp/xfstests//xfs/114.out.bad	2018-12-04 00:25:42.117839178 -0800
@@ -1,6 +1,8 @@
 QA output created by 114
 Format and mount
 Create some files
+fallocate: Operation not supported
+fallocate: Operation not supported
 Insert and write file range
 Remount
 Collapse file
--- /tmp/xfstests/tests/xfs/170.out	2018-11-01 15:51:51.879339736 -0700
+++ /var/tmp/xfstests//xfs/170.out.bad	2018-12-04 00:29:25.817709966 -0800
@@ -3,19 +3,5 @@
 # streaming
 # sync AGs...
 # checking stream AGs...
-+ passed, streams are in seperate AGs
-# testing 8 22 4 8 3 1 0 ....
-# streaming
-# sync AGs...
-# checking stream AGs...
-+ passed, streams are in seperate AGs
-# testing 8 22 4 8 3 0 1 ....
-# streaming
-# sync AGs...
-# checking stream AGs...
-+ passed, streams are in seperate AGs
-# testing 8 22 4 8 3 1 1 ....
-# streaming
-# sync AGs...
-# checking stream AGs...
-+ passed, streams are in seperate AGs
+- failed, 2 streams with matching AGs
+(see /var/tmp/xfstests/xfs/170.full for details)
--- /tmp/xfstests/tests/xfs/171.out	2018-11-01 15:51:51.879339736 -0700
+++ /var/tmp/xfstests//xfs/171.out.bad	2018-12-04 00:29:41.669985068 -0800
@@ -3,19 +3,5 @@
 # streaming
 # sync AGs...
 # checking stream AGs...
-+ passed, streams are in seperate AGs
-# testing 64 16 8 100 1 1 1 ....
-# streaming
-# sync AGs...
-# checking stream AGs...
-+ passed, streams are in seperate AGs
-# testing 64 16 8 100 1 0 0 ....
-# streaming
-# sync AGs...
-# checking stream AGs...
-+ passed, streams are in seperate AGs
-# testing 64 16 8 100 1 0 1 ....
-# streaming
-# sync AGs...
-# checking stream AGs...
-+ passed, streams are in seperate AGs
+- failed, 7 streams with matching AGs
+(see /var/tmp/xfstests/xfs/171.full for details)
--- /tmp/xfstests/tests/xfs/173.out	2018-11-01 15:51:51.879339736 -0700
+++ /var/tmp/xfstests//xfs/173.out.bad	2018-12-04 00:30:17.422605872 -0800
@@ -18,4 +18,5 @@
 # streaming
 # sync AGs...
 # checking stream AGs...
-+ passed, streams are in seperate AGs
+- failed, 28 streams with matching AGs
+(see /var/tmp/xfstests/xfs/173.full for details)
--- /tmp/xfstests/tests/xfs/180.out	2018-11-01 15:51:51.880339714 -0700
+++ /var/tmp/xfstests//xfs/180.out.bad	2018-12-04 00:30:38.254967826 -0800
@@ -9,3 +9,4 @@
 2909feb63a37b0e95fe5cfb7f274f7b1  SCRATCH_MNT/test-180/file1
 d41f6527bc8320364e12ea7076140b8b  SCRATCH_MNT/test-180/file2
 Check extent counts
+file2 badly fragmented
--- /tmp/xfstests/tests/xfs/182.out	2018-11-01 15:51:51.880339714 -0700
+++ /var/tmp/xfstests//xfs/182.out.bad	2018-12-04 00:30:52.171209702 -0800
@@ -10,3 +10,4 @@
 2909feb63a37b0e95fe5cfb7f274f7b1  SCRATCH_MNT/test-182/file1
 c6ba35da9f73ced20d7781a448cc11d4  SCRATCH_MNT/test-182/file2
 Check extent counts
+file2 badly fragmented
--- /tmp/xfstests/tests/xfs/192.out	2018-11-01 15:51:51.883339647 -0700
+++ /var/tmp/xfstests//xfs/192.out.bad	2018-12-04 00:32:19.796734292 -0800
@@ -8,3 +8,4 @@
 Compare files
 2909feb63a37b0e95fe5cfb7f274f7b1  SCRATCH_MNT/test-192/file1
 Check extent counts
+file2 badly fragmented
--- /tmp/xfstests/tests/xfs/198.out	2018-11-01 15:51:51.884339625 -0700
+++ /var/tmp/xfstests//xfs/198.out.bad	2018-12-04 00:32:29.484903017 -0800
@@ -8,3 +8,4 @@
 Compare files
 2909feb63a37b0e95fe5cfb7f274f7b1  SCRATCH_MNT/test-198/file1
 Check extent counts
+file2 badly fragmented
--- /tmp/xfstests/tests/xfs/204.out	2018-11-01 15:51:51.885339603 -0700
+++ /var/tmp/xfstests//xfs/204.out.bad	2018-12-04 00:32:39.781082365 -0800
@@ -8,3 +8,4 @@
 Compare files
 2909feb63a37b0e95fe5cfb7f274f7b1  SCRATCH_MNT/test-204/file1
 Check extent counts
+file2 badly fragmented
--- /tmp/xfstests/tests/xfs/205.out	2018-11-01 15:51:51.885339603 -0700
+++ /var/tmp/xfstests//xfs/205.out.bad	2018-12-04 00:32:43.641149612 -0800
@@ -1,4 +1,5 @@
 QA output created by 205
 *** one file
+   !!! disk full (expected)
 *** one file, a few bytes at a time
 *** done
--- /tmp/xfstests/tests/xfs/208.out	2018-11-01 15:51:51.885339603 -0700
+++ /var/tmp/xfstests//xfs/208.out.bad	2018-12-04 00:32:49.745255963 -0800
@@ -11,3 +11,5 @@
 d41f6527bc8320364e12ea7076140b8b  SCRATCH_MNT/test-208/file2
 d41f6527bc8320364e12ea7076140b8b  SCRATCH_MNT/test-208/file3
 Check extent counts
+file2 badly fragmented
+file3 badly fragmented
--- /tmp/xfstests/tests/xfs/252.out	2018-11-01 15:51:51.892339448 -0700
+++ /var/tmp/xfstests//xfs/252.out.bad	2018-12-04 00:43:57.620854199 -0800
@@ -7,9 +7,7 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	3. into unwritten space
-0: [0..127]: unwritten
-1: [128..383]: hole
-2: [384..639]: unwritten
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	4. hole -> data
 0: [0..383]: hole
@@ -17,43 +15,39 @@
 2: [512..639]: hole
 286aad7ca07b2256f0f2bb8e608ff63d
 	5. hole -> unwritten
-0: [0..383]: hole
-1: [384..511]: unwritten
-2: [512..639]: hole
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	6. data -> hole
 0: [0..127]: data
 1: [128..639]: hole
 3976e5cc0b8a47c4cdc9e0211635f568
 	7. data -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
-1: [128..383]: hole
-2: [384..511]: unwritten
-3: [512..639]: hole
+1: [128..639]: hole
 3976e5cc0b8a47c4cdc9e0211635f568
 	8. unwritten -> hole
-0: [0..127]: unwritten
-1: [128..639]: hole
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	9. unwritten -> data
-0: [0..127]: unwritten
-1: [128..383]: hole
-2: [384..511]: data
-3: [512..639]: hole
+fallocate: Operation not supported
+0: [0..383]: hole
+1: [384..511]: data
+2: [512..639]: hole
 286aad7ca07b2256f0f2bb8e608ff63d
 	10. hole -> data -> hole
 1aca77e2188f52a62674fe8a873bdaba
 	11. data -> hole -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	12. unwritten -> data -> unwritten
-0: [0..127]: unwritten
-1: [128..511]: hole
-2: [512..639]: unwritten
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	13. data -> unwritten -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
@@ -86,9 +80,7 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	3. into unwritten space
-0: [0..127]: unwritten
-1: [128..383]: hole
-2: [384..639]: unwritten
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	4. hole -> data
 0: [0..383]: hole
@@ -96,43 +88,39 @@
 2: [512..639]: hole
 286aad7ca07b2256f0f2bb8e608ff63d
 	5. hole -> unwritten
-0: [0..383]: hole
-1: [384..511]: unwritten
-2: [512..639]: hole
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	6. data -> hole
 0: [0..127]: data
 1: [128..639]: hole
 3976e5cc0b8a47c4cdc9e0211635f568
 	7. data -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
-1: [128..383]: hole
-2: [384..511]: unwritten
-3: [512..639]: hole
+1: [128..639]: hole
 3976e5cc0b8a47c4cdc9e0211635f568
 	8. unwritten -> hole
-0: [0..127]: unwritten
-1: [128..639]: hole
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	9. unwritten -> data
-0: [0..127]: unwritten
-1: [128..383]: hole
-2: [384..511]: data
-3: [512..639]: hole
+fallocate: Operation not supported
+0: [0..383]: hole
+1: [384..511]: data
+2: [512..639]: hole
 286aad7ca07b2256f0f2bb8e608ff63d
 	10. hole -> data -> hole
 1aca77e2188f52a62674fe8a873bdaba
 	11. data -> hole -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	12. unwritten -> data -> unwritten
-0: [0..127]: unwritten
-1: [128..511]: hole
-2: [512..639]: unwritten
+fallocate: Operation not supported
 1aca77e2188f52a62674fe8a873bdaba
 	13. data -> unwritten -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
@@ -165,6 +153,7 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	3. into unwritten space
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
@@ -175,6 +164,7 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	5. hole -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
@@ -185,16 +175,19 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	7. data -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	8. unwritten -> hole
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	9. unwritten -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
@@ -205,16 +198,19 @@
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	11. data -> hole -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	12. unwritten -> data -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	13. data -> unwritten -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
@@ -247,6 +243,7 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	3. into unwritten space
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
@@ -257,6 +254,7 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	5. hole -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
@@ -267,16 +265,19 @@
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	7. data -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	8. unwritten -> hole
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
 2f7a72b9ca9923b610514a11a45a80c9
 	9. unwritten -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..383]: hole
 2: [384..639]: data
@@ -287,16 +288,19 @@
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	11. data -> hole -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	12. unwritten -> data -> unwritten
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
 0bcfc7652751f8fe46381240ccadd9d7
 	13. data -> unwritten -> data
+fallocate: Operation not supported
 0: [0..127]: data
 1: [128..511]: hole
 2: [512..639]: data
--- /tmp/xfstests/tests/xfs/266.out	2018-11-01 15:51:51.894339404 -0700
+++ /var/tmp/xfstests//xfs/266.out.bad	2018-12-04 00:44:52.097802263 -0800
@@ -58,6 +58,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4516 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
@@ -82,6 +84,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: NOTE: dump is not self-contained, orphaned files expected if base dump(s) was not applied
 xfsrestore: 1 directories and 4 entries processed
--- /tmp/xfstests/tests/xfs/281.out	2018-11-01 15:51:51.897339338 -0700
+++ /var/tmp/xfstests//xfs/281.out.bad	2018-12-04 00:45:55.718910065 -0800
@@ -31,6 +31,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4516 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/282.out	2018-11-01 15:51:51.897339338 -0700
+++ /var/tmp/xfstests//xfs/282.out.bad	2018-12-04 00:46:10.147161386 -0800
@@ -61,6 +61,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4516 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
@@ -86,6 +88,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/283.out	2018-11-01 15:51:51.897339338 -0700
+++ /var/tmp/xfstests//xfs/283.out.bad	2018-12-04 00:46:24.679414556 -0800
@@ -61,6 +61,8 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4516 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
@@ -86,6 +88,7 @@
 xfsrestore: media ID: ID
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
+xfsrestore: NOTE: attempt to reserve 4264 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 3 directories and 38 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/296.out	2018-11-01 15:51:51.899339294 -0700
+++ /var/tmp/xfstests//xfs/296.out.bad	2018-12-04 00:48:48.253917474 -0800
@@ -34,6 +34,8 @@
 xfsrestore: using online session inventory
 xfsrestore: searching media for directory dump
 xfsrestore: examining media file 0
+xfsrestore: NOTE: attempt to reserve 4208 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/dirattr using XFS_IOC_RESVSP64 failed: Operation not supported (95)
+xfsrestore: NOTE: attempt to reserve 4156 bytes for SCRATCH_MNT/restoredir/xfsrestorehousekeepingdir/namreg using XFS_IOC_RESVSP64 failed: Operation not supported (95)
 xfsrestore: reading directories
 xfsrestore: 2 directories and 2 entries processed
 xfsrestore: directory post-processing
--- /tmp/xfstests/tests/xfs/326.out	2018-11-01 15:51:51.904339183 -0700
+++ /var/tmp/xfstests//xfs/326.out.bad	2018-12-04 00:51:30.084742147 -0800
@@ -13,6 +13,6 @@
 FS should be online, touch should succeed
 Check files again
 2a4f043bf9730a9e8882c9264b9797b3  SCRATCH_MNT/file1
-1e108771fba35e2f2961d1ad23efbff7  SCRATCH_MNT/file2
+610b3ba989e5e55e90271c2d92e597ca  SCRATCH_MNT/file2
 153498e22f8ff52d7f60b466a5e65285  SCRATCH_MNT/file3
 Done
--- /tmp/xfstests/tests/xfs/329.out	2018-11-01 15:51:51.904339183 -0700
+++ /var/tmp/xfstests//xfs/329.out.bad	2018-12-04 00:51:34.236814667 -0800
@@ -4,7 +4,6 @@
 Inject error
 Defrag the file
 FS should be shut down, touch will fail
-touch: cannot touch 'SCRATCH_MNT/badfs': Input/output error
 Remount to replay log
 Check extent count
 FS should be online, touch should succeed
--- /tmp/xfstests/tests/xfs/331.out	2018-11-01 15:51:51.904339183 -0700
+++ /var/tmp/xfstests//xfs/331.out.bad	2018-12-04 00:51:41.396939729 -0800
@@ -2,3 +2,5 @@
 + create scratch fs
 + mount fs image
 + make some files
+fallocate: Operation not supported
+fallocate: Operation not supported
--- /tmp/xfstests/tests/xfs/420.out	2018-11-01 15:51:51.915338940 -0700
+++ /var/tmp/xfstests//xfs/420.out.bad	2018-12-04 00:52:12.569484283 -0800
@@ -14,8 +14,6 @@
 Whence	Result
 DATA	0
 HOLE	131072
-DATA	196608
-HOLE	262144
 Compare files
 c2803804acc9936eef8aab42c119bfac  SCRATCH_MNT/test-420/file1
 017c08a9320aad844ce86aa9631afb98  SCRATCH_MNT/test-420/file2
--- /tmp/xfstests/tests/xfs/442.out	2018-11-01 15:51:51.918338874 -0700
+++ /var/tmp/xfstests//xfs/442.out.bad	2018-12-04 00:53:21.438687804 -0800
@@ -1,6 +1,7 @@
 QA output created by 442
 Format and fsstress
 Check quota before remount
+project quota 1485116 blocks does not match du 1485216 blocks?
 Check quota after remount
 Comparing user usage
 Comparing group usage

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-06  1:05 ` COW improvements and always_cow support V3 Darrick J. Wong
@ 2018-12-06  4:16   ` Christoph Hellwig
  2018-12-06 16:32     ` Darrick J. Wong
  2018-12-06 20:09   ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-06  4:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Wed, Dec 05, 2018 at 05:05:50PM -0800, Darrick J. Wong wrote:
> Hmm, so I tried running xfstests with quota enabled and -g all and
> always_cow=1, and saw a bunch of test failures.  Are these expected?
> A always_cow=0 run seems to have run fine.
> 
> (generic/050 might just be broken so don't spend too much energy on
> that one)

What xfstests options are you using for this "quota enabled" run?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-06  4:16   ` Christoph Hellwig
@ 2018-12-06 16:32     ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-06 16:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Thu, Dec 06, 2018 at 05:16:06AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 05, 2018 at 05:05:50PM -0800, Darrick J. Wong wrote:
> > Hmm, so I tried running xfstests with quota enabled and -g all and
> > always_cow=1, and saw a bunch of test failures.  Are these expected?
> > A always_cow=0 run seems to have run fine.
> > 
> > (generic/050 might just be broken so don't spend too much energy on
> > that one)
> 
> What xfstests options are you using for this "quota enabled" run?

Nothing particularly special:

MKFS_OPTIONS='-f -m reflink=1,rmapbt=1 -i sparse=1 /dev/sdb'
MOUNT_OPTIONS='-o usrquota,grpquota,prjquota /dev/sdb /opt'

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-06  1:05 ` COW improvements and always_cow support V3 Darrick J. Wong
  2018-12-06  4:16   ` Christoph Hellwig
@ 2018-12-06 20:09   ` Christoph Hellwig
  2018-12-17 17:59     ` Darrick J. Wong
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-06 20:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

I see the 050 one, and it looks odd.

> --- /tmp/xfstests/tests/generic/075.out	2017-02-28 09:23:56.049065953 -0800
> +++ /var/tmp/xfstests//generic/075.out.bad	2018-12-03 22:11:12.392566688 -0800

This and a few other fsx tests assume you can always fallocate
on XFS.  I sent a series for this:

https://www.spinics.net/lists/linux-xfs/msg23433.html

But I need to rework some of the patches a little more based on the
review feedback.

Similarly the badly fragmented warnings are simply expected, that
is the nature of an always out of place write fs.  But the quota
run turned up a few more failures than I expected for the always_cow
mode, I can look into them.  But so far 95% of the always_cow bugs
were either issues with the text, or genuine XFS COW path bugs not
otherwise uncovered..

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-06 20:09   ` Christoph Hellwig
@ 2018-12-17 17:59     ` Darrick J. Wong
  2018-12-18 18:05       ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-17 17:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Thu, Dec 06, 2018 at 09:09:30PM +0100, Christoph Hellwig wrote:
> I see the 050 one, and it looks odd.
> 
> > --- /tmp/xfstests/tests/generic/075.out	2017-02-28 09:23:56.049065953 -0800
> > +++ /var/tmp/xfstests//generic/075.out.bad	2018-12-03 22:11:12.392566688 -0800
> 
> This and a few other fsx tests assume you can always fallocate
> on XFS.  I sent a series for this:
> 
> https://www.spinics.net/lists/linux-xfs/msg23433.html
> 
> But I need to rework some of the patches a little more based on the
> review feedback.

"the patches"... as in the fstests patches, or the always_cow series?

> Similarly the badly fragmented warnings are simply expected, that
> is the nature of an always out of place write fs.  But the quota
> run turned up a few more failures than I expected for the always_cow
> mode, I can look into them.  But so far 95% of the always_cow bugs
> were either issues with the text, or genuine XFS COW path bugs not
> otherwise uncovered..

<nod> generic/311 and generic/476 regularly trigger assertion warnings
about negative quota counts, which I have yet to diagnose.  FWIW
fstests' post-run quota checking never complains about problems, which
means we're probably just moving quota accounting in the wrong order or
something.

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-17 17:59     ` Darrick J. Wong
@ 2018-12-18 18:05       ` Christoph Hellwig
  2018-12-19  0:44         ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-18 18:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Mon, Dec 17, 2018 at 09:59:22AM -0800, Darrick J. Wong wrote:
> > This and a few other fsx tests assume you can always fallocate
> > on XFS.  I sent a series for this:
> > 
> > https://www.spinics.net/lists/linux-xfs/msg23433.html
> > 
> > But I need to rework some of the patches a little more based on the
> > review feedback.
> 
> "the patches"... as in the fstests patches, or the always_cow series?

The fstests patches.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints
  2018-12-03 22:24 ` [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints Christoph Hellwig
@ 2018-12-18 21:44   ` Darrick J. Wong
  2018-12-19 19:29     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 21:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:58PM -0500, Christoph Hellwig wrote:
> While using delalloc for extsize hints is generally a good idea, the
> current code that does so only for COW doesn't help us much and creates
> a lot of special cases.  Switch it to use real allocations like we
> do for direct I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c   | 28 +++++++++++++++++-----------
>  fs/xfs/xfs_reflink.c |  5 ++++-
>  fs/xfs/xfs_reflink.h |  5 ++---
>  3 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 9f1fd224bb06..d851abac16a9 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1039,22 +1039,28 @@ xfs_file_iomap_begin(
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_reflink_inode(ip)) {
> +		struct xfs_bmbt_irec	orig = imap;
> +
>  		/* if zeroing doesn't need COW allocation, then we are done. */
>  		if ((flags & IOMAP_ZERO) &&
>  		    !needs_cow_for_zeroing(&imap, nimaps))
>  			goto out_found;
>  
> -		if (flags & IOMAP_DIRECT) {
> -			/* may drop and re-acquire the ilock */
> -			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
> -					&lockmode);
> -			if (error)
> -				goto out_unlock;
> -		} else {
> -			error = xfs_reflink_reserve_cow(ip, &imap);
> -			if (error)
> -				goto out_unlock;
> -		}
> +		error = xfs_reflink_allocate_cow(ip, &imap, &shared, &lockmode,
> +						 flags);
> +		if (error)
> +			goto out_unlock;
> +
> +		/*
> +		 * For buffered writes we need to report the address of the
> +		 * previous block (if there was any) so that the higher level
> +		 * write code can perform read-modify-write operations.  For
> +		 * direct I/O code, which must be block aligned we need to
> +		 * report the newly allocated address.
> +		 */
> +		if (!(flags & IOMAP_DIRECT) &&
> +		    orig.br_startblock != HOLESTARTBLOCK)
> +			imap = orig;
>  
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index a8c32632090c..bdbaff1b3fb7 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -397,7 +397,8 @@ xfs_reflink_allocate_cow(
>  	struct xfs_inode	*ip,
>  	struct xfs_bmbt_irec	*imap,
>  	bool			*shared,
> -	uint			*lockmode)
> +	uint			*lockmode,
> +	unsigned		flags)

I'm not thrilled with passing iomap flags into the reflink code here...

>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		offset_fsb = imap->br_startoff;
> @@ -471,6 +472,8 @@ xfs_reflink_allocate_cow(
>  	if (nimaps == 0)
>  		return -ENOSPC;
>  convert:
> +	if (!(flags & IOMAP_DIRECT))

...because I feel that it's easy to miss the subtlety here that for
buffered writes we don't care if the cow extent is unwritten or written,
but for directio we very /much/ care that the cow extent is written,
because we're writing to it immediately.  Can this grow a comment to
reinforce why we skip the conversion?

Also, can we call this 'iomap_flags' to make it clearer which flags
we're talking about?

/*
 * COW fork extents are supposed to remain unwritten until we're ready
 * to initiate a disk write.  For directio we /are/ going to write the
 * data and need the conversion, but for buffered writes we're done.
 */
if (!(iomap_flags & IOMAP_DIRECT))
	return 0;
return xfs_reflink_convert_cow_extent(...);

--D

> +		return 0;
>  	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
>  
>  out_unreserve:
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 6d73daef1f13..d76fc520cac8 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -12,10 +12,9 @@ extern int xfs_reflink_find_shared(struct xfs_mount *mp, struct xfs_trans *tp,
>  extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *irec, bool *shared);
>  
> -extern int xfs_reflink_reserve_cow(struct xfs_inode *ip,
> -		struct xfs_bmbt_irec *imap);
>  extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
> -		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
> +		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
> +		unsigned flags);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
>  
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/11] xfs: remove xfs_trim_extent_eof
  2018-12-03 22:24 ` [PATCH 01/11] xfs: remove xfs_trim_extent_eof Christoph Hellwig
@ 2018-12-18 21:45   ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 21:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:53PM -0500, Christoph Hellwig wrote:
> Opencoding this function in the only caller makes it blindly obvious
> what is going on instead of having to look at two files for that.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 11 -----------
>  fs/xfs/libxfs/xfs_bmap.h |  1 -
>  fs/xfs/xfs_aops.c        |  2 +-
>  3 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 19e921d1586f..f16d42abc500 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3685,17 +3685,6 @@ xfs_trim_extent(
>  	}
>  }
>  
> -/* trim extent to within eof */
> -void
> -xfs_trim_extent_eof(
> -	struct xfs_bmbt_irec	*irec,
> -	struct xfs_inode	*ip)
> -
> -{
> -	xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount,
> -					      i_size_read(VFS_I(ip))));
> -}
> -
>  /*
>   * Trim the returned map to the required bounds
>   */
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 488dc8860fd7..f9a925caa70e 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -181,7 +181,6 @@ static inline bool xfs_bmap_is_real_extent(struct xfs_bmbt_irec *irec)
>  
>  void	xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno,
>  		xfs_filblks_t len);
> -void	xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *);
>  int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
>  int	xfs_bmap_set_attrforkoff(struct xfs_inode *ip, int size, int *version);
>  void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 338b9d9984e0..d7275075878e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -329,7 +329,7 @@ xfs_map_blocks(
>  	 * mechanism to protect us from arbitrary extent modifying contexts, not
>  	 * just eofblocks.
>  	 */
> -	xfs_trim_extent_eof(&wpc->imap, ip);
> +	xfs_trim_extent(&wpc->imap, 0, XFS_B_TO_FSB(mp, i_size_read(inode)));
>  
>  	/*
>  	 * COW fork blocks can overlap data fork blocks even if the blocks
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/11] xfs: remove the io_type field from the writeback context and ioend
  2018-12-03 22:24 ` [PATCH 02/11] xfs: remove the io_type field from the writeback context and ioend Christoph Hellwig
@ 2018-12-18 21:45   ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 21:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:54PM -0500, Christoph Hellwig wrote:
> The io_type field contains what is basically a summary of information
> from the inode fork and the imap.  But we can just as easily use that
> information directly, simplifying a few bits here and there and
> improving the trace points.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c    | 91 ++++++++++++++++++++------------------------
>  fs/xfs/xfs_aops.h    | 21 +---------
>  fs/xfs/xfs_iomap.c   |  8 ++--
>  fs/xfs/xfs_reflink.c |  2 +-
>  fs/xfs/xfs_trace.h   | 28 +++++++-------
>  5 files changed, 62 insertions(+), 88 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index d7275075878e..8fec6fd4c632 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -28,7 +28,7 @@
>   */
>  struct xfs_writepage_ctx {
>  	struct xfs_bmbt_irec    imap;
> -	unsigned int		io_type;
> +	int			fork;
>  	unsigned int		cow_seq;
>  	struct xfs_ioend	*ioend;
>  };
> @@ -255,30 +255,20 @@ xfs_end_io(
>  	 */
>  	error = blk_status_to_errno(ioend->io_bio->bi_status);
>  	if (unlikely(error)) {
> -		switch (ioend->io_type) {
> -		case XFS_IO_COW:
> +		if (ioend->io_fork == XFS_COW_FORK)
>  			xfs_reflink_cancel_cow_range(ip, offset, size, true);
> -			break;
> -		}
> -
>  		goto done;
>  	}
>  
>  	/*
> -	 * Success:  commit the COW or unwritten blocks if needed.
> +	 * Success: commit the COW or unwritten blocks if needed.
>  	 */
> -	switch (ioend->io_type) {
> -	case XFS_IO_COW:
> +	if (ioend->io_fork == XFS_COW_FORK)
>  		error = xfs_reflink_end_cow(ip, offset, size);
> -		break;
> -	case XFS_IO_UNWRITTEN:
> -		/* writeback should never update isize */
> +	else if (ioend->io_state == XFS_EXT_UNWRITTEN)
>  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> -		break;
> -	default:
> +	else
>  		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans);
> -		break;
> -	}
>  
>  done:
>  	if (ioend->io_append_trans)
> @@ -293,7 +283,8 @@ xfs_end_bio(
>  	struct xfs_ioend	*ioend = bio->bi_private;
>  	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
>  
> -	if (ioend->io_type == XFS_IO_UNWRITTEN || ioend->io_type == XFS_IO_COW)
> +	if (ioend->io_fork == XFS_COW_FORK ||
> +	    ioend->io_state == XFS_EXT_UNWRITTEN)
>  		queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
>  	else if (ioend->io_append_trans)
>  		queue_work(mp->m_data_workqueue, &ioend->io_work);
> @@ -313,7 +304,6 @@ xfs_map_blocks(
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
>  	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
>  	struct xfs_bmbt_irec	imap;
> -	int			whichfork = XFS_DATA_FORK;
>  	struct xfs_iext_cursor	icur;
>  	bool			imap_valid;
>  	int			error = 0;
> @@ -350,7 +340,7 @@ xfs_map_blocks(
>  		     offset_fsb < wpc->imap.br_startoff + wpc->imap.br_blockcount;
>  	if (imap_valid &&
>  	    (!xfs_inode_has_cow_data(ip) ||
> -	     wpc->io_type == XFS_IO_COW ||
> +	     wpc->fork == XFS_COW_FORK ||
>  	     wpc->cow_seq == READ_ONCE(ip->i_cowfp->if_seq)))
>  		return 0;
>  
> @@ -382,6 +372,9 @@ xfs_map_blocks(
>  	if (cow_fsb != NULLFILEOFF && cow_fsb <= offset_fsb) {
>  		wpc->cow_seq = READ_ONCE(ip->i_cowfp->if_seq);
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +
> +		wpc->fork = XFS_COW_FORK;
> +
>  		/*
>  		 * Truncate can race with writeback since writeback doesn't
>  		 * take the iolock and truncate decreases the file size before
> @@ -394,11 +387,13 @@ xfs_map_blocks(
>  		 * will kill the contents anyway.
>  		 */
>  		if (offset > i_size_read(inode)) {
> -			wpc->io_type = XFS_IO_HOLE;
> +			wpc->imap.br_blockcount = end_fsb - offset_fsb;
> +			wpc->imap.br_startoff = offset_fsb;
> +			wpc->imap.br_startblock = HOLESTARTBLOCK;
> +			wpc->imap.br_state = XFS_EXT_NORM;
>  			return 0;
>  		}
> -		whichfork = XFS_COW_FORK;
> -		wpc->io_type = XFS_IO_COW;
> +
>  		goto allocate_blocks;
>  	}
>  
> @@ -419,12 +414,14 @@ xfs_map_blocks(
>  		imap.br_startoff = end_fsb;	/* fake a hole past EOF */
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
> +	wpc->fork = XFS_DATA_FORK;
> +
>  	if (imap.br_startoff > offset_fsb) {
>  		/* landed in a hole or beyond EOF */
>  		imap.br_blockcount = imap.br_startoff - offset_fsb;
>  		imap.br_startoff = offset_fsb;
>  		imap.br_startblock = HOLESTARTBLOCK;
> -		wpc->io_type = XFS_IO_HOLE;
> +		imap.br_state = XFS_EXT_NORM;
>  	} else {
>  		/*
>  		 * Truncate to the next COW extent if there is one.  This is the
> @@ -436,30 +433,23 @@ xfs_map_blocks(
>  		    cow_fsb < imap.br_startoff + imap.br_blockcount)
>  			imap.br_blockcount = cow_fsb - imap.br_startoff;
>  
> -		if (isnullstartblock(imap.br_startblock)) {
> -			/* got a delalloc extent */
> -			wpc->io_type = XFS_IO_DELALLOC;
> +		/* got a delalloc extent? */
> +		if (isnullstartblock(imap.br_startblock))
>  			goto allocate_blocks;
> -		}
> -
> -		if (imap.br_state == XFS_EXT_UNWRITTEN)
> -			wpc->io_type = XFS_IO_UNWRITTEN;
> -		else
> -			wpc->io_type = XFS_IO_OVERWRITE;
>  	}
>  
>  	wpc->imap = imap;
> -	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
> +	trace_xfs_map_blocks_found(ip, offset, count, wpc->fork, &imap);
>  	return 0;
>  allocate_blocks:
> -	error = xfs_iomap_write_allocate(ip, whichfork, offset, &imap,
> +	error = xfs_iomap_write_allocate(ip, wpc->fork, offset, &imap,
>  			&wpc->cow_seq);
>  	if (error)
>  		return error;
> -	ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
> +	ASSERT(wpc->fork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
>  	       imap.br_startoff + imap.br_blockcount <= cow_fsb);
>  	wpc->imap = imap;
> -	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
> +	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->fork, &imap);
>  	return 0;
>  }
>  
> @@ -484,7 +474,7 @@ xfs_submit_ioend(
>  	int			status)
>  {
>  	/* Convert CoW extents to regular */
> -	if (!status && ioend->io_type == XFS_IO_COW) {
> +	if (!status && ioend->io_fork == XFS_COW_FORK) {
>  		/*
>  		 * Yuk. This can do memory allocation, but is not a
>  		 * transactional operation so everything is done in GFP_KERNEL
> @@ -502,7 +492,8 @@ xfs_submit_ioend(
>  
>  	/* Reserve log space if we might write beyond the on-disk inode size. */
>  	if (!status &&
> -	    ioend->io_type != XFS_IO_UNWRITTEN &&
> +	    (ioend->io_fork == XFS_COW_FORK ||
> +	     ioend->io_state != XFS_EXT_UNWRITTEN) &&
>  	    xfs_ioend_is_append(ioend) &&
>  	    !ioend->io_append_trans)
>  		status = xfs_setfilesize_trans_alloc(ioend);
> @@ -531,7 +522,8 @@ xfs_submit_ioend(
>  static struct xfs_ioend *
>  xfs_alloc_ioend(
>  	struct inode		*inode,
> -	unsigned int		type,
> +	int			fork,
> +	xfs_exntst_t		state,
>  	xfs_off_t		offset,
>  	struct block_device	*bdev,
>  	sector_t		sector)
> @@ -545,7 +537,8 @@ xfs_alloc_ioend(
>  
>  	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
>  	INIT_LIST_HEAD(&ioend->io_list);
> -	ioend->io_type = type;
> +	ioend->io_fork = fork;
> +	ioend->io_state = state;
>  	ioend->io_inode = inode;
>  	ioend->io_size = 0;
>  	ioend->io_offset = offset;
> @@ -606,13 +599,15 @@ xfs_add_to_ioend(
>  	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
>  		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
>  
> -	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> +	if (!wpc->ioend ||
> +	    wpc->fork != wpc->ioend->io_fork ||
> +	    wpc->imap.br_state != wpc->ioend->io_state ||
>  	    sector != bio_end_sector(wpc->ioend->io_bio) ||
>  	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
>  		if (wpc->ioend)
>  			list_add(&wpc->ioend->io_list, iolist);
> -		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
> -				bdev, sector);
> +		wpc->ioend = xfs_alloc_ioend(inode, wpc->fork,
> +				wpc->imap.br_state, offset, bdev, sector);
>  	}
>  
>  	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
> @@ -721,7 +716,7 @@ xfs_writepage_map(
>  		error = xfs_map_blocks(wpc, inode, file_offset);
>  		if (error)
>  			break;
> -		if (wpc->io_type == XFS_IO_HOLE)
> +		if (wpc->imap.br_startblock == HOLESTARTBLOCK)
>  			continue;
>  		xfs_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
>  				 &submit_list);
> @@ -916,9 +911,7 @@ xfs_vm_writepage(
>  	struct page		*page,
>  	struct writeback_control *wbc)
>  {
> -	struct xfs_writepage_ctx wpc = {
> -		.io_type = XFS_IO_HOLE,
> -	};
> +	struct xfs_writepage_ctx wpc = { };
>  	int			ret;
>  
>  	ret = xfs_do_writepage(page, wbc, &wpc);
> @@ -932,9 +925,7 @@ xfs_vm_writepages(
>  	struct address_space	*mapping,
>  	struct writeback_control *wbc)
>  {
> -	struct xfs_writepage_ctx wpc = {
> -		.io_type = XFS_IO_HOLE,
> -	};
> +	struct xfs_writepage_ctx wpc = { };
>  	int			ret;
>  
>  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 494b4338446e..6c2615b83c5d 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -8,30 +8,13 @@
>  
>  extern struct bio_set xfs_ioend_bioset;
>  
> -/*
> - * Types of I/O for bmap clustering and I/O completion tracking.
> - */
> -enum {
> -	XFS_IO_HOLE,		/* covers region without any block allocation */
> -	XFS_IO_DELALLOC,	/* covers delalloc region */
> -	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
> -	XFS_IO_OVERWRITE,	/* covers already allocated extent */
> -	XFS_IO_COW,		/* covers copy-on-write extent */
> -};
> -
> -#define XFS_IO_TYPES \
> -	{ XFS_IO_HOLE,			"hole" },	\
> -	{ XFS_IO_DELALLOC,		"delalloc" },	\
> -	{ XFS_IO_UNWRITTEN,		"unwritten" },	\
> -	{ XFS_IO_OVERWRITE,		"overwrite" },	\
> -	{ XFS_IO_COW,			"CoW" }
> -
>  /*
>   * Structure for buffered I/O completions.
>   */
>  struct xfs_ioend {
>  	struct list_head	io_list;	/* next ioend in chain */
> -	unsigned int		io_type;	/* delalloc / unwritten */
> +	int			io_fork;	/* inode fork written back */
> +	xfs_exntst_t		io_state;	/* extent state */
>  	struct inode		*io_inode;	/* file being written to */
>  	size_t			io_size;	/* size of the extent */
>  	xfs_off_t		io_offset;	/* offset in the file */
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 27c93b5f029d..32a7c169e096 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -575,7 +575,7 @@ xfs_file_iomap_begin_delay(
>  				goto out_unlock;
>  		}
>  
> -		trace_xfs_iomap_found(ip, offset, count, 0, &got);
> +		trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK, &got);
>  		goto done;
>  	}
>  
> @@ -647,7 +647,7 @@ xfs_file_iomap_begin_delay(
>  	 * them out if the write happens to fail.
>  	 */
>  	iomap->flags |= IOMAP_F_NEW;
> -	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
> +	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
>  done:
>  	if (isnullstartblock(got.br_startblock))
>  		got.br_startblock = DELAYSTARTBLOCK;
> @@ -1139,7 +1139,7 @@ xfs_file_iomap_begin(
>  		return error;
>  
>  	iomap->flags |= IOMAP_F_NEW;
> -	trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
> +	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
>  
>  out_finish:
>  	if (xfs_ipincount(ip) && (ip->i_itemp->ili_fsync_fields
> @@ -1155,7 +1155,7 @@ xfs_file_iomap_begin(
>  out_found:
>  	ASSERT(nimaps);
>  	xfs_iunlock(ip, lockmode);
> -	trace_xfs_iomap_found(ip, offset, length, 0, &imap);
> +	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
>  	goto out_finish;
>  
>  out_unlock:
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 322a852ce284..a8c32632090c 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1148,7 +1148,7 @@ xfs_reflink_remap_blocks(
>  			break;
>  		ASSERT(nimaps == 1);
>  
> -		trace_xfs_reflink_remap_imap(src, srcoff, len, XFS_IO_OVERWRITE,
> +		trace_xfs_reflink_remap_imap(src, srcoff, len, XFS_DATA_FORK,
>  				&imap);
>  
>  		/* Translate imap into the destination file. */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 8a6532aae779..870865913bd8 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1210,15 +1210,15 @@ DEFINE_READPAGE_EVENT(xfs_vm_readpages);
>  
>  DECLARE_EVENT_CLASS(xfs_imap_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
> -		 int type, struct xfs_bmbt_irec *irec),
> -	TP_ARGS(ip, offset, count, type, irec),
> +		 int whichfork, struct xfs_bmbt_irec *irec),
> +	TP_ARGS(ip, offset, count, whichfork, irec),
>  	TP_STRUCT__entry(
>  		__field(dev_t, dev)
>  		__field(xfs_ino_t, ino)
>  		__field(loff_t, size)
>  		__field(loff_t, offset)
>  		__field(size_t, count)
> -		__field(int, type)
> +		__field(int, whichfork)
>  		__field(xfs_fileoff_t, startoff)
>  		__field(xfs_fsblock_t, startblock)
>  		__field(xfs_filblks_t, blockcount)
> @@ -1229,33 +1229,33 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
>  		__entry->size = ip->i_d.di_size;
>  		__entry->offset = offset;
>  		__entry->count = count;
> -		__entry->type = type;
> +		__entry->whichfork = whichfork;
>  		__entry->startoff = irec ? irec->br_startoff : 0;
>  		__entry->startblock = irec ? irec->br_startblock : 0;
>  		__entry->blockcount = irec ? irec->br_blockcount : 0;
>  	),
>  	TP_printk("dev %d:%d ino 0x%llx size 0x%llx offset 0x%llx count %zd "
> -		  "type %s startoff 0x%llx startblock %lld blockcount 0x%llx",
> +		  "fork %s startoff 0x%llx startblock %lld blockcount 0x%llx",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  __entry->ino,
>  		  __entry->size,
>  		  __entry->offset,
>  		  __entry->count,
> -		  __print_symbolic(__entry->type, XFS_IO_TYPES),
> +		  __entry->whichfork == XFS_COW_FORK ? "cow" : "data",
>  		  __entry->startoff,
>  		  (int64_t)__entry->startblock,
>  		  __entry->blockcount)
>  )
>  
> -#define DEFINE_IOMAP_EVENT(name)	\
> +#define DEFINE_IMAP_EVENT(name)	\
>  DEFINE_EVENT(xfs_imap_class, name,	\
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,	\
> -		 int type, struct xfs_bmbt_irec *irec),		\
> -	TP_ARGS(ip, offset, count, type, irec))
> -DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
> -DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
> -DEFINE_IOMAP_EVENT(xfs_iomap_alloc);
> -DEFINE_IOMAP_EVENT(xfs_iomap_found);
> +		 int whichfork, struct xfs_bmbt_irec *irec),		\
> +	TP_ARGS(ip, offset, count, whichfork, irec))
> +DEFINE_IMAP_EVENT(xfs_map_blocks_found);
> +DEFINE_IMAP_EVENT(xfs_map_blocks_alloc);
> +DEFINE_IMAP_EVENT(xfs_iomap_alloc);
> +DEFINE_IMAP_EVENT(xfs_iomap_found);
>  
>  DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
> @@ -3055,7 +3055,7 @@ DEFINE_EVENT(xfs_inode_irec_class, name, \
>  DEFINE_INODE_EVENT(xfs_reflink_set_inode_flag);
>  DEFINE_INODE_EVENT(xfs_reflink_unset_inode_flag);
>  DEFINE_ITRUNC_EVENT(xfs_reflink_update_inode_size);
> -DEFINE_IOMAP_EVENT(xfs_reflink_remap_imap);
> +DEFINE_IMAP_EVENT(xfs_reflink_remap_imap);
>  TRACE_EVENT(xfs_reflink_remap_blocks_loop,
>  	TP_PROTO(struct xfs_inode *src, xfs_fileoff_t soffset,
>  		 xfs_filblks_t len, struct xfs_inode *dest,
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/11] xfs: make xfs_bmbt_to_iomap more useful
  2018-12-03 22:24 ` [PATCH 05/11] xfs: make xfs_bmbt_to_iomap more useful Christoph Hellwig
@ 2018-12-18 21:46   ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 21:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:57PM -0500, Christoph Hellwig wrote:
> Move checking for invalid zero blocks and setting of various iomap flags
> into this helper.  Also make it deal with "raw" delalloc extents to
> avoid clutter in the callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 84 +++++++++++++++++++++-------------------------
>  fs/xfs/xfs_iomap.h |  4 +--
>  fs/xfs/xfs_pnfs.c  |  2 +-
>  3 files changed, 41 insertions(+), 49 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6acfed2ae858..9f1fd224bb06 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -35,18 +35,40 @@
>  #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
>  						<< mp->m_writeio_log)
>  
> -void
> +static int
> +xfs_alert_fsblock_zero(
> +	xfs_inode_t	*ip,
> +	xfs_bmbt_irec_t	*imap)
> +{
> +	xfs_alert_tag(ip->i_mount, XFS_PTAG_FSBLOCK_ZERO,
> +			"Access to block zero in inode %llu "
> +			"start_block: %llx start_off: %llx "
> +			"blkcnt: %llx extent-state: %x",
> +		(unsigned long long)ip->i_ino,
> +		(unsigned long long)imap->br_startblock,
> +		(unsigned long long)imap->br_startoff,
> +		(unsigned long long)imap->br_blockcount,
> +		imap->br_state);
> +	return -EFSCORRUPTED;
> +}
> +
> +int
>  xfs_bmbt_to_iomap(
>  	struct xfs_inode	*ip,
>  	struct iomap		*iomap,
> -	struct xfs_bmbt_irec	*imap)
> +	struct xfs_bmbt_irec	*imap,
> +	bool			shared)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  
> +	if (unlikely(!imap->br_startblock && !XFS_IS_REALTIME_INODE(ip)))
> +		return xfs_alert_fsblock_zero(ip, imap);
> +
>  	if (imap->br_startblock == HOLESTARTBLOCK) {
>  		iomap->addr = IOMAP_NULL_ADDR;
>  		iomap->type = IOMAP_HOLE;
> -	} else if (imap->br_startblock == DELAYSTARTBLOCK) {
> +	} else if (imap->br_startblock == DELAYSTARTBLOCK ||
> +		   isnullstartblock(imap->br_startblock)) {
>  		iomap->addr = IOMAP_NULL_ADDR;
>  		iomap->type = IOMAP_DELALLOC;
>  	} else {
> @@ -60,6 +82,13 @@ xfs_bmbt_to_iomap(
>  	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
>  	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
>  	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
> +
> +	if (xfs_ipincount(ip) &&
> +	    (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> +		iomap->flags |= IOMAP_F_DIRTY;
> +	if (shared)
> +		iomap->flags |= IOMAP_F_SHARED;
> +	return 0;
>  }
>  
>  static void
> @@ -138,23 +167,6 @@ xfs_iomap_eof_align_last_fsb(
>  	return 0;
>  }
>  
> -STATIC int
> -xfs_alert_fsblock_zero(
> -	xfs_inode_t	*ip,
> -	xfs_bmbt_irec_t	*imap)
> -{
> -	xfs_alert_tag(ip->i_mount, XFS_PTAG_FSBLOCK_ZERO,
> -			"Access to block zero in inode %llu "
> -			"start_block: %llx start_off: %llx "
> -			"blkcnt: %llx extent-state: %x",
> -		(unsigned long long)ip->i_ino,
> -		(unsigned long long)imap->br_startblock,
> -		(unsigned long long)imap->br_startoff,
> -		(unsigned long long)imap->br_blockcount,
> -		imap->br_state);
> -	return -EFSCORRUPTED;
> -}
> -
>  int
>  xfs_iomap_write_direct(
>  	xfs_inode_t	*ip,
> @@ -649,17 +661,7 @@ xfs_file_iomap_begin_delay(
>  	iomap->flags |= IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
>  done:
> -	if (isnullstartblock(got.br_startblock))
> -		got.br_startblock = DELAYSTARTBLOCK;
> -
> -	if (!got.br_startblock) {
> -		error = xfs_alert_fsblock_zero(ip, &got);
> -		if (error)
> -			goto out_unlock;
> -	}
> -
> -	xfs_bmbt_to_iomap(ip, iomap, &got);
> -
> +	error = xfs_bmbt_to_iomap(ip, iomap, &got, false);
>  out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
> @@ -1097,15 +1099,7 @@ xfs_file_iomap_begin(
>  	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
>  
>  out_finish:
> -	if (xfs_ipincount(ip) && (ip->i_itemp->ili_fsync_fields
> -				& ~XFS_ILOG_TIMESTAMP))
> -		iomap->flags |= IOMAP_F_DIRTY;
> -
> -	xfs_bmbt_to_iomap(ip, iomap, &imap);
> -
> -	if (shared)
> -		iomap->flags |= IOMAP_F_SHARED;
> -	return 0;
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, shared);
>  
>  out_found:
>  	ASSERT(nimaps);
> @@ -1228,12 +1222,10 @@ xfs_xattr_iomap_begin(
>  out_unlock:
>  	xfs_iunlock(ip, lockmode);
>  
> -	if (!error) {
> -		ASSERT(nimaps);
> -		xfs_bmbt_to_iomap(ip, iomap, &imap);
> -	}
> -
> -	return error;
> +	if (error)
> +		return error;
> +	ASSERT(nimaps);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, false);
>  }
>  
>  const struct iomap_ops xfs_xattr_iomap_ops = {
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index c6170548831b..ed27e41b687c 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -17,8 +17,8 @@ int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t,
>  			struct xfs_bmbt_irec *, unsigned int *);
>  int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
>  
> -void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
> -		struct xfs_bmbt_irec *);
> +int xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
> +		struct xfs_bmbt_irec *, bool shared);
>  xfs_extlen_t xfs_eof_alignment(struct xfs_inode *ip, xfs_extlen_t extsize);
>  
>  static inline xfs_filblks_t
> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
> index f44c3599527d..bde2c9f56a46 100644
> --- a/fs/xfs/xfs_pnfs.c
> +++ b/fs/xfs/xfs_pnfs.c
> @@ -185,7 +185,7 @@ xfs_fs_map_blocks(
>  	}
>  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
>  
> -	xfs_bmbt_to_iomap(ip, iomap, &imap);
> +	error = xfs_bmbt_to_iomap(ip, iomap, &imap, false);
>  	*device_generation = mp->m_generation;
>  	return error;
>  out_unlock:
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust
  2018-12-03 22:25 ` [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust Christoph Hellwig
@ 2018-12-18 22:22   ` Darrick J. Wong
  2018-12-19 19:30     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 22:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:25:02PM -0500, Christoph Hellwig wrote:
> If we have racing buffered and direct I/O COW fork extents under
> writeback can have been moved to the data fork by the time we call
> xfs_reflink_convert_cow from xfs_submit_ioend.  This would be mostly
> harmless as the block numbers don't change by this move, except for
> the fact that xfs_bmapi_write will crash or trigger asserts when
> not finding existing extents, even despite trying to paper over this
> with the XFS_BMAPI_CONVERT_ONLY flag.
> 
> Instead of special casing non-transaction conversions in the already
> way too complicated xfs_bmapi_write just add a new helper for the much
> simpler non-transactional COW fork case, which simplify ignores not
> found extents.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 12 ++------
>  fs/xfs/libxfs/xfs_bmap.h |  8 +++---
>  fs/xfs/xfs_reflink.c     | 61 +++++++++++++++++++++++++---------------
>  3 files changed, 45 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 1992ed8a60b0..fbed7ed34a7f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -2029,7 +2029,7 @@ xfs_bmap_add_extent_delay_real(
>  /*
>   * Convert an unwritten allocation to a real allocation or vice versa.
>   */
> -STATIC int				/* error */
> +int					/* error */
>  xfs_bmap_add_extent_unwritten_real(
>  	struct xfs_trans	*tp,
>  	xfs_inode_t		*ip,	/* incore inode pointer */
> @@ -4236,9 +4236,7 @@ xfs_bmapi_write(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
> -	ASSERT(tp != NULL ||
> -	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
> -			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
> +	ASSERT(tp != NULL);
>  	ASSERT(len > 0);
>  	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> @@ -4316,9 +4314,6 @@ xfs_bmapi_write(
>  			 * locked and hence a truncate will block on them
>  			 * first.
>  			 */
> -			ASSERT(!((flags & XFS_BMAPI_CONVERT) &&
> -			         (flags & XFS_BMAPI_COWFORK)));
> -
>  			if (flags & XFS_BMAPI_DELALLOC) {
>  				if (eof || bno >= end)
>  					break;
> @@ -4333,8 +4328,7 @@ xfs_bmapi_write(
>  		 * First, deal with the hole before the allocated space
>  		 * that we found, if any.
>  		 */
> -		if ((need_alloc || wasdelay) &&
> -		    !(flags & XFS_BMAPI_CONVERT_ONLY)) {
> +		if (need_alloc || wasdelay) {
>  			bma.eof = eof;
>  			bma.conv = !!(flags & XFS_BMAPI_CONVERT);
>  			bma.wasdel = wasdelay;
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index f9a925caa70e..ee3848680684 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -98,9 +98,6 @@ struct xfs_extent_free_item
>  /* Only convert delalloc space, don't allocate entirely new extents */
>  #define XFS_BMAPI_DELALLOC	0x400
>  
> -/* Only convert unwritten extents, don't allocate new blocks */
> -#define XFS_BMAPI_CONVERT_ONLY	0x800
> -
>  /* Skip online discard of freed extents */
>  #define XFS_BMAPI_NODISCARD	0x1000
>  
> @@ -118,7 +115,6 @@ struct xfs_extent_free_item
>  	{ XFS_BMAPI_REMAP,	"REMAP" }, \
>  	{ XFS_BMAPI_COWFORK,	"COWFORK" }, \
>  	{ XFS_BMAPI_DELALLOC,	"DELALLOC" }, \
> -	{ XFS_BMAPI_CONVERT_ONLY, "CONVERT_ONLY" }, \
>  	{ XFS_BMAPI_NODISCARD,	"NODISCARD" }, \
>  	{ XFS_BMAPI_NORMAP,	"NORMAP" }
>  
> @@ -227,6 +223,10 @@ int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
>  		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
>  		struct xfs_bmbt_irec *got, struct xfs_iext_cursor *cur,
>  		int eof);
> +int	xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
> +		struct xfs_inode *ip, int whichfork,
> +		struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp,
> +		struct xfs_bmbt_irec *new, int *logflagsp);
>  
>  static inline void
>  xfs_bmap_add_free(
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index d59b556d42cb..0cf13cb1b2fe 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -234,26 +234,42 @@ xfs_reflink_trim_around_shared(
>  	}
>  }
>  
> -/* Convert part of an unwritten CoW extent to a real one. */
> -STATIC int
> -xfs_reflink_convert_cow_extent(
> -	struct xfs_inode		*ip,
> -	struct xfs_bmbt_irec		*imap,
> -	xfs_fileoff_t			offset_fsb,
> -	xfs_filblks_t			count_fsb)
> +static int
> +xfs_reflink_convert_cow_locked(
> +	struct xfs_inode	*ip,
> +	xfs_fileoff_t		offset_fsb,
> +	xfs_filblks_t		count_fsb)
>  {
> -	int				nimaps = 1;
> +	struct xfs_iext_cursor	icur;
> +	struct xfs_bmbt_irec	got;
> +	struct xfs_btree_cur	*dummy_cur = NULL;
> +	int			dummy_logflags;
> +	int			error;
>  
> -	if (imap->br_state == XFS_EXT_NORM)
> +	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got))
>  		return 0;
>  
> -	xfs_trim_extent(imap, offset_fsb, count_fsb);
> -	trace_xfs_reflink_convert_cow(ip, imap);
> -	if (imap->br_blockcount == 0)
> -		return 0;
> -	return xfs_bmapi_write(NULL, ip, imap->br_startoff, imap->br_blockcount,
> -			XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT, 0, imap,
> -			&nimaps);
> +	do {
> +		if (got.br_startoff >= offset_fsb + count_fsb)
> +			break;
> +		if (got.br_state == XFS_EXT_NORM)
> +			continue;
> +		if (WARN_ON_ONCE(isnullstartblock(got.br_startblock)))
> +			return -EIO;
> +
> +		xfs_trim_extent(&got, offset_fsb, count_fsb);
> +		if (!got.br_blockcount)
> +			continue;
> +
> +		got.br_state = XFS_EXT_NORM;
> +		error = xfs_bmap_add_extent_unwritten_real(NULL, ip,
> +				XFS_COW_FORK, &icur, &dummy_cur, &got,
> +				&dummy_logflags);
> +		if (error)
> +			return error;
> +	} while (xfs_iext_next_extent(ip->i_cowfp, &icur, &got));
> +
> +	return error;
>  }
>  
>  /* Convert all of the unwritten CoW extents in a file's range to real ones. */
> @@ -267,15 +283,12 @@ xfs_reflink_convert_cow(
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
>  	xfs_filblks_t		count_fsb = end_fsb - offset_fsb;
> -	struct xfs_bmbt_irec	imap;
> -	int			nimaps = 1, error = 0;
> +	int			error;
>  
>  	ASSERT(count != 0);
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	error = xfs_bmapi_write(NULL, ip, offset_fsb, count_fsb,
> -			XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT |
> -			XFS_BMAPI_CONVERT_ONLY, 0, &imap, &nimaps);
> +	error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);

At this point you might as well convert the one remaining caller of
xfs_reflink_convert_cow to take and drop the ILOCK around the
reflink_convert_cow call...

--D

>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
>  }
> @@ -405,9 +418,11 @@ xfs_reflink_allocate_cow(
>  	if (nimaps == 0)
>  		return -ENOSPC;
>  convert:
> -	if (!(flags & IOMAP_DIRECT))
> +	xfs_trim_extent(imap, offset_fsb, count_fsb);
> +	if (!(flags & IOMAP_DIRECT) || imap->br_state == XFS_EXT_NORM)
>  		return 0;
> -	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
> +	trace_xfs_reflink_convert_cow(ip, imap);
> +	return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
>  
>  out_unreserve:
>  	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/11] xfs: remove the s_maxbytes checks in xfs_map_blocks
  2018-12-03 22:24 ` [PATCH 03/11] xfs: remove the s_maxbytes checks in xfs_map_blocks Christoph Hellwig
@ 2018-12-18 22:31   ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 22:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:55PM -0500, Christoph Hellwig wrote:
> We already ensure all data fits into s_maxbytes in the write / fault
> path.  The only reason we have them here is that they were copy and
> pasted from xfs_bmapi_read when we stopped using that function.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8fec6fd4c632..5b6fab283316 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -301,7 +301,8 @@ xfs_map_blocks(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	ssize_t			count = i_blocksize(inode);
> -	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
>  	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
>  	struct xfs_bmbt_irec	imap;
>  	struct xfs_iext_cursor	icur;
> @@ -356,11 +357,6 @@ xfs_map_blocks(
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
> -	ASSERT(offset <= mp->m_super->s_maxbytes);
> -
> -	if (offset > mp->m_super->s_maxbytes - count)
> -		count = mp->m_super->s_maxbytes - offset;
> -	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>  
>  	/*
>  	 * Check if this is offset is covered by a COW extents, and if yes use
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/11] xfs: rework the truncate race handling in the writeback path
  2018-12-03 22:24 ` [PATCH 04/11] xfs: rework the truncate race handling in the writeback path Christoph Hellwig
@ 2018-12-18 23:03   ` Darrick J. Wong
  2018-12-19 19:32     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 23:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:56PM -0500, Christoph Hellwig wrote:
> We currently try to handle the races with truncate and COW to data fork
> conversion rather ad-hoc in a few places in the writeback path:
> 
>  - xfs_map_blocks contains an i_size check for the COW fork only, and
>    returns an imap containing a hole to make the writeback code skip
>    the rest of the page
>  - xfs_iomap_write_allocate does another i_size check under ilock, and
>    does an extent tree lookup to find the last extent to skip everthing
>    beyond that, returning -EAGAIN if either is invalid to make the
>    writeback code exit early
>  - xfs_bmapi_write can ignore holes for delalloc conversions, but only
>    does so if called for the COW fork
> 
> Replace this with a coherent scheme:
> 
>  - check i_size first in xfs_map_blocks, and skip any processing if we
>    already are beyond i_size by presenting a hole until the end of the
>    file to the caller
>  - in xfs_iomap_write_allocate check i_size again, and return -EAGAIN
>    if we are beyond it now that we've taken ilock.
>  - Skip holes for all delalloc conversion in xfs_bmapi_write instead
>    of doing a separate lookup before calling it
>  - in xfs_map_blocks retry the case where xfs_iomap_write_allocate
>    could not perform a conversion one single time if we were on a COW
>    fork to handle the race where an extent moved from the COW to the
>    data fork, and else return a hole to skip writeback as we must
>    have races with writeback
> 
> Overall this greatly simplifies the code, makes it more robust and also
> handles the COW to data fork race properly that we did not handle
> previosuly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c |  27 ++++-----
>  fs/xfs/xfs_aops.c        |  61 +++++++++++++-------
>  fs/xfs/xfs_iomap.c       | 121 ++++++++++++---------------------------
>  3 files changed, 87 insertions(+), 122 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index f16d42abc500..1992ed8a60b0 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4305,28 +4305,21 @@ xfs_bmapi_write(
>  		/* in hole or beyond EOF? */
>  		if (eof || bma.got.br_startoff > bno) {
>  			/*
> -			 * CoW fork conversions should /never/ hit EOF or
> -			 * holes.  There should always be something for us
> -			 * to work on.
> +			 * It is possible that the extents have changed since
> +			 * we did the read call as we dropped the ilock for a
> +			 * while.  We have to be careful about truncates or hole
> +			 * punchs here - we are not allowed to allocate
> +			 * non-delalloc blocks here.
> +			 *
> +			 * The only protection against truncation is the pages
> +			 * for the range we are being asked to convert are
> +			 * locked and hence a truncate will block on them
> +			 * first.
>  			 */
>  			ASSERT(!((flags & XFS_BMAPI_CONVERT) &&
>  			         (flags & XFS_BMAPI_COWFORK)));
>  
>  			if (flags & XFS_BMAPI_DELALLOC) {
> -				/*
> -				 * For the COW fork we can reasonably get a
> -				 * request for converting an extent that races
> -				 * with other threads already having converted
> -				 * part of it, as there converting COW to
> -				 * regular blocks is not protected using the
> -				 * IOLOCK.
> -				 */
> -				ASSERT(flags & XFS_BMAPI_COWFORK);
> -				if (!(flags & XFS_BMAPI_COWFORK)) {
> -					error = -EIO;
> -					goto error0;
> -				}
> -
>  				if (eof || bno >= end)
>  					break;
>  			} else {
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 5b6fab283316..124b8de37115 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -300,6 +300,7 @@ xfs_map_blocks(
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> +	loff_t			isize = i_size_read(inode);
>  	ssize_t			count = i_blocksize(inode);
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
> @@ -308,6 +309,15 @@ xfs_map_blocks(
>  	struct xfs_iext_cursor	icur;
>  	bool			imap_valid;
>  	int			error = 0;
> +	int			retries = 0;
> +
> +	/*
> +	 * If the offset is beyong the inode size we know that we raced with
> +	 * trunacte and are done now.  Note that we'll recheck this again

"truncate" ^^^^^^^^

> +	 * under the ilock later before doing delalloc conversions.
> +	 */
> +	if (offset > isize)
> +		goto eof;
>  
>  	/*
>  	 * We have to make sure the cached mapping is within EOF to protect
> @@ -320,7 +330,7 @@ xfs_map_blocks(
>  	 * mechanism to protect us from arbitrary extent modifying contexts, not
>  	 * just eofblocks.
>  	 */
> -	xfs_trim_extent(&wpc->imap, 0, XFS_B_TO_FSB(mp, i_size_read(inode)));
> +	xfs_trim_extent(&wpc->imap, 0, XFS_B_TO_FSB(mp, isize));
>  
>  	/*
>  	 * COW fork blocks can overlap data fork blocks even if the blocks
> @@ -354,6 +364,7 @@ xfs_map_blocks(
>  	 * into real extents.  If we return without a valid map, it means we
>  	 * landed in a hole and we skip the block.
>  	 */
> +retry:
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
> @@ -370,26 +381,6 @@ xfs_map_blocks(
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
>  		wpc->fork = XFS_COW_FORK;
> -
> -		/*
> -		 * Truncate can race with writeback since writeback doesn't
> -		 * take the iolock and truncate decreases the file size before
> -		 * it starts truncating the pages between new_size and old_size.
> -		 * Therefore, we can end up in the situation where writeback
> -		 * gets a CoW fork mapping but the truncate makes the mapping
> -		 * invalid and we end up in here trying to get a new mapping.
> -		 * bail out here so that we simply never get a valid mapping
> -		 * and so we drop the write altogether.  The page truncation
> -		 * will kill the contents anyway.
> -		 */
> -		if (offset > i_size_read(inode)) {
> -			wpc->imap.br_blockcount = end_fsb - offset_fsb;
> -			wpc->imap.br_startoff = offset_fsb;
> -			wpc->imap.br_startblock = HOLESTARTBLOCK;
> -			wpc->imap.br_state = XFS_EXT_NORM;
> -			return 0;
> -		}
> -
>  		goto allocate_blocks;
>  	}
>  
> @@ -440,13 +431,39 @@ xfs_map_blocks(
>  allocate_blocks:
>  	error = xfs_iomap_write_allocate(ip, wpc->fork, offset, &imap,
>  			&wpc->cow_seq);
> -	if (error)
> +	if (error) {
> +		if (error == -EAGAIN)
> +			goto truncate_race;
>  		return error;
> +	}
>  	ASSERT(wpc->fork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
>  	       imap.br_startoff + imap.br_blockcount <= cow_fsb);
>  	wpc->imap = imap;
>  	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->fork, &imap);
>  	return 0;
> +
> +truncate_race:
> +	/*
> +	 * If we failed to find the extent in the COW fork we might have raced
> +	 * with a COW to data fork conversion or truncate.  Restart the lookup
> +	 * to catch the extent in the data fork for the former case, but prevent
> +	 * additional retries to avoid looping forever for the latter case.
> +	 */
> +	if (wpc->fork == XFS_COW_FORK && !retries++) {
> +		imap_valid = false;
> +		goto retry;
> +	}
> +eof:
> +	/*
> +	 * If we raced with truncate there might be no data left at this offset.
> +	 * In that case we need to return a hole so that the writeback code
> +	 * skips writeback for the rest of the file.
> +	 */
> +	wpc->imap.br_startoff = offset_fsb;
> +	wpc->imap.br_blockcount = end_fsb - offset_fsb;
> +	wpc->imap.br_startblock = HOLESTARTBLOCK;
> +	wpc->imap.br_state = XFS_EXT_NORM;
> +	return 0;

This function has become rather spaghetti-like.  Any way we can clean
this up reasonably?

>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 32a7c169e096..6acfed2ae858 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -685,14 +685,13 @@ xfs_iomap_write_allocate(
>  {
>  	xfs_mount_t	*mp = ip->i_mount;
>  	struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
> -	xfs_fileoff_t	offset_fsb, last_block;
> +	xfs_fileoff_t	offset_fsb;
>  	xfs_fileoff_t	end_fsb, map_start_fsb;
>  	xfs_filblks_t	count_fsb;
>  	xfs_trans_t	*tp;
>  	int		nimaps;
>  	int		error = 0;
>  	int		flags = XFS_BMAPI_DELALLOC;
> -	int		nres;
>  
>  	if (whichfork == XFS_COW_FORK)
>  		flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
> @@ -712,95 +711,51 @@ xfs_iomap_write_allocate(
>  
>  	while (count_fsb != 0) {
>  		/*
> -		 * Set up a transaction with which to allocate the
> -		 * backing store for the file.  Do allocations in a
> -		 * loop until we get some space in the range we are
> -		 * interested in.  The other space that might be allocated
> -		 * is in the delayed allocation extent on which we sit
> -		 * but before our buffer starts.
> +		 * We have already reserved space for the extent and any
> +		 * indirect blocks when creating the delalloc extent, there
> +		 * is no need to reserve space in this transaction again.
>  		 */
> -		nimaps = 0;
> -		while (nimaps == 0) {

This removal of the nimaps == 0 loop bothers me: why is doing so safe?

I see that we can return from xfs_bmapi_write with nimaps == 0 if
something is trying to punch or truncate the range that we're writing
back, but it also seems to me that bmapi_write can return zero mappings
because xfs_bmapi_allocate() didn't find any blocks.  I /think/ that's
impossible because we're converting delalloc reservations and so we
should never run out of space, right?

Anyway, when _write_allocate gets zero mappings, it'll return -EAGAIN to
xfs_map_blocks, which will retry once to cover the case of racing with
cow -> data fork remapping but otherwise it won't bother?  And that's
why it's fine that only to loop once?

Am I reasoning this correctly?

--D

> -			nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
> -			/*
> -			 * We have already reserved space for the extent and any
> -			 * indirect blocks when creating the delalloc extent,
> -			 * there is no need to reserve space in this transaction
> -			 * again.
> -			 */
> -			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0,
> -					0, XFS_TRANS_RESERVE, &tp);
> -			if (error)
> -				return error;
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0,
> +				0, XFS_TRANS_RESERVE, &tp);
> +		if (error)
> +			return error;
>  
> -			xfs_ilock(ip, XFS_ILOCK_EXCL);
> -			xfs_trans_ijoin(tp, ip, 0);
> +		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  
> -			/*
> -			 * it is possible that the extents have changed since
> -			 * we did the read call as we dropped the ilock for a
> -			 * while. We have to be careful about truncates or hole
> -			 * punchs here - we are not allowed to allocate
> -			 * non-delalloc blocks here.
> -			 *
> -			 * The only protection against truncation is the pages
> -			 * for the range we are being asked to convert are
> -			 * locked and hence a truncate will block on them
> -			 * first.
> -			 *
> -			 * As a result, if we go beyond the range we really
> -			 * need and hit an delalloc extent boundary followed by
> -			 * a hole while we have excess blocks in the map, we
> -			 * will fill the hole incorrectly and overrun the
> -			 * transaction reservation.
> -			 *
> -			 * Using a single map prevents this as we are forced to
> -			 * check each map we look for overlap with the desired
> -			 * range and abort as soon as we find it. Also, given
> -			 * that we only return a single map, having one beyond
> -			 * what we can return is probably a bit silly.
> -			 *
> -			 * We also need to check that we don't go beyond EOF;
> -			 * this is a truncate optimisation as a truncate sets
> -			 * the new file size before block on the pages we
> -			 * currently have locked under writeback. Because they
> -			 * are about to be tossed, we don't need to write them
> -			 * back....
> -			 */
> -			nimaps = 1;
> -			end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> -			error = xfs_bmap_last_offset(ip, &last_block,
> -							XFS_DATA_FORK);
> -			if (error)
> +		/*
> +		 * We need to check that we don't go beyond EOF; this is a
> +		 * truncate optimisation as a truncate sets the new file size
> +		 * before block on the pages we currently have locked under
> +		 * writeback.  Because they are about to be tossed, we don't
> +		 * need to write them back....
> +		 */
> +		end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> +		if (map_start_fsb + count_fsb > end_fsb) {
> +			count_fsb = end_fsb - map_start_fsb;
> +			if (count_fsb == 0) {
> +				error = -EAGAIN;
>  				goto trans_cancel;
> -
> -			last_block = XFS_FILEOFF_MAX(last_block, end_fsb);
> -			if ((map_start_fsb + count_fsb) > last_block) {
> -				count_fsb = last_block - map_start_fsb;
> -				if (count_fsb == 0) {
> -					error = -EAGAIN;
> -					goto trans_cancel;
> -				}
>  			}
> +		}
>  
> -			/*
> -			 * From this point onwards we overwrite the imap
> -			 * pointer that the caller gave to us.
> -			 */
> -			error = xfs_bmapi_write(tp, ip, map_start_fsb,
> -						count_fsb, flags, nres, imap,
> -						&nimaps);
> -			if (error)
> -				goto trans_cancel;
> +		nimaps = 1;
> +		xfs_trans_ijoin(tp, ip, 0);
> +		error = xfs_bmapi_write(tp, ip, map_start_fsb, count_fsb, flags,
> +				XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK),
> +				imap, &nimaps);
> +		if (error)
> +			goto trans_cancel;
>  
> -			error = xfs_trans_commit(tp);
> -			if (error)
> -				goto error0;
> +		error = xfs_trans_commit(tp);
> +		if (error)
> +			goto error0;
>  
> -			if (whichfork == XFS_COW_FORK)
> -				*cow_seq = READ_ONCE(ifp->if_seq);
> -			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -		}
> +		if (whichfork == XFS_COW_FORK)
> +			*cow_seq = READ_ONCE(ifp->if_seq);
> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +		if (nimaps == 0)
> +			return -EAGAIN;
>  
>  		/*
>  		 * See if we were able to allocate an extent that
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-03 22:25 ` [PATCH 11/11] xfs: introduce an always_cow mode Christoph Hellwig
@ 2018-12-18 23:24   ` Darrick J. Wong
  2018-12-19 19:37     ` Christoph Hellwig
  2018-12-19 22:43     ` Dave Chinner
  0 siblings, 2 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 23:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:25:03PM -0500, Christoph Hellwig wrote:
> Add a mode where XFS never overwrites existing blocks in place.  This
> is to aid debugging our COW code, and also put infatructure in place
> for things like possible future support for zoned block devices, which
> can't support overwrites.
> 
> This mode is enabled globally by doing a:
> 
>     echo 1 > /sys/fs/xfs/debug/always_cow
> 
> Note that the parameter is global to allow running all tests in xfstests
> easily in this mode, which would not easily be possible with a per-fs
> sysfs file.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c    |  2 +-
>  fs/xfs/xfs_file.c    | 11 ++++++++++-
>  fs/xfs/xfs_iomap.c   | 28 ++++++++++++++++++----------
>  fs/xfs/xfs_reflink.c | 28 ++++++++++++++++++++++++----
>  fs/xfs/xfs_reflink.h | 13 +++++++++++++
>  fs/xfs/xfs_super.c   | 13 +++++++++----
>  fs/xfs/xfs_sysctl.h  |  1 +
>  fs/xfs/xfs_sysfs.c   | 24 ++++++++++++++++++++++++
>  8 files changed, 100 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 7d95a84064e7..a900924f16e1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -986,7 +986,7 @@ xfs_vm_bmap(
>  	 * Since we don't pass back blockdev info, we can't return bmap
>  	 * information for rt files either.
>  	 */
> -	if (xfs_is_reflink_inode(ip) || XFS_IS_REALTIME_INODE(ip))
> +	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
>  		return 0;
>  	return iomap_bmap(mapping, block, &xfs_iomap_ops);
>  }
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e47425071e65..8d2be043590a 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -507,7 +507,7 @@ xfs_file_dio_aio_write(
>  		 * We can't properly handle unaligned direct I/O to reflink
>  		 * files yet, as we can't unshare a partial block.
>  		 */
> -		if (xfs_is_reflink_inode(ip)) {
> +		if (xfs_is_cow_inode(ip)) {
>  			trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count);
>  			return -EREMCHG;
>  		}
> @@ -806,6 +806,15 @@ xfs_file_fallocate(
>  		return -EOPNOTSUPP;
>  
>  	xfs_ilock(ip, iolock);
> +	/*
> +	 * If always_cow mode we can't use preallocation and thus should not
> +	 * allow creating them.
> +	 */
> +	if (xfs_is_always_cow_inode(ip) && (mode & ~FALLOC_FL_KEEP_SIZE) == 0) {
> +		error = -EOPNOTSUPP;
> +		goto out_unlock;

I think this screws up both UNSHARE and ZERO_RANGE here -- if the first
mode is set, we'll cow the shared extents, but we'll also fill holes
with unwritten extents, which the comment implies isn't allowed.  In the
second set we'll punch the range but refill it with unwritten extents
that we'll never actually overwrite.

Granted, I'm still rather fuzzy on what exactly is supposed to happen
with preallocating fallocate when all writes require an allocation to
succeed?  btrfs fills holes with unwritten extents which the next write
will overwrite, but non-holes cow like normal.  That only makes sense if
you assume people only use fallocate to preallocate holes.  Maybe we
don't want to follow that route.  It's probably simpler not to support
creation of unwritten extents for always_cow files, in which case you'll
have to neuter UNSHARE too.

As for ZERO_RANGE, I think it's sufficient to punch the range, since we
COW even the unwritten extents (which makes allocating them pointless),
right?

What's the real goal here?  I assume you're targeting both O_ATOMIC in
addition to being able to use SMR drives as realtime devices, right?  It
would help to have a better idea of where we're going here before adding
anything user visible, even if it's just a debug knob for now.

> +	}
> +
>  	error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP);
>  	if (error)
>  		goto out_unlock;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index bbc5d2e06b06..244ea0007c09 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -395,12 +395,13 @@ xfs_quota_calc_throttle(
>  STATIC xfs_fsblock_t
>  xfs_iomap_prealloc_size(
>  	struct xfs_inode	*ip,
> +	int			whichfork,
>  	loff_t			offset,
>  	loff_t			count,
>  	struct xfs_iext_cursor	*icur)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> +	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	struct xfs_bmbt_irec	prev;
>  	int			shift = 0;
> @@ -593,7 +594,11 @@ xfs_file_iomap_begin_delay(
>  	 * themselves.  Second the lookup in the extent list is generally faster
>  	 * than going out to the shared extent tree.
>  	 */
> -	if (xfs_is_reflink_inode(ip)) {
> +	if (xfs_is_cow_inode(ip)) {
> +		if (!ip->i_cowfp) {
> +			ASSERT(!xfs_is_reflink_inode(ip));
> +			xfs_ifork_init_cow(ip);
> +		}
>  		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
>  				&ccur, &cmap);
>  		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
> @@ -609,7 +614,7 @@ xfs_file_iomap_begin_delay(
>  		 * overwriting shared extents.   This includes zeroing of
>  		 * existing extents that contain data.
>  		 */
> -		if (!xfs_is_reflink_inode(ip) ||
> +		if (!xfs_is_cow_inode(ip) ||
>  		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
>  			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
>  					&imap);
> @@ -619,7 +624,7 @@ xfs_file_iomap_begin_delay(
>  		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
>  
>  		/* Trim the mapping to the nearest shared extent boundary. */
> -		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> +		error = xfs_inode_need_cow(ip, &imap, &shared);
>  		if (error)
>  			goto out_unlock;
>  
> @@ -648,15 +653,18 @@ xfs_file_iomap_begin_delay(
>  		 */
>  		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
>  		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> +
> +		if (xfs_is_always_cow_inode(ip))
> +			whichfork = XFS_COW_FORK;
>  	}
>  
>  	error = xfs_qm_dqattach_locked(ip, false);
>  	if (error)
>  		goto out_unlock;
>  
> -	if (eof && whichfork == XFS_DATA_FORK) {
> -		prealloc_blocks = xfs_iomap_prealloc_size(ip, offset, count,
> -				&icur);
> +	if (eof) {
> +		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
> +				count, &icur);
>  		if (prealloc_blocks) {
>  			xfs_extlen_t	align;
>  			xfs_off_t	end_offset;
> @@ -987,7 +995,7 @@ xfs_ilock_for_iomap(
>  	 * COW writes may allocate delalloc space or convert unwritten COW
>  	 * extents, so we need to make sure to take the lock exclusively here.
>  	 */
> -	if (xfs_is_reflink_inode(ip) && is_write) {
> +	if (xfs_is_cow_inode(ip) && is_write) {
>  		/*
>  		 * FIXME: It could still overwrite on unshared extents and not
>  		 * need allocation.
> @@ -1021,7 +1029,7 @@ xfs_ilock_for_iomap(
>  	 * check, so if we got ILOCK_SHARED for a write and but we're now a
>  	 * reflink inode we have to switch to ILOCK_EXCL and relock.
>  	 */
> -	if (mode == XFS_ILOCK_SHARED && is_write && xfs_is_reflink_inode(ip)) {
> +	if (mode == XFS_ILOCK_SHARED && is_write && xfs_is_cow_inode(ip)) {
>  		xfs_iunlock(ip, mode);
>  		mode = XFS_ILOCK_EXCL;
>  		goto relock;
> @@ -1093,7 +1101,7 @@ xfs_file_iomap_begin(
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
>  	 */
> -	if (xfs_is_reflink_inode(ip)) {
> +	if (xfs_is_cow_inode(ip)) {
>  		struct xfs_bmbt_irec	orig = imap;
>  
>  		/* if zeroing doesn't need COW allocation, then we are done. */
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 0cf13cb1b2fe..1da46899c215 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -192,7 +192,7 @@ xfs_reflink_trim_around_shared(
>  	int			error = 0;
>  
>  	/* Holes, unwritten, and delalloc extents cannot be shared */
> -	if (!xfs_is_reflink_inode(ip) || !xfs_bmap_is_real_extent(irec)) {
> +	if (!xfs_is_cow_inode(ip) || !xfs_bmap_is_real_extent(irec)) {
>  		*shared = false;
>  		return 0;
>  	}
> @@ -234,6 +234,23 @@ xfs_reflink_trim_around_shared(
>  	}
>  }
>  
> +bool
> +xfs_inode_need_cow(
> +	struct xfs_inode	*ip,
> +	struct xfs_bmbt_irec	*imap,
> +	bool			*shared)
> +{
> +	/* We can't update any real extents in always COW mode. */
> +	if (xfs_is_always_cow_inode(ip) &&
> +	    !isnullstartblock(imap->br_startblock)) {
> +		*shared = true;
> +		return 0;
> +	}
> +
> +	/* Trim the mapping to the nearest shared extent boundary. */
> +	return xfs_reflink_trim_around_shared(ip, imap, shared);
> +}
> +
>  static int
>  xfs_reflink_convert_cow_locked(
>  	struct xfs_inode	*ip,
> @@ -321,7 +338,7 @@ xfs_find_trim_cow_extent(
>  	if (got.br_startoff > offset_fsb) {
>  		xfs_trim_extent(imap, imap->br_startoff,
>  				got.br_startoff - imap->br_startoff);
> -		return xfs_reflink_trim_around_shared(ip, imap, shared);
> +		return xfs_inode_need_cow(ip, imap, shared);
>  	}
>  
>  	*shared = true;
> @@ -356,7 +373,10 @@ xfs_reflink_allocate_cow(
>  	xfs_extlen_t		resblks = 0;
>  
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> -	ASSERT(xfs_is_reflink_inode(ip));
> +	if (!ip->i_cowfp) {
> +		ASSERT(!xfs_is_reflink_inode(ip));
> +		xfs_ifork_init_cow(ip);
> +	}
>  
>  	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
>  	if (error || !*shared)
> @@ -537,7 +557,7 @@ xfs_reflink_cancel_cow_range(
>  	int			error;
>  
>  	trace_xfs_reflink_cancel_cow_range(ip, offset, count);
> -	ASSERT(xfs_is_reflink_inode(ip));
> +	ASSERT(ip->i_cowfp);
>  
>  	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
>  	if (count == NULLFILEOFF)
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index d76fc520cac8..f6505ae37626 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -6,11 +6,24 @@
>  #ifndef __XFS_REFLINK_H
>  #define __XFS_REFLINK_H 1
>  
> +static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
> +{
> +	return xfs_globals.always_cow &&
> +		xfs_sb_version_hasreflink(&ip->i_mount->m_sb);
> +}
> +
> +static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
> +{
> +	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
> +}
> +
>  extern int xfs_reflink_find_shared(struct xfs_mount *mp, struct xfs_trans *tp,
>  		xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t aglen,
>  		xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_maximal);
>  extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *irec, bool *shared);
> +bool xfs_inode_need_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
> +		bool *shared);
>  
>  extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d3e6cd063688..f4d34749505e 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1728,11 +1728,16 @@ xfs_fs_fill_super(
>  		}
>  	}
>  
> -	if (xfs_sb_version_hasreflink(&mp->m_sb) && mp->m_sb.sb_rblocks) {
> -		xfs_alert(mp,
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		if (mp->m_sb.sb_rblocks) {
> +			xfs_alert(mp,
>  	"reflink not compatible with realtime device!");
> -		error = -EINVAL;
> -		goto out_filestream_unmount;
> +			error = -EINVAL;
> +			goto out_filestream_unmount;
> +		}
> +
> +		if (xfs_globals.always_cow)
> +			xfs_info(mp, "using DEBUG-only always_cow mode.");

How does xfs handle the situation where always_cow mode comes on after
you've already opened a file and begun writing to it?  I assume we
allocate a new cow fork for files that need it and all writes after the
switch flips will be COW?

--D

>  	}
>  
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb) && mp->m_sb.sb_rblocks) {
> diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
> index 168488130a19..ad7f9be13087 100644
> --- a/fs/xfs/xfs_sysctl.h
> +++ b/fs/xfs/xfs_sysctl.h
> @@ -85,6 +85,7 @@ struct xfs_globals {
>  	int	log_recovery_delay;	/* log recovery delay (secs) */
>  	int	mount_delay;		/* mount setup delay (secs) */
>  	bool	bug_on_assert;		/* BUG() the kernel on assert failure */
> +	bool	always_cow;		/* use COW fork for all overwrites */
>  };
>  extern struct xfs_globals	xfs_globals;
>  
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index cd6a994a7250..cabda13f3c64 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -183,10 +183,34 @@ mount_delay_show(
>  }
>  XFS_SYSFS_ATTR_RW(mount_delay);
>  
> +static ssize_t
> +always_cow_store(
> +	struct kobject	*kobject,
> +	const char	*buf,
> +	size_t		count)
> +{
> +	ssize_t		ret;
> +
> +	ret = kstrtobool(buf, &xfs_globals.always_cow);
> +	if (ret < 0)
> +		return ret;
> +	return count;
> +}
> +
> +static ssize_t
> +always_cow_show(
> +	struct kobject	*kobject,
> +	char		*buf)
> +{
> +	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.always_cow);
> +}
> +XFS_SYSFS_ATTR_RW(always_cow);
> +
>  static struct attribute *xfs_dbg_attrs[] = {
>  	ATTR_LIST(bug_on_assert),
>  	ATTR_LIST(log_recovery_delay),
>  	ATTR_LIST(mount_delay),
> +	ATTR_LIST(always_cow),
>  	NULL,
>  };
>  
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay
  2018-12-03 22:25 ` [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-12-18 23:36   ` Darrick J. Wong
  2018-12-19 19:38     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 23:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:25:00PM -0500, Christoph Hellwig wrote:
> Besides simplifying the code a bit this allows to actually implement
> the behavior of using COW preallocation for non-COW data mentioned
> in the current comments.
> 
> Note that this breaks the current version of xfs/420, but that is
> because the test is broken.  A separate fix will be sent for it.

(Still waiting for this...)

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c   | 132 ++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_reflink.c |  67 ----------------------
>  fs/xfs/xfs_trace.h   |   1 -
>  3 files changed, 93 insertions(+), 107 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d851abac16a9..d19f99e5476a 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -534,15 +534,16 @@ xfs_file_iomap_begin_delay(
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	xfs_fileoff_t		maxbytes_fsb =
>  		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
>  	xfs_fileoff_t		end_fsb;
> -	int			error = 0, eof = 0;
> -	struct xfs_bmbt_irec	got;
> -	struct xfs_iext_cursor	icur;
> +	struct xfs_bmbt_irec	imap, cmap;
> +	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
> +	bool			eof = false, cow_eof = false, shared;
> +	int			whichfork = XFS_DATA_FORK;
> +	int			error = 0;
>  
>  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
>  	ASSERT(!xfs_get_extsz_hint(ip));
> @@ -560,7 +561,7 @@ xfs_file_iomap_begin_delay(
>  
>  	XFS_STATS_INC(mp, xs_blk_mapw);
>  
> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> +	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
>  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
>  		if (error)
>  			goto out_unlock;
> @@ -568,51 +569,92 @@ xfs_file_iomap_begin_delay(
>  
>  	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
>  
> -	eof = !xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got);
> +	/*
> +	 * Search the data fork fork first to look up our source mapping.  We
> +	 * always need the data fork map, as we have to return it to the
> +	 * iomap code so that the higher level write code can read data in to
> +	 * perform read-modify-write cycles for unaligned writes.
> +	 */
> +	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
>  	if (eof)
> -		got.br_startoff = end_fsb; /* fake hole until the end */
> +		imap.br_startoff = end_fsb; /* fake hole until the end */
>  
> -	if (got.br_startoff <= offset_fsb) {
> +	/* We never need to allocate blocks for zeroing a hole. */
> +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * Search the COW fork extent list even if we did not find a data fork
> +	 * extent.  This serves two purposes: first this implements the
> +	 * speculative preallocation using cowextisze, so that we also unshare

"cowextsize"

> +	 * block adjacent to shared blocks instead of just the shared blocks
> +	 * themselves.  Second the lookup in the extent list is generally faster
> +	 * than going out to the shared extent tree.
> +	 */
> +	if (xfs_is_reflink_inode(ip)) {
> +		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
> +				&ccur, &cmap);
> +		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
> +			trace_xfs_reflink_cow_found(ip, &cmap);
> +			whichfork = XFS_COW_FORK;
> +			goto done;
> +		}
> +	}
> +
> +	if (imap.br_startoff <= offset_fsb) {
>  		/*
>  		 * For reflink files we may need a delalloc reservation when
>  		 * overwriting shared extents.   This includes zeroing of
>  		 * existing extents that contain data.
>  		 */
> -		if (xfs_is_reflink_inode(ip) &&
> -		    ((flags & IOMAP_WRITE) ||
> -		     got.br_state != XFS_EXT_UNWRITTEN)) {
> -			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
> -			error = xfs_reflink_reserve_cow(ip, &got);
> -			if (error)
> -				goto out_unlock;
> +		if (!xfs_is_reflink_inode(ip) ||
> +		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
> +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> +					&imap);
> +			goto done;
>  		}
>  
> -		trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK, &got);
> -		goto done;
> -	}
> +		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
>  
> -	if (flags & IOMAP_ZERO) {
> -		xfs_hole_to_iomap(ip, iomap, offset_fsb, got.br_startoff);
> -		goto out_unlock;
> +		/* Trim the mapping to the nearest shared extent boundary. */
> +		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> +		if (error)
> +			goto out_unlock;
> +
> +		/* Not shared?  Just report the (potentially capped) extent. */
> +		if (!shared) {
> +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> +					&imap);
> +			goto done;
> +		}
> +
> +		/*
> +		 * Fork all the shared blocks from our write offset until the
> +		 * end of the extent.
> +		 */
> +		whichfork = XFS_COW_FORK;
> +		end_fsb = imap.br_startoff + imap.br_blockcount;
> +	} else {
> +		/*
> +		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
> +		 * pages to keep the chunks of work done where somewhat
> +		 * symmetric with the work writeback does.  This is a completely
> +		 * arbitrary number pulled out of thin air.
> +		 *
> +		 * Note that the values needs to be less than 32-bits wide until
> +		 * the lower level functions are updated.
> +		 */
> +		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> +		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
>  	}
>  
>  	error = xfs_qm_dqattach_locked(ip, false);
>  	if (error)
>  		goto out_unlock;
>  
> -	/*
> -	 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES pages
> -	 * to keep the chunks of work done where somewhat symmetric with the
> -	 * work writeback does. This is a completely arbitrary number pulled
> -	 * out of thin air as a best guess for initial testing.
> -	 *
> -	 * Note that the values needs to be less than 32-bits wide until
> -	 * the lower level functions are updated.
> -	 */
> -	count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> -	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> -
> -	if (eof) {
> +	if (eof && whichfork == XFS_DATA_FORK) {
>  		prealloc_blocks = xfs_iomap_prealloc_size(ip, offset, count,
>  				&icur);
>  		if (prealloc_blocks) {
> @@ -635,9 +677,11 @@ xfs_file_iomap_begin_delay(
>  	}
>  
>  retry:
> -	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
> -			end_fsb - offset_fsb, prealloc_blocks, &got, &icur,
> -			eof);
> +	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> +			end_fsb - offset_fsb, prealloc_blocks,
> +			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> +			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> +			whichfork == XFS_DATA_FORK ? eof : cow_eof);
>  	switch (error) {
>  	case 0:
>  		break;
> @@ -659,9 +703,19 @@ xfs_file_iomap_begin_delay(
>  	 * them out if the write happens to fail.
>  	 */
>  	iomap->flags |= IOMAP_F_NEW;
> -	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
> +	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);

I'm confused by this, if whichfork == COW then won't ftrace report
results from the wrong fork?

--D

>  done:
> -	error = xfs_bmbt_to_iomap(ip, iomap, &got, false);
> +	if (whichfork == XFS_COW_FORK) {
> +		if (imap.br_startoff > offset_fsb) {
> +			xfs_trim_extent(&cmap, offset_fsb,
> +					imap.br_startoff - offset_fsb);
> +			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, false);
> +			goto out_unlock;
> +		}
> +		/* ensure we only report blocks we have a reservation for */
> +		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
> +	}
> +	error = xfs_bmbt_to_iomap(ip, iomap, &imap, false);
>  out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index bdbaff1b3fb7..d59b556d42cb 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -234,73 +234,6 @@ xfs_reflink_trim_around_shared(
>  	}
>  }
>  
> -/*
> - * Trim the passed in imap to the next shared/unshared extent boundary, and
> - * if imap->br_startoff points to a shared extent reserve space for it in the
> - * COW fork.
> - *
> - * Note that imap will always contain the block numbers for the existing blocks
> - * in the data fork, as the upper layers need them for read-modify-write
> - * operations.
> - */
> -int
> -xfs_reflink_reserve_cow(
> -	struct xfs_inode	*ip,
> -	struct xfs_bmbt_irec	*imap)
> -{
> -	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> -	struct xfs_bmbt_irec	got;
> -	int			error = 0;
> -	bool			eof = false;
> -	struct xfs_iext_cursor	icur;
> -	bool			shared;
> -
> -	/*
> -	 * Search the COW fork extent list first.  This serves two purposes:
> -	 * first this implement the speculative preallocation using cowextisze,
> -	 * so that we also unshared block adjacent to shared blocks instead
> -	 * of just the shared blocks themselves.  Second the lookup in the
> -	 * extent list is generally faster than going out to the shared extent
> -	 * tree.
> -	 */
> -
> -	if (!xfs_iext_lookup_extent(ip, ifp, imap->br_startoff, &icur, &got))
> -		eof = true;
> -	if (!eof && got.br_startoff <= imap->br_startoff) {
> -		trace_xfs_reflink_cow_found(ip, imap);
> -		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> -		return 0;
> -	}
> -
> -	/* Trim the mapping to the nearest shared extent boundary. */
> -	error = xfs_reflink_trim_around_shared(ip, imap, &shared);
> -	if (error)
> -		return error;
> -
> -	/* Not shared?  Just report the (potentially capped) extent. */
> -	if (!shared)
> -		return 0;
> -
> -	/*
> -	 * Fork all the shared blocks from our write offset until the end of
> -	 * the extent.
> -	 */
> -	error = xfs_qm_dqattach_locked(ip, false);
> -	if (error)
> -		return error;
> -
> -	error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
> -			imap->br_blockcount, 0, &got, &icur, eof);
> -	if (error == -ENOSPC || error == -EDQUOT)
> -		trace_xfs_reflink_cow_enospc(ip, imap);
> -	if (error)
> -		return error;
> -
> -	xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> -	trace_xfs_reflink_cow_alloc(ip, &got);
> -	return 0;
> -}
> -
>  /* Convert part of an unwritten CoW extent to a real one. */
>  STATIC int
>  xfs_reflink_convert_cow_extent(
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 870865913bd8..36e74fc90700 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3179,7 +3179,6 @@ DEFINE_INODE_ERROR_EVENT(xfs_reflink_unshare_error);
>  
>  /* copy on write */
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_around_shared);
> -DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_alloc);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_found);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_enospc);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay
  2018-12-03 22:25 ` [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-12-18 23:38   ` Darrick J. Wong
  2018-12-19 19:39     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 23:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:25:01PM -0500, Christoph Hellwig wrote:
> No user of it in the iomap code at the moment, but we should not
> actively report wrong information if we can trivially get it right.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d19f99e5476a..bbc5d2e06b06 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -541,7 +541,7 @@ xfs_file_iomap_begin_delay(
>  	struct xfs_bmbt_irec	imap, cmap;
>  	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
> -	bool			eof = false, cow_eof = false, shared;
> +	bool			eof = false, cow_eof = false, shared = false;
>  	int			whichfork = XFS_DATA_FORK;
>  	int			error = 0;
>  
> @@ -709,13 +709,14 @@ xfs_file_iomap_begin_delay(
>  		if (imap.br_startoff > offset_fsb) {
>  			xfs_trim_extent(&cmap, offset_fsb,
>  					imap.br_startoff - offset_fsb);
> -			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, false);
> +			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, true);

Does this belong in the previous patch?  (Maybe all of it?)

--D

>  			goto out_unlock;
>  		}
>  		/* ensure we only report blocks we have a reservation for */
>  		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
> +		shared = true;
>  	}
> -	error = xfs_bmbt_to_iomap(ip, iomap, &imap, false);
> +	error = xfs_bmbt_to_iomap(ip, iomap, &imap, shared);
>  out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/11] xfs: also truncate holes covered by COW blocks
  2018-12-03 22:24 ` [PATCH 07/11] xfs: also truncate holes covered by COW blocks Christoph Hellwig
@ 2018-12-18 23:39   ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-18 23:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Dec 03, 2018 at 05:24:59PM -0500, Christoph Hellwig wrote:
> This only matters if we want to write data through the COW fork that is
> not actually an overwrite of existing data.  Reasons for that are
> speculative COW fork allocations using the cowextsize, or a mode where
> we always write through the COW fork.  Currently both can't actually
> happen, but I plan to enable them.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 124b8de37115..7d95a84064e7 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -403,28 +403,29 @@ xfs_map_blocks(
>  
>  	wpc->fork = XFS_DATA_FORK;
>  
> +	/* landed in a hole or beyond EOF? */
>  	if (imap.br_startoff > offset_fsb) {
> -		/* landed in a hole or beyond EOF */
>  		imap.br_blockcount = imap.br_startoff - offset_fsb;
>  		imap.br_startoff = offset_fsb;
>  		imap.br_startblock = HOLESTARTBLOCK;
>  		imap.br_state = XFS_EXT_NORM;
> -	} else {
> -		/*
> -		 * Truncate to the next COW extent if there is one.  This is the
> -		 * only opportunity to do this because we can skip COW fork
> -		 * lookups for the subsequent blocks in the mapping; however,
> -		 * the requirement to treat the COW range separately remains.
> -		 */
> -		if (cow_fsb != NULLFILEOFF &&
> -		    cow_fsb < imap.br_startoff + imap.br_blockcount)
> -			imap.br_blockcount = cow_fsb - imap.br_startoff;
> -
> -		/* got a delalloc extent? */
> -		if (isnullstartblock(imap.br_startblock))
> -			goto allocate_blocks;
>  	}
>  
> +	/*
> +	 * Truncate to the next COW extent if there is one.  This is the only
> +	 * opportunity to do this because we can skip COW fork lookups for the
> +	 * subsequent blocks in the mapping; however, the requirement to treat
> +	 * the COW range separately remains.
> +	 */
> +	if (cow_fsb != NULLFILEOFF &&
> +	    cow_fsb < imap.br_startoff + imap.br_blockcount)
> +		imap.br_blockcount = cow_fsb - imap.br_startoff;
> +
> +	/* got a delalloc extent? */
> +	if (imap.br_startblock != HOLESTARTBLOCK &&
> +	    isnullstartblock(imap.br_startblock))
> +		goto allocate_blocks;
> +
>  	wpc->imap = imap;
>  	trace_xfs_map_blocks_found(ip, offset, count, wpc->fork, &imap);
>  	return 0;
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-18 18:05       ` Christoph Hellwig
@ 2018-12-19  0:44         ` Darrick J. Wong
  2018-12-20  7:09           ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-19  0:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Dec 18, 2018 at 07:05:51PM +0100, Christoph Hellwig wrote:
> On Mon, Dec 17, 2018 at 09:59:22AM -0800, Darrick J. Wong wrote:
> > > This and a few other fsx tests assume you can always fallocate
> > > on XFS.  I sent a series for this:
> > > 
> > > https://www.spinics.net/lists/linux-xfs/msg23433.html
> > > 
> > > But I need to rework some of the patches a little more based on the
> > > review feedback.
> > 
> > "the patches"... as in the fstests patches, or the always_cow series?
> 
> The fstests patches.

FWIW one of my test vms seems to have hung in generic/323 with the xfs
for-next and your patches applied:

MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, /dev/sdf'
MOUNT_OPTIONS='/dev/sdf /opt'

[ 7496.941223] run fstests generic/323 at 2018-12-18 15:11:28
[ 7497.423929] XFS (sda): Mounting V5 Filesystem
[ 7497.440459] XFS (sda): Ending clean mount
[ 7679.154591] INFO: task aio-last-ref-he:13058 blocked for more than 60 seconds.
[ 7679.157665]       Not tainted 4.20.0-rc6-djw #rc6
[ 7679.159654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7679.161479] aio-last-ref-he D13376 13058  12660 0x00000004
[ 7679.162846] Call Trace:
[ 7679.214222]  ? __schedule+0x420/0xb40
[ 7679.215268]  schedule+0x40/0x90
[ 7679.216809]  io_schedule+0x16/0x40
[ 7679.218164]  iomap_dio_rw+0x361/0x410
[ 7679.219673]  ? xfs_file_dio_aio_read+0x81/0x180 [xfs]
[ 7679.220986]  xfs_file_dio_aio_read+0x81/0x180 [xfs]
[ 7679.222252]  xfs_file_read_iter+0xba/0xd0 [xfs]
[ 7679.225448]  aio_read+0x16f/0x1d0
[ 7679.244440]  ? kvm_clock_read+0x14/0x30
[ 7679.245592]  ? kvm_sched_clock_read+0x5/0x10
[ 7679.247010]  ? io_submit_one+0x711/0x9b0
[ 7679.248074]  io_submit_one+0x711/0x9b0
[ 7679.249074]  ? __x64_sys_io_submit+0xa7/0x260
[ 7679.250294]  __x64_sys_io_submit+0xa7/0x260
[ 7679.258402]  ? do_syscall_64+0x50/0x170
[ 7679.259631]  ? __ia32_compat_sys_io_submit+0x250/0x250
[ 7679.260840]  do_syscall_64+0x50/0x170
[ 7679.261825]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 7679.263510] RIP: 0033:0x7fdcfeb0c697
[ 7679.265177] Code: Bad RIP value.
[ 7679.266743] RSP: 002b:00007fdc58bd0888 EFLAGS: 00000206 ORIG_RAX: 00000000000000d1
[ 7679.270014] RAX: ffffffffffffffda RBX: 00007fdc58bd0de0 RCX: 00007fdcfeb0c697
[ 7679.273167] RDX: 00007fdc58bd0920 RSI: 0000000000000001 RDI: 00007fdcfef2e000
[ 7679.276350] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000000
[ 7679.279523] R10: 00007fdc58bd0920 R11: 0000000000000206 R12: 000000000000000a
[ 7679.282657] R13: 00007fdc58bd0f20 R14: 00000000000b0000 R15: 0000000000000001
[ 7679.289113] 
               Showing all locks held in the system:
[ 7679.291907] 1 lock held by khungtaskd/34:
[ 7679.293705]  #0: 00000000e7a0f77e (rcu_read_lock){....}, at: debug_show_all_locks+0xe/0x190
[ 7679.297938] 1 lock held by in:imklog/920:
[ 7679.299796] 2 locks held by bash/1021:
[ 7679.301357]  #0: 00000000e04cb661 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x24/0x50
[ 7679.305540]  #1: 00000000a0b743c3 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xdb/0x950
[ 7679.309347] 1 lock held by aio-last-ref-he/13058:
[ 7679.311474]  #0: 00000000db76f3cc (&inode->i_rwsem){++++}, at: xfs_ilock+0x279/0x2e0 [xfs]

[ 7679.315735] =============================================

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints
  2018-12-18 21:44   ` Darrick J. Wong
@ 2018-12-19 19:29     ` Christoph Hellwig
  2018-12-19 19:32       ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-19 19:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 01:44:25PM -0800, Darrick J. Wong wrote:
> > +	uint			*lockmode,
> > +	unsigned		flags)
> 
> I'm not thrilled with passing iomap flags into the reflink code here...

So what is the alternative?  A non-descriptive bool argument?

> ...because I feel that it's easy to miss the subtlety here that for
> buffered writes we don't care if the cow extent is unwritten or written,
> but for directio we very /much/ care that the cow extent is written,
> because we're writing to it immediately.  Can this grow a comment to
> reinforce why we skip the conversion?

Sure.

> Also, can we call this 'iomap_flags' to make it clearer which flags
> we're talking about?

Sure.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust
  2018-12-18 22:22   ` Darrick J. Wong
@ 2018-12-19 19:30     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-19 19:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

> >  	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
> >  	xfs_filblks_t		count_fsb = end_fsb - offset_fsb;
> > -	struct xfs_bmbt_irec	imap;
> > -	int			nimaps = 1, error = 0;
> > +	int			error;
> >  
> >  	ASSERT(count != 0);
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > -	error = xfs_bmapi_write(NULL, ip, offset_fsb, count_fsb,
> > -			XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT |
> > -			XFS_BMAPI_CONVERT_ONLY, 0, &imap, &nimaps);
> > +	error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
> 
> At this point you might as well convert the one remaining caller of
> xfs_reflink_convert_cow to take and drop the ILOCK around the
> reflink_convert_cow call...

I looked at it - the ilock is not the issue, but moving the bytes
to fsb conversion into the only caller makes it look worse than
keeping a helper for that one helper.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/11] xfs: rework the truncate race handling in the writeback path
  2018-12-18 23:03   ` Darrick J. Wong
@ 2018-12-19 19:32     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-19 19:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 03:03:45PM -0800, Darrick J. Wong wrote:
> > +eof:
> > +	/*
> > +	 * If we raced with truncate there might be no data left at this offset.
> > +	 * In that case we need to return a hole so that the writeback code
> > +	 * skips writeback for the rest of the file.
> > +	 */
> > +	wpc->imap.br_startoff = offset_fsb;
> > +	wpc->imap.br_blockcount = end_fsb - offset_fsb;
> > +	wpc->imap.br_startblock = HOLESTARTBLOCK;
> > +	wpc->imap.br_state = XFS_EXT_NORM;
> > +	return 0;
> 
> This function has become rather spaghetti-like.  Any way we can clean
> this up reasonably?

What is spaghetti about a small number of gotos that we jump to at
the end of the function?  Compared to the previous mess we had this
actually is a significant cleanup.

> > -		nimaps = 0;
> > -		while (nimaps == 0) {
> 
> This removal of the nimaps == 0 loop bothers me: why is doing so safe?
> 
> I see that we can return from xfs_bmapi_write with nimaps == 0 if
> something is trying to punch or truncate the range that we're writing
> back, but it also seems to me that bmapi_write can return zero mappings
> because xfs_bmapi_allocate() didn't find any blocks.  I /think/ that's
> impossible because we're converting delalloc reservations and so we
> should never run out of space, right?
> 
> Anyway, when _write_allocate gets zero mappings, it'll return -EAGAIN to
> xfs_map_blocks, which will retry once to cover the case of racing with
> cow -> data fork remapping but otherwise it won't bother?  And that's
> why it's fine that only to loop once?
> 
> Am I reasoning this correctly?

Yes, exactly.  The only thing the loop did was to make sure we hit
the truncate race handling code another time on a failure return
from xfs_bmapi_write.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints
  2018-12-19 19:29     ` Christoph Hellwig
@ 2018-12-19 19:32       ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Dec 19, 2018 at 08:29:24PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 18, 2018 at 01:44:25PM -0800, Darrick J. Wong wrote:
> > > +	uint			*lockmode,
> > > +	unsigned		flags)
> > 
> > I'm not thrilled with passing iomap flags into the reflink code here...
> 
> So what is the alternative?  A non-descriptive bool argument?

More or less what I described below. :)

> > ...because I feel that it's easy to miss the subtlety here that for
> > buffered writes we don't care if the cow extent is unwritten or written,
> > but for directio we very /much/ care that the cow extent is written,
> > because we're writing to it immediately.  Can this grow a comment to
> > reinforce why we skip the conversion?
> 
> Sure.
> 
> > Also, can we call this 'iomap_flags' to make it clearer which flags
> > we're talking about?
> 
> Sure.

Cool!

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-18 23:24   ` Darrick J. Wong
@ 2018-12-19 19:37     ` Christoph Hellwig
  2018-12-19 22:43     ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-19 19:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 03:24:37PM -0800, Darrick J. Wong wrote:
> On Mon, Dec 03, 2018 at 05:25:03PM -0500, Christoph Hellwig wrote:
> > Add a mode where XFS never overwrites existing blocks in place.  This
> > is to aid debugging our COW code, and also put infatructure in place
> > for things like possible future support for zoned block devices, which
> > can't support overwrites.
> > 
> > This mode is enabled globally by doing a:
> > 
> >     echo 1 > /sys/fs/xfs/debug/always_cow
> > 
> > Note that the parameter is global to allow running all tests in xfstests
> > easily in this mode, which would not easily be possible with a per-fs
> > sysfs file.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_aops.c    |  2 +-
> >  fs/xfs/xfs_file.c    | 11 ++++++++++-
> >  fs/xfs/xfs_iomap.c   | 28 ++++++++++++++++++----------
> >  fs/xfs/xfs_reflink.c | 28 ++++++++++++++++++++++++----
> >  fs/xfs/xfs_reflink.h | 13 +++++++++++++
> >  fs/xfs/xfs_super.c   | 13 +++++++++----
> >  fs/xfs/xfs_sysctl.h  |  1 +
> >  fs/xfs/xfs_sysfs.c   | 24 ++++++++++++++++++++++++
> >  8 files changed, 100 insertions(+), 20 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 7d95a84064e7..a900924f16e1 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -986,7 +986,7 @@ xfs_vm_bmap(
> >  	 * Since we don't pass back blockdev info, we can't return bmap
> >  	 * information for rt files either.
> >  	 */
> > -	if (xfs_is_reflink_inode(ip) || XFS_IS_REALTIME_INODE(ip))
> > +	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
> >  		return 0;
> >  	return iomap_bmap(mapping, block, &xfs_iomap_ops);
> >  }
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index e47425071e65..8d2be043590a 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -507,7 +507,7 @@ xfs_file_dio_aio_write(
> >  		 * We can't properly handle unaligned direct I/O to reflink
> >  		 * files yet, as we can't unshare a partial block.
> >  		 */
> > -		if (xfs_is_reflink_inode(ip)) {
> > +		if (xfs_is_cow_inode(ip)) {
> >  			trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count);
> >  			return -EREMCHG;
> >  		}
> > @@ -806,6 +806,15 @@ xfs_file_fallocate(
> >  		return -EOPNOTSUPP;
> >  
> >  	xfs_ilock(ip, iolock);
> > +	/*
> > +	 * If always_cow mode we can't use preallocation and thus should not
> > +	 * allow creating them.
> > +	 */
> > +	if (xfs_is_always_cow_inode(ip) && (mode & ~FALLOC_FL_KEEP_SIZE) == 0) {
> > +		error = -EOPNOTSUPP;
> > +		goto out_unlock;
> 
> I think this screws up both UNSHARE and ZERO_RANGE here -- if the first
> mode is set, we'll cow the shared extents, but we'll also fill holes
> with unwritten extents, which the comment implies isn't allowed.  In the
> second set we'll punch the range but refill it with unwritten extents
> that we'll never actually overwrite.

True.

> 
> Granted, I'm still rather fuzzy on what exactly is supposed to happen
> with preallocating fallocate when all writes require an allocation to
> succeed?  btrfs fills holes with unwritten extents which the next write
> will overwrite, but non-holes cow like normal.  That only makes sense if
> you assume people only use fallocate to preallocate holes.  Maybe we
> don't want to follow that route.  It's probably simpler not to support
> creation of unwritten extents for always_cow files, in which case you'll
> have to neuter UNSHARE too.
> 
> As for ZERO_RANGE, I think it's sufficient to punch the range, since we
> COW even the unwritten extents (which makes allocating them pointless),
> right?

Agreed.

> What's the real goal here?  I assume you're targeting both O_ATOMIC in
> addition to being able to use SMR drives as realtime devices, right?  It
> would help to have a better idea of where we're going here before adding
> anything user visible, even if it's just a debug knob for now.

My SMR plan (and is just that at the moment except for prep and bits
of prototype code) is to allow SMR drives instead of the realtime 
subvolume indeed.  It isn't really the rt subvolume anymore as we don't
use the RT allocator, but we need the bit to differenciate the metadata
device and the data going to SMR drives.

> >  	"reflink not compatible with realtime device!");
> > -		error = -EINVAL;
> > -		goto out_filestream_unmount;
> > +			error = -EINVAL;
> > +			goto out_filestream_unmount;
> > +		}
> > +
> > +		if (xfs_globals.always_cow)
> > +			xfs_info(mp, "using DEBUG-only always_cow mode.");
> 
> How does xfs handle the situation where always_cow mode comes on after
> you've already opened a file and begun writing to it?  I assume we
> allocate a new cow fork for files that need it and all writes after the
> switch flips will be COW?

Yes.  I guess in some ways it would be nicer to just sample the value
once at mount time - that could avoid having to deal with unexpected
corner cases in the future.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay
  2018-12-18 23:36   ` Darrick J. Wong
@ 2018-12-19 19:38     ` Christoph Hellwig
  2018-12-19 20:20       ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-19 19:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 03:36:41PM -0800, Darrick J. Wong wrote:
> On Mon, Dec 03, 2018 at 05:25:00PM -0500, Christoph Hellwig wrote:
> > Besides simplifying the code a bit this allows to actually implement
> > the behavior of using COW preallocation for non-COW data mentioned
> > in the current comments.
> > 
> > Note that this breaks the current version of xfs/420, but that is
> > because the test is broken.  A separate fix will be sent for it.
> 
> (Still waiting for this...)

It was sent quite a while ago, but I need to resend it..

> 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_iomap.c   | 132 ++++++++++++++++++++++++++++++-------------
> >  fs/xfs/xfs_reflink.c |  67 ----------------------
> >  fs/xfs/xfs_trace.h   |   1 -
> >  3 files changed, 93 insertions(+), 107 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index d851abac16a9..d19f99e5476a 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -534,15 +534,16 @@ xfs_file_iomap_begin_delay(
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > -	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> >  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> >  	xfs_fileoff_t		maxbytes_fsb =
> >  		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> >  	xfs_fileoff_t		end_fsb;
> > -	int			error = 0, eof = 0;
> > -	struct xfs_bmbt_irec	got;
> > -	struct xfs_iext_cursor	icur;
> > +	struct xfs_bmbt_irec	imap, cmap;
> > +	struct xfs_iext_cursor	icur, ccur;
> >  	xfs_fsblock_t		prealloc_blocks = 0;
> > +	bool			eof = false, cow_eof = false, shared;
> > +	int			whichfork = XFS_DATA_FORK;
> > +	int			error = 0;
> >  
> >  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> >  	ASSERT(!xfs_get_extsz_hint(ip));
> > @@ -560,7 +561,7 @@ xfs_file_iomap_begin_delay(
> >  
> >  	XFS_STATS_INC(mp, xs_blk_mapw);
> >  
> > -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > +	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
> >  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> >  		if (error)
> >  			goto out_unlock;
> > @@ -568,51 +569,92 @@ xfs_file_iomap_begin_delay(
> >  
> >  	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> >  
> > -	eof = !xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got);
> > +	/*
> > +	 * Search the data fork fork first to look up our source mapping.  We
> > +	 * always need the data fork map, as we have to return it to the
> > +	 * iomap code so that the higher level write code can read data in to
> > +	 * perform read-modify-write cycles for unaligned writes.
> > +	 */
> > +	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
> >  	if (eof)
> > -		got.br_startoff = end_fsb; /* fake hole until the end */
> > +		imap.br_startoff = end_fsb; /* fake hole until the end */
> >  
> > -	if (got.br_startoff <= offset_fsb) {
> > +	/* We never need to allocate blocks for zeroing a hole. */
> > +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> > +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> > +		goto out_unlock;
> > +	}
> > +
> > +	/*
> > +	 * Search the COW fork extent list even if we did not find a data fork
> > +	 * extent.  This serves two purposes: first this implements the
> > +	 * speculative preallocation using cowextisze, so that we also unshare
> 
> "cowextsize"
> 
> > +	 * block adjacent to shared blocks instead of just the shared blocks
> > +	 * themselves.  Second the lookup in the extent list is generally faster
> > +	 * than going out to the shared extent tree.
> > +	 */
> > +	if (xfs_is_reflink_inode(ip)) {
> > +		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
> > +				&ccur, &cmap);
> > +		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
> > +			trace_xfs_reflink_cow_found(ip, &cmap);
> > +			whichfork = XFS_COW_FORK;
> > +			goto done;
> > +		}
> > +	}
> > +
> > +	if (imap.br_startoff <= offset_fsb) {
> >  		/*
> >  		 * For reflink files we may need a delalloc reservation when
> >  		 * overwriting shared extents.   This includes zeroing of
> >  		 * existing extents that contain data.
> >  		 */
> > -		if (xfs_is_reflink_inode(ip) &&
> > -		    ((flags & IOMAP_WRITE) ||
> > -		     got.br_state != XFS_EXT_UNWRITTEN)) {
> > -			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
> > -			error = xfs_reflink_reserve_cow(ip, &got);
> > -			if (error)
> > -				goto out_unlock;
> > +		if (!xfs_is_reflink_inode(ip) ||
> > +		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
> > +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> > +					&imap);
> > +			goto done;
> >  		}
> >  
> > -		trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK, &got);
> > -		goto done;
> > -	}
> > +		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> >  
> > -	if (flags & IOMAP_ZERO) {
> > -		xfs_hole_to_iomap(ip, iomap, offset_fsb, got.br_startoff);
> > -		goto out_unlock;
> > +		/* Trim the mapping to the nearest shared extent boundary. */
> > +		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> > +		if (error)
> > +			goto out_unlock;
> > +
> > +		/* Not shared?  Just report the (potentially capped) extent. */
> > +		if (!shared) {
> > +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> > +					&imap);
> > +			goto done;
> > +		}
> > +
> > +		/*
> > +		 * Fork all the shared blocks from our write offset until the
> > +		 * end of the extent.
> > +		 */
> > +		whichfork = XFS_COW_FORK;
> > +		end_fsb = imap.br_startoff + imap.br_blockcount;
> > +	} else {
> > +		/*
> > +		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
> > +		 * pages to keep the chunks of work done where somewhat
> > +		 * symmetric with the work writeback does.  This is a completely
> > +		 * arbitrary number pulled out of thin air.
> > +		 *
> > +		 * Note that the values needs to be less than 32-bits wide until
> > +		 * the lower level functions are updated.
> > +		 */
> > +		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> > +		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> >  	}
> >  
> >  	error = xfs_qm_dqattach_locked(ip, false);
> >  	if (error)
> >  		goto out_unlock;
> >  
> > -	/*
> > -	 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES pages
> > -	 * to keep the chunks of work done where somewhat symmetric with the
> > -	 * work writeback does. This is a completely arbitrary number pulled
> > -	 * out of thin air as a best guess for initial testing.
> > -	 *
> > -	 * Note that the values needs to be less than 32-bits wide until
> > -	 * the lower level functions are updated.
> > -	 */
> > -	count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> > -	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> > -
> > -	if (eof) {
> > +	if (eof && whichfork == XFS_DATA_FORK) {
> >  		prealloc_blocks = xfs_iomap_prealloc_size(ip, offset, count,
> >  				&icur);
> >  		if (prealloc_blocks) {
> > @@ -635,9 +677,11 @@ xfs_file_iomap_begin_delay(
> >  	}
> >  
> >  retry:
> > -	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
> > -			end_fsb - offset_fsb, prealloc_blocks, &got, &icur,
> > -			eof);
> > +	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> > +			end_fsb - offset_fsb, prealloc_blocks,
> > +			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> > +			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> > +			whichfork == XFS_DATA_FORK ? eof : cow_eof);
> >  	switch (error) {
> >  	case 0:
> >  		break;
> > @@ -659,9 +703,19 @@ xfs_file_iomap_begin_delay(
> >  	 * them out if the write happens to fail.
> >  	 */
> >  	iomap->flags |= IOMAP_F_NEW;
> > -	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
> > +	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> 
> I'm confused by this, if whichfork == COW then won't ftrace report
> results from the wrong fork?

The question is what the "right" fork to trace is as both matter here.
At least we now clearly tell you which fork the trace belongs too.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay
  2018-12-18 23:38   ` Darrick J. Wong
@ 2018-12-19 19:39     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-19 19:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 03:38:28PM -0800, Darrick J. Wong wrote:
> > @@ -541,7 +541,7 @@ xfs_file_iomap_begin_delay(
> >  	struct xfs_bmbt_irec	imap, cmap;
> >  	struct xfs_iext_cursor	icur, ccur;
> >  	xfs_fsblock_t		prealloc_blocks = 0;
> > -	bool			eof = false, cow_eof = false, shared;
> > +	bool			eof = false, cow_eof = false, shared = false;
> >  	int			whichfork = XFS_DATA_FORK;
> >  	int			error = 0;
> >  
> > @@ -709,13 +709,14 @@ xfs_file_iomap_begin_delay(
> >  		if (imap.br_startoff > offset_fsb) {
> >  			xfs_trim_extent(&cmap, offset_fsb,
> >  					imap.br_startoff - offset_fsb);
> > -			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, false);
> > +			error = xfs_bmbt_to_iomap(ip, iomap, &cmap, true);
> 
> Does this belong in the previous patch?  (Maybe all of it?)

Not really, as it is just a nice to have thing that is a change of
behavior to what we did before.  So I'd rather have it in a single
patch that documents what exactly we did here and why.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay
  2018-12-19 19:38     ` Christoph Hellwig
@ 2018-12-19 20:20       ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-19 20:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Dec 19, 2018 at 08:38:43PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 18, 2018 at 03:36:41PM -0800, Darrick J. Wong wrote:
> > On Mon, Dec 03, 2018 at 05:25:00PM -0500, Christoph Hellwig wrote:
> > > Besides simplifying the code a bit this allows to actually implement
> > > the behavior of using COW preallocation for non-COW data mentioned
> > > in the current comments.
> > > 
> > > Note that this breaks the current version of xfs/420, but that is
> > > because the test is broken.  A separate fix will be sent for it.
> > 
> > (Still waiting for this...)
> 
> It was sent quite a while ago, but I need to resend it..
> 
> > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_iomap.c   | 132 ++++++++++++++++++++++++++++++-------------
> > >  fs/xfs/xfs_reflink.c |  67 ----------------------
> > >  fs/xfs/xfs_trace.h   |   1 -
> > >  3 files changed, 93 insertions(+), 107 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index d851abac16a9..d19f99e5476a 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -534,15 +534,16 @@ xfs_file_iomap_begin_delay(
> > >  {
> > >  	struct xfs_inode	*ip = XFS_I(inode);
> > >  	struct xfs_mount	*mp = ip->i_mount;
> > > -	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> > >  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > >  	xfs_fileoff_t		maxbytes_fsb =
> > >  		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > >  	xfs_fileoff_t		end_fsb;
> > > -	int			error = 0, eof = 0;
> > > -	struct xfs_bmbt_irec	got;
> > > -	struct xfs_iext_cursor	icur;
> > > +	struct xfs_bmbt_irec	imap, cmap;
> > > +	struct xfs_iext_cursor	icur, ccur;
> > >  	xfs_fsblock_t		prealloc_blocks = 0;
> > > +	bool			eof = false, cow_eof = false, shared;
> > > +	int			whichfork = XFS_DATA_FORK;
> > > +	int			error = 0;
> > >  
> > >  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> > >  	ASSERT(!xfs_get_extsz_hint(ip));
> > > @@ -560,7 +561,7 @@ xfs_file_iomap_begin_delay(
> > >  
> > >  	XFS_STATS_INC(mp, xs_blk_mapw);
> > >  
> > > -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > > +	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
> > >  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> > >  		if (error)
> > >  			goto out_unlock;
> > > @@ -568,51 +569,92 @@ xfs_file_iomap_begin_delay(
> > >  
> > >  	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> > >  
> > > -	eof = !xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got);
> > > +	/*
> > > +	 * Search the data fork fork first to look up our source mapping.  We
> > > +	 * always need the data fork map, as we have to return it to the
> > > +	 * iomap code so that the higher level write code can read data in to
> > > +	 * perform read-modify-write cycles for unaligned writes.
> > > +	 */
> > > +	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
> > >  	if (eof)
> > > -		got.br_startoff = end_fsb; /* fake hole until the end */
> > > +		imap.br_startoff = end_fsb; /* fake hole until the end */
> > >  
> > > -	if (got.br_startoff <= offset_fsb) {
> > > +	/* We never need to allocate blocks for zeroing a hole. */
> > > +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> > > +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Search the COW fork extent list even if we did not find a data fork
> > > +	 * extent.  This serves two purposes: first this implements the
> > > +	 * speculative preallocation using cowextisze, so that we also unshare
> > 
> > "cowextsize"
> > 
> > > +	 * block adjacent to shared blocks instead of just the shared blocks
> > > +	 * themselves.  Second the lookup in the extent list is generally faster
> > > +	 * than going out to the shared extent tree.
> > > +	 */
> > > +	if (xfs_is_reflink_inode(ip)) {
> > > +		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
> > > +				&ccur, &cmap);
> > > +		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
> > > +			trace_xfs_reflink_cow_found(ip, &cmap);
> > > +			whichfork = XFS_COW_FORK;
> > > +			goto done;
> > > +		}
> > > +	}
> > > +
> > > +	if (imap.br_startoff <= offset_fsb) {
> > >  		/*
> > >  		 * For reflink files we may need a delalloc reservation when
> > >  		 * overwriting shared extents.   This includes zeroing of
> > >  		 * existing extents that contain data.
> > >  		 */
> > > -		if (xfs_is_reflink_inode(ip) &&
> > > -		    ((flags & IOMAP_WRITE) ||
> > > -		     got.br_state != XFS_EXT_UNWRITTEN)) {
> > > -			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
> > > -			error = xfs_reflink_reserve_cow(ip, &got);
> > > -			if (error)
> > > -				goto out_unlock;
> > > +		if (!xfs_is_reflink_inode(ip) ||
> > > +		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
> > > +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> > > +					&imap);
> > > +			goto done;
> > >  		}
> > >  
> > > -		trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK, &got);
> > > -		goto done;
> > > -	}
> > > +		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> > >  
> > > -	if (flags & IOMAP_ZERO) {
> > > -		xfs_hole_to_iomap(ip, iomap, offset_fsb, got.br_startoff);
> > > -		goto out_unlock;
> > > +		/* Trim the mapping to the nearest shared extent boundary. */
> > > +		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> > > +		if (error)
> > > +			goto out_unlock;
> > > +
> > > +		/* Not shared?  Just report the (potentially capped) extent. */
> > > +		if (!shared) {
> > > +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> > > +					&imap);
> > > +			goto done;
> > > +		}
> > > +
> > > +		/*
> > > +		 * Fork all the shared blocks from our write offset until the
> > > +		 * end of the extent.
> > > +		 */
> > > +		whichfork = XFS_COW_FORK;
> > > +		end_fsb = imap.br_startoff + imap.br_blockcount;
> > > +	} else {
> > > +		/*
> > > +		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
> > > +		 * pages to keep the chunks of work done where somewhat
> > > +		 * symmetric with the work writeback does.  This is a completely
> > > +		 * arbitrary number pulled out of thin air.
> > > +		 *
> > > +		 * Note that the values needs to be less than 32-bits wide until
> > > +		 * the lower level functions are updated.
> > > +		 */
> > > +		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> > > +		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> > >  	}
> > >  
> > >  	error = xfs_qm_dqattach_locked(ip, false);
> > >  	if (error)
> > >  		goto out_unlock;
> > >  
> > > -	/*
> > > -	 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES pages
> > > -	 * to keep the chunks of work done where somewhat symmetric with the
> > > -	 * work writeback does. This is a completely arbitrary number pulled
> > > -	 * out of thin air as a best guess for initial testing.
> > > -	 *
> > > -	 * Note that the values needs to be less than 32-bits wide until
> > > -	 * the lower level functions are updated.
> > > -	 */
> > > -	count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> > > -	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> > > -
> > > -	if (eof) {
> > > +	if (eof && whichfork == XFS_DATA_FORK) {
> > >  		prealloc_blocks = xfs_iomap_prealloc_size(ip, offset, count,
> > >  				&icur);
> > >  		if (prealloc_blocks) {
> > > @@ -635,9 +677,11 @@ xfs_file_iomap_begin_delay(
> > >  	}
> > >  
> > >  retry:
> > > -	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
> > > -			end_fsb - offset_fsb, prealloc_blocks, &got, &icur,
> > > -			eof);
> > > +	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> > > +			end_fsb - offset_fsb, prealloc_blocks,
> > > +			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> > > +			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> > > +			whichfork == XFS_DATA_FORK ? eof : cow_eof);
> > >  	switch (error) {
> > >  	case 0:
> > >  		break;
> > > @@ -659,9 +703,19 @@ xfs_file_iomap_begin_delay(
> > >  	 * them out if the write happens to fail.
> > >  	 */
> > >  	iomap->flags |= IOMAP_F_NEW;
> > > -	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &got);
> > > +	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> > 
> > I'm confused by this, if whichfork == COW then won't ftrace report
> > results from the wrong fork?
> 
> The question is what the "right" fork to trace is as both matter here.
> At least we now clearly tell you which fork the trace belongs too.

True.  But I think it's misleading to have a tracepoint report say "cow"
when the extent data it records is actually from the data fork, and
particularly because we sometimes pass cmap back to the caller when
whichfork == COW.

If I were tracing through here I'd probably want to know what we found
from both forks at that particular point in time so that I could
continue following the logic.

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-18 23:24   ` Darrick J. Wong
  2018-12-19 19:37     ` Christoph Hellwig
@ 2018-12-19 22:43     ` Dave Chinner
  2018-12-20  7:07       ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2018-12-19 22:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 03:24:37PM -0800, Darrick J. Wong wrote:
> On Mon, Dec 03, 2018 at 05:25:03PM -0500, Christoph Hellwig wrote:
> Granted, I'm still rather fuzzy on what exactly is supposed to happen
> with preallocating fallocate when all writes require an allocation to
> succeed? 

For always_cow mode, perhaps we could consider preallocating into
the COW fork rather than the data fork? That way when we go to write
the data, we've already got the space allocated regardless of
whether it is over a hole or existing data?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-19 22:43     ` Dave Chinner
@ 2018-12-20  7:07       ` Christoph Hellwig
  2018-12-20 21:03         ` Dave Chinner
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-20  7:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs

On Thu, Dec 20, 2018 at 09:43:35AM +1100, Dave Chinner wrote:
> On Tue, Dec 18, 2018 at 03:24:37PM -0800, Darrick J. Wong wrote:
> > On Mon, Dec 03, 2018 at 05:25:03PM -0500, Christoph Hellwig wrote:
> > Granted, I'm still rather fuzzy on what exactly is supposed to happen
> > with preallocating fallocate when all writes require an allocation to
> > succeed? 
> 
> For always_cow mode, perhaps we could consider preallocating into
> the COW fork rather than the data fork? That way when we go to write
> the data, we've already got the space allocated regardless of
> whether it is over a hole or existing data?

For a speculative preallocation that is what we already do.  But for
persistent preallocation that doesn't help as the COW fork is not
persistent.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-19  0:44         ` Darrick J. Wong
@ 2018-12-20  7:09           ` Christoph Hellwig
  2018-12-20 22:09             ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-20  7:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Dec 18, 2018 at 04:44:11PM -0800, Darrick J. Wong wrote:
> FWIW one of my test vms seems to have hung in generic/323 with the xfs
> for-next and your patches applied:
> 
> MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, /dev/sdf'
> MOUNT_OPTIONS='/dev/sdf /opt'

I've run generic/323 with those settings in a loop over night and
haven't reproduced it so far.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-20  7:07       ` Christoph Hellwig
@ 2018-12-20 21:03         ` Dave Chinner
  2018-12-21  6:27           ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2018-12-20 21:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs

On Thu, Dec 20, 2018 at 08:07:41AM +0100, Christoph Hellwig wrote:
> On Thu, Dec 20, 2018 at 09:43:35AM +1100, Dave Chinner wrote:
> > On Tue, Dec 18, 2018 at 03:24:37PM -0800, Darrick J. Wong wrote:
> > > On Mon, Dec 03, 2018 at 05:25:03PM -0500, Christoph Hellwig wrote:
> > > Granted, I'm still rather fuzzy on what exactly is supposed to happen
> > > with preallocating fallocate when all writes require an allocation to
> > > succeed? 
> > 
> > For always_cow mode, perhaps we could consider preallocating into
> > the COW fork rather than the data fork? That way when we go to write
> > the data, we've already got the space allocated regardless of
> > whether it is over a hole or existing data?
> 
> For a speculative preallocation that is what we already do.  But for
> persistent preallocation that doesn't help as the COW fork is not
> persistent.

Yes, I know it's not persistent, but do we care for always_cow mode?
Preallocation to prevent enospc is done just before the data is
written, and if we put it in the COW fork then it will mostly just
work and behave as expected for preventing ENOSPC on subsequent
writes. Preallocation to control data layout is largely irrelevant
to always_cow mode, so it really makes no difference to us if the
preallocation disappears when the inode is cycled out of cache....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-20  7:09           ` Christoph Hellwig
@ 2018-12-20 22:09             ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-12-20 22:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Thu, Dec 20, 2018 at 08:09:27AM +0100, Christoph Hellwig wrote:
> On Tue, Dec 18, 2018 at 04:44:11PM -0800, Darrick J. Wong wrote:
> > FWIW one of my test vms seems to have hung in generic/323 with the xfs
> > for-next and your patches applied:
> > 
> > MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, /dev/sdf'
> > MOUNT_OPTIONS='/dev/sdf /opt'
> 
> I've run generic/323 with those settings in a loop over night and
> haven't reproduced it so far.

<nod> It's now shown up in one of my dev tree and TOT test run VMs, so
it would seem that it's not specific to the always_cow patches.

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] xfs: introduce an always_cow mode
  2018-12-20 21:03         ` Dave Chinner
@ 2018-12-21  6:27           ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-21  6:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs

On Fri, Dec 21, 2018 at 08:03:09AM +1100, Dave Chinner wrote:
> Yes, I know it's not persistent, but do we care for always_cow mode?
> Preallocation to prevent enospc is done just before the data is
> written, and if we put it in the COW fork then it will mostly just
> work and behave as expected for preventing ENOSPC on subsequent
> writes. Preallocation to control data layout is largely irrelevant
> to always_cow mode, so it really makes no difference to us if the
> preallocation disappears when the inode is cycled out of cache....

I'll have to see if we can get the semantics for the right.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: COW improvements and always_cow support V3
  2018-12-03 16:21 Christoph Hellwig
@ 2018-12-03 17:22 ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 17:22 UTC (permalink / raw)
  To: linux-xfs

Please discard this for now - the inflight wifi is to unstable to send
a patch series..

^ permalink raw reply	[flat|nested] 46+ messages in thread

* COW improvements and always_cow support V3
@ 2018-12-03 16:21 Christoph Hellwig
  2018-12-03 17:22 ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2018-12-03 16:21 UTC (permalink / raw)
  To: linux-xfs

Hi all,

this series adds the always_cow mode support after improving our COW
write support a little bit first.

The always_cow mode stresses the COW path a lot, but with a few xfstests
fixups it generall looks good, except for a few tests that complain about
fragmentation, which is rather inherent in this mode, and xfs/326 which
inserts error tags into the COW path not getting the expected result.

Changes since v2:
 - add a patch to remove xfs_trim_extent_eof
 - add a patch to remove the separate io_type and rely on existing state
   in the writeback path
 - rework the truncate race handling in the writeback path a little more

Changes since v1:
 - make delalloc and unwritten extent conversions simpler and more robust
 - add a few additional cleanups
 - support all fallocate modes but actual preallocation
 - rebase on top of a fix from Brian (which is included as first patch
   to make the patch set more usable)

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-12-21  6:27 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 22:24 COW improvements and always_cow support V3 Christoph Hellwig
2018-12-03 22:24 ` [PATCH 01/11] xfs: remove xfs_trim_extent_eof Christoph Hellwig
2018-12-18 21:45   ` Darrick J. Wong
2018-12-03 22:24 ` [PATCH 02/11] xfs: remove the io_type field from the writeback context and ioend Christoph Hellwig
2018-12-18 21:45   ` Darrick J. Wong
2018-12-03 22:24 ` [PATCH 03/11] xfs: remove the s_maxbytes checks in xfs_map_blocks Christoph Hellwig
2018-12-18 22:31   ` Darrick J. Wong
2018-12-03 22:24 ` [PATCH 04/11] xfs: rework the truncate race handling in the writeback path Christoph Hellwig
2018-12-18 23:03   ` Darrick J. Wong
2018-12-19 19:32     ` Christoph Hellwig
2018-12-03 22:24 ` [PATCH 05/11] xfs: make xfs_bmbt_to_iomap more useful Christoph Hellwig
2018-12-18 21:46   ` Darrick J. Wong
2018-12-03 22:24 ` [PATCH 06/11] xfs: don't use delalloc extents for COW on files with extsize hints Christoph Hellwig
2018-12-18 21:44   ` Darrick J. Wong
2018-12-19 19:29     ` Christoph Hellwig
2018-12-19 19:32       ` Darrick J. Wong
2018-12-03 22:24 ` [PATCH 07/11] xfs: also truncate holes covered by COW blocks Christoph Hellwig
2018-12-18 23:39   ` Darrick J. Wong
2018-12-03 22:25 ` [PATCH 08/11] xfs: merge COW handling into xfs_file_iomap_begin_delay Christoph Hellwig
2018-12-18 23:36   ` Darrick J. Wong
2018-12-19 19:38     ` Christoph Hellwig
2018-12-19 20:20       ` Darrick J. Wong
2018-12-03 22:25 ` [PATCH 09/11] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay Christoph Hellwig
2018-12-18 23:38   ` Darrick J. Wong
2018-12-19 19:39     ` Christoph Hellwig
2018-12-03 22:25 ` [PATCH 10/11] xfs: make COW fork unwritten extent conversions more robust Christoph Hellwig
2018-12-18 22:22   ` Darrick J. Wong
2018-12-19 19:30     ` Christoph Hellwig
2018-12-03 22:25 ` [PATCH 11/11] xfs: introduce an always_cow mode Christoph Hellwig
2018-12-18 23:24   ` Darrick J. Wong
2018-12-19 19:37     ` Christoph Hellwig
2018-12-19 22:43     ` Dave Chinner
2018-12-20  7:07       ` Christoph Hellwig
2018-12-20 21:03         ` Dave Chinner
2018-12-21  6:27           ` Christoph Hellwig
2018-12-06  1:05 ` COW improvements and always_cow support V3 Darrick J. Wong
2018-12-06  4:16   ` Christoph Hellwig
2018-12-06 16:32     ` Darrick J. Wong
2018-12-06 20:09   ` Christoph Hellwig
2018-12-17 17:59     ` Darrick J. Wong
2018-12-18 18:05       ` Christoph Hellwig
2018-12-19  0:44         ` Darrick J. Wong
2018-12-20  7:09           ` Christoph Hellwig
2018-12-20 22:09             ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2018-12-03 16:21 Christoph Hellwig
2018-12-03 17:22 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.