All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code
@ 2017-08-25 15:05 Brian Foster
  2017-08-25 15:05 ` [PATCH 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

Hi all,

Here's a first non-rfc of the ordered buffer refactoring series. The
original motivation is to provide a function that deferred operations
can use to relog buffers. As such, the initial refactoring patches are
mostly the same as the rfc. This work became a bit more involved than
originally anticipated because the refactoring that implemented more
strict buffer ordering rules also teased out the fact that the extent
swap code wasn't using ordered buffers quite correctly.

The primary differences in this version are the addition of some new
cleanup/bugfix patches and an update to make buffer ordering conditional
on the blf dirty state of the buffer. This is driven by the fact that
the existing bmbt owner change algorithm misuses ordered buffers by not
considering the dirty state of such buffers before they are relogged as
ordered. After some refactoring, the final patch updates the the bmbt
owner change mechanism to physically relog such buffers and updates
extent swap to use a rolling transaction to deal with this case.

Thoughts, reviews, flames appreciated.

Brian

v1:
- Drop unnecessary buffer type set in xfs_btree_block_change_owner().
- Several new cleanup patches.
- Fix bmbt owner change (extent swap) log recovery.
- Update buffer ordering to fail on previously dirty buffers.
- Rework bmbt owner change to use ordered buffers correctly.
rfc: https://marc.info/?l=linux-xfs&m=150272969407981&w=2

Brian Foster (9):
  xfs: open-code xfs_buf_item_dirty()
  xfs: remove unnecessary dirty bli format check for ordered bufs
  xfs: ordered buffer log items are never formatted
  xfs: refactor buffer logging into buffer dirtying helper
  xfs: don't log dirty ranges for ordered buffers
  xfs: skip bmbt block ino validation during owner change
  xfs: move bmbt owner change to last step of extent swap
  xfs: disallow marking previously dirty buffers as ordered
  xfs: relog dirty buffers during swapext bmbt owner change

 fs/xfs/libxfs/xfs_bmap_btree.c |  1 +
 fs/xfs/libxfs/xfs_btree.c      | 27 ++++++++----
 fs/xfs/libxfs/xfs_btree.h      |  3 +-
 fs/xfs/libxfs/xfs_ialloc.c     |  2 -
 fs/xfs/xfs_bmap_util.c         | 93 +++++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_buf_item.c          | 59 +++++++++++++--------------
 fs/xfs/xfs_buf_item.h          |  2 +-
 fs/xfs/xfs_trace.h             |  1 -
 fs/xfs/xfs_trans.h             |  6 ++-
 fs/xfs/xfs_trans_buf.c         | 79 +++++++++++++++++++++--------------
 10 files changed, 172 insertions(+), 101 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/9] xfs: open-code xfs_buf_item_dirty()
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 15:26   ` Darrick J. Wong
  2017-08-28  9:20   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

It checks a single flag and has one caller. It probably isn't worth
its own function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c  | 11 -----------
 fs/xfs/xfs_buf_item.h  |  1 -
 fs/xfs/xfs_trans_buf.c |  2 +-
 3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 573fc72..cdae0ad5e0 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -945,17 +945,6 @@ xfs_buf_item_log(
 }
 
 
-/*
- * Return 1 if the buffer has been logged or ordered in a transaction (at any
- * point, not just the current transaction) and 0 if not.
- */
-uint
-xfs_buf_item_dirty(
-	xfs_buf_log_item_t	*bip)
-{
-	return (bip->bli_flags & XFS_BLI_DIRTY);
-}
-
 STATIC void
 xfs_buf_item_free(
 	xfs_buf_log_item_t	*bip)
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 530686e..e0e744a 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -64,7 +64,6 @@ typedef struct xfs_buf_log_item {
 int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
 void	xfs_buf_item_relse(struct xfs_buf *);
 void	xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
-uint	xfs_buf_item_dirty(xfs_buf_log_item_t *);
 void	xfs_buf_attach_iodone(struct xfs_buf *,
 			      void(*)(struct xfs_buf *, xfs_log_item_t *),
 			      xfs_log_item_t *);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 86987d8..cac8abb 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -435,7 +435,7 @@ xfs_trans_brelse(xfs_trans_t	*tp,
 	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
 		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
 		xfs_buf_item_relse(bp);
-	} else if (!xfs_buf_item_dirty(bip)) {
+	} else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
 /***
 		ASSERT(bp->b_pincount == 0);
 ***/
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
  2017-08-25 15:05 ` [PATCH 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 15:51   ` Darrick J. Wong
  2017-08-28  9:25   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 3/9] xfs: ordered buffer log items are never formatted Brian Foster
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

xfs_buf_item_unlock() historically checked the dirty state of the
buffer by manually checking the buffer log formats for dirty
segments. The introduction of ordered buffers invalidated this check
because ordered buffers have dirty bli's but no dirty (logged)
segments. The check was updated to accommodate ordered buffers by
looking at the bli state first and considering the blf only if the
bli is clean.

This logic is safe but unnecessary. There is no valid case where the
bli is clean yet the blf has dirty segments. The bli is set dirty
whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).

Remove the conditional blf dirty checks and replace with an assert
that should catch any discrepencies between bli and blf dirty
states. Refactor the old blf dirty check into a helper function to
be used by the assert.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c | 46 +++++++++++++++++++++++++++++++---------------
 fs/xfs/xfs_buf_item.h |  1 +
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index cdae0ad5e0..9f4dbca 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -575,8 +575,9 @@ xfs_buf_item_unlock(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	struct xfs_buf		*bp = bip->bli_buf;
-	bool			clean;
+	bool			dirty;
 	bool			aborted;
+	bool			ordered;
 	int			flags;
 
 	/* Clear the buffer's association with this transaction. */
@@ -620,20 +621,13 @@ xfs_buf_item_unlock(
 	 * regardless of whether it is dirty or not. A dirty abort implies a
 	 * shutdown, anyway.
 	 *
-	 * Ordered buffers are dirty but may have no recorded changes, so ensure
-	 * we only release clean items here.
+	 * The bli dirty state should match whether the blf has logged segments
+	 * except for ordered buffers, where only the bli should be dirty.
 	 */
-	clean = (flags & XFS_BLI_DIRTY) ? false : true;
-	if (clean) {
-		int i;
-		for (i = 0; i < bip->bli_format_count; i++) {
-			if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
-				     bip->bli_formats[i].blf_map_size)) {
-				clean = false;
-				break;
-			}
-		}
-	}
+	dirty = (flags & XFS_BLI_DIRTY) ? true : false;
+	ordered = (flags & XFS_BLI_ORDERED) ? true : false;
+	ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
+	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
 
 	/*
 	 * Clean buffers, by definition, cannot be in the AIL. However, aborted
@@ -652,7 +646,7 @@ xfs_buf_item_unlock(
 			ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp));
 			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
-		} else if (clean)
+		} else if (!dirty)
 			xfs_buf_item_relse(bp);
 	}
 
@@ -945,6 +939,28 @@ xfs_buf_item_log(
 }
 
 
+/*
+ * Return true if the buffer has any ranges logged/dirtied by a transaction,
+ * false otherwise.
+ */
+bool
+xfs_buf_item_dirty_format(
+	struct xfs_buf_log_item	*bip)
+{
+	int			i;
+	bool			dirty = false;
+
+	for (i = 0; i < bip->bli_format_count; i++) {
+		if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
+			     bip->bli_formats[i].blf_map_size)) {
+			dirty = true;
+			break;
+		}
+	}
+
+	return dirty;
+}
+
 STATIC void
 xfs_buf_item_free(
 	xfs_buf_log_item_t	*bip)
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index e0e744a..9690ce6 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -64,6 +64,7 @@ typedef struct xfs_buf_log_item {
 int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
 void	xfs_buf_item_relse(struct xfs_buf *);
 void	xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
+bool	xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
 void	xfs_buf_attach_iodone(struct xfs_buf *,
 			      void(*)(struct xfs_buf *, xfs_log_item_t *),
 			      xfs_log_item_t *);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/9] xfs: ordered buffer log items are never formatted
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
  2017-08-25 15:05 ` [PATCH 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
  2017-08-25 15:05 ` [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 15:26   ` Darrick J. Wong
  2017-08-28  9:26   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

Ordered buffers pass through the logging infrastructure without ever
being written to the log. The way this works is that the ordered
buffer status is transferred to the log vector at commit time via
the ->iop_size() callback. In xlog_cil_insert_format_items(),
ordered log vectors bypass ->iop_format() processing altogether.

Therefore it is unnecessary for xfs_buf_item_format() to handle
ordered buffers. Remove the unnecessary logic and assert that an
ordered buffer never reaches this point.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c | 12 ++----------
 fs/xfs/xfs_trace.h    |  1 -
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 9f4dbca..4df4581 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -323,6 +323,8 @@ xfs_buf_item_format(
 	ASSERT((bip->bli_flags & XFS_BLI_STALE) ||
 	       (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF
 	        && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF));
+	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED) ||
+	       bip->bli_flags & XFS_BLI_STALE);
 
 
 	/*
@@ -347,16 +349,6 @@ xfs_buf_item_format(
 		bip->bli_flags &= ~XFS_BLI_INODE_BUF;
 	}
 
-	if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
-							XFS_BLI_ORDERED) {
-		/*
-		 * The buffer has been logged just to order it.  It is not being
-		 * included in the transaction commit, so don't format it.
-		 */
-		trace_xfs_buf_item_format_ordered(bip);
-		return;
-	}
-
 	for (i = 0; i < bip->bli_format_count; i++) {
 		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
 					    &bip->bli_formats[i]);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 6881047..e839ab4 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -517,7 +517,6 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
-DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/9] xfs: refactor buffer logging into buffer dirtying helper
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
                   ` (2 preceding siblings ...)
  2017-08-25 15:05 ` [PATCH 3/9] xfs: ordered buffer log items are never formatted Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-28  9:28   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer.  One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.

Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_trans.h     |  4 +++-
 fs/xfs/xfs_trans_buf.c | 46 ++++++++++++++++++++++++++++++----------------
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 7d62772..b3632eb 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -217,7 +217,9 @@ void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
 void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
 void		xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
-void		xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
+void		xfs_trans_log_buf(struct xfs_trans *, struct xfs_buf *, uint,
+				  uint);
+void		xfs_trans_dirty_buf(struct xfs_trans *, struct xfs_buf *);
 void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
 
 void		xfs_extent_free_init_defer_op(void);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index cac8abb..8c99813 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -493,25 +493,17 @@ xfs_trans_bhold_release(xfs_trans_t	*tp,
 }
 
 /*
- * This is called to mark bytes first through last inclusive of the given
- * buffer as needing to be logged when the transaction is committed.
- * The buffer must already be associated with the given transaction.
- *
- * First and last are numbers relative to the beginning of this buffer,
- * so the first byte in the buffer is numbered 0 regardless of the
- * value of b_blkno.
+ * Mark a buffer dirty in the transaction.
  */
 void
-xfs_trans_log_buf(xfs_trans_t	*tp,
-		  xfs_buf_t	*bp,
-		  uint		first,
-		  uint		last)
+xfs_trans_dirty_buf(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp)
 {
-	xfs_buf_log_item_t	*bip = bp->b_fspriv;
+	struct xfs_buf_log_item	*bip = bp->b_fspriv;
 
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
-	ASSERT(first <= last && last < BBTOB(bp->b_length));
 	ASSERT(bp->b_iodone == NULL ||
 	       bp->b_iodone == xfs_buf_iodone_callbacks);
 
@@ -531,8 +523,6 @@ xfs_trans_log_buf(xfs_trans_t	*tp,
 	bp->b_iodone = xfs_buf_iodone_callbacks;
 	bip->bli_item.li_cb = xfs_buf_iodone;
 
-	trace_xfs_trans_log_buf(bip);
-
 	/*
 	 * If we invalidated the buffer within this transaction, then
 	 * cancel the invalidation now that we're dirtying the buffer
@@ -545,15 +535,39 @@ xfs_trans_log_buf(xfs_trans_t	*tp,
 		bp->b_flags &= ~XBF_STALE;
 		bip->__bli_format.blf_flags &= ~XFS_BLF_CANCEL;
 	}
+	bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
 
 	tp->t_flags |= XFS_TRANS_DIRTY;
 	bip->bli_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+}
+
+/*
+ * This is called to mark bytes first through last inclusive of the given
+ * buffer as needing to be logged when the transaction is committed.
+ * The buffer must already be associated with the given transaction.
+ *
+ * First and last are numbers relative to the beginning of this buffer,
+ * so the first byte in the buffer is numbered 0 regardless of the
+ * value of b_blkno.
+ */
+void
+xfs_trans_log_buf(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp,
+	uint			first,
+	uint			last)
+{
+	struct xfs_buf_log_item	*bip = bp->b_fspriv;
+
+	ASSERT(first <= last && last < BBTOB(bp->b_length));
+
+	xfs_trans_dirty_buf(tp, bp);
 
 	/*
 	 * If we have an ordered buffer we are not logging any dirty range but
 	 * it still needs to be marked dirty and that it has been logged.
 	 */
-	bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
+	trace_xfs_trans_log_buf(bip);
 	if (!(bip->bli_flags & XFS_BLI_ORDERED))
 		xfs_buf_item_log(bip, first, last);
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
                   ` (3 preceding siblings ...)
  2017-08-25 15:05 ` [PATCH 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 15:51   ` Darrick J. Wong
  2017-08-28  9:29   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c  |  6 ++----
 fs/xfs/libxfs/xfs_ialloc.c |  2 --
 fs/xfs/xfs_trans_buf.c     | 26 ++++++++++++++------------
 3 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index e0bcc4a..0b7905a 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4464,12 +4464,10 @@ xfs_btree_block_change_owner(
 	 * though, so everything is consistent in memory.
 	 */
 	if (bp) {
-		if (cur->bc_tp) {
+		if (cur->bc_tp)
 			xfs_trans_ordered_buf(cur->bc_tp, bp);
-			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
-		} else {
+		else
 			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
-		}
 	} else {
 		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
 		ASSERT(level == cur->bc_nlevels - 1);
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 1e0658a..988bb3f 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -378,8 +378,6 @@ xfs_ialloc_inode_init(
 				 * transaction and pin the log appropriately.
 				 */
 				xfs_trans_ordered_buf(tp, fbuf);
-				xfs_trans_log_buf(tp, fbuf, 0,
-						  BBTOB(fbuf->b_length) - 1);
 			}
 		} else {
 			fbuf->b_flags |= XBF_DONE;
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 8c99813..3089e80 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -560,16 +560,12 @@ xfs_trans_log_buf(
 	struct xfs_buf_log_item	*bip = bp->b_fspriv;
 
 	ASSERT(first <= last && last < BBTOB(bp->b_length));
+	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED));
 
 	xfs_trans_dirty_buf(tp, bp);
 
-	/*
-	 * If we have an ordered buffer we are not logging any dirty range but
-	 * it still needs to be marked dirty and that it has been logged.
-	 */
 	trace_xfs_trans_log_buf(bip);
-	if (!(bip->bli_flags & XFS_BLI_ORDERED))
-		xfs_buf_item_log(bip, first, last);
+	xfs_buf_item_log(bip, first, last);
 }
 
 
@@ -722,12 +718,11 @@ xfs_trans_inode_alloc_buf(
 }
 
 /*
- * Mark the buffer as ordered for this transaction. This means
- * that the contents of the buffer are not recorded in the transaction
- * but it is tracked in the AIL as though it was. This allows us
- * to record logical changes in transactions rather than the physical
- * changes we make to the buffer without changing writeback ordering
- * constraints of metadata buffers.
+ * Mark the buffer as ordered for this transaction. This means that the contents
+ * of the buffer are not recorded in the transaction but it is tracked in the
+ * AIL as though it was. This allows us to record logical changes in
+ * transactions rather than the physical changes we make to the buffer without
+ * changing writeback ordering constraints of metadata buffers.
  */
 void
 xfs_trans_ordered_buf(
@@ -739,9 +734,16 @@ xfs_trans_ordered_buf(
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
+	ASSERT(!xfs_buf_item_dirty_format(bip));
 
 	bip->bli_flags |= XFS_BLI_ORDERED;
 	trace_xfs_buf_item_ordered(bip);
+
+	/*
+	 * We don't log a dirty range of an ordered buffer but it still needs
+	 * to be marked dirty and that it has been logged.
+	 */
+	xfs_trans_dirty_buf(tp, bp);
 }
 
 /*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/9] xfs: skip bmbt block ino validation during owner change
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
                   ` (4 preceding siblings ...)
  2017-08-25 15:05 ` [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 15:35   ` Darrick J. Wong
  2017-08-28  9:44   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap_btree.c | 1 +
 fs/xfs/libxfs/xfs_btree.c      | 1 +
 fs/xfs/libxfs/xfs_btree.h      | 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 85de225..6f05995 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -858,6 +858,7 @@ xfs_bmbt_change_owner(
 	cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork);
 	if (!cur)
 		return -ENOMEM;
+	cur->bc_private.b.flags |= XFS_BTCUR_BPRV_INVALIDINO;
 
 	error = xfs_btree_change_owner(cur, new_owner, buffer_list);
 	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 0b7905a..d06b04d 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1791,6 +1791,7 @@ xfs_btree_lookup_get_block(
 
 	/* Check the inode owner since the verifiers don't. */
 	if (xfs_sb_version_hascrc(&cur->bc_mp->m_sb) &&
+	    !(cur->bc_private.b.flags & XFS_BTCUR_BPRV_INVALIDINO) &&
 	    (cur->bc_flags & XFS_BTREE_LONG_PTRS) &&
 	    be64_to_cpu((*blkp)->bb_u.l.bb_owner) !=
 			cur->bc_private.b.ip->i_ino)
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 9c95e96..c803ddc 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -233,7 +233,8 @@ typedef struct xfs_btree_cur
 			short		forksize;	/* fork's inode space */
 			char		whichfork;	/* data or attr fork */
 			char		flags;		/* flags */
-#define	XFS_BTCUR_BPRV_WASDEL	1			/* was delayed */
+#define	XFS_BTCUR_BPRV_WASDEL		(1<<0)		/* was delayed */
+#define	XFS_BTCUR_BPRV_INVALIDINO	(1<<1)		/* for ext swap */
 		} b;
 	}		bc_private;	/* per-btree type data */
 } xfs_btree_cur_t;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
                   ` (5 preceding siblings ...)
  2017-08-25 15:05 ` [PATCH 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 15:57   ` Darrick J. Wong
  2017-08-28  9:46   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
  2017-08-25 15:05 ` [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

The extent swap operation currently resets bmbt block owners before
the inode forks are swapped. The bmbt buffers are marked as ordered
so they do not have to be physically logged in the transaction.

This use of ordered buffers is not safe as bmbt buffers may have
been previously physically logged. The bmbt owner change algorithm
needs to be updated to physically log buffers that are already dirty
when/if they are encountered. This means that an extent swap will
eventually require multiple rolling transactions to handle large
btrees. In addition, all inode related changes must be logged before
the bmbt owner change scan begins and can roll the transaction for
the first time to preserve fs consistency via log recovery.

In preparation for such fixes to the bmbt owner change algorithm,
refactor the bmbt scan out of the extent fork swap code to the last
operation before the transaction is committed. Update
xfs_swap_extent_forks() to only set the inode log flags when an
owner change scan is necessary. Update xfs_swap_extents() to trigger
the owner change based on the inode log flags. Note that since the
owner change now occurs after the extent fork swap, the inode btrees
must be fixed up with the inode number of the current inode (similar
to log recovery).

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 93e9552..ee8fb9a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1840,29 +1840,18 @@ xfs_swap_extent_forks(
 	}
 
 	/*
-	 * Before we've swapped the forks, lets set the owners of the forks
-	 * appropriately. We have to do this as we are demand paging the btree
-	 * buffers, and so the validation done on read will expect the owner
-	 * field to be correctly set. Once we change the owners, we can swap the
-	 * inode forks.
+	 * Btree format (v3) inodes have the inode number stamped in the bmbt
+	 * block headers. We can't start changing the bmbt blocks until the
+	 * inode owner change is logged so recovery does the right thing in the
+	 * event of a crash. Set the owner change log flags now and leave the
+	 * bmbt scan as the last step.
 	 */
 	if (ip->i_d.di_version == 3 &&
-	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE)
 		(*target_log_flags) |= XFS_ILOG_DOWNER;
-		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
-					      tip->i_ino, NULL);
-		if (error)
-			return error;
-	}
-
 	if (tip->i_d.di_version == 3 &&
-	    tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+	    tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
 		(*src_log_flags) |= XFS_ILOG_DOWNER;
-		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
-					      ip->i_ino, NULL);
-		if (error)
-			return error;
-	}
 
 	/*
 	 * Swap the data forks of the inodes
@@ -2092,6 +2081,25 @@ xfs_swap_extents(
 	xfs_trans_log_inode(tp, tip, target_log_flags);
 
 	/*
+	 * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems
+	 * have inode number owner values in the bmbt blocks that still refer to
+	 * the old inode. Scan each bmbt to fix up the owner values with the
+	 * inode number of the current inode.
+	 */
+	if (src_log_flags & XFS_ILOG_DOWNER) {
+		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
+					      ip->i_ino, NULL);
+		if (error)
+			goto out_trans_cancel;
+	}
+	if (target_log_flags & XFS_ILOG_DOWNER) {
+		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
+					      tip->i_ino, NULL);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	/*
 	 * If this is a synchronous mount, make sure that the
 	 * transaction goes to disk before returning to the user.
 	 */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
                   ` (6 preceding siblings ...)
  2017-08-25 15:05 ` [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 16:50   ` Darrick J. Wong
  2017-08-28  9:34   ` Christoph Hellwig
  2017-08-25 15:05 ` [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_trans.h     | 2 +-
 fs/xfs/xfs_trans_buf.c | 7 +++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index b3632eb..4709823 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -212,7 +212,7 @@ void		xfs_trans_bhold_release(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
-void		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
+bool		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
 void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 3089e80..3ba7a96 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -724,7 +724,7 @@ xfs_trans_inode_alloc_buf(
  * transactions rather than the physical changes we make to the buffer without
  * changing writeback ordering constraints of metadata buffers.
  */
-void
+bool
 xfs_trans_ordered_buf(
 	struct xfs_trans	*tp,
 	struct xfs_buf		*bp)
@@ -734,7 +734,9 @@ xfs_trans_ordered_buf(
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
-	ASSERT(!xfs_buf_item_dirty_format(bip));
+
+	if (xfs_buf_item_dirty_format(bip))
+		return false;
 
 	bip->bli_flags |= XFS_BLI_ORDERED;
 	trace_xfs_buf_item_ordered(bip);
@@ -744,6 +746,7 @@ xfs_trans_ordered_buf(
 	 * to be marked dirty and that it has been logged.
 	 */
 	xfs_trans_dirty_buf(tp, bp);
+	return true;
 }
 
 /*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change
  2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
                   ` (7 preceding siblings ...)
  2017-08-25 15:05 ` [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
@ 2017-08-25 15:05 ` Brian Foster
  2017-08-25 16:53   ` Darrick J. Wong
  2017-08-28  9:51   ` Christoph Hellwig
  8 siblings, 2 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 15:05 UTC (permalink / raw)
  To: linux-xfs

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_btree.c | 26 ++++++++++++++-------
 fs/xfs/xfs_bmap_util.c    | 57 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 65 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index d06b04d..c466a23 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4452,10 +4452,15 @@ xfs_btree_block_change_owner(
 
 	/* modify the owner */
 	block = xfs_btree_get_block(cur, level, &bp);
-	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		if (block->bb_u.l.bb_owner == cpu_to_be64(bbcoi->new_owner))
+			return 0;
 		block->bb_u.l.bb_owner = cpu_to_be64(bbcoi->new_owner);
-	else
+	} else {
+		if (block->bb_u.s.bb_owner == cpu_to_be32(bbcoi->new_owner))
+			return 0;
 		block->bb_u.s.bb_owner = cpu_to_be32(bbcoi->new_owner);
+	}
 
 	/*
 	 * If the block is a root block hosted in an inode, we might not have a
@@ -4464,14 +4469,19 @@ xfs_btree_block_change_owner(
 	 * block is formatted into the on-disk inode fork. We still change it,
 	 * though, so everything is consistent in memory.
 	 */
-	if (bp) {
-		if (cur->bc_tp)
-			xfs_trans_ordered_buf(cur->bc_tp, bp);
-		else
-			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
-	} else {
+	if (!bp) {
 		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
 		ASSERT(level == cur->bc_nlevels - 1);
+		return 0;
+	}
+
+	if (cur->bc_tp) {
+		if (!xfs_trans_ordered_buf(cur->bc_tp, bp)) {
+			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
+			return -EAGAIN;
+		}
+	} else {
+		xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
 	}
 
 	return 0;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index ee8fb9a..3e9b7a4 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1929,6 +1929,48 @@ xfs_swap_extent_forks(
 	return 0;
 }
 
+/*
+ * Fix up the owners of the bmbt blocks to refer to the current inode. The
+ * change owner scan attempts to order all modified buffers in the current
+ * transaction. In the event of ordered buffer failure, the offending buffer is
+ * physically logged as a fallback and the scan returns -EAGAIN. We must roll
+ * the transaction in this case to replenish the fallback log reservation and
+ * restart the scan. This process repeats until the scan completes.
+ */
+static int
+xfs_swap_change_owner(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	struct xfs_inode	*tmpip)
+{
+	int			error;
+	struct xfs_trans	*tp = *tpp;
+
+	do {
+		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino,
+					      NULL);
+		/* success or fatal error */
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_trans_roll(tpp, NULL);
+		if (error)
+			break;
+		tp = *tpp;
+
+		/*
+		 * Redirty both inodes so they can relog and keep the log tail
+		 * moving forward.
+		 */
+		xfs_trans_ijoin(tp, ip, 0);
+		xfs_trans_ijoin(tp, tmpip, 0);
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE);
+	} while (true);
+
+	return error;
+}
+
 int
 xfs_swap_extents(
 	struct xfs_inode	*ip,	/* target inode */
@@ -1943,7 +1985,7 @@ xfs_swap_extents(
 	int			lock_flags;
 	struct xfs_ifork	*cowfp;
 	uint64_t		f;
-	int			resblks;
+	int			resblks = 0;
 
 	/*
 	 * Lock the inodes against other IO, page faults and truncate to
@@ -1991,11 +2033,8 @@ xfs_swap_extents(
 			  XFS_SWAP_RMAP_SPACE_RES(mp,
 				XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK),
 				XFS_DATA_FORK);
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
-				0, 0, &tp);
-	} else
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0,
-				0, 0, &tp);
+	}
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	if (error)
 		goto out_unlock;
 
@@ -2087,14 +2126,12 @@ xfs_swap_extents(
 	 * inode number of the current inode.
 	 */
 	if (src_log_flags & XFS_ILOG_DOWNER) {
-		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
-					      ip->i_ino, NULL);
+		error = xfs_swap_change_owner(&tp, ip, tip);
 		if (error)
 			goto out_trans_cancel;
 	}
 	if (target_log_flags & XFS_ILOG_DOWNER) {
-		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
-					      tip->i_ino, NULL);
+		error = xfs_swap_change_owner(&tp, tip, ip);
 		if (error)
 			goto out_trans_cancel;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/9] xfs: ordered buffer log items are never formatted
  2017-08-25 15:05 ` [PATCH 3/9] xfs: ordered buffer log items are never formatted Brian Foster
@ 2017-08-25 15:26   ` Darrick J. Wong
  2017-08-28  9:26   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 15:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:51AM -0400, Brian Foster wrote:
> Ordered buffers pass through the logging infrastructure without ever
> being written to the log. The way this works is that the ordered
> buffer status is transferred to the log vector at commit time via
> the ->iop_size() callback. In xlog_cil_insert_format_items(),
> ordered log vectors bypass ->iop_format() processing altogether.
> 
> Therefore it is unnecessary for xfs_buf_item_format() to handle
> ordered buffers. Remove the unnecessary logic and assert that an
> ordered buffer never reaches this point.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf_item.c | 12 ++----------
>  fs/xfs/xfs_trace.h    |  1 -
>  2 files changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 9f4dbca..4df4581 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -323,6 +323,8 @@ xfs_buf_item_format(
>  	ASSERT((bip->bli_flags & XFS_BLI_STALE) ||
>  	       (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF
>  	        && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF));
> +	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED) ||
> +	       bip->bli_flags & XFS_BLI_STALE);

Needs parentheses around the stale check.

--D

>  
>  
>  	/*
> @@ -347,16 +349,6 @@ xfs_buf_item_format(
>  		bip->bli_flags &= ~XFS_BLI_INODE_BUF;
>  	}
>  
> -	if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
> -							XFS_BLI_ORDERED) {
> -		/*
> -		 * The buffer has been logged just to order it.  It is not being
> -		 * included in the transaction commit, so don't format it.
> -		 */
> -		trace_xfs_buf_item_format_ordered(bip);
> -		return;
> -	}
> -
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
>  					    &bip->bli_formats[i]);
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 6881047..e839ab4 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -517,7 +517,6 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
> -DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/9] xfs: open-code xfs_buf_item_dirty()
  2017-08-25 15:05 ` [PATCH 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
@ 2017-08-25 15:26   ` Darrick J. Wong
  2017-08-28  9:20   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 15:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:49AM -0400, Brian Foster wrote:
> It checks a single flag and has one caller. It probably isn't worth
> its own function.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_buf_item.c  | 11 -----------
>  fs/xfs/xfs_buf_item.h  |  1 -
>  fs/xfs/xfs_trans_buf.c |  2 +-
>  3 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 573fc72..cdae0ad5e0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -945,17 +945,6 @@ xfs_buf_item_log(
>  }
>  
>  
> -/*
> - * Return 1 if the buffer has been logged or ordered in a transaction (at any
> - * point, not just the current transaction) and 0 if not.
> - */
> -uint
> -xfs_buf_item_dirty(
> -	xfs_buf_log_item_t	*bip)
> -{
> -	return (bip->bli_flags & XFS_BLI_DIRTY);
> -}
> -
>  STATIC void
>  xfs_buf_item_free(
>  	xfs_buf_log_item_t	*bip)
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index 530686e..e0e744a 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -64,7 +64,6 @@ typedef struct xfs_buf_log_item {
>  int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
>  void	xfs_buf_item_relse(struct xfs_buf *);
>  void	xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
> -uint	xfs_buf_item_dirty(xfs_buf_log_item_t *);
>  void	xfs_buf_attach_iodone(struct xfs_buf *,
>  			      void(*)(struct xfs_buf *, xfs_log_item_t *),
>  			      xfs_log_item_t *);
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 86987d8..cac8abb 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -435,7 +435,7 @@ xfs_trans_brelse(xfs_trans_t	*tp,
>  	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
>  		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
>  		xfs_buf_item_relse(bp);
> -	} else if (!xfs_buf_item_dirty(bip)) {
> +	} else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
>  /***
>  		ASSERT(bp->b_pincount == 0);
>  ***/
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/9] xfs: skip bmbt block ino validation during owner change
  2017-08-25 15:05 ` [PATCH 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
@ 2017-08-25 15:35   ` Darrick J. Wong
  2017-08-25 18:11     ` Brian Foster
  2017-08-28  9:44   ` Christoph Hellwig
  1 sibling, 1 reply; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 15:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:54AM -0400, Brian Foster wrote:
> Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
> owners on v5 (!rmapbt) filesystems. The bmbt scan uses
> xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
> current owner of the block against the parent inode of the bmbt.
> This works during extent swap because the bmbt owners are updated to
> the opposite inode number before the inode extent forks are swapped.
> 
> The modified bmbt blocks are marked as ordered buffers which allows
> everything to commit in a single transaction. If the transaction
> commits to the log and the system crashes such that recovery of the
> extent swap is required, log recovery restarts the bmbt scan to fix
> up any bmbt blocks that may have not been written back before the
> crash. The log recovery bmbt scan occurs after the inode forks have
> been swapped, however. This causes the bmbt block owner verification
> to fail, leads to log recovery failure and requires xfs_repair to
> zap the log to recover.
> 
> Define a new invalid inode owner flag to inform the btree block
> lookup mechanism that the current inode may be invalid with respect
> to the current owner of the bmbt block. Set this flag on the cursor
> used for change owner scans to allow this operation to work at
> runtime and during log recovery.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap_btree.c | 1 +
>  fs/xfs/libxfs/xfs_btree.c      | 1 +
>  fs/xfs/libxfs/xfs_btree.h      | 3 ++-
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 85de225..6f05995 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -858,6 +858,7 @@ xfs_bmbt_change_owner(
>  	cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork);
>  	if (!cur)
>  		return -ENOMEM;
> +	cur->bc_private.b.flags |= XFS_BTCUR_BPRV_INVALIDINO;
>  
>  	error = xfs_btree_change_owner(cur, new_owner, buffer_list);
>  	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 0b7905a..d06b04d 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -1791,6 +1791,7 @@ xfs_btree_lookup_get_block(
>  
>  	/* Check the inode owner since the verifiers don't. */
>  	if (xfs_sb_version_hascrc(&cur->bc_mp->m_sb) &&
> +	    !(cur->bc_private.b.flags & XFS_BTCUR_BPRV_INVALIDINO) &&
>  	    (cur->bc_flags & XFS_BTREE_LONG_PTRS) &&
>  	    be64_to_cpu((*blkp)->bb_u.l.bb_owner) !=
>  			cur->bc_private.b.ip->i_ino)
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 9c95e96..c803ddc 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -233,7 +233,8 @@ typedef struct xfs_btree_cur
>  			short		forksize;	/* fork's inode space */
>  			char		whichfork;	/* data or attr fork */
>  			char		flags;		/* flags */
> -#define	XFS_BTCUR_BPRV_WASDEL	1			/* was delayed */
> +#define	XFS_BTCUR_BPRV_WASDEL		(1<<0)		/* was delayed */
> +#define	XFS_BTCUR_BPRV_INVALIDINO	(1<<1)		/* for ext swap */

I think I'd call this invalid owner since that's the name of the field
that we're (not) validating.

Otherwise looks fine,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

>  		} b;
>  	}		bc_private;	/* per-btree type data */
>  } xfs_btree_cur_t;
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs
  2017-08-25 15:05 ` [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
@ 2017-08-25 15:51   ` Darrick J. Wong
  2017-08-28  9:25   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 15:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:50AM -0400, Brian Foster wrote:
> xfs_buf_item_unlock() historically checked the dirty state of the
> buffer by manually checking the buffer log formats for dirty
> segments. The introduction of ordered buffers invalidated this check
> because ordered buffers have dirty bli's but no dirty (logged)
> segments. The check was updated to accommodate ordered buffers by
> looking at the bli state first and considering the blf only if the
> bli is clean.
> 
> This logic is safe but unnecessary. There is no valid case where the
> bli is clean yet the blf has dirty segments. The bli is set dirty
> whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
> cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).
> 
> Remove the conditional blf dirty checks and replace with an assert
> that should catch any discrepencies between bli and blf dirty
> states. Refactor the old blf dirty check into a helper function to
> be used by the assert.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf_item.c | 46 +++++++++++++++++++++++++++++++---------------
>  fs/xfs/xfs_buf_item.h |  1 +
>  2 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index cdae0ad5e0..9f4dbca 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -575,8 +575,9 @@ xfs_buf_item_unlock(
>  {
>  	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
>  	struct xfs_buf		*bp = bip->bli_buf;
> -	bool			clean;
> +	bool			dirty;
>  	bool			aborted;
> +	bool			ordered;
>  	int			flags;
>  
>  	/* Clear the buffer's association with this transaction. */
> @@ -620,20 +621,13 @@ xfs_buf_item_unlock(
>  	 * regardless of whether it is dirty or not. A dirty abort implies a
>  	 * shutdown, anyway.
>  	 *
> -	 * Ordered buffers are dirty but may have no recorded changes, so ensure
> -	 * we only release clean items here.
> +	 * The bli dirty state should match whether the blf has logged segments
> +	 * except for ordered buffers, where only the bli should be dirty.
>  	 */
> -	clean = (flags & XFS_BLI_DIRTY) ? false : true;
> -	if (clean) {
> -		int i;
> -		for (i = 0; i < bip->bli_format_count; i++) {
> -			if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
> -				     bip->bli_formats[i].blf_map_size)) {
> -				clean = false;
> -				break;
> -			}
> -		}
> -	}
> +	dirty = (flags & XFS_BLI_DIRTY) ? true : false;
> +	ordered = (flags & XFS_BLI_ORDERED) ? true : false;
> +	ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
> +	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
>  
>  	/*
>  	 * Clean buffers, by definition, cannot be in the AIL. However, aborted
> @@ -652,7 +646,7 @@ xfs_buf_item_unlock(
>  			ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp));
>  			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
>  			xfs_buf_item_relse(bp);
> -		} else if (clean)
> +		} else if (!dirty)
>  			xfs_buf_item_relse(bp);
>  	}
>  
> @@ -945,6 +939,28 @@ xfs_buf_item_log(
>  }
>  
>  
> +/*
> + * Return true if the buffer has any ranges logged/dirtied by a transaction,
> + * false otherwise.
> + */
> +bool
> +xfs_buf_item_dirty_format(
> +	struct xfs_buf_log_item	*bip)
> +{
> +	int			i;
> +	bool			dirty = false;
> +
> +	for (i = 0; i < bip->bli_format_count; i++) {
> +		if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
> +			     bip->bli_formats[i].blf_map_size)) {
> +			dirty = true;
> +			break;

/me imagines this could be shortened to 'return true', but looks ok otherwise,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +		}
> +	}
> +
> +	return dirty;
> +}
> +
>  STATIC void
>  xfs_buf_item_free(
>  	xfs_buf_log_item_t	*bip)
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index e0e744a..9690ce6 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -64,6 +64,7 @@ typedef struct xfs_buf_log_item {
>  int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
>  void	xfs_buf_item_relse(struct xfs_buf *);
>  void	xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
> +bool	xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
>  void	xfs_buf_attach_iodone(struct xfs_buf *,
>  			      void(*)(struct xfs_buf *, xfs_log_item_t *),
>  			      xfs_log_item_t *);
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers
  2017-08-25 15:05 ` [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
@ 2017-08-25 15:51   ` Darrick J. Wong
  2017-08-28  9:29   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 15:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:53AM -0400, Brian Foster wrote:
> Ordered buffers are attached to transactions and pushed through the
> logging infrastructure just like normal buffers with the exception
> that they are not actually written to the log. Therefore, we don't
> need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
> called on ordered buffers to set up all of the dirty state on the
> transaction, buffer and log item and prepare the buffer for I/O.
> 
> Now that xfs_trans_dirty_buf() is available, call it from
> xfs_trans_ordered_buf() so the latter is now mutually exclusive with
> xfs_trans_log_buf(). This reflects the implementation of ordered
> buffers and helps eliminate confusion over the need to log ranges of
> ordered buffers just to set up internal log state.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_btree.c  |  6 ++----
>  fs/xfs/libxfs/xfs_ialloc.c |  2 --
>  fs/xfs/xfs_trans_buf.c     | 26 ++++++++++++++------------
>  3 files changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index e0bcc4a..0b7905a 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -4464,12 +4464,10 @@ xfs_btree_block_change_owner(
>  	 * though, so everything is consistent in memory.
>  	 */
>  	if (bp) {
> -		if (cur->bc_tp) {
> +		if (cur->bc_tp)
>  			xfs_trans_ordered_buf(cur->bc_tp, bp);
> -			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
> -		} else {
> +		else
>  			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
> -		}
>  	} else {
>  		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
>  		ASSERT(level == cur->bc_nlevels - 1);
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 1e0658a..988bb3f 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -378,8 +378,6 @@ xfs_ialloc_inode_init(
>  				 * transaction and pin the log appropriately.
>  				 */
>  				xfs_trans_ordered_buf(tp, fbuf);
> -				xfs_trans_log_buf(tp, fbuf, 0,
> -						  BBTOB(fbuf->b_length) - 1);
>  			}
>  		} else {
>  			fbuf->b_flags |= XBF_DONE;
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 8c99813..3089e80 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -560,16 +560,12 @@ xfs_trans_log_buf(
>  	struct xfs_buf_log_item	*bip = bp->b_fspriv;
>  
>  	ASSERT(first <= last && last < BBTOB(bp->b_length));
> +	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED));
>  
>  	xfs_trans_dirty_buf(tp, bp);
>  
> -	/*
> -	 * If we have an ordered buffer we are not logging any dirty range but
> -	 * it still needs to be marked dirty and that it has been logged.
> -	 */
>  	trace_xfs_trans_log_buf(bip);
> -	if (!(bip->bli_flags & XFS_BLI_ORDERED))
> -		xfs_buf_item_log(bip, first, last);
> +	xfs_buf_item_log(bip, first, last);
>  }
>  
>  
> @@ -722,12 +718,11 @@ xfs_trans_inode_alloc_buf(
>  }
>  
>  /*
> - * Mark the buffer as ordered for this transaction. This means
> - * that the contents of the buffer are not recorded in the transaction
> - * but it is tracked in the AIL as though it was. This allows us
> - * to record logical changes in transactions rather than the physical
> - * changes we make to the buffer without changing writeback ordering
> - * constraints of metadata buffers.
> + * Mark the buffer as ordered for this transaction. This means that the contents
> + * of the buffer are not recorded in the transaction but it is tracked in the
> + * AIL as though it was. This allows us to record logical changes in
> + * transactions rather than the physical changes we make to the buffer without
> + * changing writeback ordering constraints of metadata buffers.
>   */
>  void
>  xfs_trans_ordered_buf(
> @@ -739,9 +734,16 @@ xfs_trans_ordered_buf(
>  	ASSERT(bp->b_transp == tp);
>  	ASSERT(bip != NULL);
>  	ASSERT(atomic_read(&bip->bli_refcount) > 0);
> +	ASSERT(!xfs_buf_item_dirty_format(bip));
>  
>  	bip->bli_flags |= XFS_BLI_ORDERED;
>  	trace_xfs_buf_item_ordered(bip);
> +
> +	/*
> +	 * We don't log a dirty range of an ordered buffer but it still needs
> +	 * to be marked dirty and that it has been logged.
> +	 */
> +	xfs_trans_dirty_buf(tp, bp);
>  }
>  
>  /*
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap
  2017-08-25 15:05 ` [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
@ 2017-08-25 15:57   ` Darrick J. Wong
  2017-08-28  9:46   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 15:57 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:55AM -0400, Brian Foster wrote:
> The extent swap operation currently resets bmbt block owners before
> the inode forks are swapped. The bmbt buffers are marked as ordered
> so they do not have to be physically logged in the transaction.
> 
> This use of ordered buffers is not safe as bmbt buffers may have
> been previously physically logged. The bmbt owner change algorithm
> needs to be updated to physically log buffers that are already dirty
> when/if they are encountered. This means that an extent swap will
> eventually require multiple rolling transactions to handle large
> btrees. In addition, all inode related changes must be logged before
> the bmbt owner change scan begins and can roll the transaction for
> the first time to preserve fs consistency via log recovery.
> 
> In preparation for such fixes to the bmbt owner change algorithm,
> refactor the bmbt scan out of the extent fork swap code to the last
> operation before the transaction is committed. Update
> xfs_swap_extent_forks() to only set the inode log flags when an
> owner change scan is necessary. Update xfs_swap_extents() to trigger
> the owner change based on the inode log flags. Note that since the
> owner change now occurs after the extent fork swap, the inode btrees
> must be fixed up with the inode number of the current inode (similar
> to log recovery).
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok I think,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_bmap_util.c | 44 ++++++++++++++++++++++++++------------------
>  1 file changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 93e9552..ee8fb9a 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1840,29 +1840,18 @@ xfs_swap_extent_forks(
>  	}
>  
>  	/*
> -	 * Before we've swapped the forks, lets set the owners of the forks
> -	 * appropriately. We have to do this as we are demand paging the btree
> -	 * buffers, and so the validation done on read will expect the owner
> -	 * field to be correctly set. Once we change the owners, we can swap the
> -	 * inode forks.
> +	 * Btree format (v3) inodes have the inode number stamped in the bmbt
> +	 * block headers. We can't start changing the bmbt blocks until the
> +	 * inode owner change is logged so recovery does the right thing in the
> +	 * event of a crash. Set the owner change log flags now and leave the
> +	 * bmbt scan as the last step.
>  	 */
>  	if (ip->i_d.di_version == 3 &&
> -	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
> +	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE)
>  		(*target_log_flags) |= XFS_ILOG_DOWNER;
> -		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
> -					      tip->i_ino, NULL);
> -		if (error)
> -			return error;
> -	}
> -
>  	if (tip->i_d.di_version == 3 &&
> -	    tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
> +	    tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
>  		(*src_log_flags) |= XFS_ILOG_DOWNER;
> -		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
> -					      ip->i_ino, NULL);
> -		if (error)
> -			return error;
> -	}
>  
>  	/*
>  	 * Swap the data forks of the inodes
> @@ -2092,6 +2081,25 @@ xfs_swap_extents(
>  	xfs_trans_log_inode(tp, tip, target_log_flags);
>  
>  	/*
> +	 * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems
> +	 * have inode number owner values in the bmbt blocks that still refer to
> +	 * the old inode. Scan each bmbt to fix up the owner values with the
> +	 * inode number of the current inode.
> +	 */
> +	if (src_log_flags & XFS_ILOG_DOWNER) {
> +		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
> +					      ip->i_ino, NULL);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +	if (target_log_flags & XFS_ILOG_DOWNER) {
> +		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
> +					      tip->i_ino, NULL);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
> +	/*
>  	 * If this is a synchronous mount, make sure that the
>  	 * transaction goes to disk before returning to the user.
>  	 */
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered
  2017-08-25 15:05 ` [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
@ 2017-08-25 16:50   ` Darrick J. Wong
  2017-08-28  9:34   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 16:50 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:56AM -0400, Brian Foster wrote:
> Ordered buffers are used in situations where the buffer is not
> physically logged but must pass through the transaction/logging
> pipeline for a particular transaction. As a result, ordered buffers
> are not unpinned and written back until the transaction commits to
> the log. Ordered buffers have a strict requirement that the target
> buffer must not be currently dirty and resident in the log pipeline
> at the time it is marked ordered. If a dirty+ordered buffer is
> committed, the buffer is reinserted to the AIL but not physically
> relogged at the LSN of the associated checkpoint. The buffer log
> item is assigned the LSN of the latest checkpoint and the AIL
> effectively releases the previously logged buffer content from the
> active log before the buffer has been written back. If the tail
> pushes forward and a filesystem crash occurs while in this state, an
> inconsistent filesystem could result.
> 
> It is currently the caller responsibility to ensure an ordered
> buffer is not already dirty from a previous modification. This is
> unclear and error prone when not used in situations where it is
> guaranteed a buffer has not been previously modified (such as new
> metadata allocations).
> 
> To facilitate general purpose use of ordered buffers, update
> xfs_trans_ordered_buf() to conditionally order the buffer based on
> state of the log item and return the status of the result. If the
> bli is dirty, do not order the buffer and return false. The caller
> must either physically log the buffer (having acquired the
> appropriate log reservation) or push it from the AIL to clean it
> before it can be marked ordered in the current transaction.
> 
> Note that ordered buffers are currently only used in two situations:
> 1.) inode chunk allocation where previously logged buffers are not
> possible and 2.) extent swap which will be updated to handle ordered
> buffer failures in a separate patch.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_trans.h     | 2 +-
>  fs/xfs/xfs_trans_buf.c | 7 +++++--
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index b3632eb..4709823 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -212,7 +212,7 @@ void		xfs_trans_bhold_release(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
> -void		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
> +bool		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
>  void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 3089e80..3ba7a96 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -724,7 +724,7 @@ xfs_trans_inode_alloc_buf(
>   * transactions rather than the physical changes we make to the buffer without
>   * changing writeback ordering constraints of metadata buffers.
>   */
> -void
> +bool
>  xfs_trans_ordered_buf(
>  	struct xfs_trans	*tp,
>  	struct xfs_buf		*bp)
> @@ -734,7 +734,9 @@ xfs_trans_ordered_buf(
>  	ASSERT(bp->b_transp == tp);
>  	ASSERT(bip != NULL);
>  	ASSERT(atomic_read(&bip->bli_refcount) > 0);
> -	ASSERT(!xfs_buf_item_dirty_format(bip));
> +
> +	if (xfs_buf_item_dirty_format(bip))
> +		return false;
>  
>  	bip->bli_flags |= XFS_BLI_ORDERED;
>  	trace_xfs_buf_item_ordered(bip);
> @@ -744,6 +746,7 @@ xfs_trans_ordered_buf(
>  	 * to be marked dirty and that it has been logged.
>  	 */
>  	xfs_trans_dirty_buf(tp, bp);
> +	return true;
>  }
>  
>  /*
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change
  2017-08-25 15:05 ` [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
@ 2017-08-25 16:53   ` Darrick J. Wong
  2017-08-28  9:51   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-08-25 16:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 11:05:57AM -0400, Brian Foster wrote:
> The owner change bmbt scan that occurs during extent swap operations
> does not handle ordered buffer failures. Buffers that cannot be
> marked ordered must be physically logged so previously dirty ranges
> of the buffer can be relogged in the transaction.
> 
> Since the bmbt scan may need to process and potentially log a large
> number of blocks, we can't expect to complete this operation in a
> single transaction. Update extent swap to use a permanent
> transaction with enough log reservation to physically log a buffer.
> Update the bmbt scan to physically log any buffers that cannot be
> ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
> caller rolls the transaction and restarts the scan. Finally, update
> the bmbt scan helper function to skip bmbt blocks that already match
> the expected owner so they are not reprocessed after scan restarts.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok, I think...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_btree.c | 26 ++++++++++++++-------
>  fs/xfs/xfs_bmap_util.c    | 57 ++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 65 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index d06b04d..c466a23 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -4452,10 +4452,15 @@ xfs_btree_block_change_owner(
>  
>  	/* modify the owner */
>  	block = xfs_btree_get_block(cur, level, &bp);
> -	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +		if (block->bb_u.l.bb_owner == cpu_to_be64(bbcoi->new_owner))
> +			return 0;
>  		block->bb_u.l.bb_owner = cpu_to_be64(bbcoi->new_owner);
> -	else
> +	} else {
> +		if (block->bb_u.s.bb_owner == cpu_to_be32(bbcoi->new_owner))
> +			return 0;
>  		block->bb_u.s.bb_owner = cpu_to_be32(bbcoi->new_owner);
> +	}
>  
>  	/*
>  	 * If the block is a root block hosted in an inode, we might not have a
> @@ -4464,14 +4469,19 @@ xfs_btree_block_change_owner(
>  	 * block is formatted into the on-disk inode fork. We still change it,
>  	 * though, so everything is consistent in memory.
>  	 */
> -	if (bp) {
> -		if (cur->bc_tp)
> -			xfs_trans_ordered_buf(cur->bc_tp, bp);
> -		else
> -			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
> -	} else {
> +	if (!bp) {
>  		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
>  		ASSERT(level == cur->bc_nlevels - 1);
> +		return 0;
> +	}
> +
> +	if (cur->bc_tp) {
> +		if (!xfs_trans_ordered_buf(cur->bc_tp, bp)) {
> +			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
> +			return -EAGAIN;
> +		}
> +	} else {
> +		xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
>  	}
>  
>  	return 0;
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index ee8fb9a..3e9b7a4 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1929,6 +1929,48 @@ xfs_swap_extent_forks(
>  	return 0;
>  }
>  
> +/*
> + * Fix up the owners of the bmbt blocks to refer to the current inode. The
> + * change owner scan attempts to order all modified buffers in the current
> + * transaction. In the event of ordered buffer failure, the offending buffer is
> + * physically logged as a fallback and the scan returns -EAGAIN. We must roll
> + * the transaction in this case to replenish the fallback log reservation and
> + * restart the scan. This process repeats until the scan completes.
> + */
> +static int
> +xfs_swap_change_owner(
> +	struct xfs_trans	**tpp,
> +	struct xfs_inode	*ip,
> +	struct xfs_inode	*tmpip)
> +{
> +	int			error;
> +	struct xfs_trans	*tp = *tpp;
> +
> +	do {
> +		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino,
> +					      NULL);
> +		/* success or fatal error */
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_trans_roll(tpp, NULL);
> +		if (error)
> +			break;
> +		tp = *tpp;
> +
> +		/*
> +		 * Redirty both inodes so they can relog and keep the log tail
> +		 * moving forward.
> +		 */
> +		xfs_trans_ijoin(tp, ip, 0);
> +		xfs_trans_ijoin(tp, tmpip, 0);
> +		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +		xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE);
> +	} while (true);
> +
> +	return error;
> +}
> +
>  int
>  xfs_swap_extents(
>  	struct xfs_inode	*ip,	/* target inode */
> @@ -1943,7 +1985,7 @@ xfs_swap_extents(
>  	int			lock_flags;
>  	struct xfs_ifork	*cowfp;
>  	uint64_t		f;
> -	int			resblks;
> +	int			resblks = 0;
>  
>  	/*
>  	 * Lock the inodes against other IO, page faults and truncate to
> @@ -1991,11 +2033,8 @@ xfs_swap_extents(
>  			  XFS_SWAP_RMAP_SPACE_RES(mp,
>  				XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK),
>  				XFS_DATA_FORK);
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
> -				0, 0, &tp);
> -	} else
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0,
> -				0, 0, &tp);
> +	}
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
>  	if (error)
>  		goto out_unlock;
>  
> @@ -2087,14 +2126,12 @@ xfs_swap_extents(
>  	 * inode number of the current inode.
>  	 */
>  	if (src_log_flags & XFS_ILOG_DOWNER) {
> -		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
> -					      ip->i_ino, NULL);
> +		error = xfs_swap_change_owner(&tp, ip, tip);
>  		if (error)
>  			goto out_trans_cancel;
>  	}
>  	if (target_log_flags & XFS_ILOG_DOWNER) {
> -		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
> -					      tip->i_ino, NULL);
> +		error = xfs_swap_change_owner(&tp, tip, ip);
>  		if (error)
>  			goto out_trans_cancel;
>  	}
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/9] xfs: skip bmbt block ino validation during owner change
  2017-08-25 15:35   ` Darrick J. Wong
@ 2017-08-25 18:11     ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-25 18:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Aug 25, 2017 at 08:35:53AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 25, 2017 at 11:05:54AM -0400, Brian Foster wrote:
> > Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
> > owners on v5 (!rmapbt) filesystems. The bmbt scan uses
> > xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
> > current owner of the block against the parent inode of the bmbt.
> > This works during extent swap because the bmbt owners are updated to
> > the opposite inode number before the inode extent forks are swapped.
> > 
> > The modified bmbt blocks are marked as ordered buffers which allows
> > everything to commit in a single transaction. If the transaction
> > commits to the log and the system crashes such that recovery of the
> > extent swap is required, log recovery restarts the bmbt scan to fix
> > up any bmbt blocks that may have not been written back before the
> > crash. The log recovery bmbt scan occurs after the inode forks have
> > been swapped, however. This causes the bmbt block owner verification
> > to fail, leads to log recovery failure and requires xfs_repair to
> > zap the log to recover.
> > 
> > Define a new invalid inode owner flag to inform the btree block
> > lookup mechanism that the current inode may be invalid with respect
> > to the current owner of the bmbt block. Set this flag on the cursor
> > used for change owner scans to allow this operation to work at
> > runtime and during log recovery.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap_btree.c | 1 +
> >  fs/xfs/libxfs/xfs_btree.c      | 1 +
> >  fs/xfs/libxfs/xfs_btree.h      | 3 ++-
> >  3 files changed, 4 insertions(+), 1 deletion(-)
> > 
...
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 9c95e96..c803ddc 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -233,7 +233,8 @@ typedef struct xfs_btree_cur
> >  			short		forksize;	/* fork's inode space */
> >  			char		whichfork;	/* data or attr fork */
> >  			char		flags;		/* flags */
> > -#define	XFS_BTCUR_BPRV_WASDEL	1			/* was delayed */
> > +#define	XFS_BTCUR_BPRV_WASDEL		(1<<0)		/* was delayed */
> > +#define	XFS_BTCUR_BPRV_INVALIDINO	(1<<1)		/* for ext swap */
> 
> I think I'd call this invalid owner since that's the name of the field
> that we're (not) validating.
> 

Sure, I'll change it to XFS_BTCUR_BPRV_INVALID_OWNER unless I hear
otherwise and incorporate the other cleanups for v2. Thanks for the
reviews.

Brian

> Otherwise looks fine,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> --D
> 
> >  		} b;
> >  	}		bc_private;	/* per-btree type data */
> >  } xfs_btree_cur_t;
> > -- 
> > 2.9.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/9] xfs: open-code xfs_buf_item_dirty()
  2017-08-25 15:05 ` [PATCH 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
  2017-08-25 15:26   ` Darrick J. Wong
@ 2017-08-28  9:20   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs
  2017-08-25 15:05 ` [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
  2017-08-25 15:51   ` Darrick J. Wong
@ 2017-08-28  9:25   ` Christoph Hellwig
  2017-08-28 10:51     ` Brian Foster
  1 sibling, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:25 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

This looks generally good, but I have a few nitpicks, too :)

On Fri, Aug 25, 2017 at 11:05:50AM -0400, Brian Foster wrote:
> -	bool			clean;
> +	bool			dirty;
>  	bool			aborted;
> +	bool			ordered;

Can we just initialize all these variables here at the declaration

> +	 * The bli dirty state should match whether the blf has logged segments
> +	 * except for ordered buffers, where only the bli should be dirty.
>  	 */
> +	dirty = (flags & XFS_BLI_DIRTY) ? true : false;
> +	ordered = (flags & XFS_BLI_ORDERED) ? true : false;

No need for the "? true : false" both for these two and the existing
aborted case.

> +/*
> + * Return true if the buffer has any ranges logged/dirtied by a transaction,
> + * false otherwise.
> + */
> +bool
> +xfs_buf_item_dirty_format(

Do you need this outside of xfs_buf_item.c later?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/9] xfs: ordered buffer log items are never formatted
  2017-08-25 15:05 ` [PATCH 3/9] xfs: ordered buffer log items are never formatted Brian Foster
  2017-08-25 15:26   ` Darrick J. Wong
@ 2017-08-28  9:26   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

(I don't think the additional brace are strictly needed, but they
really help parsing the statement)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/9] xfs: refactor buffer logging into buffer dirtying helper
  2017-08-25 15:05 ` [PATCH 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
@ 2017-08-28  9:28   ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:28 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers
  2017-08-25 15:05 ` [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
  2017-08-25 15:51   ` Darrick J. Wong
@ 2017-08-28  9:29   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered
  2017-08-25 15:05 ` [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
  2017-08-25 16:50   ` Darrick J. Wong
@ 2017-08-28  9:34   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/9] xfs: skip bmbt block ino validation during owner change
  2017-08-25 15:05 ` [PATCH 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
  2017-08-25 15:35   ` Darrick J. Wong
@ 2017-08-28  9:44   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks fine with this or another name for the flag:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap
  2017-08-25 15:05 ` [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
  2017-08-25 15:57   ` Darrick J. Wong
@ 2017-08-28  9:46   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change
  2017-08-25 15:05 ` [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
  2017-08-25 16:53   ` Darrick J. Wong
@ 2017-08-28  9:51   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-08-28  9:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs
  2017-08-28  9:25   ` Christoph Hellwig
@ 2017-08-28 10:51     ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2017-08-28 10:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Aug 28, 2017 at 02:25:43AM -0700, Christoph Hellwig wrote:
> This looks generally good, but I have a few nitpicks, too :)
> 
> On Fri, Aug 25, 2017 at 11:05:50AM -0400, Brian Foster wrote:
> > -	bool			clean;
> > +	bool			dirty;
> >  	bool			aborted;
> > +	bool			ordered;
> 
> Can we just initialize all these variables here at the declaration
> 

Sure..

> > +	 * The bli dirty state should match whether the blf has logged segments
> > +	 * except for ordered buffers, where only the bli should be dirty.
> >  	 */
> > +	dirty = (flags & XFS_BLI_DIRTY) ? true : false;
> > +	ordered = (flags & XFS_BLI_ORDERED) ? true : false;
> 
> No need for the "? true : false" both for these two and the existing
> aborted case.
> 

Ok, so I assume you mean to change these to something like:

	bool			aborted = !!(bip->bli_flags & XFS_BLI_ABORTED);
	...

> > +/*
> > + * Return true if the buffer has any ranges logged/dirtied by a transaction,
> > + * false otherwise.
> > + */
> > +bool
> > +xfs_buf_item_dirty_format(
> 
> Do you need this outside of xfs_buf_item.c later?
> 

It's used in xfs_trans_buf.c as well.

Brian

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-08-28 10:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-25 15:05 [PATCH 0/9] xfs: xfs: refactor ordered buffer logging code Brian Foster
2017-08-25 15:05 ` [PATCH 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
2017-08-25 15:26   ` Darrick J. Wong
2017-08-28  9:20   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
2017-08-25 15:51   ` Darrick J. Wong
2017-08-28  9:25   ` Christoph Hellwig
2017-08-28 10:51     ` Brian Foster
2017-08-25 15:05 ` [PATCH 3/9] xfs: ordered buffer log items are never formatted Brian Foster
2017-08-25 15:26   ` Darrick J. Wong
2017-08-28  9:26   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
2017-08-28  9:28   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
2017-08-25 15:51   ` Darrick J. Wong
2017-08-28  9:29   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
2017-08-25 15:35   ` Darrick J. Wong
2017-08-25 18:11     ` Brian Foster
2017-08-28  9:44   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
2017-08-25 15:57   ` Darrick J. Wong
2017-08-28  9:46   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
2017-08-25 16:50   ` Darrick J. Wong
2017-08-28  9:34   ` Christoph Hellwig
2017-08-25 15:05 ` [PATCH 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
2017-08-25 16:53   ` Darrick J. Wong
2017-08-28  9:51   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.