All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] xfs: refactor ordered buffer logging code
@ 2017-08-29 14:37 Brian Foster
  2017-08-29 14:37 ` [PATCH v2 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

Hi all,

Here's v2 of the ordered buffer / buffer relogging refactoring series.
Only minor updates in this version. I've also tagged patch 6 for stable
as it fixes a log recovery regression for bmbt owner change operations.

Brian

v2:
- Various cleanups.
- Rename bmbt scan invalid owner flag.
- Added stable CC for bmbt owner change log recovery fix (6/9).
v1: https://marc.info/?l=linux-xfs&m=150367356112090&w=2
- Drop unnecessary buffer type set in xfs_btree_block_change_owner().
- Several new cleanup patches.
- Fix bmbt owner change (extent swap) log recovery.
- Update buffer ordering to fail on previously dirty buffers.
- Rework bmbt owner change to use ordered buffers correctly.
rfc: https://marc.info/?l=linux-xfs&m=150272969407981&w=2

Brian Foster (9):
  xfs: open-code xfs_buf_item_dirty()
  xfs: remove unnecessary dirty bli format check for ordered bufs
  xfs: ordered buffer log items are never formatted
  xfs: refactor buffer logging into buffer dirtying helper
  xfs: don't log dirty ranges for ordered buffers
  xfs: skip bmbt block ino validation during owner change
  xfs: move bmbt owner change to last step of extent swap
  xfs: disallow marking previously dirty buffers as ordered
  xfs: relog dirty buffers during swapext bmbt owner change

 fs/xfs/libxfs/xfs_bmap_btree.c |  1 +
 fs/xfs/libxfs/xfs_btree.c      | 27 ++++++++----
 fs/xfs/libxfs/xfs_btree.h      |  3 +-
 fs/xfs/libxfs/xfs_ialloc.c     |  2 -
 fs/xfs/xfs_bmap_util.c         | 93 +++++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_buf_item.c          | 75 +++++++++++++---------------------
 fs/xfs/xfs_buf_item.h          |  2 +-
 fs/xfs/xfs_trace.h             |  1 -
 fs/xfs/xfs_trans.h             |  6 ++-
 fs/xfs/xfs_trans_buf.c         | 79 +++++++++++++++++++++--------------
 10 files changed, 173 insertions(+), 116 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/9] xfs: open-code xfs_buf_item_dirty()
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

It checks a single flag and has one caller. It probably isn't worth
its own function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf_item.c  | 11 -----------
 fs/xfs/xfs_buf_item.h  |  1 -
 fs/xfs/xfs_trans_buf.c |  2 +-
 3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 573fc72..cdae0ad5e0 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -945,17 +945,6 @@ xfs_buf_item_log(
 }
 
 
-/*
- * Return 1 if the buffer has been logged or ordered in a transaction (at any
- * point, not just the current transaction) and 0 if not.
- */
-uint
-xfs_buf_item_dirty(
-	xfs_buf_log_item_t	*bip)
-{
-	return (bip->bli_flags & XFS_BLI_DIRTY);
-}
-
 STATIC void
 xfs_buf_item_free(
 	xfs_buf_log_item_t	*bip)
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 530686e..e0e744a 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -64,7 +64,6 @@ typedef struct xfs_buf_log_item {
 int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
 void	xfs_buf_item_relse(struct xfs_buf *);
 void	xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
-uint	xfs_buf_item_dirty(xfs_buf_log_item_t *);
 void	xfs_buf_attach_iodone(struct xfs_buf *,
 			      void(*)(struct xfs_buf *, xfs_log_item_t *),
 			      xfs_log_item_t *);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 86987d8..cac8abb 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -435,7 +435,7 @@ xfs_trans_brelse(xfs_trans_t	*tp,
 	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
 		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
 		xfs_buf_item_relse(bp);
-	} else if (!xfs_buf_item_dirty(bip)) {
+	} else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
 /***
 		ASSERT(bp->b_pincount == 0);
 ***/
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
  2017-08-29 14:37 ` [PATCH v2 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 3/9] xfs: ordered buffer log items are never formatted Brian Foster
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

xfs_buf_item_unlock() historically checked the dirty state of the
buffer by manually checking the buffer log formats for dirty
segments. The introduction of ordered buffers invalidated this check
because ordered buffers have dirty bli's but no dirty (logged)
segments. The check was updated to accommodate ordered buffers by
looking at the bli state first and considering the blf only if the
bli is clean.

This logic is safe but unnecessary. There is no valid case where the
bli is clean yet the blf has dirty segments. The bli is set dirty
whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).

Remove the conditional blf dirty checks and replace with an assert
that should catch any discrepencies between bli and blf dirty
states. Refactor the old blf dirty check into a helper function to
be used by the assert.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_buf_item.c | 62 ++++++++++++++++++++++++++-------------------------
 fs/xfs/xfs_buf_item.h |  1 +
 2 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index cdae0ad5e0..ff076d1 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -575,26 +575,18 @@ xfs_buf_item_unlock(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	struct xfs_buf		*bp = bip->bli_buf;
-	bool			clean;
-	bool			aborted;
-	int			flags;
+	bool			aborted = !!(lip->li_flags & XFS_LI_ABORTED);
+	bool			hold = !!(bip->bli_flags & XFS_BLI_HOLD);
+	bool			dirty = !!(bip->bli_flags & XFS_BLI_DIRTY);
+	bool			ordered = !!(bip->bli_flags & XFS_BLI_ORDERED);
 
 	/* Clear the buffer's association with this transaction. */
 	bp->b_transp = NULL;
 
 	/*
-	 * If this is a transaction abort, don't return early.  Instead, allow
-	 * the brelse to happen.  Normally it would be done for stale
-	 * (cancelled) buffers at unpin time, but we'll never go through the
-	 * pin/unpin cycle if we abort inside commit.
+	 * The per-transaction state has been copied above so clear it from the
+	 * bli.
 	 */
-	aborted = (lip->li_flags & XFS_LI_ABORTED) ? true : false;
-	/*
-	 * Before possibly freeing the buf item, copy the per-transaction state
-	 * so we can reference it safely later after clearing it from the
-	 * buffer log item.
-	 */
-	flags = bip->bli_flags;
 	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
 
 	/*
@@ -602,7 +594,7 @@ xfs_buf_item_unlock(
 	 * unlock the buffer and free the buf item when the buffer is unpinned
 	 * for the last time.
 	 */
-	if (flags & XFS_BLI_STALE) {
+	if (bip->bli_flags & XFS_BLI_STALE) {
 		trace_xfs_buf_item_unlock_stale(bip);
 		ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
 		if (!aborted) {
@@ -620,20 +612,11 @@ xfs_buf_item_unlock(
 	 * regardless of whether it is dirty or not. A dirty abort implies a
 	 * shutdown, anyway.
 	 *
-	 * Ordered buffers are dirty but may have no recorded changes, so ensure
-	 * we only release clean items here.
+	 * The bli dirty state should match whether the blf has logged segments
+	 * except for ordered buffers, where only the bli should be dirty.
 	 */
-	clean = (flags & XFS_BLI_DIRTY) ? false : true;
-	if (clean) {
-		int i;
-		for (i = 0; i < bip->bli_format_count; i++) {
-			if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
-				     bip->bli_formats[i].blf_map_size)) {
-				clean = false;
-				break;
-			}
-		}
-	}
+	ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
+	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
 
 	/*
 	 * Clean buffers, by definition, cannot be in the AIL. However, aborted
@@ -652,11 +635,11 @@ xfs_buf_item_unlock(
 			ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp));
 			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
-		} else if (clean)
+		} else if (!dirty)
 			xfs_buf_item_relse(bp);
 	}
 
-	if (!(flags & XFS_BLI_HOLD))
+	if (!hold)
 		xfs_buf_relse(bp);
 }
 
@@ -945,6 +928,25 @@ xfs_buf_item_log(
 }
 
 
+/*
+ * Return true if the buffer has any ranges logged/dirtied by a transaction,
+ * false otherwise.
+ */
+bool
+xfs_buf_item_dirty_format(
+	struct xfs_buf_log_item	*bip)
+{
+	int			i;
+
+	for (i = 0; i < bip->bli_format_count; i++) {
+		if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
+			     bip->bli_formats[i].blf_map_size))
+			return true;
+	}
+
+	return false;
+}
+
 STATIC void
 xfs_buf_item_free(
 	xfs_buf_log_item_t	*bip)
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index e0e744a..9690ce6 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -64,6 +64,7 @@ typedef struct xfs_buf_log_item {
 int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
 void	xfs_buf_item_relse(struct xfs_buf *);
 void	xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
+bool	xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
 void	xfs_buf_attach_iodone(struct xfs_buf *,
 			      void(*)(struct xfs_buf *, xfs_log_item_t *),
 			      xfs_log_item_t *);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/9] xfs: ordered buffer log items are never formatted
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
  2017-08-29 14:37 ` [PATCH v2 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
  2017-08-29 14:37 ` [PATCH v2 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 18:49   ` Darrick J. Wong
  2017-08-29 14:37 ` [PATCH v2 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

Ordered buffers pass through the logging infrastructure without ever
being written to the log. The way this works is that the ordered
buffer status is transferred to the log vector at commit time via
the ->iop_size() callback. In xlog_cil_insert_format_items(),
ordered log vectors bypass ->iop_format() processing altogether.

Therefore it is unnecessary for xfs_buf_item_format() to handle
ordered buffers. Remove the unnecessary logic and assert that an
ordered buffer never reaches this point.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf_item.c | 12 ++----------
 fs/xfs/xfs_trace.h    |  1 -
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index ff076d1..ef2c137 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -323,6 +323,8 @@ xfs_buf_item_format(
 	ASSERT((bip->bli_flags & XFS_BLI_STALE) ||
 	       (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF
 	        && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF));
+	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED) ||
+	       (bip->bli_flags & XFS_BLI_STALE));
 
 
 	/*
@@ -347,16 +349,6 @@ xfs_buf_item_format(
 		bip->bli_flags &= ~XFS_BLI_INODE_BUF;
 	}
 
-	if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
-							XFS_BLI_ORDERED) {
-		/*
-		 * The buffer has been logged just to order it.  It is not being
-		 * included in the transaction commit, so don't format it.
-		 */
-		trace_xfs_buf_item_format_ordered(bip);
-		return;
-	}
-
 	for (i = 0; i < bip->bli_format_count; i++) {
 		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
 					    &bip->bli_formats[i]);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 6881047..e839ab4 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -517,7 +517,6 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
-DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/9] xfs: refactor buffer logging into buffer dirtying helper
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (2 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 3/9] xfs: ordered buffer log items are never formatted Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer.  One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.

Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_trans.h     |  4 +++-
 fs/xfs/xfs_trans_buf.c | 46 ++++++++++++++++++++++++++++++----------------
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 7d62772..b3632eb 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -217,7 +217,9 @@ void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
 void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
 void		xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
-void		xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
+void		xfs_trans_log_buf(struct xfs_trans *, struct xfs_buf *, uint,
+				  uint);
+void		xfs_trans_dirty_buf(struct xfs_trans *, struct xfs_buf *);
 void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
 
 void		xfs_extent_free_init_defer_op(void);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index cac8abb..8c99813 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -493,25 +493,17 @@ xfs_trans_bhold_release(xfs_trans_t	*tp,
 }
 
 /*
- * This is called to mark bytes first through last inclusive of the given
- * buffer as needing to be logged when the transaction is committed.
- * The buffer must already be associated with the given transaction.
- *
- * First and last are numbers relative to the beginning of this buffer,
- * so the first byte in the buffer is numbered 0 regardless of the
- * value of b_blkno.
+ * Mark a buffer dirty in the transaction.
  */
 void
-xfs_trans_log_buf(xfs_trans_t	*tp,
-		  xfs_buf_t	*bp,
-		  uint		first,
-		  uint		last)
+xfs_trans_dirty_buf(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp)
 {
-	xfs_buf_log_item_t	*bip = bp->b_fspriv;
+	struct xfs_buf_log_item	*bip = bp->b_fspriv;
 
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
-	ASSERT(first <= last && last < BBTOB(bp->b_length));
 	ASSERT(bp->b_iodone == NULL ||
 	       bp->b_iodone == xfs_buf_iodone_callbacks);
 
@@ -531,8 +523,6 @@ xfs_trans_log_buf(xfs_trans_t	*tp,
 	bp->b_iodone = xfs_buf_iodone_callbacks;
 	bip->bli_item.li_cb = xfs_buf_iodone;
 
-	trace_xfs_trans_log_buf(bip);
-
 	/*
 	 * If we invalidated the buffer within this transaction, then
 	 * cancel the invalidation now that we're dirtying the buffer
@@ -545,15 +535,39 @@ xfs_trans_log_buf(xfs_trans_t	*tp,
 		bp->b_flags &= ~XBF_STALE;
 		bip->__bli_format.blf_flags &= ~XFS_BLF_CANCEL;
 	}
+	bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
 
 	tp->t_flags |= XFS_TRANS_DIRTY;
 	bip->bli_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+}
+
+/*
+ * This is called to mark bytes first through last inclusive of the given
+ * buffer as needing to be logged when the transaction is committed.
+ * The buffer must already be associated with the given transaction.
+ *
+ * First and last are numbers relative to the beginning of this buffer,
+ * so the first byte in the buffer is numbered 0 regardless of the
+ * value of b_blkno.
+ */
+void
+xfs_trans_log_buf(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp,
+	uint			first,
+	uint			last)
+{
+	struct xfs_buf_log_item	*bip = bp->b_fspriv;
+
+	ASSERT(first <= last && last < BBTOB(bp->b_length));
+
+	xfs_trans_dirty_buf(tp, bp);
 
 	/*
 	 * If we have an ordered buffer we are not logging any dirty range but
 	 * it still needs to be marked dirty and that it has been logged.
 	 */
-	bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
+	trace_xfs_trans_log_buf(bip);
 	if (!(bip->bli_flags & XFS_BLI_ORDERED))
 		xfs_buf_item_log(bip, first, last);
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/9] xfs: don't log dirty ranges for ordered buffers
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (3 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_btree.c  |  6 ++----
 fs/xfs/libxfs/xfs_ialloc.c |  2 --
 fs/xfs/xfs_trans_buf.c     | 26 ++++++++++++++------------
 3 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index e0bcc4a..0b7905a 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4464,12 +4464,10 @@ xfs_btree_block_change_owner(
 	 * though, so everything is consistent in memory.
 	 */
 	if (bp) {
-		if (cur->bc_tp) {
+		if (cur->bc_tp)
 			xfs_trans_ordered_buf(cur->bc_tp, bp);
-			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
-		} else {
+		else
 			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
-		}
 	} else {
 		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
 		ASSERT(level == cur->bc_nlevels - 1);
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 1e0658a..988bb3f 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -378,8 +378,6 @@ xfs_ialloc_inode_init(
 				 * transaction and pin the log appropriately.
 				 */
 				xfs_trans_ordered_buf(tp, fbuf);
-				xfs_trans_log_buf(tp, fbuf, 0,
-						  BBTOB(fbuf->b_length) - 1);
 			}
 		} else {
 			fbuf->b_flags |= XBF_DONE;
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 8c99813..3089e80 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -560,16 +560,12 @@ xfs_trans_log_buf(
 	struct xfs_buf_log_item	*bip = bp->b_fspriv;
 
 	ASSERT(first <= last && last < BBTOB(bp->b_length));
+	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED));
 
 	xfs_trans_dirty_buf(tp, bp);
 
-	/*
-	 * If we have an ordered buffer we are not logging any dirty range but
-	 * it still needs to be marked dirty and that it has been logged.
-	 */
 	trace_xfs_trans_log_buf(bip);
-	if (!(bip->bli_flags & XFS_BLI_ORDERED))
-		xfs_buf_item_log(bip, first, last);
+	xfs_buf_item_log(bip, first, last);
 }
 
 
@@ -722,12 +718,11 @@ xfs_trans_inode_alloc_buf(
 }
 
 /*
- * Mark the buffer as ordered for this transaction. This means
- * that the contents of the buffer are not recorded in the transaction
- * but it is tracked in the AIL as though it was. This allows us
- * to record logical changes in transactions rather than the physical
- * changes we make to the buffer without changing writeback ordering
- * constraints of metadata buffers.
+ * Mark the buffer as ordered for this transaction. This means that the contents
+ * of the buffer are not recorded in the transaction but it is tracked in the
+ * AIL as though it was. This allows us to record logical changes in
+ * transactions rather than the physical changes we make to the buffer without
+ * changing writeback ordering constraints of metadata buffers.
  */
 void
 xfs_trans_ordered_buf(
@@ -739,9 +734,16 @@ xfs_trans_ordered_buf(
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
+	ASSERT(!xfs_buf_item_dirty_format(bip));
 
 	bip->bli_flags |= XFS_BLI_ORDERED;
 	trace_xfs_buf_item_ordered(bip);
+
+	/*
+	 * We don't log a dirty range of an ordered buffer but it still needs
+	 * to be marked dirty and that it has been logged.
+	 */
+	xfs_trans_dirty_buf(tp, bp);
 }
 
 /*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/9] xfs: skip bmbt block ino validation during owner change
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (4 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs; +Cc: stable

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap_btree.c | 1 +
 fs/xfs/libxfs/xfs_btree.c      | 1 +
 fs/xfs/libxfs/xfs_btree.h      | 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 85de225..a6331ff 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -858,6 +858,7 @@ xfs_bmbt_change_owner(
 	cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork);
 	if (!cur)
 		return -ENOMEM;
+	cur->bc_private.b.flags |= XFS_BTCUR_BPRV_INVALID_OWNER;
 
 	error = xfs_btree_change_owner(cur, new_owner, buffer_list);
 	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 0b7905a..1d15d04 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1791,6 +1791,7 @@ xfs_btree_lookup_get_block(
 
 	/* Check the inode owner since the verifiers don't. */
 	if (xfs_sb_version_hascrc(&cur->bc_mp->m_sb) &&
+	    !(cur->bc_private.b.flags & XFS_BTCUR_BPRV_INVALID_OWNER) &&
 	    (cur->bc_flags & XFS_BTREE_LONG_PTRS) &&
 	    be64_to_cpu((*blkp)->bb_u.l.bb_owner) !=
 			cur->bc_private.b.ip->i_ino)
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 9c95e96..f2a88c3 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -233,7 +233,8 @@ typedef struct xfs_btree_cur
 			short		forksize;	/* fork's inode space */
 			char		whichfork;	/* data or attr fork */
 			char		flags;		/* flags */
-#define	XFS_BTCUR_BPRV_WASDEL	1			/* was delayed */
+#define	XFS_BTCUR_BPRV_WASDEL		(1<<0)		/* was delayed */
+#define	XFS_BTCUR_BPRV_INVALID_OWNER	(1<<1)		/* for ext swap */
 		} b;
 	}		bc_private;	/* per-btree type data */
 } xfs_btree_cur_t;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 7/9] xfs: move bmbt owner change to last step of extent swap
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (5 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

The extent swap operation currently resets bmbt block owners before
the inode forks are swapped. The bmbt buffers are marked as ordered
so they do not have to be physically logged in the transaction.

This use of ordered buffers is not safe as bmbt buffers may have
been previously physically logged. The bmbt owner change algorithm
needs to be updated to physically log buffers that are already dirty
when/if they are encountered. This means that an extent swap will
eventually require multiple rolling transactions to handle large
btrees. In addition, all inode related changes must be logged before
the bmbt owner change scan begins and can roll the transaction for
the first time to preserve fs consistency via log recovery.

In preparation for such fixes to the bmbt owner change algorithm,
refactor the bmbt scan out of the extent fork swap code to the last
operation before the transaction is committed. Update
xfs_swap_extent_forks() to only set the inode log flags when an
owner change scan is necessary. Update xfs_swap_extents() to trigger
the owner change based on the inode log flags. Note that since the
owner change now occurs after the extent fork swap, the inode btrees
must be fixed up with the inode number of the current inode (similar
to log recovery).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 93e9552..ee8fb9a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1840,29 +1840,18 @@ xfs_swap_extent_forks(
 	}
 
 	/*
-	 * Before we've swapped the forks, lets set the owners of the forks
-	 * appropriately. We have to do this as we are demand paging the btree
-	 * buffers, and so the validation done on read will expect the owner
-	 * field to be correctly set. Once we change the owners, we can swap the
-	 * inode forks.
+	 * Btree format (v3) inodes have the inode number stamped in the bmbt
+	 * block headers. We can't start changing the bmbt blocks until the
+	 * inode owner change is logged so recovery does the right thing in the
+	 * event of a crash. Set the owner change log flags now and leave the
+	 * bmbt scan as the last step.
 	 */
 	if (ip->i_d.di_version == 3 &&
-	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE)
 		(*target_log_flags) |= XFS_ILOG_DOWNER;
-		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
-					      tip->i_ino, NULL);
-		if (error)
-			return error;
-	}
-
 	if (tip->i_d.di_version == 3 &&
-	    tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+	    tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
 		(*src_log_flags) |= XFS_ILOG_DOWNER;
-		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
-					      ip->i_ino, NULL);
-		if (error)
-			return error;
-	}
 
 	/*
 	 * Swap the data forks of the inodes
@@ -2092,6 +2081,25 @@ xfs_swap_extents(
 	xfs_trans_log_inode(tp, tip, target_log_flags);
 
 	/*
+	 * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems
+	 * have inode number owner values in the bmbt blocks that still refer to
+	 * the old inode. Scan each bmbt to fix up the owner values with the
+	 * inode number of the current inode.
+	 */
+	if (src_log_flags & XFS_ILOG_DOWNER) {
+		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
+					      ip->i_ino, NULL);
+		if (error)
+			goto out_trans_cancel;
+	}
+	if (target_log_flags & XFS_ILOG_DOWNER) {
+		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
+					      tip->i_ino, NULL);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	/*
 	 * If this is a synchronous mount, make sure that the
 	 * transaction goes to disk before returning to the user.
 	 */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 8/9] xfs: disallow marking previously dirty buffers as ordered
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (6 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-29 14:37 ` [PATCH v2 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
  2017-08-30 16:08 ` [PATCH v2 0/9] xfs: refactor ordered buffer logging code Darrick J. Wong
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_trans.h     | 2 +-
 fs/xfs/xfs_trans_buf.c | 7 +++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index b3632eb..4709823 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -212,7 +212,7 @@ void		xfs_trans_bhold_release(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
-void		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
+bool		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
 void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 3089e80..3ba7a96 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -724,7 +724,7 @@ xfs_trans_inode_alloc_buf(
  * transactions rather than the physical changes we make to the buffer without
  * changing writeback ordering constraints of metadata buffers.
  */
-void
+bool
 xfs_trans_ordered_buf(
 	struct xfs_trans	*tp,
 	struct xfs_buf		*bp)
@@ -734,7 +734,9 @@ xfs_trans_ordered_buf(
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
-	ASSERT(!xfs_buf_item_dirty_format(bip));
+
+	if (xfs_buf_item_dirty_format(bip))
+		return false;
 
 	bip->bli_flags |= XFS_BLI_ORDERED;
 	trace_xfs_buf_item_ordered(bip);
@@ -744,6 +746,7 @@ xfs_trans_ordered_buf(
 	 * to be marked dirty and that it has been logged.
 	 */
 	xfs_trans_dirty_buf(tp, bp);
+	return true;
 }
 
 /*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 9/9] xfs: relog dirty buffers during swapext bmbt owner change
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (7 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
@ 2017-08-29 14:37 ` Brian Foster
  2017-08-30 16:08 ` [PATCH v2 0/9] xfs: refactor ordered buffer logging code Darrick J. Wong
  9 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-08-29 14:37 UTC (permalink / raw)
  To: linux-xfs

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_btree.c | 26 ++++++++++++++-------
 fs/xfs/xfs_bmap_util.c    | 57 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 65 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 1d15d04..5bfb882 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4452,10 +4452,15 @@ xfs_btree_block_change_owner(
 
 	/* modify the owner */
 	block = xfs_btree_get_block(cur, level, &bp);
-	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		if (block->bb_u.l.bb_owner == cpu_to_be64(bbcoi->new_owner))
+			return 0;
 		block->bb_u.l.bb_owner = cpu_to_be64(bbcoi->new_owner);
-	else
+	} else {
+		if (block->bb_u.s.bb_owner == cpu_to_be32(bbcoi->new_owner))
+			return 0;
 		block->bb_u.s.bb_owner = cpu_to_be32(bbcoi->new_owner);
+	}
 
 	/*
 	 * If the block is a root block hosted in an inode, we might not have a
@@ -4464,14 +4469,19 @@ xfs_btree_block_change_owner(
 	 * block is formatted into the on-disk inode fork. We still change it,
 	 * though, so everything is consistent in memory.
 	 */
-	if (bp) {
-		if (cur->bc_tp)
-			xfs_trans_ordered_buf(cur->bc_tp, bp);
-		else
-			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
-	} else {
+	if (!bp) {
 		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
 		ASSERT(level == cur->bc_nlevels - 1);
+		return 0;
+	}
+
+	if (cur->bc_tp) {
+		if (!xfs_trans_ordered_buf(cur->bc_tp, bp)) {
+			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
+			return -EAGAIN;
+		}
+	} else {
+		xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
 	}
 
 	return 0;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index ee8fb9a..3e9b7a4 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1929,6 +1929,48 @@ xfs_swap_extent_forks(
 	return 0;
 }
 
+/*
+ * Fix up the owners of the bmbt blocks to refer to the current inode. The
+ * change owner scan attempts to order all modified buffers in the current
+ * transaction. In the event of ordered buffer failure, the offending buffer is
+ * physically logged as a fallback and the scan returns -EAGAIN. We must roll
+ * the transaction in this case to replenish the fallback log reservation and
+ * restart the scan. This process repeats until the scan completes.
+ */
+static int
+xfs_swap_change_owner(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	struct xfs_inode	*tmpip)
+{
+	int			error;
+	struct xfs_trans	*tp = *tpp;
+
+	do {
+		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino,
+					      NULL);
+		/* success or fatal error */
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_trans_roll(tpp, NULL);
+		if (error)
+			break;
+		tp = *tpp;
+
+		/*
+		 * Redirty both inodes so they can relog and keep the log tail
+		 * moving forward.
+		 */
+		xfs_trans_ijoin(tp, ip, 0);
+		xfs_trans_ijoin(tp, tmpip, 0);
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE);
+	} while (true);
+
+	return error;
+}
+
 int
 xfs_swap_extents(
 	struct xfs_inode	*ip,	/* target inode */
@@ -1943,7 +1985,7 @@ xfs_swap_extents(
 	int			lock_flags;
 	struct xfs_ifork	*cowfp;
 	uint64_t		f;
-	int			resblks;
+	int			resblks = 0;
 
 	/*
 	 * Lock the inodes against other IO, page faults and truncate to
@@ -1991,11 +2033,8 @@ xfs_swap_extents(
 			  XFS_SWAP_RMAP_SPACE_RES(mp,
 				XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK),
 				XFS_DATA_FORK);
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
-				0, 0, &tp);
-	} else
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0,
-				0, 0, &tp);
+	}
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	if (error)
 		goto out_unlock;
 
@@ -2087,14 +2126,12 @@ xfs_swap_extents(
 	 * inode number of the current inode.
 	 */
 	if (src_log_flags & XFS_ILOG_DOWNER) {
-		error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
-					      ip->i_ino, NULL);
+		error = xfs_swap_change_owner(&tp, ip, tip);
 		if (error)
 			goto out_trans_cancel;
 	}
 	if (target_log_flags & XFS_ILOG_DOWNER) {
-		error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
-					      tip->i_ino, NULL);
+		error = xfs_swap_change_owner(&tp, tip, ip);
 		if (error)
 			goto out_trans_cancel;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/9] xfs: ordered buffer log items are never formatted
  2017-08-29 14:37 ` [PATCH v2 3/9] xfs: ordered buffer log items are never formatted Brian Foster
@ 2017-08-29 18:49   ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2017-08-29 18:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Aug 29, 2017 at 10:37:52AM -0400, Brian Foster wrote:
> Ordered buffers pass through the logging infrastructure without ever
> being written to the log. The way this works is that the ordered
> buffer status is transferred to the log vector at commit time via
> the ->iop_size() callback. In xlog_cil_insert_format_items(),
> ordered log vectors bypass ->iop_format() processing altogether.
> 
> Therefore it is unnecessary for xfs_buf_item_format() to handle
> ordered buffers. Remove the unnecessary logic and assert that an
> ordered buffer never reaches this point.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_buf_item.c | 12 ++----------
>  fs/xfs/xfs_trace.h    |  1 -
>  2 files changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index ff076d1..ef2c137 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -323,6 +323,8 @@ xfs_buf_item_format(
>  	ASSERT((bip->bli_flags & XFS_BLI_STALE) ||
>  	       (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF
>  	        && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF));
> +	ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED) ||
> +	       (bip->bli_flags & XFS_BLI_STALE));
>  
>  
>  	/*
> @@ -347,16 +349,6 @@ xfs_buf_item_format(
>  		bip->bli_flags &= ~XFS_BLI_INODE_BUF;
>  	}
>  
> -	if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
> -							XFS_BLI_ORDERED) {
> -		/*
> -		 * The buffer has been logged just to order it.  It is not being
> -		 * included in the transaction commit, so don't format it.
> -		 */
> -		trace_xfs_buf_item_format_ordered(bip);
> -		return;
> -	}
> -
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
>  					    &bip->bli_formats[i]);
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 6881047..e839ab4 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -517,7 +517,6 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
> -DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
>  DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/9] xfs: refactor ordered buffer logging code
  2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
                   ` (8 preceding siblings ...)
  2017-08-29 14:37 ` [PATCH v2 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
@ 2017-08-30 16:08 ` Darrick J. Wong
  9 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2017-08-30 16:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Aug 29, 2017 at 10:37:49AM -0400, Brian Foster wrote:
> Hi all,
> 
> Here's v2 of the ordered buffer / buffer relogging refactoring series.
> Only minor updates in this version. I've also tagged patch 6 for stable
> as it fixes a log recovery regression for bmbt owner change operations.

Seems to have passed testing cycle, queued for next.

--D

> 
> Brian
> 
> v2:
> - Various cleanups.
> - Rename bmbt scan invalid owner flag.
> - Added stable CC for bmbt owner change log recovery fix (6/9).
> v1: https://marc.info/?l=linux-xfs&m=150367356112090&w=2
> - Drop unnecessary buffer type set in xfs_btree_block_change_owner().
> - Several new cleanup patches.
> - Fix bmbt owner change (extent swap) log recovery.
> - Update buffer ordering to fail on previously dirty buffers.
> - Rework bmbt owner change to use ordered buffers correctly.
> rfc: https://marc.info/?l=linux-xfs&m=150272969407981&w=2
> 
> Brian Foster (9):
>   xfs: open-code xfs_buf_item_dirty()
>   xfs: remove unnecessary dirty bli format check for ordered bufs
>   xfs: ordered buffer log items are never formatted
>   xfs: refactor buffer logging into buffer dirtying helper
>   xfs: don't log dirty ranges for ordered buffers
>   xfs: skip bmbt block ino validation during owner change
>   xfs: move bmbt owner change to last step of extent swap
>   xfs: disallow marking previously dirty buffers as ordered
>   xfs: relog dirty buffers during swapext bmbt owner change
> 
>  fs/xfs/libxfs/xfs_bmap_btree.c |  1 +
>  fs/xfs/libxfs/xfs_btree.c      | 27 ++++++++----
>  fs/xfs/libxfs/xfs_btree.h      |  3 +-
>  fs/xfs/libxfs/xfs_ialloc.c     |  2 -
>  fs/xfs/xfs_bmap_util.c         | 93 +++++++++++++++++++++++++++++++-----------
>  fs/xfs/xfs_buf_item.c          | 75 +++++++++++++---------------------
>  fs/xfs/xfs_buf_item.h          |  2 +-
>  fs/xfs/xfs_trace.h             |  1 -
>  fs/xfs/xfs_trans.h             |  6 ++-
>  fs/xfs/xfs_trans_buf.c         | 79 +++++++++++++++++++++--------------
>  10 files changed, 173 insertions(+), 116 deletions(-)
> 
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-08-30 16:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 14:37 [PATCH v2 0/9] xfs: refactor ordered buffer logging code Brian Foster
2017-08-29 14:37 ` [PATCH v2 1/9] xfs: open-code xfs_buf_item_dirty() Brian Foster
2017-08-29 14:37 ` [PATCH v2 2/9] xfs: remove unnecessary dirty bli format check for ordered bufs Brian Foster
2017-08-29 14:37 ` [PATCH v2 3/9] xfs: ordered buffer log items are never formatted Brian Foster
2017-08-29 18:49   ` Darrick J. Wong
2017-08-29 14:37 ` [PATCH v2 4/9] xfs: refactor buffer logging into buffer dirtying helper Brian Foster
2017-08-29 14:37 ` [PATCH v2 5/9] xfs: don't log dirty ranges for ordered buffers Brian Foster
2017-08-29 14:37 ` [PATCH v2 6/9] xfs: skip bmbt block ino validation during owner change Brian Foster
2017-08-29 14:37 ` [PATCH v2 7/9] xfs: move bmbt owner change to last step of extent swap Brian Foster
2017-08-29 14:37 ` [PATCH v2 8/9] xfs: disallow marking previously dirty buffers as ordered Brian Foster
2017-08-29 14:37 ` [PATCH v2 9/9] xfs: relog dirty buffers during swapext bmbt owner change Brian Foster
2017-08-30 16:08 ` [PATCH v2 0/9] xfs: refactor ordered buffer logging code Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.