All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/19] xfs: refactor log recovery
@ 2020-04-22  2:06 Darrick J. Wong
  2020-04-22  2:06 ` [PATCH 01/19] xfs: complain when we don't recognize the log item type Darrick J. Wong
                   ` (20 more replies)
  0 siblings, 21 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This series refactors log recovery by moving recovery code for each log
item type into the source code for the rest of that log item type and
using dispatch function pointers to virtualize the interactions.  This
dramatically reduces the amount of code in xfs_log_recover.c and
increases cohesion throughout the log code.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-log-recovery

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/19] xfs: complain when we don't recognize the log item type
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-22 16:17   ` Brian Foster
  2020-04-25 17:42   ` Christoph Hellwig
  2020-04-22  2:06 ` [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure Darrick J. Wong
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

When we're sorting recovered log items ahead of recovering them and
encounter a log item of unknown type, actually print the type code when
we're rejecting the whole transaction to aid in debugging.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 11c3502b07b1..5f803083ddc3 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1887,8 +1887,8 @@ xlog_recover_reorder_trans(
 			break;
 		default:
 			xfs_warn(log->l_mp,
-				"%s: unrecognized type of log operation",
-				__func__);
+				"%s: unrecognized type of log operation (%d)",
+				__func__, ITEM_TYPE(item));
 			ASSERT(0);
 			/*
 			 * return the remaining items back to the transaction


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
  2020-04-22  2:06 ` [PATCH 01/19] xfs: complain when we don't recognize the log item type Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-25 18:13   ` Christoph Hellwig
  2020-04-22  2:06 ` [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions Darrick J. Wong
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a generic dispatch structure to delegate recovery of different
log item types into various code modules.  This will enable us to move
code specific to a particular log item type out of xfs_log_recover.c and
into the log item source.

The first operation we virtualize is the log item sorting.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |   39 +++++++++++++++++
 fs/xfs/xfs_buf_item.c           |   20 ++++++++-
 fs/xfs/xfs_dquot_item.c         |   10 ++++
 fs/xfs/xfs_icreate_item.c       |    6 +++
 fs/xfs/xfs_inode_item.c         |    6 +++
 fs/xfs/xfs_log_recover.c        |   88 +++++++++++++++++++++++++++------------
 6 files changed, 139 insertions(+), 30 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 3bf671637a91..60a6afb93049 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -6,6 +6,43 @@
 #ifndef	__XFS_LOG_RECOVER_H__
 #define __XFS_LOG_RECOVER_H__
 
+/*
+ * Each log item type (XFS_LI_*) gets its own xlog_recover_item_type to
+ * define how recovery should work for that type of log item.
+ */
+struct xlog_recover_item;
+
+/* Sorting hat for log items as they're read in. */
+enum xlog_recover_reorder {
+	XLOG_REORDER_UNKNOWN,
+	XLOG_REORDER_BUFFER_LIST,
+	XLOG_REORDER_CANCEL_LIST,
+	XLOG_REORDER_INODE_BUFFER_LIST,
+	XLOG_REORDER_INODE_LIST,
+};
+
+typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
+		struct xlog_recover_item *item);
+
+struct xlog_recover_item_type {
+	/*
+	 * These two items decide how to sort recovered log items during
+	 * recovery.  If reorder_fn is non-NULL it will be called; otherwise,
+	 * reorder will be used to decide.  See the comment above
+	 * xlog_recover_reorder_trans for more details about what the values
+	 * mean.
+	 */
+	enum xlog_recover_reorder	reorder;
+	xlog_recover_reorder_fn		reorder_fn;
+};
+
+extern const struct xlog_recover_item_type xlog_icreate_item_type;
+extern const struct xlog_recover_item_type xlog_buf_item_type;
+extern const struct xlog_recover_item_type xlog_inode_item_type;
+extern const struct xlog_recover_item_type xlog_dquot_item_type;
+extern const struct xlog_recover_item_type xlog_quotaoff_item_type;
+extern const struct xlog_recover_item_type xlog_intent_item_type;
+
 /*
  * Macros, structures, prototypes for internal log manager use.
  */
@@ -24,10 +61,10 @@
  */
 typedef struct xlog_recover_item {
 	struct list_head	ri_list;
-	int			ri_type;
 	int			ri_cnt;	/* count of regions found */
 	int			ri_total;	/* total regions */
 	xfs_log_iovec_t		*ri_buf;	/* ptr to regions buffer */
+	const struct xlog_recover_item_type *ri_type;
 } xlog_recover_item_t;
 
 struct xlog_recover {
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 1545657c3ca0..bf7480e18889 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -16,7 +16,8 @@
 #include "xfs_trans_priv.h"
 #include "xfs_trace.h"
 #include "xfs_log.h"
-
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 kmem_zone_t	*xfs_buf_item_zone;
 
@@ -1287,3 +1288,20 @@ xfs_buf_resubmit_failed_buffers(
 
 	return ret;
 }
+
+STATIC enum xlog_recover_reorder
+xlog_buf_reorder_fn(
+	struct xlog_recover_item	*item)
+{
+	struct xfs_buf_log_format	*buf_f = item->ri_buf[0].i_addr;
+
+	if (buf_f->blf_flags & XFS_BLF_CANCEL)
+		return XLOG_REORDER_CANCEL_LIST;
+	if (buf_f->blf_flags & XFS_BLF_INODE_BUF)
+		return XLOG_REORDER_INODE_BUFFER_LIST;
+	return XLOG_REORDER_BUFFER_LIST;
+}
+
+const struct xlog_recover_item_type xlog_buf_item_type = {
+	.reorder_fn		= xlog_buf_reorder_fn,
+};
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index baad1748d0d1..2558586b4d45 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -17,6 +17,8 @@
 #include "xfs_trans_priv.h"
 #include "xfs_qm.h"
 #include "xfs_log.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip)
 {
@@ -383,3 +385,11 @@ xfs_qm_qoff_logitem_init(
 	qf->qql_flags = flags;
 	return qf;
 }
+
+const struct xlog_recover_item_type xlog_dquot_item_type = {
+	.reorder		= XLOG_REORDER_INODE_LIST,
+};
+
+const struct xlog_recover_item_type xlog_quotaoff_item_type = {
+	.reorder		= XLOG_REORDER_INODE_LIST,
+};
diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
index 490fee22b878..0a1ed4dc1c3d 100644
--- a/fs/xfs/xfs_icreate_item.c
+++ b/fs/xfs/xfs_icreate_item.c
@@ -11,6 +11,8 @@
 #include "xfs_trans_priv.h"
 #include "xfs_icreate_item.h"
 #include "xfs_log.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 kmem_zone_t	*xfs_icreate_zone;		/* inode create item zone */
 
@@ -107,3 +109,7 @@ xfs_icreate_log(
 	tp->t_flags |= XFS_TRANS_DIRTY;
 	set_bit(XFS_LI_DIRTY, &icp->ic_item.li_flags);
 }
+
+const struct xlog_recover_item_type xlog_icreate_item_type = {
+	.reorder		= XLOG_REORDER_BUFFER_LIST,
+};
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index f779cca2346f..b04e9c5330b7 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -18,6 +18,8 @@
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
 #include "xfs_error.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 #include <linux/iversion.h>
 
@@ -843,3 +845,7 @@ xfs_inode_item_format_convert(
 	in_f->ilf_boffset = in_f32->ilf_boffset;
 	return 0;
 }
+
+const struct xlog_recover_item_type xlog_inode_item_type = {
+	.reorder		= XLOG_REORDER_INODE_LIST,
+};
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 5f803083ddc3..e7a9f899f657 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1779,6 +1779,12 @@ xlog_clear_stale_blocks(
 	return 0;
 }
 
+/* Log intent item dispatching. */
+
+const struct xlog_recover_item_type xlog_intent_item_type = {
+	.reorder		= XLOG_REORDER_INODE_LIST,
+};
+
 /******************************************************************************
  *
  *		Log recover routines
@@ -1786,6 +1792,39 @@ xlog_clear_stale_blocks(
  ******************************************************************************
  */
 
+static const struct xlog_recover_item_type *
+xlog_item_for_type(
+	unsigned short	type)
+{
+	switch (type) {
+	case XFS_LI_ICREATE:
+		return &xlog_icreate_item_type;
+	case XFS_LI_BUF:
+		return &xlog_buf_item_type;
+	case XFS_LI_EFD:
+	case XFS_LI_EFI:
+	case XFS_LI_RUI:
+	case XFS_LI_RUD:
+	case XFS_LI_CUI:
+	case XFS_LI_CUD:
+	case XFS_LI_BUI:
+	case XFS_LI_BUD:
+		return &xlog_intent_item_type;
+	case XFS_LI_INODE:
+		return &xlog_inode_item_type;
+	case XFS_LI_DQUOT:
+		return &xlog_dquot_item_type;
+	case XFS_LI_QUOTAOFF:
+		return &xlog_quotaoff_item_type;
+	case XFS_LI_IUNLINK:
+		/* Not implemented? */
+		return NULL;
+	default:
+		/* Unknown type, go away. */
+		return NULL;
+	}
+}
+
 /*
  * Sort the log items in the transaction.
  *
@@ -1851,41 +1890,34 @@ xlog_recover_reorder_trans(
 
 	list_splice_init(&trans->r_itemq, &sort_list);
 	list_for_each_entry_safe(item, n, &sort_list, ri_list) {
-		xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
+		enum xlog_recover_reorder	fate = XLOG_REORDER_UNKNOWN;
+
+		item->ri_type = xlog_item_for_type(ITEM_TYPE(item));
+		if (item->ri_type) {
+			if (item->ri_type->reorder_fn)
+				fate = item->ri_type->reorder_fn(item);
+			else
+				fate = item->ri_type->reorder;
+		}
 
-		switch (ITEM_TYPE(item)) {
-		case XFS_LI_ICREATE:
+		switch (fate) {
+		case XLOG_REORDER_BUFFER_LIST:
 			list_move_tail(&item->ri_list, &buffer_list);
 			break;
-		case XFS_LI_BUF:
-			if (buf_f->blf_flags & XFS_BLF_CANCEL) {
-				trace_xfs_log_recover_item_reorder_head(log,
-							trans, item, pass);
-				list_move(&item->ri_list, &cancel_list);
-				break;
-			}
-			if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
-				list_move(&item->ri_list, &inode_buffer_list);
-				break;
-			}
-			list_move_tail(&item->ri_list, &buffer_list);
+		case XLOG_REORDER_CANCEL_LIST:
+			trace_xfs_log_recover_item_reorder_head(log,
+					trans, item, pass);
+			list_move(&item->ri_list, &cancel_list);
 			break;
-		case XFS_LI_INODE:
-		case XFS_LI_DQUOT:
-		case XFS_LI_QUOTAOFF:
-		case XFS_LI_EFD:
-		case XFS_LI_EFI:
-		case XFS_LI_RUI:
-		case XFS_LI_RUD:
-		case XFS_LI_CUI:
-		case XFS_LI_CUD:
-		case XFS_LI_BUI:
-		case XFS_LI_BUD:
+		case XLOG_REORDER_INODE_BUFFER_LIST:
+			list_move(&item->ri_list, &inode_buffer_list);
+			break;
+		case XLOG_REORDER_INODE_LIST:
 			trace_xfs_log_recover_item_reorder_tail(log,
-							trans, item, pass);
+					trans, item, pass);
 			list_move_tail(&item->ri_list, &inode_list);
 			break;
-		default:
+		case XLOG_REORDER_UNKNOWN:
 			xfs_warn(log->l_mp,
 				"%s: unrecognized type of log operation (%d)",
 				__func__, ITEM_TYPE(item));


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
  2020-04-22  2:06 ` [PATCH 01/19] xfs: complain when we don't recognize the log item type Darrick J. Wong
  2020-04-22  2:06 ` [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-25 18:19   ` Christoph Hellwig
  2020-04-22  2:06 ` [PATCH 04/19] xfs: refactor log recovery item dispatch for pass1 commit functions Darrick J. Wong
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the pass2 readhead code into the per-item source code files and use
the dispatch function to call them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |   19 ++++++
 fs/xfs/xfs_buf_item.c           |   18 +++++
 fs/xfs/xfs_dquot_item.c         |   39 ++++++++++++
 fs/xfs/xfs_inode_item.c         |   28 ++++++++
 fs/xfs/xfs_log_recover.c        |  129 +--------------------------------------
 5 files changed, 108 insertions(+), 125 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 60a6afb93049..010fc22e5fdf 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -23,6 +23,8 @@ enum xlog_recover_reorder {
 
 typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
 		struct xlog_recover_item *item);
+typedef void (*xlog_recover_ra_pass2_fn)(struct xlog *log,
+		struct xlog_recover_item *item);
 
 struct xlog_recover_item_type {
 	/*
@@ -34,6 +36,9 @@ struct xlog_recover_item_type {
 	 */
 	enum xlog_recover_reorder	reorder;
 	xlog_recover_reorder_fn		reorder_fn;
+
+	/* Start readahead for pass2, if provided. */
+	xlog_recover_ra_pass2_fn	ra_pass2_fn;
 };
 
 extern const struct xlog_recover_item_type xlog_icreate_item_type;
@@ -88,4 +93,18 @@ struct xlog_recover {
 #define	XLOG_RECOVER_PASS1	1
 #define	XLOG_RECOVER_PASS2	2
 
+/*
+ * This structure is used during recovery to record the buf log items which
+ * have been canceled and should not be replayed.
+ */
+struct xfs_buf_cancel {
+	xfs_daddr_t		bc_blkno;
+	uint			bc_len;
+	int			bc_refcount;
+	struct list_head	bc_list;
+};
+
+struct xfs_buf_cancel *xlog_peek_buffer_cancelled(struct xlog *log,
+		xfs_daddr_t blkno, uint len, unsigned short flags);
+
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index bf7480e18889..1ad514cc501c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1302,6 +1302,24 @@ xlog_buf_reorder_fn(
 	return XLOG_REORDER_BUFFER_LIST;
 }
 
+STATIC void
+xlog_recover_buffer_ra_pass2(
+	struct xlog                     *log,
+	struct xlog_recover_item        *item)
+{
+	struct xfs_buf_log_format	*buf_f = item->ri_buf[0].i_addr;
+	struct xfs_mount		*mp = log->l_mp;
+
+	if (xlog_peek_buffer_cancelled(log, buf_f->blf_blkno,
+			buf_f->blf_len, buf_f->blf_flags)) {
+		return;
+	}
+
+	xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
+				buf_f->blf_len, NULL);
+}
+
 const struct xlog_recover_item_type xlog_buf_item_type = {
 	.reorder_fn		= xlog_buf_reorder_fn,
+	.ra_pass2_fn		= xlog_recover_buffer_ra_pass2,
 };
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 2558586b4d45..d4c794abf900 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -386,8 +386,47 @@ xfs_qm_qoff_logitem_init(
 	return qf;
 }
 
+STATIC void
+xlog_recover_dquot_ra_pass2(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	struct xfs_mount	*mp = log->l_mp;
+	struct xfs_disk_dquot	*recddq;
+	struct xfs_dq_logformat	*dq_f;
+	uint			type;
+	int			len;
+
+
+	if (mp->m_qflags == 0)
+		return;
+
+	recddq = item->ri_buf[1].i_addr;
+	if (recddq == NULL)
+		return;
+	if (item->ri_buf[1].i_len < sizeof(struct xfs_disk_dquot))
+		return;
+
+	type = recddq->d_flags & (XFS_DQ_USER | XFS_DQ_PROJ | XFS_DQ_GROUP);
+	ASSERT(type);
+	if (log->l_quotaoffs_flag & type)
+		return;
+
+	dq_f = item->ri_buf[0].i_addr;
+	ASSERT(dq_f);
+	ASSERT(dq_f->qlf_len == 1);
+
+	len = XFS_FSB_TO_BB(mp, dq_f->qlf_len);
+	if (xlog_peek_buffer_cancelled(log, dq_f->qlf_blkno, len, 0))
+		return;
+
+	xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, len,
+			  &xfs_dquot_buf_ra_ops);
+}
+
 const struct xlog_recover_item_type xlog_dquot_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
+	.ra_pass2_fn		= xlog_recover_dquot_ra_pass2,
 };
 
 const struct xlog_recover_item_type xlog_quotaoff_item_type = {
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index b04e9c5330b7..bb0fe064aa1b 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -846,6 +846,34 @@ xfs_inode_item_format_convert(
 	return 0;
 }
 
+STATIC void
+xlog_recover_inode_ra_pass2(
+	struct xlog                     *log,
+	struct xlog_recover_item        *item)
+{
+	struct xfs_inode_log_format	ilf_buf;
+	struct xfs_inode_log_format	*ilfp;
+	struct xfs_mount		*mp = log->l_mp;
+	int			error;
+
+	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
+		ilfp = item->ri_buf[0].i_addr;
+	} else {
+		ilfp = &ilf_buf;
+		memset(ilfp, 0, sizeof(*ilfp));
+		error = xfs_inode_item_format_convert(&item->ri_buf[0], ilfp);
+		if (error)
+			return;
+	}
+
+	if (xlog_peek_buffer_cancelled(log, ilfp->ilf_blkno, ilfp->ilf_len, 0))
+		return;
+
+	xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
+				ilfp->ilf_len, &xfs_inode_buf_ra_ops);
+}
+
 const struct xlog_recover_item_type xlog_inode_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
+	.ra_pass2_fn		= xlog_recover_inode_ra_pass2,
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e7a9f899f657..9ee8eb9b93a2 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -55,17 +55,6 @@ STATIC int
 xlog_do_recovery_pass(
         struct xlog *, xfs_daddr_t, xfs_daddr_t, int, xfs_daddr_t *);
 
-/*
- * This structure is used during recovery to record the buf log items which
- * have been canceled and should not be replayed.
- */
-struct xfs_buf_cancel {
-	xfs_daddr_t		bc_blkno;
-	uint			bc_len;
-	int			bc_refcount;
-	struct list_head	bc_list;
-};
-
 /*
  * Sector aligned buffer routines for buffer create/read/write/access
  */
@@ -2009,12 +1998,12 @@ xlog_recover_buffer_pass1(
  * entry in the buffer cancel record table. If it is, return the cancel
  * buffer structure to the caller.
  */
-STATIC struct xfs_buf_cancel *
+struct xfs_buf_cancel *
 xlog_peek_buffer_cancelled(
 	struct xlog		*log,
 	xfs_daddr_t		blkno,
 	uint			len,
-	unsigned short			flags)
+	unsigned short		flags)
 {
 	struct list_head	*bucket;
 	struct xfs_buf_cancel	*bcp;
@@ -3900,117 +3889,6 @@ xlog_recover_do_icreate_pass2(
 				     length, be32_to_cpu(icl->icl_gen));
 }
 
-STATIC void
-xlog_recover_buffer_ra_pass2(
-	struct xlog                     *log,
-	struct xlog_recover_item        *item)
-{
-	struct xfs_buf_log_format	*buf_f = item->ri_buf[0].i_addr;
-	struct xfs_mount		*mp = log->l_mp;
-
-	if (xlog_peek_buffer_cancelled(log, buf_f->blf_blkno,
-			buf_f->blf_len, buf_f->blf_flags)) {
-		return;
-	}
-
-	xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
-				buf_f->blf_len, NULL);
-}
-
-STATIC void
-xlog_recover_inode_ra_pass2(
-	struct xlog                     *log,
-	struct xlog_recover_item        *item)
-{
-	struct xfs_inode_log_format	ilf_buf;
-	struct xfs_inode_log_format	*ilfp;
-	struct xfs_mount		*mp = log->l_mp;
-	int			error;
-
-	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
-		ilfp = item->ri_buf[0].i_addr;
-	} else {
-		ilfp = &ilf_buf;
-		memset(ilfp, 0, sizeof(*ilfp));
-		error = xfs_inode_item_format_convert(&item->ri_buf[0], ilfp);
-		if (error)
-			return;
-	}
-
-	if (xlog_peek_buffer_cancelled(log, ilfp->ilf_blkno, ilfp->ilf_len, 0))
-		return;
-
-	xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
-				ilfp->ilf_len, &xfs_inode_buf_ra_ops);
-}
-
-STATIC void
-xlog_recover_dquot_ra_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	struct xfs_mount	*mp = log->l_mp;
-	struct xfs_disk_dquot	*recddq;
-	struct xfs_dq_logformat	*dq_f;
-	uint			type;
-	int			len;
-
-
-	if (mp->m_qflags == 0)
-		return;
-
-	recddq = item->ri_buf[1].i_addr;
-	if (recddq == NULL)
-		return;
-	if (item->ri_buf[1].i_len < sizeof(struct xfs_disk_dquot))
-		return;
-
-	type = recddq->d_flags & (XFS_DQ_USER | XFS_DQ_PROJ | XFS_DQ_GROUP);
-	ASSERT(type);
-	if (log->l_quotaoffs_flag & type)
-		return;
-
-	dq_f = item->ri_buf[0].i_addr;
-	ASSERT(dq_f);
-	ASSERT(dq_f->qlf_len == 1);
-
-	len = XFS_FSB_TO_BB(mp, dq_f->qlf_len);
-	if (xlog_peek_buffer_cancelled(log, dq_f->qlf_blkno, len, 0))
-		return;
-
-	xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, len,
-			  &xfs_dquot_buf_ra_ops);
-}
-
-STATIC void
-xlog_recover_ra_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	switch (ITEM_TYPE(item)) {
-	case XFS_LI_BUF:
-		xlog_recover_buffer_ra_pass2(log, item);
-		break;
-	case XFS_LI_INODE:
-		xlog_recover_inode_ra_pass2(log, item);
-		break;
-	case XFS_LI_DQUOT:
-		xlog_recover_dquot_ra_pass2(log, item);
-		break;
-	case XFS_LI_EFI:
-	case XFS_LI_EFD:
-	case XFS_LI_QUOTAOFF:
-	case XFS_LI_RUI:
-	case XFS_LI_RUD:
-	case XFS_LI_CUI:
-	case XFS_LI_CUD:
-	case XFS_LI_BUI:
-	case XFS_LI_BUD:
-	default:
-		break;
-	}
-}
-
 STATIC int
 xlog_recover_commit_pass1(
 	struct xlog			*log,
@@ -4147,7 +4025,8 @@ xlog_recover_commit_trans(
 			error = xlog_recover_commit_pass1(log, trans, item);
 			break;
 		case XLOG_RECOVER_PASS2:
-			xlog_recover_ra_pass2(log, item);
+			if (item->ri_type && item->ri_type->ra_pass2_fn)
+				item->ri_type->ra_pass2_fn(log, item);
 			list_move_tail(&item->ri_list, &ra_list);
 			items_queued++;
 			if (items_queued >= XLOG_RECOVER_COMMIT_QUEUE_MAX) {


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/19] xfs: refactor log recovery item dispatch for pass1 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-04-22  2:06 ` [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-25 18:20   ` Christoph Hellwig
  2020-04-22  2:06 ` [PATCH 05/19] xfs: refactor log recovery buffer item dispatch for pass2 " Darrick J. Wong
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the pass1 commit code into the per-item source code files and use
the dispatch function to call them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    5 ++
 fs/xfs/xfs_buf_item.c           |   62 ++++++++++++++++++++++
 fs/xfs/xfs_buf_item.h           |    1 
 fs/xfs/xfs_dquot_item.c         |   28 ++++++++++
 fs/xfs/xfs_log_recover.c        |  109 +--------------------------------------
 5 files changed, 98 insertions(+), 107 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 010fc22e5fdf..dca12058bb98 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -25,6 +25,8 @@ typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
 		struct xlog_recover_item *item);
 typedef void (*xlog_recover_ra_pass2_fn)(struct xlog *log,
 		struct xlog_recover_item *item);
+typedef int (*xlog_recover_commit_pass1_fn)(struct xlog *log,
+		struct xlog_recover_item *item);
 
 struct xlog_recover_item_type {
 	/*
@@ -39,6 +41,9 @@ struct xlog_recover_item_type {
 
 	/* Start readahead for pass2, if provided. */
 	xlog_recover_ra_pass2_fn	ra_pass2_fn;
+
+	/* Do whatever work we need to do for pass1, if provided. */
+	xlog_recover_commit_pass1_fn	commit_pass1_fn;
 };
 
 extern const struct xlog_recover_item_type xlog_icreate_item_type;
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 1ad514cc501c..351c42f354e3 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -29,7 +29,7 @@ static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip)
 STATIC void	xfs_buf_do_callbacks(struct xfs_buf *bp);
 
 /* Is this log iovec plausibly large enough to contain the buffer log format? */
-bool
+STATIC bool
 xfs_buf_log_check_iovec(
 	struct xfs_log_iovec		*iovec)
 {
@@ -1319,7 +1319,67 @@ xlog_recover_buffer_ra_pass2(
 				buf_f->blf_len, NULL);
 }
 
+/*
+ * Build up the table of buf cancel records so that we don't replay
+ * cancelled data in the second pass.  For buffer records that are
+ * not cancel records, there is nothing to do here so we just return.
+ *
+ * If we get a cancel record which is already in the table, this indicates
+ * that the buffer was cancelled multiple times.  In order to ensure
+ * that during pass 2 we keep the record in the table until we reach its
+ * last occurrence in the log, we keep a reference count in the cancel
+ * record in the table to tell us how many times we expect to see this
+ * record during the second pass.
+ */
+STATIC int
+xlog_recover_buffer_pass1(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
+	struct list_head	*bucket;
+	struct xfs_buf_cancel	*bcp;
+
+	if (!xfs_buf_log_check_iovec(&item->ri_buf[0])) {
+		xfs_err(log->l_mp, "bad buffer log item size (%d)",
+				item->ri_buf[0].i_len);
+		return -EFSCORRUPTED;
+	}
+
+	/*
+	 * If this isn't a cancel buffer item, then just return.
+	 */
+	if (!(buf_f->blf_flags & XFS_BLF_CANCEL)) {
+		trace_xfs_log_recover_buf_not_cancel(log, buf_f);
+		return 0;
+	}
+
+	/*
+	 * Insert an xfs_buf_cancel record into the hash table of them.
+	 * If there is already an identical record, bump its reference count.
+	 */
+	bucket = XLOG_BUF_CANCEL_BUCKET(log, buf_f->blf_blkno);
+	list_for_each_entry(bcp, bucket, bc_list) {
+		if (bcp->bc_blkno == buf_f->blf_blkno &&
+		    bcp->bc_len == buf_f->blf_len) {
+			bcp->bc_refcount++;
+			trace_xfs_log_recover_buf_cancel_ref_inc(log, buf_f);
+			return 0;
+		}
+	}
+
+	bcp = kmem_alloc(sizeof(struct xfs_buf_cancel), 0);
+	bcp->bc_blkno = buf_f->blf_blkno;
+	bcp->bc_len = buf_f->blf_len;
+	bcp->bc_refcount = 1;
+	list_add_tail(&bcp->bc_list, bucket);
+
+	trace_xfs_log_recover_buf_cancel_add(log, buf_f);
+	return 0;
+}
+
 const struct xlog_recover_item_type xlog_buf_item_type = {
 	.reorder_fn		= xlog_buf_reorder_fn,
 	.ra_pass2_fn		= xlog_recover_buffer_ra_pass2,
+	.commit_pass1_fn	= xlog_recover_buffer_pass1,
 };
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 30114b510332..4a054b11011a 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -61,7 +61,6 @@ void	xfs_buf_iodone_callbacks(struct xfs_buf *);
 void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
 bool	xfs_buf_resubmit_failed_buffers(struct xfs_buf *,
 					struct list_head *);
-bool	xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
 
 extern kmem_zone_t	*xfs_buf_item_zone;
 
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index d4c794abf900..e3a54b1fb20a 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -429,6 +429,34 @@ const struct xlog_recover_item_type xlog_dquot_item_type = {
 	.ra_pass2_fn		= xlog_recover_dquot_ra_pass2,
 };
 
+/*
+ * Recover QUOTAOFF records. We simply make a note of it in the xlog
+ * structure, so that we know not to do any dquot item or dquot buffer recovery,
+ * of that type.
+ */
+STATIC int
+xlog_recover_quotaoff_pass1(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	xfs_qoff_logformat_t	*qoff_f = item->ri_buf[0].i_addr;
+	ASSERT(qoff_f);
+
+	/*
+	 * The logitem format's flag tells us if this was user quotaoff,
+	 * group/project quotaoff or both.
+	 */
+	if (qoff_f->qf_flags & XFS_UQUOTA_ACCT)
+		log->l_quotaoffs_flag |= XFS_DQ_USER;
+	if (qoff_f->qf_flags & XFS_PQUOTA_ACCT)
+		log->l_quotaoffs_flag |= XFS_DQ_PROJ;
+	if (qoff_f->qf_flags & XFS_GQUOTA_ACCT)
+		log->l_quotaoffs_flag |= XFS_DQ_GROUP;
+
+	return 0;
+}
+
 const struct xlog_recover_item_type xlog_quotaoff_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
+	.commit_pass1_fn	= xlog_recover_quotaoff_pass1,
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9ee8eb9b93a2..12e864ef6d43 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1934,65 +1934,6 @@ xlog_recover_reorder_trans(
 	return error;
 }
 
-/*
- * Build up the table of buf cancel records so that we don't replay
- * cancelled data in the second pass.  For buffer records that are
- * not cancel records, there is nothing to do here so we just return.
- *
- * If we get a cancel record which is already in the table, this indicates
- * that the buffer was cancelled multiple times.  In order to ensure
- * that during pass 2 we keep the record in the table until we reach its
- * last occurrence in the log, we keep a reference count in the cancel
- * record in the table to tell us how many times we expect to see this
- * record during the second pass.
- */
-STATIC int
-xlog_recover_buffer_pass1(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
-	struct list_head	*bucket;
-	struct xfs_buf_cancel	*bcp;
-
-	if (!xfs_buf_log_check_iovec(&item->ri_buf[0])) {
-		xfs_err(log->l_mp, "bad buffer log item size (%d)",
-				item->ri_buf[0].i_len);
-		return -EFSCORRUPTED;
-	}
-
-	/*
-	 * If this isn't a cancel buffer item, then just return.
-	 */
-	if (!(buf_f->blf_flags & XFS_BLF_CANCEL)) {
-		trace_xfs_log_recover_buf_not_cancel(log, buf_f);
-		return 0;
-	}
-
-	/*
-	 * Insert an xfs_buf_cancel record into the hash table of them.
-	 * If there is already an identical record, bump its reference count.
-	 */
-	bucket = XLOG_BUF_CANCEL_BUCKET(log, buf_f->blf_blkno);
-	list_for_each_entry(bcp, bucket, bc_list) {
-		if (bcp->bc_blkno == buf_f->blf_blkno &&
-		    bcp->bc_len == buf_f->blf_len) {
-			bcp->bc_refcount++;
-			trace_xfs_log_recover_buf_cancel_ref_inc(log, buf_f);
-			return 0;
-		}
-	}
-
-	bcp = kmem_alloc(sizeof(struct xfs_buf_cancel), 0);
-	bcp->bc_blkno = buf_f->blf_blkno;
-	bcp->bc_len = buf_f->blf_len;
-	bcp->bc_refcount = 1;
-	list_add_tail(&bcp->bc_list, bucket);
-
-	trace_xfs_log_recover_buf_cancel_add(log, buf_f);
-	return 0;
-}
-
 /*
  * Check to see whether the buffer being recovered has a corresponding
  * entry in the buffer cancel record table. If it is, return the cancel
@@ -3196,33 +3137,6 @@ xlog_recover_inode_pass2(
 	return error;
 }
 
-/*
- * Recover QUOTAOFF records. We simply make a note of it in the xlog
- * structure, so that we know not to do any dquot item or dquot buffer recovery,
- * of that type.
- */
-STATIC int
-xlog_recover_quotaoff_pass1(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	xfs_qoff_logformat_t	*qoff_f = item->ri_buf[0].i_addr;
-	ASSERT(qoff_f);
-
-	/*
-	 * The logitem format's flag tells us if this was user quotaoff,
-	 * group/project quotaoff or both.
-	 */
-	if (qoff_f->qf_flags & XFS_UQUOTA_ACCT)
-		log->l_quotaoffs_flag |= XFS_DQ_USER;
-	if (qoff_f->qf_flags & XFS_PQUOTA_ACCT)
-		log->l_quotaoffs_flag |= XFS_DQ_PROJ;
-	if (qoff_f->qf_flags & XFS_GQUOTA_ACCT)
-		log->l_quotaoffs_flag |= XFS_DQ_GROUP;
-
-	return 0;
-}
-
 /*
  * Recover a dquot record
  */
@@ -3897,30 +3811,15 @@ xlog_recover_commit_pass1(
 {
 	trace_xfs_log_recover_item_recover(log, trans, item, XLOG_RECOVER_PASS1);
 
-	switch (ITEM_TYPE(item)) {
-	case XFS_LI_BUF:
-		return xlog_recover_buffer_pass1(log, item);
-	case XFS_LI_QUOTAOFF:
-		return xlog_recover_quotaoff_pass1(log, item);
-	case XFS_LI_INODE:
-	case XFS_LI_EFI:
-	case XFS_LI_EFD:
-	case XFS_LI_DQUOT:
-	case XFS_LI_ICREATE:
-	case XFS_LI_RUI:
-	case XFS_LI_RUD:
-	case XFS_LI_CUI:
-	case XFS_LI_CUD:
-	case XFS_LI_BUI:
-	case XFS_LI_BUD:
-		/* nothing to do in pass 1 */
-		return 0;
-	default:
+	if (!item->ri_type) {
 		xfs_warn(log->l_mp, "%s: invalid item type (%d)",
 			__func__, ITEM_TYPE(item));
 		ASSERT(0);
 		return -EFSCORRUPTED;
 	}
+	if (!item->ri_type->commit_pass1_fn)
+		return 0;
+	return item->ri_type->commit_pass1_fn(log, item);
 }
 
 STATIC int


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/19] xfs: refactor log recovery buffer item dispatch for pass2 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-04-22  2:06 ` [PATCH 04/19] xfs: refactor log recovery item dispatch for pass1 commit functions Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-22  2:06 ` [PATCH 06/19] xfs: refactor log recovery inode " Darrick J. Wong
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the log buffer item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |   12 +
 fs/xfs/xfs_buf_item.c           |  782 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_log_recover.c        |  790 ---------------------------------------
 3 files changed, 801 insertions(+), 783 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index dca12058bb98..1eca24cc83a9 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -27,6 +27,9 @@ typedef void (*xlog_recover_ra_pass2_fn)(struct xlog *log,
 		struct xlog_recover_item *item);
 typedef int (*xlog_recover_commit_pass1_fn)(struct xlog *log,
 		struct xlog_recover_item *item);
+typedef int (*xlog_recover_commit_pass2_fn)(struct xlog *log,
+		struct list_head *buffer_list, struct xlog_recover_item *item,
+		xfs_lsn_t lsn);
 
 struct xlog_recover_item_type {
 	/*
@@ -44,6 +47,12 @@ struct xlog_recover_item_type {
 
 	/* Do whatever work we need to do for pass1, if provided. */
 	xlog_recover_commit_pass1_fn	commit_pass1_fn;
+
+	/*
+	 * This function should do whatever work is needed for pass2 of log
+	 * recovery, if provided.
+	 */
+	xlog_recover_commit_pass2_fn	commit_pass2_fn;
 };
 
 extern const struct xlog_recover_item_type xlog_icreate_item_type;
@@ -111,5 +120,8 @@ struct xfs_buf_cancel {
 
 struct xfs_buf_cancel *xlog_peek_buffer_cancelled(struct xlog *log,
 		xfs_daddr_t blkno, uint len, unsigned short flags);
+void xlog_recover_iodone(struct xfs_buf *bp);
+int xlog_check_buffer_cancelled(struct xlog *log, xfs_daddr_t blkno, uint len,
+		unsigned short flags);
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 351c42f354e3..b6f0ebc11cc9 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -18,6 +18,10 @@
 #include "xfs_log.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
+#include "xfs_error.h"
+#include "xfs_inode.h"
+#include "xfs_dir2.h"
+#include "xfs_quota.h"
 
 kmem_zone_t	*xfs_buf_item_zone;
 
@@ -1378,8 +1382,786 @@ xlog_recover_buffer_pass1(
 	return 0;
 }
 
+/*
+ * Validate the recovered buffer is of the correct type and attach the
+ * appropriate buffer operations to them for writeback. Magic numbers are in a
+ * few places:
+ *	the first 16 bits of the buffer (inode buffer, dquot buffer),
+ *	the first 32 bits of the buffer (most blocks),
+ *	inside a struct xfs_da_blkinfo at the start of the buffer.
+ */
+static void
+xlog_recover_validate_buf_type(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	xfs_buf_log_format_t	*buf_f,
+	xfs_lsn_t		current_lsn)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+	uint32_t		magic32;
+	uint16_t		magic16;
+	uint16_t		magicda;
+	char			*warnmsg = NULL;
+
+	/*
+	 * We can only do post recovery validation on items on CRC enabled
+	 * fielsystems as we need to know when the buffer was written to be able
+	 * to determine if we should have replayed the item. If we replay old
+	 * metadata over a newer buffer, then it will enter a temporarily
+	 * inconsistent state resulting in verification failures. Hence for now
+	 * just avoid the verification stage for non-crc filesystems
+	 */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return;
+
+	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
+	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
+	magicda = be16_to_cpu(info->magic);
+	switch (xfs_blft_from_flags(buf_f)) {
+	case XFS_BLFT_BTREE_BUF:
+		switch (magic32) {
+		case XFS_ABTB_CRC_MAGIC:
+		case XFS_ABTB_MAGIC:
+			bp->b_ops = &xfs_bnobt_buf_ops;
+			break;
+		case XFS_ABTC_CRC_MAGIC:
+		case XFS_ABTC_MAGIC:
+			bp->b_ops = &xfs_cntbt_buf_ops;
+			break;
+		case XFS_IBT_CRC_MAGIC:
+		case XFS_IBT_MAGIC:
+			bp->b_ops = &xfs_inobt_buf_ops;
+			break;
+		case XFS_FIBT_CRC_MAGIC:
+		case XFS_FIBT_MAGIC:
+			bp->b_ops = &xfs_finobt_buf_ops;
+			break;
+		case XFS_BMAP_CRC_MAGIC:
+		case XFS_BMAP_MAGIC:
+			bp->b_ops = &xfs_bmbt_buf_ops;
+			break;
+		case XFS_RMAP_CRC_MAGIC:
+			bp->b_ops = &xfs_rmapbt_buf_ops;
+			break;
+		case XFS_REFC_CRC_MAGIC:
+			bp->b_ops = &xfs_refcountbt_buf_ops;
+			break;
+		default:
+			warnmsg = "Bad btree block magic!";
+			break;
+		}
+		break;
+	case XFS_BLFT_AGF_BUF:
+		if (magic32 != XFS_AGF_MAGIC) {
+			warnmsg = "Bad AGF block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_agf_buf_ops;
+		break;
+	case XFS_BLFT_AGFL_BUF:
+		if (magic32 != XFS_AGFL_MAGIC) {
+			warnmsg = "Bad AGFL block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_agfl_buf_ops;
+		break;
+	case XFS_BLFT_AGI_BUF:
+		if (magic32 != XFS_AGI_MAGIC) {
+			warnmsg = "Bad AGI block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_agi_buf_ops;
+		break;
+	case XFS_BLFT_UDQUOT_BUF:
+	case XFS_BLFT_PDQUOT_BUF:
+	case XFS_BLFT_GDQUOT_BUF:
+#ifdef CONFIG_XFS_QUOTA
+		if (magic16 != XFS_DQUOT_MAGIC) {
+			warnmsg = "Bad DQUOT block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_dquot_buf_ops;
+#else
+		xfs_alert(mp,
+	"Trying to recover dquots without QUOTA support built in!");
+		ASSERT(0);
+#endif
+		break;
+	case XFS_BLFT_DINO_BUF:
+		if (magic16 != XFS_DINODE_MAGIC) {
+			warnmsg = "Bad INODE block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_inode_buf_ops;
+		break;
+	case XFS_BLFT_SYMLINK_BUF:
+		if (magic32 != XFS_SYMLINK_MAGIC) {
+			warnmsg = "Bad symlink block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_symlink_buf_ops;
+		break;
+	case XFS_BLFT_DIR_BLOCK_BUF:
+		if (magic32 != XFS_DIR2_BLOCK_MAGIC &&
+		    magic32 != XFS_DIR3_BLOCK_MAGIC) {
+			warnmsg = "Bad dir block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_dir3_block_buf_ops;
+		break;
+	case XFS_BLFT_DIR_DATA_BUF:
+		if (magic32 != XFS_DIR2_DATA_MAGIC &&
+		    magic32 != XFS_DIR3_DATA_MAGIC) {
+			warnmsg = "Bad dir data magic!";
+			break;
+		}
+		bp->b_ops = &xfs_dir3_data_buf_ops;
+		break;
+	case XFS_BLFT_DIR_FREE_BUF:
+		if (magic32 != XFS_DIR2_FREE_MAGIC &&
+		    magic32 != XFS_DIR3_FREE_MAGIC) {
+			warnmsg = "Bad dir3 free magic!";
+			break;
+		}
+		bp->b_ops = &xfs_dir3_free_buf_ops;
+		break;
+	case XFS_BLFT_DIR_LEAF1_BUF:
+		if (magicda != XFS_DIR2_LEAF1_MAGIC &&
+		    magicda != XFS_DIR3_LEAF1_MAGIC) {
+			warnmsg = "Bad dir leaf1 magic!";
+			break;
+		}
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		break;
+	case XFS_BLFT_DIR_LEAFN_BUF:
+		if (magicda != XFS_DIR2_LEAFN_MAGIC &&
+		    magicda != XFS_DIR3_LEAFN_MAGIC) {
+			warnmsg = "Bad dir leafn magic!";
+			break;
+		}
+		bp->b_ops = &xfs_dir3_leafn_buf_ops;
+		break;
+	case XFS_BLFT_DA_NODE_BUF:
+		if (magicda != XFS_DA_NODE_MAGIC &&
+		    magicda != XFS_DA3_NODE_MAGIC) {
+			warnmsg = "Bad da node magic!";
+			break;
+		}
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		break;
+	case XFS_BLFT_ATTR_LEAF_BUF:
+		if (magicda != XFS_ATTR_LEAF_MAGIC &&
+		    magicda != XFS_ATTR3_LEAF_MAGIC) {
+			warnmsg = "Bad attr leaf magic!";
+			break;
+		}
+		bp->b_ops = &xfs_attr3_leaf_buf_ops;
+		break;
+	case XFS_BLFT_ATTR_RMT_BUF:
+		if (magic32 != XFS_ATTR3_RMT_MAGIC) {
+			warnmsg = "Bad attr remote magic!";
+			break;
+		}
+		bp->b_ops = &xfs_attr3_rmt_buf_ops;
+		break;
+	case XFS_BLFT_SB_BUF:
+		if (magic32 != XFS_SB_MAGIC) {
+			warnmsg = "Bad SB block magic!";
+			break;
+		}
+		bp->b_ops = &xfs_sb_buf_ops;
+		break;
+#ifdef CONFIG_XFS_RT
+	case XFS_BLFT_RTBITMAP_BUF:
+	case XFS_BLFT_RTSUMMARY_BUF:
+		/* no magic numbers for verification of RT buffers */
+		bp->b_ops = &xfs_rtbuf_ops;
+		break;
+#endif /* CONFIG_XFS_RT */
+	default:
+		xfs_warn(mp, "Unknown buffer type %d!",
+			 xfs_blft_from_flags(buf_f));
+		break;
+	}
+
+	/*
+	 * Nothing else to do in the case of a NULL current LSN as this means
+	 * the buffer is more recent than the change in the log and will be
+	 * skipped.
+	 */
+	if (current_lsn == NULLCOMMITLSN)
+		return;
+
+	if (warnmsg) {
+		xfs_warn(mp, warnmsg);
+		ASSERT(0);
+	}
+
+	/*
+	 * We must update the metadata LSN of the buffer as it is written out to
+	 * ensure that older transactions never replay over this one and corrupt
+	 * the buffer. This can occur if log recovery is interrupted at some
+	 * point after the current transaction completes, at which point a
+	 * subsequent mount starts recovery from the beginning.
+	 *
+	 * Write verifiers update the metadata LSN from log items attached to
+	 * the buffer. Therefore, initialize a bli purely to carry the LSN to
+	 * the verifier. We'll clean it up in our ->iodone() callback.
+	 */
+	if (bp->b_ops) {
+		struct xfs_buf_log_item	*bip;
+
+		ASSERT(!bp->b_iodone || bp->b_iodone == xlog_recover_iodone);
+		bp->b_iodone = xlog_recover_iodone;
+		xfs_buf_item_init(bp, mp);
+		bip = bp->b_log_item;
+		bip->bli_item.li_lsn = current_lsn;
+	}
+}
+
+/*
+ * Perform a 'normal' buffer recovery.  Each logged region of the
+ * buffer should be copied over the corresponding region in the
+ * given buffer.  The bitmap in the buf log format structure indicates
+ * where to place the logged data.
+ */
+STATIC void
+xlog_recover_do_reg_buffer(
+	struct xfs_mount	*mp,
+	xlog_recover_item_t	*item,
+	struct xfs_buf		*bp,
+	xfs_buf_log_format_t	*buf_f,
+	xfs_lsn_t		current_lsn)
+{
+	int			i;
+	int			bit;
+	int			nbits;
+	xfs_failaddr_t		fa;
+	const size_t		size_disk_dquot = sizeof(struct xfs_disk_dquot);
+
+	trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
+
+	bit = 0;
+	i = 1;  /* 0 is the buf format structure */
+	while (1) {
+		bit = xfs_next_bit(buf_f->blf_data_map,
+				   buf_f->blf_map_size, bit);
+		if (bit == -1)
+			break;
+		nbits = xfs_contig_bits(buf_f->blf_data_map,
+					buf_f->blf_map_size, bit);
+		ASSERT(nbits > 0);
+		ASSERT(item->ri_buf[i].i_addr != NULL);
+		ASSERT(item->ri_buf[i].i_len % XFS_BLF_CHUNK == 0);
+		ASSERT(BBTOB(bp->b_length) >=
+		       ((uint)bit << XFS_BLF_SHIFT) + (nbits << XFS_BLF_SHIFT));
+
+		/*
+		 * The dirty regions logged in the buffer, even though
+		 * contiguous, may span multiple chunks. This is because the
+		 * dirty region may span a physical page boundary in a buffer
+		 * and hence be split into two separate vectors for writing into
+		 * the log. Hence we need to trim nbits back to the length of
+		 * the current region being copied out of the log.
+		 */
+		if (item->ri_buf[i].i_len < (nbits << XFS_BLF_SHIFT))
+			nbits = item->ri_buf[i].i_len >> XFS_BLF_SHIFT;
+
+		/*
+		 * Do a sanity check if this is a dquot buffer. Just checking
+		 * the first dquot in the buffer should do. XXXThis is
+		 * probably a good thing to do for other buf types also.
+		 */
+		fa = NULL;
+		if (buf_f->blf_flags &
+		   (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
+			if (item->ri_buf[i].i_addr == NULL) {
+				xfs_alert(mp,
+					"XFS: NULL dquot in %s.", __func__);
+				goto next;
+			}
+			if (item->ri_buf[i].i_len < size_disk_dquot) {
+				xfs_alert(mp,
+					"XFS: dquot too small (%d) in %s.",
+					item->ri_buf[i].i_len, __func__);
+				goto next;
+			}
+			fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr,
+					       -1, 0);
+			if (fa) {
+				xfs_alert(mp,
+	"dquot corrupt at %pS trying to replay into block 0x%llx",
+					fa, bp->b_bn);
+				goto next;
+			}
+		}
+
+		memcpy(xfs_buf_offset(bp,
+			(uint)bit << XFS_BLF_SHIFT),	/* dest */
+			item->ri_buf[i].i_addr,		/* source */
+			nbits<<XFS_BLF_SHIFT);		/* length */
+ next:
+		i++;
+		bit += nbits;
+	}
+
+	/* Shouldn't be any more regions */
+	ASSERT(i == item->ri_total);
+
+	xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
+}
+
+/*
+ * Perform a dquot buffer recovery.
+ * Simple algorithm: if we have found a QUOTAOFF log item of the same type
+ * (ie. USR or GRP), then just toss this buffer away; don't recover it.
+ * Else, treat it as a regular buffer and do recovery.
+ *
+ * Return false if the buffer was tossed and true if we recovered the buffer to
+ * indicate to the caller if the buffer needs writing.
+ */
+STATIC bool
+xlog_recover_do_dquot_buffer(
+	struct xfs_mount		*mp,
+	struct xlog			*log,
+	struct xlog_recover_item	*item,
+	struct xfs_buf			*bp,
+	struct xfs_buf_log_format	*buf_f)
+{
+	uint			type;
+
+	trace_xfs_log_recover_buf_dquot_buf(log, buf_f);
+
+	/*
+	 * Filesystems are required to send in quota flags at mount time.
+	 */
+	if (!mp->m_qflags)
+		return false;
+
+	type = 0;
+	if (buf_f->blf_flags & XFS_BLF_UDQUOT_BUF)
+		type |= XFS_DQ_USER;
+	if (buf_f->blf_flags & XFS_BLF_PDQUOT_BUF)
+		type |= XFS_DQ_PROJ;
+	if (buf_f->blf_flags & XFS_BLF_GDQUOT_BUF)
+		type |= XFS_DQ_GROUP;
+	/*
+	 * This type of quotas was turned off, so ignore this buffer
+	 */
+	if (log->l_quotaoffs_flag & type)
+		return false;
+
+	xlog_recover_do_reg_buffer(mp, item, bp, buf_f, NULLCOMMITLSN);
+	return true;
+}
+
+/*
+ * Perform recovery for a buffer full of inodes.  In these buffers, the only
+ * data which should be recovered is that which corresponds to the
+ * di_next_unlinked pointers in the on disk inode structures.  The rest of the
+ * data for the inodes is always logged through the inodes themselves rather
+ * than the inode buffer and is recovered in xlog_recover_inode_pass2().
+ *
+ * The only time when buffers full of inodes are fully recovered is when the
+ * buffer is full of newly allocated inodes.  In this case the buffer will
+ * not be marked as an inode buffer and so will be sent to
+ * xlog_recover_do_reg_buffer() below during recovery.
+ */
+STATIC int
+xlog_recover_do_inode_buffer(
+	struct xfs_mount	*mp,
+	xlog_recover_item_t	*item,
+	struct xfs_buf		*bp,
+	xfs_buf_log_format_t	*buf_f)
+{
+	int			i;
+	int			item_index = 0;
+	int			bit = 0;
+	int			nbits = 0;
+	int			reg_buf_offset = 0;
+	int			reg_buf_bytes = 0;
+	int			next_unlinked_offset;
+	int			inodes_per_buf;
+	xfs_agino_t		*logged_nextp;
+	xfs_agino_t		*buffer_nextp;
+
+	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
+
+	/*
+	 * Post recovery validation only works properly on CRC enabled
+	 * filesystems.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		bp->b_ops = &xfs_inode_buf_ops;
+
+	inodes_per_buf = BBTOB(bp->b_length) >> mp->m_sb.sb_inodelog;
+	for (i = 0; i < inodes_per_buf; i++) {
+		next_unlinked_offset = (i * mp->m_sb.sb_inodesize) +
+			offsetof(xfs_dinode_t, di_next_unlinked);
+
+		while (next_unlinked_offset >=
+		       (reg_buf_offset + reg_buf_bytes)) {
+			/*
+			 * The next di_next_unlinked field is beyond
+			 * the current logged region.  Find the next
+			 * logged region that contains or is beyond
+			 * the current di_next_unlinked field.
+			 */
+			bit += nbits;
+			bit = xfs_next_bit(buf_f->blf_data_map,
+					   buf_f->blf_map_size, bit);
+
+			/*
+			 * If there are no more logged regions in the
+			 * buffer, then we're done.
+			 */
+			if (bit == -1)
+				return 0;
+
+			nbits = xfs_contig_bits(buf_f->blf_data_map,
+						buf_f->blf_map_size, bit);
+			ASSERT(nbits > 0);
+			reg_buf_offset = bit << XFS_BLF_SHIFT;
+			reg_buf_bytes = nbits << XFS_BLF_SHIFT;
+			item_index++;
+		}
+
+		/*
+		 * If the current logged region starts after the current
+		 * di_next_unlinked field, then move on to the next
+		 * di_next_unlinked field.
+		 */
+		if (next_unlinked_offset < reg_buf_offset)
+			continue;
+
+		ASSERT(item->ri_buf[item_index].i_addr != NULL);
+		ASSERT((item->ri_buf[item_index].i_len % XFS_BLF_CHUNK) == 0);
+		ASSERT((reg_buf_offset + reg_buf_bytes) <= BBTOB(bp->b_length));
+
+		/*
+		 * The current logged region contains a copy of the
+		 * current di_next_unlinked field.  Extract its value
+		 * and copy it to the buffer copy.
+		 */
+		logged_nextp = item->ri_buf[item_index].i_addr +
+				next_unlinked_offset - reg_buf_offset;
+		if (XFS_IS_CORRUPT(mp, *logged_nextp == 0)) {
+			xfs_alert(mp,
+		"Bad inode buffer log record (ptr = "PTR_FMT", bp = "PTR_FMT"). "
+		"Trying to replay bad (0) inode di_next_unlinked field.",
+				item, bp);
+			return -EFSCORRUPTED;
+		}
+
+		buffer_nextp = xfs_buf_offset(bp, next_unlinked_offset);
+		*buffer_nextp = *logged_nextp;
+
+		/*
+		 * If necessary, recalculate the CRC in the on-disk inode. We
+		 * have to leave the inode in a consistent state for whoever
+		 * reads it next....
+		 */
+		xfs_dinode_calc_crc(mp,
+				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
+
+	}
+
+	return 0;
+}
+
+/*
+ * V5 filesystems know the age of the buffer on disk being recovered. We can
+ * have newer objects on disk than we are replaying, and so for these cases we
+ * don't want to replay the current change as that will make the buffer contents
+ * temporarily invalid on disk.
+ *
+ * The magic number might not match the buffer type we are going to recover
+ * (e.g. reallocated blocks), so we ignore the xfs_buf_log_format flags.  Hence
+ * extract the LSN of the existing object in the buffer based on it's current
+ * magic number.  If we don't recognise the magic number in the buffer, then
+ * return a LSN of -1 so that the caller knows it was an unrecognised block and
+ * so can recover the buffer.
+ *
+ * Note: we cannot rely solely on magic number matches to determine that the
+ * buffer has a valid LSN - we also need to verify that it belongs to this
+ * filesystem, so we need to extract the object's LSN and compare it to that
+ * which we read from the superblock. If the UUIDs don't match, then we've got a
+ * stale metadata block from an old filesystem instance that we need to recover
+ * over the top of.
+ */
+static xfs_lsn_t
+xlog_recover_get_buf_lsn(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp)
+{
+	uint32_t		magic32;
+	uint16_t		magic16;
+	uint16_t		magicda;
+	void			*blk = bp->b_addr;
+	uuid_t			*uuid;
+	xfs_lsn_t		lsn = -1;
+
+	/* v4 filesystems always recover immediately */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		goto recover_immediately;
+
+	magic32 = be32_to_cpu(*(__be32 *)blk);
+	switch (magic32) {
+	case XFS_ABTB_CRC_MAGIC:
+	case XFS_ABTC_CRC_MAGIC:
+	case XFS_ABTB_MAGIC:
+	case XFS_ABTC_MAGIC:
+	case XFS_RMAP_CRC_MAGIC:
+	case XFS_REFC_CRC_MAGIC:
+	case XFS_IBT_CRC_MAGIC:
+	case XFS_IBT_MAGIC: {
+		struct xfs_btree_block *btb = blk;
+
+		lsn = be64_to_cpu(btb->bb_u.s.bb_lsn);
+		uuid = &btb->bb_u.s.bb_uuid;
+		break;
+	}
+	case XFS_BMAP_CRC_MAGIC:
+	case XFS_BMAP_MAGIC: {
+		struct xfs_btree_block *btb = blk;
+
+		lsn = be64_to_cpu(btb->bb_u.l.bb_lsn);
+		uuid = &btb->bb_u.l.bb_uuid;
+		break;
+	}
+	case XFS_AGF_MAGIC:
+		lsn = be64_to_cpu(((struct xfs_agf *)blk)->agf_lsn);
+		uuid = &((struct xfs_agf *)blk)->agf_uuid;
+		break;
+	case XFS_AGFL_MAGIC:
+		lsn = be64_to_cpu(((struct xfs_agfl *)blk)->agfl_lsn);
+		uuid = &((struct xfs_agfl *)blk)->agfl_uuid;
+		break;
+	case XFS_AGI_MAGIC:
+		lsn = be64_to_cpu(((struct xfs_agi *)blk)->agi_lsn);
+		uuid = &((struct xfs_agi *)blk)->agi_uuid;
+		break;
+	case XFS_SYMLINK_MAGIC:
+		lsn = be64_to_cpu(((struct xfs_dsymlink_hdr *)blk)->sl_lsn);
+		uuid = &((struct xfs_dsymlink_hdr *)blk)->sl_uuid;
+		break;
+	case XFS_DIR3_BLOCK_MAGIC:
+	case XFS_DIR3_DATA_MAGIC:
+	case XFS_DIR3_FREE_MAGIC:
+		lsn = be64_to_cpu(((struct xfs_dir3_blk_hdr *)blk)->lsn);
+		uuid = &((struct xfs_dir3_blk_hdr *)blk)->uuid;
+		break;
+	case XFS_ATTR3_RMT_MAGIC:
+		/*
+		 * Remote attr blocks are written synchronously, rather than
+		 * being logged. That means they do not contain a valid LSN
+		 * (i.e. transactionally ordered) in them, and hence any time we
+		 * see a buffer to replay over the top of a remote attribute
+		 * block we should simply do so.
+		 */
+		goto recover_immediately;
+	case XFS_SB_MAGIC:
+		/*
+		 * superblock uuids are magic. We may or may not have a
+		 * sb_meta_uuid on disk, but it will be set in the in-core
+		 * superblock. We set the uuid pointer for verification
+		 * according to the superblock feature mask to ensure we check
+		 * the relevant UUID in the superblock.
+		 */
+		lsn = be64_to_cpu(((struct xfs_dsb *)blk)->sb_lsn);
+		if (xfs_sb_version_hasmetauuid(&mp->m_sb))
+			uuid = &((struct xfs_dsb *)blk)->sb_meta_uuid;
+		else
+			uuid = &((struct xfs_dsb *)blk)->sb_uuid;
+		break;
+	default:
+		break;
+	}
+
+	if (lsn != (xfs_lsn_t)-1) {
+		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
+			goto recover_immediately;
+		return lsn;
+	}
+
+	magicda = be16_to_cpu(((struct xfs_da_blkinfo *)blk)->magic);
+	switch (magicda) {
+	case XFS_DIR3_LEAF1_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		lsn = be64_to_cpu(((struct xfs_da3_blkinfo *)blk)->lsn);
+		uuid = &((struct xfs_da3_blkinfo *)blk)->uuid;
+		break;
+	default:
+		break;
+	}
+
+	if (lsn != (xfs_lsn_t)-1) {
+		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
+			goto recover_immediately;
+		return lsn;
+	}
+
+	/*
+	 * We do individual object checks on dquot and inode buffers as they
+	 * have their own individual LSN records. Also, we could have a stale
+	 * buffer here, so we have to at least recognise these buffer types.
+	 *
+	 * A notd complexity here is inode unlinked list processing - it logs
+	 * the inode directly in the buffer, but we don't know which inodes have
+	 * been modified, and there is no global buffer LSN. Hence we need to
+	 * recover all inode buffer types immediately. This problem will be
+	 * fixed by logical logging of the unlinked list modifications.
+	 */
+	magic16 = be16_to_cpu(*(__be16 *)blk);
+	switch (magic16) {
+	case XFS_DQUOT_MAGIC:
+	case XFS_DINODE_MAGIC:
+		goto recover_immediately;
+	default:
+		break;
+	}
+
+	/* unknown buffer contents, recover immediately */
+
+recover_immediately:
+	return (xfs_lsn_t)-1;
+
+}
+
+/*
+ * This routine replays a modification made to a buffer at runtime.
+ * There are actually two types of buffer, regular and inode, which
+ * are handled differently.  Inode buffers are handled differently
+ * in that we only recover a specific set of data from them, namely
+ * the inode di_next_unlinked fields.  This is because all other inode
+ * data is actually logged via inode records and any data we replay
+ * here which overlaps that may be stale.
+ *
+ * When meta-data buffers are freed at run time we log a buffer item
+ * with the XFS_BLF_CANCEL bit set to indicate that previous copies
+ * of the buffer in the log should not be replayed at recovery time.
+ * This is so that if the blocks covered by the buffer are reused for
+ * file data before we crash we don't end up replaying old, freed
+ * meta-data into a user's file.
+ *
+ * To handle the cancellation of buffer log items, we make two passes
+ * over the log during recovery.  During the first we build a table of
+ * those buffers which have been cancelled, and during the second we
+ * only replay those buffers which do not have corresponding cancel
+ * records in the table.  See xlog_recover_buffer_pass[1,2] above
+ * for more details on the implementation of the table of cancel records.
+ */
+STATIC int
+xlog_recover_buffer_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
+{
+	xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
+	xfs_mount_t		*mp = log->l_mp;
+	xfs_buf_t		*bp;
+	int			error;
+	uint			buf_flags;
+	xfs_lsn_t		lsn;
+
+	/*
+	 * In this pass we only want to recover all the buffers which have
+	 * not been cancelled and are not cancellation buffers themselves.
+	 */
+	if (xlog_check_buffer_cancelled(log, buf_f->blf_blkno,
+			buf_f->blf_len, buf_f->blf_flags)) {
+		trace_xfs_log_recover_buf_cancel(log, buf_f);
+		return 0;
+	}
+
+	trace_xfs_log_recover_buf_recover(log, buf_f);
+
+	buf_flags = 0;
+	if (buf_f->blf_flags & XFS_BLF_INODE_BUF)
+		buf_flags |= XBF_UNMAPPED;
+
+	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno, buf_f->blf_len,
+			  buf_flags, &bp, NULL);
+	if (error)
+		return error;
+
+	/*
+	 * Recover the buffer only if we get an LSN from it and it's less than
+	 * the lsn of the transaction we are replaying.
+	 *
+	 * Note that we have to be extremely careful of readahead here.
+	 * Readahead does not attach verfiers to the buffers so if we don't
+	 * actually do any replay after readahead because of the LSN we found
+	 * in the buffer if more recent than that current transaction then we
+	 * need to attach the verifier directly. Failure to do so can lead to
+	 * future recovery actions (e.g. EFI and unlinked list recovery) can
+	 * operate on the buffers and they won't get the verifier attached. This
+	 * can lead to blocks on disk having the correct content but a stale
+	 * CRC.
+	 *
+	 * It is safe to assume these clean buffers are currently up to date.
+	 * If the buffer is dirtied by a later transaction being replayed, then
+	 * the verifier will be reset to match whatever recover turns that
+	 * buffer into.
+	 */
+	lsn = xlog_recover_get_buf_lsn(mp, bp);
+	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
+		trace_xfs_log_recover_buf_skip(log, buf_f);
+		xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
+		goto out_release;
+	}
+
+	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
+		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
+		if (error)
+			goto out_release;
+	} else if (buf_f->blf_flags &
+		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
+		bool	dirty;
+
+		dirty = xlog_recover_do_dquot_buffer(mp, log, item, bp, buf_f);
+		if (!dirty)
+			goto out_release;
+	} else {
+		xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
+	}
+
+	/*
+	 * Perform delayed write on the buffer.  Asynchronous writes will be
+	 * slower when taking into account all the buffers to be flushed.
+	 *
+	 * Also make sure that only inode buffers with good sizes stay in
+	 * the buffer cache.  The kernel moves inodes in buffers of 1 block
+	 * or inode_cluster_size bytes, whichever is bigger.  The inode
+	 * buffers in the log can be a different size if the log was generated
+	 * by an older kernel using unclustered inode buffers or a newer kernel
+	 * running with a different inode cluster size.  Regardless, if the
+	 * the inode buffer size isn't max(blocksize, inode_cluster_size)
+	 * for *our* value of inode_cluster_size, then we need to keep
+	 * the buffer out of the buffer cache so that the buffer won't
+	 * overlap with future reads of those inodes.
+	 */
+	if (XFS_DINODE_MAGIC ==
+	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
+	    (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size)) {
+		xfs_buf_stale(bp);
+		error = xfs_bwrite(bp);
+	} else {
+		ASSERT(bp->b_mount == mp);
+		bp->b_iodone = xlog_recover_iodone;
+		xfs_buf_delwri_queue(bp, buffer_list);
+	}
+
+out_release:
+	xfs_buf_relse(bp);
+	return error;
+}
+
 const struct xlog_recover_item_type xlog_buf_item_type = {
 	.reorder_fn		= xlog_buf_reorder_fn,
 	.ra_pass2_fn		= xlog_recover_buffer_ra_pass2,
 	.commit_pass1_fn	= xlog_recover_buffer_pass1,
+	.commit_pass2_fn	= xlog_recover_buffer_pass2,
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 12e864ef6d43..c620117fba65 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -273,7 +273,7 @@ xlog_header_check_mount(
 	return 0;
 }
 
-STATIC void
+void
 xlog_recover_iodone(
 	struct xfs_buf	*bp)
 {
@@ -1979,12 +1979,12 @@ xlog_peek_buffer_cancelled(
  * occurrence in the log so that if the same buffer is re-used again after its
  * last cancellation we actually replay the changes made at that point.
  */
-STATIC int
+int
 xlog_check_buffer_cancelled(
 	struct xlog		*log,
 	xfs_daddr_t		blkno,
 	uint			len,
-	unsigned short			flags)
+	unsigned short		flags)
 {
 	struct xfs_buf_cancel	*bcp;
 
@@ -2007,783 +2007,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * Perform recovery for a buffer full of inodes.  In these buffers, the only
- * data which should be recovered is that which corresponds to the
- * di_next_unlinked pointers in the on disk inode structures.  The rest of the
- * data for the inodes is always logged through the inodes themselves rather
- * than the inode buffer and is recovered in xlog_recover_inode_pass2().
- *
- * The only time when buffers full of inodes are fully recovered is when the
- * buffer is full of newly allocated inodes.  In this case the buffer will
- * not be marked as an inode buffer and so will be sent to
- * xlog_recover_do_reg_buffer() below during recovery.
- */
-STATIC int
-xlog_recover_do_inode_buffer(
-	struct xfs_mount	*mp,
-	xlog_recover_item_t	*item,
-	struct xfs_buf		*bp,
-	xfs_buf_log_format_t	*buf_f)
-{
-	int			i;
-	int			item_index = 0;
-	int			bit = 0;
-	int			nbits = 0;
-	int			reg_buf_offset = 0;
-	int			reg_buf_bytes = 0;
-	int			next_unlinked_offset;
-	int			inodes_per_buf;
-	xfs_agino_t		*logged_nextp;
-	xfs_agino_t		*buffer_nextp;
-
-	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
-
-	/*
-	 * Post recovery validation only works properly on CRC enabled
-	 * filesystems.
-	 */
-	if (xfs_sb_version_hascrc(&mp->m_sb))
-		bp->b_ops = &xfs_inode_buf_ops;
-
-	inodes_per_buf = BBTOB(bp->b_length) >> mp->m_sb.sb_inodelog;
-	for (i = 0; i < inodes_per_buf; i++) {
-		next_unlinked_offset = (i * mp->m_sb.sb_inodesize) +
-			offsetof(xfs_dinode_t, di_next_unlinked);
-
-		while (next_unlinked_offset >=
-		       (reg_buf_offset + reg_buf_bytes)) {
-			/*
-			 * The next di_next_unlinked field is beyond
-			 * the current logged region.  Find the next
-			 * logged region that contains or is beyond
-			 * the current di_next_unlinked field.
-			 */
-			bit += nbits;
-			bit = xfs_next_bit(buf_f->blf_data_map,
-					   buf_f->blf_map_size, bit);
-
-			/*
-			 * If there are no more logged regions in the
-			 * buffer, then we're done.
-			 */
-			if (bit == -1)
-				return 0;
-
-			nbits = xfs_contig_bits(buf_f->blf_data_map,
-						buf_f->blf_map_size, bit);
-			ASSERT(nbits > 0);
-			reg_buf_offset = bit << XFS_BLF_SHIFT;
-			reg_buf_bytes = nbits << XFS_BLF_SHIFT;
-			item_index++;
-		}
-
-		/*
-		 * If the current logged region starts after the current
-		 * di_next_unlinked field, then move on to the next
-		 * di_next_unlinked field.
-		 */
-		if (next_unlinked_offset < reg_buf_offset)
-			continue;
-
-		ASSERT(item->ri_buf[item_index].i_addr != NULL);
-		ASSERT((item->ri_buf[item_index].i_len % XFS_BLF_CHUNK) == 0);
-		ASSERT((reg_buf_offset + reg_buf_bytes) <= BBTOB(bp->b_length));
-
-		/*
-		 * The current logged region contains a copy of the
-		 * current di_next_unlinked field.  Extract its value
-		 * and copy it to the buffer copy.
-		 */
-		logged_nextp = item->ri_buf[item_index].i_addr +
-				next_unlinked_offset - reg_buf_offset;
-		if (XFS_IS_CORRUPT(mp, *logged_nextp == 0)) {
-			xfs_alert(mp,
-		"Bad inode buffer log record (ptr = "PTR_FMT", bp = "PTR_FMT"). "
-		"Trying to replay bad (0) inode di_next_unlinked field.",
-				item, bp);
-			return -EFSCORRUPTED;
-		}
-
-		buffer_nextp = xfs_buf_offset(bp, next_unlinked_offset);
-		*buffer_nextp = *logged_nextp;
-
-		/*
-		 * If necessary, recalculate the CRC in the on-disk inode. We
-		 * have to leave the inode in a consistent state for whoever
-		 * reads it next....
-		 */
-		xfs_dinode_calc_crc(mp,
-				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
-
-	}
-
-	return 0;
-}
-
-/*
- * V5 filesystems know the age of the buffer on disk being recovered. We can
- * have newer objects on disk than we are replaying, and so for these cases we
- * don't want to replay the current change as that will make the buffer contents
- * temporarily invalid on disk.
- *
- * The magic number might not match the buffer type we are going to recover
- * (e.g. reallocated blocks), so we ignore the xfs_buf_log_format flags.  Hence
- * extract the LSN of the existing object in the buffer based on it's current
- * magic number.  If we don't recognise the magic number in the buffer, then
- * return a LSN of -1 so that the caller knows it was an unrecognised block and
- * so can recover the buffer.
- *
- * Note: we cannot rely solely on magic number matches to determine that the
- * buffer has a valid LSN - we also need to verify that it belongs to this
- * filesystem, so we need to extract the object's LSN and compare it to that
- * which we read from the superblock. If the UUIDs don't match, then we've got a
- * stale metadata block from an old filesystem instance that we need to recover
- * over the top of.
- */
-static xfs_lsn_t
-xlog_recover_get_buf_lsn(
-	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
-{
-	uint32_t		magic32;
-	uint16_t		magic16;
-	uint16_t		magicda;
-	void			*blk = bp->b_addr;
-	uuid_t			*uuid;
-	xfs_lsn_t		lsn = -1;
-
-	/* v4 filesystems always recover immediately */
-	if (!xfs_sb_version_hascrc(&mp->m_sb))
-		goto recover_immediately;
-
-	magic32 = be32_to_cpu(*(__be32 *)blk);
-	switch (magic32) {
-	case XFS_ABTB_CRC_MAGIC:
-	case XFS_ABTC_CRC_MAGIC:
-	case XFS_ABTB_MAGIC:
-	case XFS_ABTC_MAGIC:
-	case XFS_RMAP_CRC_MAGIC:
-	case XFS_REFC_CRC_MAGIC:
-	case XFS_IBT_CRC_MAGIC:
-	case XFS_IBT_MAGIC: {
-		struct xfs_btree_block *btb = blk;
-
-		lsn = be64_to_cpu(btb->bb_u.s.bb_lsn);
-		uuid = &btb->bb_u.s.bb_uuid;
-		break;
-	}
-	case XFS_BMAP_CRC_MAGIC:
-	case XFS_BMAP_MAGIC: {
-		struct xfs_btree_block *btb = blk;
-
-		lsn = be64_to_cpu(btb->bb_u.l.bb_lsn);
-		uuid = &btb->bb_u.l.bb_uuid;
-		break;
-	}
-	case XFS_AGF_MAGIC:
-		lsn = be64_to_cpu(((struct xfs_agf *)blk)->agf_lsn);
-		uuid = &((struct xfs_agf *)blk)->agf_uuid;
-		break;
-	case XFS_AGFL_MAGIC:
-		lsn = be64_to_cpu(((struct xfs_agfl *)blk)->agfl_lsn);
-		uuid = &((struct xfs_agfl *)blk)->agfl_uuid;
-		break;
-	case XFS_AGI_MAGIC:
-		lsn = be64_to_cpu(((struct xfs_agi *)blk)->agi_lsn);
-		uuid = &((struct xfs_agi *)blk)->agi_uuid;
-		break;
-	case XFS_SYMLINK_MAGIC:
-		lsn = be64_to_cpu(((struct xfs_dsymlink_hdr *)blk)->sl_lsn);
-		uuid = &((struct xfs_dsymlink_hdr *)blk)->sl_uuid;
-		break;
-	case XFS_DIR3_BLOCK_MAGIC:
-	case XFS_DIR3_DATA_MAGIC:
-	case XFS_DIR3_FREE_MAGIC:
-		lsn = be64_to_cpu(((struct xfs_dir3_blk_hdr *)blk)->lsn);
-		uuid = &((struct xfs_dir3_blk_hdr *)blk)->uuid;
-		break;
-	case XFS_ATTR3_RMT_MAGIC:
-		/*
-		 * Remote attr blocks are written synchronously, rather than
-		 * being logged. That means they do not contain a valid LSN
-		 * (i.e. transactionally ordered) in them, and hence any time we
-		 * see a buffer to replay over the top of a remote attribute
-		 * block we should simply do so.
-		 */
-		goto recover_immediately;
-	case XFS_SB_MAGIC:
-		/*
-		 * superblock uuids are magic. We may or may not have a
-		 * sb_meta_uuid on disk, but it will be set in the in-core
-		 * superblock. We set the uuid pointer for verification
-		 * according to the superblock feature mask to ensure we check
-		 * the relevant UUID in the superblock.
-		 */
-		lsn = be64_to_cpu(((struct xfs_dsb *)blk)->sb_lsn);
-		if (xfs_sb_version_hasmetauuid(&mp->m_sb))
-			uuid = &((struct xfs_dsb *)blk)->sb_meta_uuid;
-		else
-			uuid = &((struct xfs_dsb *)blk)->sb_uuid;
-		break;
-	default:
-		break;
-	}
-
-	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
-			goto recover_immediately;
-		return lsn;
-	}
-
-	magicda = be16_to_cpu(((struct xfs_da_blkinfo *)blk)->magic);
-	switch (magicda) {
-	case XFS_DIR3_LEAF1_MAGIC:
-	case XFS_DIR3_LEAFN_MAGIC:
-	case XFS_DA3_NODE_MAGIC:
-		lsn = be64_to_cpu(((struct xfs_da3_blkinfo *)blk)->lsn);
-		uuid = &((struct xfs_da3_blkinfo *)blk)->uuid;
-		break;
-	default:
-		break;
-	}
-
-	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
-			goto recover_immediately;
-		return lsn;
-	}
-
-	/*
-	 * We do individual object checks on dquot and inode buffers as they
-	 * have their own individual LSN records. Also, we could have a stale
-	 * buffer here, so we have to at least recognise these buffer types.
-	 *
-	 * A notd complexity here is inode unlinked list processing - it logs
-	 * the inode directly in the buffer, but we don't know which inodes have
-	 * been modified, and there is no global buffer LSN. Hence we need to
-	 * recover all inode buffer types immediately. This problem will be
-	 * fixed by logical logging of the unlinked list modifications.
-	 */
-	magic16 = be16_to_cpu(*(__be16 *)blk);
-	switch (magic16) {
-	case XFS_DQUOT_MAGIC:
-	case XFS_DINODE_MAGIC:
-		goto recover_immediately;
-	default:
-		break;
-	}
-
-	/* unknown buffer contents, recover immediately */
-
-recover_immediately:
-	return (xfs_lsn_t)-1;
-
-}
-
-/*
- * Validate the recovered buffer is of the correct type and attach the
- * appropriate buffer operations to them for writeback. Magic numbers are in a
- * few places:
- *	the first 16 bits of the buffer (inode buffer, dquot buffer),
- *	the first 32 bits of the buffer (most blocks),
- *	inside a struct xfs_da_blkinfo at the start of the buffer.
- */
-static void
-xlog_recover_validate_buf_type(
-	struct xfs_mount	*mp,
-	struct xfs_buf		*bp,
-	xfs_buf_log_format_t	*buf_f,
-	xfs_lsn_t		current_lsn)
-{
-	struct xfs_da_blkinfo	*info = bp->b_addr;
-	uint32_t		magic32;
-	uint16_t		magic16;
-	uint16_t		magicda;
-	char			*warnmsg = NULL;
-
-	/*
-	 * We can only do post recovery validation on items on CRC enabled
-	 * fielsystems as we need to know when the buffer was written to be able
-	 * to determine if we should have replayed the item. If we replay old
-	 * metadata over a newer buffer, then it will enter a temporarily
-	 * inconsistent state resulting in verification failures. Hence for now
-	 * just avoid the verification stage for non-crc filesystems
-	 */
-	if (!xfs_sb_version_hascrc(&mp->m_sb))
-		return;
-
-	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
-	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
-	magicda = be16_to_cpu(info->magic);
-	switch (xfs_blft_from_flags(buf_f)) {
-	case XFS_BLFT_BTREE_BUF:
-		switch (magic32) {
-		case XFS_ABTB_CRC_MAGIC:
-		case XFS_ABTB_MAGIC:
-			bp->b_ops = &xfs_bnobt_buf_ops;
-			break;
-		case XFS_ABTC_CRC_MAGIC:
-		case XFS_ABTC_MAGIC:
-			bp->b_ops = &xfs_cntbt_buf_ops;
-			break;
-		case XFS_IBT_CRC_MAGIC:
-		case XFS_IBT_MAGIC:
-			bp->b_ops = &xfs_inobt_buf_ops;
-			break;
-		case XFS_FIBT_CRC_MAGIC:
-		case XFS_FIBT_MAGIC:
-			bp->b_ops = &xfs_finobt_buf_ops;
-			break;
-		case XFS_BMAP_CRC_MAGIC:
-		case XFS_BMAP_MAGIC:
-			bp->b_ops = &xfs_bmbt_buf_ops;
-			break;
-		case XFS_RMAP_CRC_MAGIC:
-			bp->b_ops = &xfs_rmapbt_buf_ops;
-			break;
-		case XFS_REFC_CRC_MAGIC:
-			bp->b_ops = &xfs_refcountbt_buf_ops;
-			break;
-		default:
-			warnmsg = "Bad btree block magic!";
-			break;
-		}
-		break;
-	case XFS_BLFT_AGF_BUF:
-		if (magic32 != XFS_AGF_MAGIC) {
-			warnmsg = "Bad AGF block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_agf_buf_ops;
-		break;
-	case XFS_BLFT_AGFL_BUF:
-		if (magic32 != XFS_AGFL_MAGIC) {
-			warnmsg = "Bad AGFL block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_agfl_buf_ops;
-		break;
-	case XFS_BLFT_AGI_BUF:
-		if (magic32 != XFS_AGI_MAGIC) {
-			warnmsg = "Bad AGI block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_agi_buf_ops;
-		break;
-	case XFS_BLFT_UDQUOT_BUF:
-	case XFS_BLFT_PDQUOT_BUF:
-	case XFS_BLFT_GDQUOT_BUF:
-#ifdef CONFIG_XFS_QUOTA
-		if (magic16 != XFS_DQUOT_MAGIC) {
-			warnmsg = "Bad DQUOT block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_dquot_buf_ops;
-#else
-		xfs_alert(mp,
-	"Trying to recover dquots without QUOTA support built in!");
-		ASSERT(0);
-#endif
-		break;
-	case XFS_BLFT_DINO_BUF:
-		if (magic16 != XFS_DINODE_MAGIC) {
-			warnmsg = "Bad INODE block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_inode_buf_ops;
-		break;
-	case XFS_BLFT_SYMLINK_BUF:
-		if (magic32 != XFS_SYMLINK_MAGIC) {
-			warnmsg = "Bad symlink block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_symlink_buf_ops;
-		break;
-	case XFS_BLFT_DIR_BLOCK_BUF:
-		if (magic32 != XFS_DIR2_BLOCK_MAGIC &&
-		    magic32 != XFS_DIR3_BLOCK_MAGIC) {
-			warnmsg = "Bad dir block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_dir3_block_buf_ops;
-		break;
-	case XFS_BLFT_DIR_DATA_BUF:
-		if (magic32 != XFS_DIR2_DATA_MAGIC &&
-		    magic32 != XFS_DIR3_DATA_MAGIC) {
-			warnmsg = "Bad dir data magic!";
-			break;
-		}
-		bp->b_ops = &xfs_dir3_data_buf_ops;
-		break;
-	case XFS_BLFT_DIR_FREE_BUF:
-		if (magic32 != XFS_DIR2_FREE_MAGIC &&
-		    magic32 != XFS_DIR3_FREE_MAGIC) {
-			warnmsg = "Bad dir3 free magic!";
-			break;
-		}
-		bp->b_ops = &xfs_dir3_free_buf_ops;
-		break;
-	case XFS_BLFT_DIR_LEAF1_BUF:
-		if (magicda != XFS_DIR2_LEAF1_MAGIC &&
-		    magicda != XFS_DIR3_LEAF1_MAGIC) {
-			warnmsg = "Bad dir leaf1 magic!";
-			break;
-		}
-		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
-		break;
-	case XFS_BLFT_DIR_LEAFN_BUF:
-		if (magicda != XFS_DIR2_LEAFN_MAGIC &&
-		    magicda != XFS_DIR3_LEAFN_MAGIC) {
-			warnmsg = "Bad dir leafn magic!";
-			break;
-		}
-		bp->b_ops = &xfs_dir3_leafn_buf_ops;
-		break;
-	case XFS_BLFT_DA_NODE_BUF:
-		if (magicda != XFS_DA_NODE_MAGIC &&
-		    magicda != XFS_DA3_NODE_MAGIC) {
-			warnmsg = "Bad da node magic!";
-			break;
-		}
-		bp->b_ops = &xfs_da3_node_buf_ops;
-		break;
-	case XFS_BLFT_ATTR_LEAF_BUF:
-		if (magicda != XFS_ATTR_LEAF_MAGIC &&
-		    magicda != XFS_ATTR3_LEAF_MAGIC) {
-			warnmsg = "Bad attr leaf magic!";
-			break;
-		}
-		bp->b_ops = &xfs_attr3_leaf_buf_ops;
-		break;
-	case XFS_BLFT_ATTR_RMT_BUF:
-		if (magic32 != XFS_ATTR3_RMT_MAGIC) {
-			warnmsg = "Bad attr remote magic!";
-			break;
-		}
-		bp->b_ops = &xfs_attr3_rmt_buf_ops;
-		break;
-	case XFS_BLFT_SB_BUF:
-		if (magic32 != XFS_SB_MAGIC) {
-			warnmsg = "Bad SB block magic!";
-			break;
-		}
-		bp->b_ops = &xfs_sb_buf_ops;
-		break;
-#ifdef CONFIG_XFS_RT
-	case XFS_BLFT_RTBITMAP_BUF:
-	case XFS_BLFT_RTSUMMARY_BUF:
-		/* no magic numbers for verification of RT buffers */
-		bp->b_ops = &xfs_rtbuf_ops;
-		break;
-#endif /* CONFIG_XFS_RT */
-	default:
-		xfs_warn(mp, "Unknown buffer type %d!",
-			 xfs_blft_from_flags(buf_f));
-		break;
-	}
-
-	/*
-	 * Nothing else to do in the case of a NULL current LSN as this means
-	 * the buffer is more recent than the change in the log and will be
-	 * skipped.
-	 */
-	if (current_lsn == NULLCOMMITLSN)
-		return;
-
-	if (warnmsg) {
-		xfs_warn(mp, warnmsg);
-		ASSERT(0);
-	}
-
-	/*
-	 * We must update the metadata LSN of the buffer as it is written out to
-	 * ensure that older transactions never replay over this one and corrupt
-	 * the buffer. This can occur if log recovery is interrupted at some
-	 * point after the current transaction completes, at which point a
-	 * subsequent mount starts recovery from the beginning.
-	 *
-	 * Write verifiers update the metadata LSN from log items attached to
-	 * the buffer. Therefore, initialize a bli purely to carry the LSN to
-	 * the verifier. We'll clean it up in our ->iodone() callback.
-	 */
-	if (bp->b_ops) {
-		struct xfs_buf_log_item	*bip;
-
-		ASSERT(!bp->b_iodone || bp->b_iodone == xlog_recover_iodone);
-		bp->b_iodone = xlog_recover_iodone;
-		xfs_buf_item_init(bp, mp);
-		bip = bp->b_log_item;
-		bip->bli_item.li_lsn = current_lsn;
-	}
-}
-
-/*
- * Perform a 'normal' buffer recovery.  Each logged region of the
- * buffer should be copied over the corresponding region in the
- * given buffer.  The bitmap in the buf log format structure indicates
- * where to place the logged data.
- */
-STATIC void
-xlog_recover_do_reg_buffer(
-	struct xfs_mount	*mp,
-	xlog_recover_item_t	*item,
-	struct xfs_buf		*bp,
-	xfs_buf_log_format_t	*buf_f,
-	xfs_lsn_t		current_lsn)
-{
-	int			i;
-	int			bit;
-	int			nbits;
-	xfs_failaddr_t		fa;
-	const size_t		size_disk_dquot = sizeof(struct xfs_disk_dquot);
-
-	trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
-
-	bit = 0;
-	i = 1;  /* 0 is the buf format structure */
-	while (1) {
-		bit = xfs_next_bit(buf_f->blf_data_map,
-				   buf_f->blf_map_size, bit);
-		if (bit == -1)
-			break;
-		nbits = xfs_contig_bits(buf_f->blf_data_map,
-					buf_f->blf_map_size, bit);
-		ASSERT(nbits > 0);
-		ASSERT(item->ri_buf[i].i_addr != NULL);
-		ASSERT(item->ri_buf[i].i_len % XFS_BLF_CHUNK == 0);
-		ASSERT(BBTOB(bp->b_length) >=
-		       ((uint)bit << XFS_BLF_SHIFT) + (nbits << XFS_BLF_SHIFT));
-
-		/*
-		 * The dirty regions logged in the buffer, even though
-		 * contiguous, may span multiple chunks. This is because the
-		 * dirty region may span a physical page boundary in a buffer
-		 * and hence be split into two separate vectors for writing into
-		 * the log. Hence we need to trim nbits back to the length of
-		 * the current region being copied out of the log.
-		 */
-		if (item->ri_buf[i].i_len < (nbits << XFS_BLF_SHIFT))
-			nbits = item->ri_buf[i].i_len >> XFS_BLF_SHIFT;
-
-		/*
-		 * Do a sanity check if this is a dquot buffer. Just checking
-		 * the first dquot in the buffer should do. XXXThis is
-		 * probably a good thing to do for other buf types also.
-		 */
-		fa = NULL;
-		if (buf_f->blf_flags &
-		   (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
-			if (item->ri_buf[i].i_addr == NULL) {
-				xfs_alert(mp,
-					"XFS: NULL dquot in %s.", __func__);
-				goto next;
-			}
-			if (item->ri_buf[i].i_len < size_disk_dquot) {
-				xfs_alert(mp,
-					"XFS: dquot too small (%d) in %s.",
-					item->ri_buf[i].i_len, __func__);
-				goto next;
-			}
-			fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr,
-					       -1, 0);
-			if (fa) {
-				xfs_alert(mp,
-	"dquot corrupt at %pS trying to replay into block 0x%llx",
-					fa, bp->b_bn);
-				goto next;
-			}
-		}
-
-		memcpy(xfs_buf_offset(bp,
-			(uint)bit << XFS_BLF_SHIFT),	/* dest */
-			item->ri_buf[i].i_addr,		/* source */
-			nbits<<XFS_BLF_SHIFT);		/* length */
- next:
-		i++;
-		bit += nbits;
-	}
-
-	/* Shouldn't be any more regions */
-	ASSERT(i == item->ri_total);
-
-	xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
-}
-
-/*
- * Perform a dquot buffer recovery.
- * Simple algorithm: if we have found a QUOTAOFF log item of the same type
- * (ie. USR or GRP), then just toss this buffer away; don't recover it.
- * Else, treat it as a regular buffer and do recovery.
- *
- * Return false if the buffer was tossed and true if we recovered the buffer to
- * indicate to the caller if the buffer needs writing.
- */
-STATIC bool
-xlog_recover_do_dquot_buffer(
-	struct xfs_mount		*mp,
-	struct xlog			*log,
-	struct xlog_recover_item	*item,
-	struct xfs_buf			*bp,
-	struct xfs_buf_log_format	*buf_f)
-{
-	uint			type;
-
-	trace_xfs_log_recover_buf_dquot_buf(log, buf_f);
-
-	/*
-	 * Filesystems are required to send in quota flags at mount time.
-	 */
-	if (!mp->m_qflags)
-		return false;
-
-	type = 0;
-	if (buf_f->blf_flags & XFS_BLF_UDQUOT_BUF)
-		type |= XFS_DQ_USER;
-	if (buf_f->blf_flags & XFS_BLF_PDQUOT_BUF)
-		type |= XFS_DQ_PROJ;
-	if (buf_f->blf_flags & XFS_BLF_GDQUOT_BUF)
-		type |= XFS_DQ_GROUP;
-	/*
-	 * This type of quotas was turned off, so ignore this buffer
-	 */
-	if (log->l_quotaoffs_flag & type)
-		return false;
-
-	xlog_recover_do_reg_buffer(mp, item, bp, buf_f, NULLCOMMITLSN);
-	return true;
-}
-
-/*
- * This routine replays a modification made to a buffer at runtime.
- * There are actually two types of buffer, regular and inode, which
- * are handled differently.  Inode buffers are handled differently
- * in that we only recover a specific set of data from them, namely
- * the inode di_next_unlinked fields.  This is because all other inode
- * data is actually logged via inode records and any data we replay
- * here which overlaps that may be stale.
- *
- * When meta-data buffers are freed at run time we log a buffer item
- * with the XFS_BLF_CANCEL bit set to indicate that previous copies
- * of the buffer in the log should not be replayed at recovery time.
- * This is so that if the blocks covered by the buffer are reused for
- * file data before we crash we don't end up replaying old, freed
- * meta-data into a user's file.
- *
- * To handle the cancellation of buffer log items, we make two passes
- * over the log during recovery.  During the first we build a table of
- * those buffers which have been cancelled, and during the second we
- * only replay those buffers which do not have corresponding cancel
- * records in the table.  See xlog_recover_buffer_pass[1,2] above
- * for more details on the implementation of the table of cancel records.
- */
-STATIC int
-xlog_recover_buffer_pass2(
-	struct xlog			*log,
-	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			current_lsn)
-{
-	xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
-	xfs_mount_t		*mp = log->l_mp;
-	xfs_buf_t		*bp;
-	int			error;
-	uint			buf_flags;
-	xfs_lsn_t		lsn;
-
-	/*
-	 * In this pass we only want to recover all the buffers which have
-	 * not been cancelled and are not cancellation buffers themselves.
-	 */
-	if (xlog_check_buffer_cancelled(log, buf_f->blf_blkno,
-			buf_f->blf_len, buf_f->blf_flags)) {
-		trace_xfs_log_recover_buf_cancel(log, buf_f);
-		return 0;
-	}
-
-	trace_xfs_log_recover_buf_recover(log, buf_f);
-
-	buf_flags = 0;
-	if (buf_f->blf_flags & XFS_BLF_INODE_BUF)
-		buf_flags |= XBF_UNMAPPED;
-
-	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno, buf_f->blf_len,
-			  buf_flags, &bp, NULL);
-	if (error)
-		return error;
-
-	/*
-	 * Recover the buffer only if we get an LSN from it and it's less than
-	 * the lsn of the transaction we are replaying.
-	 *
-	 * Note that we have to be extremely careful of readahead here.
-	 * Readahead does not attach verfiers to the buffers so if we don't
-	 * actually do any replay after readahead because of the LSN we found
-	 * in the buffer if more recent than that current transaction then we
-	 * need to attach the verifier directly. Failure to do so can lead to
-	 * future recovery actions (e.g. EFI and unlinked list recovery) can
-	 * operate on the buffers and they won't get the verifier attached. This
-	 * can lead to blocks on disk having the correct content but a stale
-	 * CRC.
-	 *
-	 * It is safe to assume these clean buffers are currently up to date.
-	 * If the buffer is dirtied by a later transaction being replayed, then
-	 * the verifier will be reset to match whatever recover turns that
-	 * buffer into.
-	 */
-	lsn = xlog_recover_get_buf_lsn(mp, bp);
-	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
-		trace_xfs_log_recover_buf_skip(log, buf_f);
-		xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
-		goto out_release;
-	}
-
-	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
-		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
-		if (error)
-			goto out_release;
-	} else if (buf_f->blf_flags &
-		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
-		bool	dirty;
-
-		dirty = xlog_recover_do_dquot_buffer(mp, log, item, bp, buf_f);
-		if (!dirty)
-			goto out_release;
-	} else {
-		xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
-	}
-
-	/*
-	 * Perform delayed write on the buffer.  Asynchronous writes will be
-	 * slower when taking into account all the buffers to be flushed.
-	 *
-	 * Also make sure that only inode buffers with good sizes stay in
-	 * the buffer cache.  The kernel moves inodes in buffers of 1 block
-	 * or inode_cluster_size bytes, whichever is bigger.  The inode
-	 * buffers in the log can be a different size if the log was generated
-	 * by an older kernel using unclustered inode buffers or a newer kernel
-	 * running with a different inode cluster size.  Regardless, if the
-	 * the inode buffer size isn't max(blocksize, inode_cluster_size)
-	 * for *our* value of inode_cluster_size, then we need to keep
-	 * the buffer out of the buffer cache so that the buffer won't
-	 * overlap with future reads of those inodes.
-	 */
-	if (XFS_DINODE_MAGIC ==
-	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
-	    (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size)) {
-		xfs_buf_stale(bp);
-		error = xfs_bwrite(bp);
-	} else {
-		ASSERT(bp->b_mount == mp);
-		bp->b_iodone = xlog_recover_iodone;
-		xfs_buf_delwri_queue(bp, buffer_list);
-	}
-
-out_release:
-	xfs_buf_relse(bp);
-	return error;
-}
-
 /*
  * Inode fork owner changes
  *
@@ -3831,10 +3054,11 @@ xlog_recover_commit_pass2(
 {
 	trace_xfs_log_recover_item_recover(log, trans, item, XLOG_RECOVER_PASS2);
 
+	if (item->ri_type && item->ri_type->commit_pass2_fn)
+		return item->ri_type->commit_pass2_fn(log, buffer_list, item,
+				trans->r_lsn);
+
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_BUF:
-		return xlog_recover_buffer_pass2(log, buffer_list, item,
-						 trans->r_lsn);
 	case XFS_LI_INODE:
 		return xlog_recover_inode_pass2(log, buffer_list, item,
 						 trans->r_lsn);


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/19] xfs: refactor log recovery inode item dispatch for pass2 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (4 preceding siblings ...)
  2020-04-22  2:06 ` [PATCH 05/19] xfs: refactor log recovery buffer item dispatch for pass2 " Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-22  2:06 ` [PATCH 07/19] xfs: refactor log recovery intent " Darrick J. Wong
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the log inode item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode_item.c  |  356 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_log_recover.c |  356 ----------------------------------------------
 2 files changed, 356 insertions(+), 356 deletions(-)


diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index bb0fe064aa1b..6efc192e86ac 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -20,6 +20,8 @@
 #include "xfs_error.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
+#include "xfs_icache.h"
+#include "xfs_bmap_btree.h"
 
 #include <linux/iversion.h>
 
@@ -873,7 +875,361 @@ xlog_recover_inode_ra_pass2(
 				ilfp->ilf_len, &xfs_inode_buf_ra_ops);
 }
 
+/*
+ * Inode fork owner changes
+ *
+ * If we have been told that we have to reparent the inode fork, it's because an
+ * extent swap operation on a CRC enabled filesystem has been done and we are
+ * replaying it. We need to walk the BMBT of the appropriate fork and change the
+ * owners of it.
+ *
+ * The complexity here is that we don't have an inode context to work with, so
+ * after we've replayed the inode we need to instantiate one.  This is where the
+ * fun begins.
+ *
+ * We are in the middle of log recovery, so we can't run transactions. That
+ * means we cannot use cache coherent inode instantiation via xfs_iget(), as
+ * that will result in the corresponding iput() running the inode through
+ * xfs_inactive(). If we've just replayed an inode core that changes the link
+ * count to zero (i.e. it's been unlinked), then xfs_inactive() will run
+ * transactions (bad!).
+ *
+ * So, to avoid this, we instantiate an inode directly from the inode core we've
+ * just recovered. We have the buffer still locked, and all we really need to
+ * instantiate is the inode core and the forks being modified. We can do this
+ * manually, then run the inode btree owner change, and then tear down the
+ * xfs_inode without having to run any transactions at all.
+ *
+ * Also, because we don't have a transaction context available here but need to
+ * gather all the buffers we modify for writeback so we pass the buffer_list
+ * instead for the operation to use.
+ */
+
+STATIC int
+xfs_recover_inode_owner_change(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip,
+	struct xfs_inode_log_format *in_f,
+	struct list_head	*buffer_list)
+{
+	struct xfs_inode	*ip;
+	int			error;
+
+	ASSERT(in_f->ilf_fields & (XFS_ILOG_DOWNER|XFS_ILOG_AOWNER));
+
+	ip = xfs_inode_alloc(mp, in_f->ilf_ino);
+	if (!ip)
+		return -ENOMEM;
+
+	/* instantiate the inode */
+	ASSERT(dip->di_version >= 3);
+	xfs_inode_from_disk(ip, dip);
+
+	error = xfs_iformat_fork(ip, dip);
+	if (error)
+		goto out_free_ip;
+
+	if (!xfs_inode_verify_forks(ip)) {
+		error = -EFSCORRUPTED;
+		goto out_free_ip;
+	}
+
+	if (in_f->ilf_fields & XFS_ILOG_DOWNER) {
+		ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT);
+		error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK,
+					      ip->i_ino, buffer_list);
+		if (error)
+			goto out_free_ip;
+	}
+
+	if (in_f->ilf_fields & XFS_ILOG_AOWNER) {
+		ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT);
+		error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK,
+					      ip->i_ino, buffer_list);
+		if (error)
+			goto out_free_ip;
+	}
+
+out_free_ip:
+	xfs_inode_free(ip);
+	return error;
+}
+
+STATIC int
+xlog_recover_inode_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
+{
+	struct xfs_inode_log_format	*in_f;
+	xfs_mount_t		*mp = log->l_mp;
+	xfs_buf_t		*bp;
+	xfs_dinode_t		*dip;
+	int			len;
+	char			*src;
+	char			*dest;
+	int			error;
+	int			attr_index;
+	uint			fields;
+	struct xfs_log_dinode	*ldip;
+	uint			isize;
+	int			need_free = 0;
+
+	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
+		in_f = item->ri_buf[0].i_addr;
+	} else {
+		in_f = kmem_alloc(sizeof(struct xfs_inode_log_format), 0);
+		need_free = 1;
+		error = xfs_inode_item_format_convert(&item->ri_buf[0], in_f);
+		if (error)
+			goto error;
+	}
+
+	/*
+	 * Inode buffers can be freed, look out for it,
+	 * and do not replay the inode.
+	 */
+	if (xlog_check_buffer_cancelled(log, in_f->ilf_blkno,
+					in_f->ilf_len, 0)) {
+		error = 0;
+		trace_xfs_log_recover_inode_cancel(log, in_f);
+		goto error;
+	}
+	trace_xfs_log_recover_inode_recover(log, in_f);
+
+	error = xfs_buf_read(mp->m_ddev_targp, in_f->ilf_blkno, in_f->ilf_len,
+			0, &bp, &xfs_inode_buf_ops);
+	if (error)
+		goto error;
+	ASSERT(in_f->ilf_fields & XFS_ILOG_CORE);
+	dip = xfs_buf_offset(bp, in_f->ilf_boffset);
+
+	/*
+	 * Make sure the place we're flushing out to really looks
+	 * like an inode!
+	 */
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_magic16(bp, dip->di_magic))) {
+		xfs_alert(mp,
+	"%s: Bad inode magic number, dip = "PTR_FMT", dino bp = "PTR_FMT", ino = %Ld",
+			__func__, dip, bp, in_f->ilf_ino);
+		error = -EFSCORRUPTED;
+		goto out_release;
+	}
+	ldip = item->ri_buf[1].i_addr;
+	if (XFS_IS_CORRUPT(mp, ldip->di_magic != XFS_DINODE_MAGIC)) {
+		xfs_alert(mp,
+			"%s: Bad inode log record, rec ptr "PTR_FMT", ino %Ld",
+			__func__, item, in_f->ilf_ino);
+		error = -EFSCORRUPTED;
+		goto out_release;
+	}
+
+	/*
+	 * If the inode has an LSN in it, recover the inode only if it's less
+	 * than the lsn of the transaction we are replaying. Note: we still
+	 * need to replay an owner change even though the inode is more recent
+	 * than the transaction as there is no guarantee that all the btree
+	 * blocks are more recent than this transaction, too.
+	 */
+	if (dip->di_version >= 3) {
+		xfs_lsn_t	lsn = be64_to_cpu(dip->di_lsn);
+
+		if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
+			trace_xfs_log_recover_inode_skip(log, in_f);
+			error = 0;
+			goto out_owner_change;
+		}
+	}
+
+	/*
+	 * di_flushiter is only valid for v1/2 inodes. All changes for v3 inodes
+	 * are transactional and if ordering is necessary we can determine that
+	 * more accurately by the LSN field in the V3 inode core. Don't trust
+	 * the inode versions we might be changing them here - use the
+	 * superblock flag to determine whether we need to look at di_flushiter
+	 * to skip replay when the on disk inode is newer than the log one
+	 */
+	if (!xfs_sb_version_has_v3inode(&mp->m_sb) &&
+	    ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) {
+		/*
+		 * Deal with the wrap case, DI_MAX_FLUSH is less
+		 * than smaller numbers
+		 */
+		if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH &&
+		    ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) {
+			/* do nothing */
+		} else {
+			trace_xfs_log_recover_inode_skip(log, in_f);
+			error = 0;
+			goto out_release;
+		}
+	}
+
+	/* Take the opportunity to reset the flush iteration count */
+	ldip->di_flushiter = 0;
+
+	if (unlikely(S_ISREG(ldip->di_mode))) {
+		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
+		    (ldip->di_format != XFS_DINODE_FMT_BTREE)) {
+			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
+					 XFS_ERRLEVEL_LOW, mp, ldip,
+					 sizeof(*ldip));
+			xfs_alert(mp,
+		"%s: Bad regular inode log record, rec ptr "PTR_FMT", "
+		"ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
+				__func__, item, dip, bp, in_f->ilf_ino);
+			error = -EFSCORRUPTED;
+			goto out_release;
+		}
+	} else if (unlikely(S_ISDIR(ldip->di_mode))) {
+		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
+		    (ldip->di_format != XFS_DINODE_FMT_BTREE) &&
+		    (ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
+			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(4)",
+					     XFS_ERRLEVEL_LOW, mp, ldip,
+					     sizeof(*ldip));
+			xfs_alert(mp,
+		"%s: Bad dir inode log record, rec ptr "PTR_FMT", "
+		"ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
+				__func__, item, dip, bp, in_f->ilf_ino);
+			error = -EFSCORRUPTED;
+			goto out_release;
+		}
+	}
+	if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
+		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
+				     XFS_ERRLEVEL_LOW, mp, ldip,
+				     sizeof(*ldip));
+		xfs_alert(mp,
+	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
+	"dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
+			__func__, item, dip, bp, in_f->ilf_ino,
+			ldip->di_nextents + ldip->di_anextents,
+			ldip->di_nblocks);
+		error = -EFSCORRUPTED;
+		goto out_release;
+	}
+	if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
+		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
+				     XFS_ERRLEVEL_LOW, mp, ldip,
+				     sizeof(*ldip));
+		xfs_alert(mp,
+	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
+	"dino bp "PTR_FMT", ino %Ld, forkoff 0x%x", __func__,
+			item, dip, bp, in_f->ilf_ino, ldip->di_forkoff);
+		error = -EFSCORRUPTED;
+		goto out_release;
+	}
+	isize = xfs_log_dinode_size(mp);
+	if (unlikely(item->ri_buf[1].i_len > isize)) {
+		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
+				     XFS_ERRLEVEL_LOW, mp, ldip,
+				     sizeof(*ldip));
+		xfs_alert(mp,
+			"%s: Bad inode log record length %d, rec ptr "PTR_FMT,
+			__func__, item->ri_buf[1].i_len, item);
+		error = -EFSCORRUPTED;
+		goto out_release;
+	}
+
+	/* recover the log dinode inode into the on disk inode */
+	xfs_log_dinode_to_disk(ldip, dip);
+
+	fields = in_f->ilf_fields;
+	if (fields & XFS_ILOG_DEV)
+		xfs_dinode_put_rdev(dip, in_f->ilf_u.ilfu_rdev);
+
+	if (in_f->ilf_size == 2)
+		goto out_owner_change;
+	len = item->ri_buf[2].i_len;
+	src = item->ri_buf[2].i_addr;
+	ASSERT(in_f->ilf_size <= 4);
+	ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK));
+	ASSERT(!(fields & XFS_ILOG_DFORK) ||
+	       (len == in_f->ilf_dsize));
+
+	switch (fields & XFS_ILOG_DFORK) {
+	case XFS_ILOG_DDATA:
+	case XFS_ILOG_DEXT:
+		memcpy(XFS_DFORK_DPTR(dip), src, len);
+		break;
+
+	case XFS_ILOG_DBROOT:
+		xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src, len,
+				 (xfs_bmdr_block_t *)XFS_DFORK_DPTR(dip),
+				 XFS_DFORK_DSIZE(dip, mp));
+		break;
+
+	default:
+		/*
+		 * There are no data fork flags set.
+		 */
+		ASSERT((fields & XFS_ILOG_DFORK) == 0);
+		break;
+	}
+
+	/*
+	 * If we logged any attribute data, recover it.  There may or
+	 * may not have been any other non-core data logged in this
+	 * transaction.
+	 */
+	if (in_f->ilf_fields & XFS_ILOG_AFORK) {
+		if (in_f->ilf_fields & XFS_ILOG_DFORK) {
+			attr_index = 3;
+		} else {
+			attr_index = 2;
+		}
+		len = item->ri_buf[attr_index].i_len;
+		src = item->ri_buf[attr_index].i_addr;
+		ASSERT(len == in_f->ilf_asize);
+
+		switch (in_f->ilf_fields & XFS_ILOG_AFORK) {
+		case XFS_ILOG_ADATA:
+		case XFS_ILOG_AEXT:
+			dest = XFS_DFORK_APTR(dip);
+			ASSERT(len <= XFS_DFORK_ASIZE(dip, mp));
+			memcpy(dest, src, len);
+			break;
+
+		case XFS_ILOG_ABROOT:
+			dest = XFS_DFORK_APTR(dip);
+			xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src,
+					 len, (xfs_bmdr_block_t*)dest,
+					 XFS_DFORK_ASIZE(dip, mp));
+			break;
+
+		default:
+			xfs_warn(log->l_mp, "%s: Invalid flag", __func__);
+			ASSERT(0);
+			error = -EFSCORRUPTED;
+			goto out_release;
+		}
+	}
+
+out_owner_change:
+	/* Recover the swapext owner change unless inode has been deleted */
+	if ((in_f->ilf_fields & (XFS_ILOG_DOWNER|XFS_ILOG_AOWNER)) &&
+	    (dip->di_mode != 0))
+		error = xfs_recover_inode_owner_change(mp, dip, in_f,
+						       buffer_list);
+	/* re-generate the checksum. */
+	xfs_dinode_calc_crc(log->l_mp, dip);
+
+	ASSERT(bp->b_mount == mp);
+	bp->b_iodone = xlog_recover_iodone;
+	xfs_buf_delwri_queue(bp, buffer_list);
+
+out_release:
+	xfs_buf_relse(bp);
+error:
+	if (need_free)
+		kmem_free(in_f);
+	return error;
+}
+
 const struct xlog_recover_item_type xlog_inode_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
 	.ra_pass2_fn		= xlog_recover_inode_ra_pass2,
+	.commit_pass2_fn	= xlog_recover_inode_pass2,
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c620117fba65..2e4f400d3f6e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2007,359 +2007,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * Inode fork owner changes
- *
- * If we have been told that we have to reparent the inode fork, it's because an
- * extent swap operation on a CRC enabled filesystem has been done and we are
- * replaying it. We need to walk the BMBT of the appropriate fork and change the
- * owners of it.
- *
- * The complexity here is that we don't have an inode context to work with, so
- * after we've replayed the inode we need to instantiate one.  This is where the
- * fun begins.
- *
- * We are in the middle of log recovery, so we can't run transactions. That
- * means we cannot use cache coherent inode instantiation via xfs_iget(), as
- * that will result in the corresponding iput() running the inode through
- * xfs_inactive(). If we've just replayed an inode core that changes the link
- * count to zero (i.e. it's been unlinked), then xfs_inactive() will run
- * transactions (bad!).
- *
- * So, to avoid this, we instantiate an inode directly from the inode core we've
- * just recovered. We have the buffer still locked, and all we really need to
- * instantiate is the inode core and the forks being modified. We can do this
- * manually, then run the inode btree owner change, and then tear down the
- * xfs_inode without having to run any transactions at all.
- *
- * Also, because we don't have a transaction context available here but need to
- * gather all the buffers we modify for writeback so we pass the buffer_list
- * instead for the operation to use.
- */
-
-STATIC int
-xfs_recover_inode_owner_change(
-	struct xfs_mount	*mp,
-	struct xfs_dinode	*dip,
-	struct xfs_inode_log_format *in_f,
-	struct list_head	*buffer_list)
-{
-	struct xfs_inode	*ip;
-	int			error;
-
-	ASSERT(in_f->ilf_fields & (XFS_ILOG_DOWNER|XFS_ILOG_AOWNER));
-
-	ip = xfs_inode_alloc(mp, in_f->ilf_ino);
-	if (!ip)
-		return -ENOMEM;
-
-	/* instantiate the inode */
-	ASSERT(dip->di_version >= 3);
-	xfs_inode_from_disk(ip, dip);
-
-	error = xfs_iformat_fork(ip, dip);
-	if (error)
-		goto out_free_ip;
-
-	if (!xfs_inode_verify_forks(ip)) {
-		error = -EFSCORRUPTED;
-		goto out_free_ip;
-	}
-
-	if (in_f->ilf_fields & XFS_ILOG_DOWNER) {
-		ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT);
-		error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK,
-					      ip->i_ino, buffer_list);
-		if (error)
-			goto out_free_ip;
-	}
-
-	if (in_f->ilf_fields & XFS_ILOG_AOWNER) {
-		ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT);
-		error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK,
-					      ip->i_ino, buffer_list);
-		if (error)
-			goto out_free_ip;
-	}
-
-out_free_ip:
-	xfs_inode_free(ip);
-	return error;
-}
-
-STATIC int
-xlog_recover_inode_pass2(
-	struct xlog			*log,
-	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			current_lsn)
-{
-	struct xfs_inode_log_format	*in_f;
-	xfs_mount_t		*mp = log->l_mp;
-	xfs_buf_t		*bp;
-	xfs_dinode_t		*dip;
-	int			len;
-	char			*src;
-	char			*dest;
-	int			error;
-	int			attr_index;
-	uint			fields;
-	struct xfs_log_dinode	*ldip;
-	uint			isize;
-	int			need_free = 0;
-
-	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
-		in_f = item->ri_buf[0].i_addr;
-	} else {
-		in_f = kmem_alloc(sizeof(struct xfs_inode_log_format), 0);
-		need_free = 1;
-		error = xfs_inode_item_format_convert(&item->ri_buf[0], in_f);
-		if (error)
-			goto error;
-	}
-
-	/*
-	 * Inode buffers can be freed, look out for it,
-	 * and do not replay the inode.
-	 */
-	if (xlog_check_buffer_cancelled(log, in_f->ilf_blkno,
-					in_f->ilf_len, 0)) {
-		error = 0;
-		trace_xfs_log_recover_inode_cancel(log, in_f);
-		goto error;
-	}
-	trace_xfs_log_recover_inode_recover(log, in_f);
-
-	error = xfs_buf_read(mp->m_ddev_targp, in_f->ilf_blkno, in_f->ilf_len,
-			0, &bp, &xfs_inode_buf_ops);
-	if (error)
-		goto error;
-	ASSERT(in_f->ilf_fields & XFS_ILOG_CORE);
-	dip = xfs_buf_offset(bp, in_f->ilf_boffset);
-
-	/*
-	 * Make sure the place we're flushing out to really looks
-	 * like an inode!
-	 */
-	if (XFS_IS_CORRUPT(mp, !xfs_verify_magic16(bp, dip->di_magic))) {
-		xfs_alert(mp,
-	"%s: Bad inode magic number, dip = "PTR_FMT", dino bp = "PTR_FMT", ino = %Ld",
-			__func__, dip, bp, in_f->ilf_ino);
-		error = -EFSCORRUPTED;
-		goto out_release;
-	}
-	ldip = item->ri_buf[1].i_addr;
-	if (XFS_IS_CORRUPT(mp, ldip->di_magic != XFS_DINODE_MAGIC)) {
-		xfs_alert(mp,
-			"%s: Bad inode log record, rec ptr "PTR_FMT", ino %Ld",
-			__func__, item, in_f->ilf_ino);
-		error = -EFSCORRUPTED;
-		goto out_release;
-	}
-
-	/*
-	 * If the inode has an LSN in it, recover the inode only if it's less
-	 * than the lsn of the transaction we are replaying. Note: we still
-	 * need to replay an owner change even though the inode is more recent
-	 * than the transaction as there is no guarantee that all the btree
-	 * blocks are more recent than this transaction, too.
-	 */
-	if (dip->di_version >= 3) {
-		xfs_lsn_t	lsn = be64_to_cpu(dip->di_lsn);
-
-		if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
-			trace_xfs_log_recover_inode_skip(log, in_f);
-			error = 0;
-			goto out_owner_change;
-		}
-	}
-
-	/*
-	 * di_flushiter is only valid for v1/2 inodes. All changes for v3 inodes
-	 * are transactional and if ordering is necessary we can determine that
-	 * more accurately by the LSN field in the V3 inode core. Don't trust
-	 * the inode versions we might be changing them here - use the
-	 * superblock flag to determine whether we need to look at di_flushiter
-	 * to skip replay when the on disk inode is newer than the log one
-	 */
-	if (!xfs_sb_version_has_v3inode(&mp->m_sb) &&
-	    ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) {
-		/*
-		 * Deal with the wrap case, DI_MAX_FLUSH is less
-		 * than smaller numbers
-		 */
-		if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH &&
-		    ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) {
-			/* do nothing */
-		} else {
-			trace_xfs_log_recover_inode_skip(log, in_f);
-			error = 0;
-			goto out_release;
-		}
-	}
-
-	/* Take the opportunity to reset the flush iteration count */
-	ldip->di_flushiter = 0;
-
-	if (unlikely(S_ISREG(ldip->di_mode))) {
-		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
-		    (ldip->di_format != XFS_DINODE_FMT_BTREE)) {
-			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
-					 XFS_ERRLEVEL_LOW, mp, ldip,
-					 sizeof(*ldip));
-			xfs_alert(mp,
-		"%s: Bad regular inode log record, rec ptr "PTR_FMT", "
-		"ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
-				__func__, item, dip, bp, in_f->ilf_ino);
-			error = -EFSCORRUPTED;
-			goto out_release;
-		}
-	} else if (unlikely(S_ISDIR(ldip->di_mode))) {
-		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
-		    (ldip->di_format != XFS_DINODE_FMT_BTREE) &&
-		    (ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
-			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(4)",
-					     XFS_ERRLEVEL_LOW, mp, ldip,
-					     sizeof(*ldip));
-			xfs_alert(mp,
-		"%s: Bad dir inode log record, rec ptr "PTR_FMT", "
-		"ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
-				__func__, item, dip, bp, in_f->ilf_ino);
-			error = -EFSCORRUPTED;
-			goto out_release;
-		}
-	}
-	if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
-		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
-				     XFS_ERRLEVEL_LOW, mp, ldip,
-				     sizeof(*ldip));
-		xfs_alert(mp,
-	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
-	"dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
-			__func__, item, dip, bp, in_f->ilf_ino,
-			ldip->di_nextents + ldip->di_anextents,
-			ldip->di_nblocks);
-		error = -EFSCORRUPTED;
-		goto out_release;
-	}
-	if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
-		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
-				     XFS_ERRLEVEL_LOW, mp, ldip,
-				     sizeof(*ldip));
-		xfs_alert(mp,
-	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
-	"dino bp "PTR_FMT", ino %Ld, forkoff 0x%x", __func__,
-			item, dip, bp, in_f->ilf_ino, ldip->di_forkoff);
-		error = -EFSCORRUPTED;
-		goto out_release;
-	}
-	isize = xfs_log_dinode_size(mp);
-	if (unlikely(item->ri_buf[1].i_len > isize)) {
-		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
-				     XFS_ERRLEVEL_LOW, mp, ldip,
-				     sizeof(*ldip));
-		xfs_alert(mp,
-			"%s: Bad inode log record length %d, rec ptr "PTR_FMT,
-			__func__, item->ri_buf[1].i_len, item);
-		error = -EFSCORRUPTED;
-		goto out_release;
-	}
-
-	/* recover the log dinode inode into the on disk inode */
-	xfs_log_dinode_to_disk(ldip, dip);
-
-	fields = in_f->ilf_fields;
-	if (fields & XFS_ILOG_DEV)
-		xfs_dinode_put_rdev(dip, in_f->ilf_u.ilfu_rdev);
-
-	if (in_f->ilf_size == 2)
-		goto out_owner_change;
-	len = item->ri_buf[2].i_len;
-	src = item->ri_buf[2].i_addr;
-	ASSERT(in_f->ilf_size <= 4);
-	ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK));
-	ASSERT(!(fields & XFS_ILOG_DFORK) ||
-	       (len == in_f->ilf_dsize));
-
-	switch (fields & XFS_ILOG_DFORK) {
-	case XFS_ILOG_DDATA:
-	case XFS_ILOG_DEXT:
-		memcpy(XFS_DFORK_DPTR(dip), src, len);
-		break;
-
-	case XFS_ILOG_DBROOT:
-		xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src, len,
-				 (xfs_bmdr_block_t *)XFS_DFORK_DPTR(dip),
-				 XFS_DFORK_DSIZE(dip, mp));
-		break;
-
-	default:
-		/*
-		 * There are no data fork flags set.
-		 */
-		ASSERT((fields & XFS_ILOG_DFORK) == 0);
-		break;
-	}
-
-	/*
-	 * If we logged any attribute data, recover it.  There may or
-	 * may not have been any other non-core data logged in this
-	 * transaction.
-	 */
-	if (in_f->ilf_fields & XFS_ILOG_AFORK) {
-		if (in_f->ilf_fields & XFS_ILOG_DFORK) {
-			attr_index = 3;
-		} else {
-			attr_index = 2;
-		}
-		len = item->ri_buf[attr_index].i_len;
-		src = item->ri_buf[attr_index].i_addr;
-		ASSERT(len == in_f->ilf_asize);
-
-		switch (in_f->ilf_fields & XFS_ILOG_AFORK) {
-		case XFS_ILOG_ADATA:
-		case XFS_ILOG_AEXT:
-			dest = XFS_DFORK_APTR(dip);
-			ASSERT(len <= XFS_DFORK_ASIZE(dip, mp));
-			memcpy(dest, src, len);
-			break;
-
-		case XFS_ILOG_ABROOT:
-			dest = XFS_DFORK_APTR(dip);
-			xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src,
-					 len, (xfs_bmdr_block_t*)dest,
-					 XFS_DFORK_ASIZE(dip, mp));
-			break;
-
-		default:
-			xfs_warn(log->l_mp, "%s: Invalid flag", __func__);
-			ASSERT(0);
-			error = -EFSCORRUPTED;
-			goto out_release;
-		}
-	}
-
-out_owner_change:
-	/* Recover the swapext owner change unless inode has been deleted */
-	if ((in_f->ilf_fields & (XFS_ILOG_DOWNER|XFS_ILOG_AOWNER)) &&
-	    (dip->di_mode != 0))
-		error = xfs_recover_inode_owner_change(mp, dip, in_f,
-						       buffer_list);
-	/* re-generate the checksum. */
-	xfs_dinode_calc_crc(log->l_mp, dip);
-
-	ASSERT(bp->b_mount == mp);
-	bp->b_iodone = xlog_recover_iodone;
-	xfs_buf_delwri_queue(bp, buffer_list);
-
-out_release:
-	xfs_buf_relse(bp);
-error:
-	if (need_free)
-		kmem_free(in_f);
-	return error;
-}
-
 /*
  * Recover a dquot record
  */
@@ -3059,9 +2706,6 @@ xlog_recover_commit_pass2(
 				trans->r_lsn);
 
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_INODE:
-		return xlog_recover_inode_pass2(log, buffer_list, item,
-						 trans->r_lsn);
 	case XFS_LI_EFI:
 		return xlog_recover_efi_pass2(log, item, trans->r_lsn);
 	case XFS_LI_EFD:


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/19] xfs: refactor log recovery intent item dispatch for pass2 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (5 preceding siblings ...)
  2020-04-22  2:06 ` [PATCH 06/19] xfs: refactor log recovery inode " Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-25 18:24   ` Christoph Hellwig
  2020-04-22  2:06 ` [PATCH 08/19] xfs: refactor log recovery dquot " Darrick J. Wong
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the log intent item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |   44 +++++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 2e4f400d3f6e..2abcca26e4c7 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1770,8 +1770,13 @@ xlog_clear_stale_blocks(
 
 /* Log intent item dispatching. */
 
+STATIC int xlog_recover_intent_pass2(struct xlog *log,
+		struct list_head *buffer_list, struct xlog_recover_item *item,
+		xfs_lsn_t current_lsn);
+
 const struct xlog_recover_item_type xlog_intent_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
+	.commit_pass2_fn	= xlog_recover_intent_pass2,
 };
 
 /******************************************************************************
@@ -2693,35 +2698,48 @@ xlog_recover_commit_pass1(
 }
 
 STATIC int
-xlog_recover_commit_pass2(
+xlog_recover_intent_pass2(
 	struct xlog			*log,
-	struct xlog_recover		*trans,
 	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item)
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
 {
-	trace_xfs_log_recover_item_recover(log, trans, item, XLOG_RECOVER_PASS2);
-
-	if (item->ri_type && item->ri_type->commit_pass2_fn)
-		return item->ri_type->commit_pass2_fn(log, buffer_list, item,
-				trans->r_lsn);
-
 	switch (ITEM_TYPE(item)) {
 	case XFS_LI_EFI:
-		return xlog_recover_efi_pass2(log, item, trans->r_lsn);
+		return xlog_recover_efi_pass2(log, item, current_lsn);
 	case XFS_LI_EFD:
 		return xlog_recover_efd_pass2(log, item);
 	case XFS_LI_RUI:
-		return xlog_recover_rui_pass2(log, item, trans->r_lsn);
+		return xlog_recover_rui_pass2(log, item, current_lsn);
 	case XFS_LI_RUD:
 		return xlog_recover_rud_pass2(log, item);
 	case XFS_LI_CUI:
-		return xlog_recover_cui_pass2(log, item, trans->r_lsn);
+		return xlog_recover_cui_pass2(log, item, current_lsn);
 	case XFS_LI_CUD:
 		return xlog_recover_cud_pass2(log, item);
 	case XFS_LI_BUI:
-		return xlog_recover_bui_pass2(log, item, trans->r_lsn);
+		return xlog_recover_bui_pass2(log, item, current_lsn);
 	case XFS_LI_BUD:
 		return xlog_recover_bud_pass2(log, item);
+	}
+
+	return -EFSCORRUPTED;
+}
+
+STATIC int
+xlog_recover_commit_pass2(
+	struct xlog			*log,
+	struct xlog_recover		*trans,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item)
+{
+	trace_xfs_log_recover_item_recover(log, trans, item, XLOG_RECOVER_PASS2);
+
+	if (item->ri_type && item->ri_type->commit_pass2_fn)
+		return item->ri_type->commit_pass2_fn(log, buffer_list, item,
+				trans->r_lsn);
+
+	switch (ITEM_TYPE(item)) {
 	case XFS_LI_DQUOT:
 		return xlog_recover_dquot_pass2(log, buffer_list, item,
 						trans->r_lsn);


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/19] xfs: refactor log recovery dquot item dispatch for pass2 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (6 preceding siblings ...)
  2020-04-22  2:06 ` [PATCH 07/19] xfs: refactor log recovery intent " Darrick J. Wong
@ 2020-04-22  2:06 ` Darrick J. Wong
  2020-04-22  2:07 ` [PATCH 09/19] xfs: refactor log recovery icreate " Darrick J. Wong
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:06 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the log dquot item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_dquot_item.c  |  110 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_log_recover.c |  112 ----------------------------------------------
 2 files changed, 110 insertions(+), 112 deletions(-)


diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index e3a54b1fb20a..986112288630 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -424,9 +424,119 @@ xlog_recover_dquot_ra_pass2(
 			  &xfs_dquot_buf_ra_ops);
 }
 
+/*
+ * Recover a dquot record
+ */
+STATIC int
+xlog_recover_dquot_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
+{
+	xfs_mount_t		*mp = log->l_mp;
+	xfs_buf_t		*bp;
+	struct xfs_disk_dquot	*ddq, *recddq;
+	xfs_failaddr_t		fa;
+	int			error;
+	xfs_dq_logformat_t	*dq_f;
+	uint			type;
+
+
+	/*
+	 * Filesystems are required to send in quota flags at mount time.
+	 */
+	if (mp->m_qflags == 0)
+		return 0;
+
+	recddq = item->ri_buf[1].i_addr;
+	if (recddq == NULL) {
+		xfs_alert(log->l_mp, "NULL dquot in %s.", __func__);
+		return -EFSCORRUPTED;
+	}
+	if (item->ri_buf[1].i_len < sizeof(struct xfs_disk_dquot)) {
+		xfs_alert(log->l_mp, "dquot too small (%d) in %s.",
+			item->ri_buf[1].i_len, __func__);
+		return -EFSCORRUPTED;
+	}
+
+	/*
+	 * This type of quotas was turned off, so ignore this record.
+	 */
+	type = recddq->d_flags & (XFS_DQ_USER | XFS_DQ_PROJ | XFS_DQ_GROUP);
+	ASSERT(type);
+	if (log->l_quotaoffs_flag & type)
+		return 0;
+
+	/*
+	 * At this point we know that quota was _not_ turned off.
+	 * Since the mount flags are not indicating to us otherwise, this
+	 * must mean that quota is on, and the dquot needs to be replayed.
+	 * Remember that we may not have fully recovered the superblock yet,
+	 * so we can't do the usual trick of looking at the SB quota bits.
+	 *
+	 * The other possibility, of course, is that the quota subsystem was
+	 * removed since the last mount - ENOSYS.
+	 */
+	dq_f = item->ri_buf[0].i_addr;
+	ASSERT(dq_f);
+	fa = xfs_dquot_verify(mp, recddq, dq_f->qlf_id, 0);
+	if (fa) {
+		xfs_alert(mp, "corrupt dquot ID 0x%x in log at %pS",
+				dq_f->qlf_id, fa);
+		return -EFSCORRUPTED;
+	}
+	ASSERT(dq_f->qlf_len == 1);
+
+	/*
+	 * At this point we are assuming that the dquots have been allocated
+	 * and hence the buffer has valid dquots stamped in it. It should,
+	 * therefore, pass verifier validation. If the dquot is bad, then the
+	 * we'll return an error here, so we don't need to specifically check
+	 * the dquot in the buffer after the verifier has run.
+	 */
+	error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dq_f->qlf_blkno,
+				   XFS_FSB_TO_BB(mp, dq_f->qlf_len), 0, &bp,
+				   &xfs_dquot_buf_ops);
+	if (error)
+		return error;
+
+	ASSERT(bp);
+	ddq = xfs_buf_offset(bp, dq_f->qlf_boffset);
+
+	/*
+	 * If the dquot has an LSN in it, recover the dquot only if it's less
+	 * than the lsn of the transaction we are replaying.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		struct xfs_dqblk *dqb = (struct xfs_dqblk *)ddq;
+		xfs_lsn_t	lsn = be64_to_cpu(dqb->dd_lsn);
+
+		if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
+			goto out_release;
+		}
+	}
+
+	memcpy(ddq, recddq, item->ri_buf[1].i_len);
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		xfs_update_cksum((char *)ddq, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+	}
+
+	ASSERT(dq_f->qlf_size == 2);
+	ASSERT(bp->b_mount == mp);
+	bp->b_iodone = xlog_recover_iodone;
+	xfs_buf_delwri_queue(bp, buffer_list);
+
+out_release:
+	xfs_buf_relse(bp);
+	return 0;
+}
+
 const struct xlog_recover_item_type xlog_dquot_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
 	.ra_pass2_fn		= xlog_recover_dquot_ra_pass2,
+	.commit_pass2_fn	= xlog_recover_dquot_pass2,
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 2abcca26e4c7..5b3c5df22e88 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2012,115 +2012,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * Recover a dquot record
- */
-STATIC int
-xlog_recover_dquot_pass2(
-	struct xlog			*log,
-	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			current_lsn)
-{
-	xfs_mount_t		*mp = log->l_mp;
-	xfs_buf_t		*bp;
-	struct xfs_disk_dquot	*ddq, *recddq;
-	xfs_failaddr_t		fa;
-	int			error;
-	xfs_dq_logformat_t	*dq_f;
-	uint			type;
-
-
-	/*
-	 * Filesystems are required to send in quota flags at mount time.
-	 */
-	if (mp->m_qflags == 0)
-		return 0;
-
-	recddq = item->ri_buf[1].i_addr;
-	if (recddq == NULL) {
-		xfs_alert(log->l_mp, "NULL dquot in %s.", __func__);
-		return -EFSCORRUPTED;
-	}
-	if (item->ri_buf[1].i_len < sizeof(struct xfs_disk_dquot)) {
-		xfs_alert(log->l_mp, "dquot too small (%d) in %s.",
-			item->ri_buf[1].i_len, __func__);
-		return -EFSCORRUPTED;
-	}
-
-	/*
-	 * This type of quotas was turned off, so ignore this record.
-	 */
-	type = recddq->d_flags & (XFS_DQ_USER | XFS_DQ_PROJ | XFS_DQ_GROUP);
-	ASSERT(type);
-	if (log->l_quotaoffs_flag & type)
-		return 0;
-
-	/*
-	 * At this point we know that quota was _not_ turned off.
-	 * Since the mount flags are not indicating to us otherwise, this
-	 * must mean that quota is on, and the dquot needs to be replayed.
-	 * Remember that we may not have fully recovered the superblock yet,
-	 * so we can't do the usual trick of looking at the SB quota bits.
-	 *
-	 * The other possibility, of course, is that the quota subsystem was
-	 * removed since the last mount - ENOSYS.
-	 */
-	dq_f = item->ri_buf[0].i_addr;
-	ASSERT(dq_f);
-	fa = xfs_dquot_verify(mp, recddq, dq_f->qlf_id, 0);
-	if (fa) {
-		xfs_alert(mp, "corrupt dquot ID 0x%x in log at %pS",
-				dq_f->qlf_id, fa);
-		return -EFSCORRUPTED;
-	}
-	ASSERT(dq_f->qlf_len == 1);
-
-	/*
-	 * At this point we are assuming that the dquots have been allocated
-	 * and hence the buffer has valid dquots stamped in it. It should,
-	 * therefore, pass verifier validation. If the dquot is bad, then the
-	 * we'll return an error here, so we don't need to specifically check
-	 * the dquot in the buffer after the verifier has run.
-	 */
-	error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dq_f->qlf_blkno,
-				   XFS_FSB_TO_BB(mp, dq_f->qlf_len), 0, &bp,
-				   &xfs_dquot_buf_ops);
-	if (error)
-		return error;
-
-	ASSERT(bp);
-	ddq = xfs_buf_offset(bp, dq_f->qlf_boffset);
-
-	/*
-	 * If the dquot has an LSN in it, recover the dquot only if it's less
-	 * than the lsn of the transaction we are replaying.
-	 */
-	if (xfs_sb_version_hascrc(&mp->m_sb)) {
-		struct xfs_dqblk *dqb = (struct xfs_dqblk *)ddq;
-		xfs_lsn_t	lsn = be64_to_cpu(dqb->dd_lsn);
-
-		if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
-			goto out_release;
-		}
-	}
-
-	memcpy(ddq, recddq, item->ri_buf[1].i_len);
-	if (xfs_sb_version_hascrc(&mp->m_sb)) {
-		xfs_update_cksum((char *)ddq, sizeof(struct xfs_dqblk),
-				 XFS_DQUOT_CRC_OFF);
-	}
-
-	ASSERT(dq_f->qlf_size == 2);
-	ASSERT(bp->b_mount == mp);
-	bp->b_iodone = xlog_recover_iodone;
-	xfs_buf_delwri_queue(bp, buffer_list);
-
-out_release:
-	xfs_buf_relse(bp);
-	return 0;
-}
-
 /*
  * This routine is called to create an in-core extent free intent
  * item from the efi format structure which was logged on disk.
@@ -2740,9 +2631,6 @@ xlog_recover_commit_pass2(
 				trans->r_lsn);
 
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_DQUOT:
-		return xlog_recover_dquot_pass2(log, buffer_list, item,
-						trans->r_lsn);
 	case XFS_LI_ICREATE:
 		return xlog_recover_do_icreate_pass2(log, buffer_list, item);
 	case XFS_LI_QUOTAOFF:


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/19] xfs: refactor log recovery icreate item dispatch for pass2 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (7 preceding siblings ...)
  2020-04-22  2:06 ` [PATCH 08/19] xfs: refactor log recovery dquot " Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-22  2:07 ` [PATCH 10/19] xfs: refactor log recovery quotaoff " Darrick J. Wong
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the log icreate item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_icreate_item.c |  132 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_log_recover.c  |  126 -------------------------------------------
 2 files changed, 132 insertions(+), 126 deletions(-)


diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
index 0a1ed4dc1c3d..15415421ea81 100644
--- a/fs/xfs/xfs_icreate_item.c
+++ b/fs/xfs/xfs_icreate_item.c
@@ -6,13 +6,19 @@
 #include "xfs.h"
 #include "xfs_fs.h"
 #include "xfs_shared.h"
+#include "xfs_format.h"
 #include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_trans_priv.h"
 #include "xfs_icreate_item.h"
 #include "xfs_log.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
+#include "xfs_ialloc.h"
+#include "xfs_trace.h"
 
 kmem_zone_t	*xfs_icreate_zone;		/* inode create item zone */
 
@@ -110,6 +116,132 @@ xfs_icreate_log(
 	set_bit(XFS_LI_DIRTY, &icp->ic_item.li_flags);
 }
 
+/*
+ * This routine is called when an inode create format structure is found in a
+ * committed transaction in the log.  It's purpose is to initialise the inodes
+ * being allocated on disk. This requires us to get inode cluster buffers that
+ * match the range to be initialised, stamped with inode templates and written
+ * by delayed write so that subsequent modifications will hit the cached buffer
+ * and only need writing out at the end of recovery.
+ */
+STATIC int
+xlog_recover_do_icreate_pass2(
+	struct xlog		*log,
+	struct list_head	*buffer_list,
+	struct xlog_recover_item *item,
+	xfs_lsn_t		current_lsn)
+{
+	struct xfs_mount	*mp = log->l_mp;
+	struct xfs_icreate_log	*icl;
+	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	unsigned int		count;
+	unsigned int		isize;
+	xfs_agblock_t		length;
+	int			bb_per_cluster;
+	int			cancel_count;
+	int			nbufs;
+	int			i;
+
+	icl = (struct xfs_icreate_log *)item->ri_buf[0].i_addr;
+	if (icl->icl_type != XFS_LI_ICREATE) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad type");
+		return -EINVAL;
+	}
+
+	if (icl->icl_size != 1) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad icl size");
+		return -EINVAL;
+	}
+
+	agno = be32_to_cpu(icl->icl_ag);
+	if (agno >= mp->m_sb.sb_agcount) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agno");
+		return -EINVAL;
+	}
+	agbno = be32_to_cpu(icl->icl_agbno);
+	if (!agbno || agbno == NULLAGBLOCK || agbno >= mp->m_sb.sb_agblocks) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agbno");
+		return -EINVAL;
+	}
+	isize = be32_to_cpu(icl->icl_isize);
+	if (isize != mp->m_sb.sb_inodesize) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad isize");
+		return -EINVAL;
+	}
+	count = be32_to_cpu(icl->icl_count);
+	if (!count) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count");
+		return -EINVAL;
+	}
+	length = be32_to_cpu(icl->icl_length);
+	if (!length || length >= mp->m_sb.sb_agblocks) {
+		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad length");
+		return -EINVAL;
+	}
+
+	/*
+	 * The inode chunk is either full or sparse and we only support
+	 * m_ino_geo.ialloc_min_blks sized sparse allocations at this time.
+	 */
+	if (length != igeo->ialloc_blks &&
+	    length != igeo->ialloc_min_blks) {
+		xfs_warn(log->l_mp,
+			 "%s: unsupported chunk length", __FUNCTION__);
+		return -EINVAL;
+	}
+
+	/* verify inode count is consistent with extent length */
+	if ((count >> mp->m_sb.sb_inopblog) != length) {
+		xfs_warn(log->l_mp,
+			 "%s: inconsistent inode count and chunk length",
+			 __FUNCTION__);
+		return -EINVAL;
+	}
+
+	/*
+	 * The icreate transaction can cover multiple cluster buffers and these
+	 * buffers could have been freed and reused. Check the individual
+	 * buffers for cancellation so we don't overwrite anything written after
+	 * a cancellation.
+	 */
+	bb_per_cluster = XFS_FSB_TO_BB(mp, igeo->blocks_per_cluster);
+	nbufs = length / igeo->blocks_per_cluster;
+	for (i = 0, cancel_count = 0; i < nbufs; i++) {
+		xfs_daddr_t	daddr;
+
+		daddr = XFS_AGB_TO_DADDR(mp, agno,
+				agbno + i * igeo->blocks_per_cluster);
+		if (xlog_check_buffer_cancelled(log, daddr, bb_per_cluster, 0))
+			cancel_count++;
+	}
+
+	/*
+	 * We currently only use icreate for a single allocation at a time. This
+	 * means we should expect either all or none of the buffers to be
+	 * cancelled. Be conservative and skip replay if at least one buffer is
+	 * cancelled, but warn the user that something is awry if the buffers
+	 * are not consistent.
+	 *
+	 * XXX: This must be refined to only skip cancelled clusters once we use
+	 * icreate for multiple chunk allocations.
+	 */
+	ASSERT(!cancel_count || cancel_count == nbufs);
+	if (cancel_count) {
+		if (cancel_count != nbufs)
+			xfs_warn(mp,
+	"WARNING: partial inode chunk cancellation, skipped icreate.");
+		trace_xfs_log_recover_icreate_cancel(log, icl);
+		return 0;
+	}
+
+	trace_xfs_log_recover_icreate_recover(log, icl);
+	return xfs_ialloc_inode_init(mp, NULL, buffer_list, count, agno, agbno,
+				     length, be32_to_cpu(icl->icl_gen));
+}
+
 const struct xlog_recover_item_type xlog_icreate_item_type = {
 	.reorder		= XLOG_REORDER_BUFFER_LIST,
+	.commit_pass2_fn	= xlog_recover_do_icreate_pass2,
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 5b3c5df22e88..18b797ca4a6c 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2445,130 +2445,6 @@ xlog_recover_bud_pass2(
 	return 0;
 }
 
-/*
- * This routine is called when an inode create format structure is found in a
- * committed transaction in the log.  It's purpose is to initialise the inodes
- * being allocated on disk. This requires us to get inode cluster buffers that
- * match the range to be initialised, stamped with inode templates and written
- * by delayed write so that subsequent modifications will hit the cached buffer
- * and only need writing out at the end of recovery.
- */
-STATIC int
-xlog_recover_do_icreate_pass2(
-	struct xlog		*log,
-	struct list_head	*buffer_list,
-	xlog_recover_item_t	*item)
-{
-	struct xfs_mount	*mp = log->l_mp;
-	struct xfs_icreate_log	*icl;
-	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
-	xfs_agnumber_t		agno;
-	xfs_agblock_t		agbno;
-	unsigned int		count;
-	unsigned int		isize;
-	xfs_agblock_t		length;
-	int			bb_per_cluster;
-	int			cancel_count;
-	int			nbufs;
-	int			i;
-
-	icl = (struct xfs_icreate_log *)item->ri_buf[0].i_addr;
-	if (icl->icl_type != XFS_LI_ICREATE) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad type");
-		return -EINVAL;
-	}
-
-	if (icl->icl_size != 1) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad icl size");
-		return -EINVAL;
-	}
-
-	agno = be32_to_cpu(icl->icl_ag);
-	if (agno >= mp->m_sb.sb_agcount) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agno");
-		return -EINVAL;
-	}
-	agbno = be32_to_cpu(icl->icl_agbno);
-	if (!agbno || agbno == NULLAGBLOCK || agbno >= mp->m_sb.sb_agblocks) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agbno");
-		return -EINVAL;
-	}
-	isize = be32_to_cpu(icl->icl_isize);
-	if (isize != mp->m_sb.sb_inodesize) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad isize");
-		return -EINVAL;
-	}
-	count = be32_to_cpu(icl->icl_count);
-	if (!count) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count");
-		return -EINVAL;
-	}
-	length = be32_to_cpu(icl->icl_length);
-	if (!length || length >= mp->m_sb.sb_agblocks) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad length");
-		return -EINVAL;
-	}
-
-	/*
-	 * The inode chunk is either full or sparse and we only support
-	 * m_ino_geo.ialloc_min_blks sized sparse allocations at this time.
-	 */
-	if (length != igeo->ialloc_blks &&
-	    length != igeo->ialloc_min_blks) {
-		xfs_warn(log->l_mp,
-			 "%s: unsupported chunk length", __FUNCTION__);
-		return -EINVAL;
-	}
-
-	/* verify inode count is consistent with extent length */
-	if ((count >> mp->m_sb.sb_inopblog) != length) {
-		xfs_warn(log->l_mp,
-			 "%s: inconsistent inode count and chunk length",
-			 __FUNCTION__);
-		return -EINVAL;
-	}
-
-	/*
-	 * The icreate transaction can cover multiple cluster buffers and these
-	 * buffers could have been freed and reused. Check the individual
-	 * buffers for cancellation so we don't overwrite anything written after
-	 * a cancellation.
-	 */
-	bb_per_cluster = XFS_FSB_TO_BB(mp, igeo->blocks_per_cluster);
-	nbufs = length / igeo->blocks_per_cluster;
-	for (i = 0, cancel_count = 0; i < nbufs; i++) {
-		xfs_daddr_t	daddr;
-
-		daddr = XFS_AGB_TO_DADDR(mp, agno,
-				agbno + i * igeo->blocks_per_cluster);
-		if (xlog_check_buffer_cancelled(log, daddr, bb_per_cluster, 0))
-			cancel_count++;
-	}
-
-	/*
-	 * We currently only use icreate for a single allocation at a time. This
-	 * means we should expect either all or none of the buffers to be
-	 * cancelled. Be conservative and skip replay if at least one buffer is
-	 * cancelled, but warn the user that something is awry if the buffers
-	 * are not consistent.
-	 *
-	 * XXX: This must be refined to only skip cancelled clusters once we use
-	 * icreate for multiple chunk allocations.
-	 */
-	ASSERT(!cancel_count || cancel_count == nbufs);
-	if (cancel_count) {
-		if (cancel_count != nbufs)
-			xfs_warn(mp,
-	"WARNING: partial inode chunk cancellation, skipped icreate.");
-		trace_xfs_log_recover_icreate_cancel(log, icl);
-		return 0;
-	}
-
-	trace_xfs_log_recover_icreate_recover(log, icl);
-	return xfs_ialloc_inode_init(mp, NULL, buffer_list, count, agno, agbno,
-				     length, be32_to_cpu(icl->icl_gen));
-}
-
 STATIC int
 xlog_recover_commit_pass1(
 	struct xlog			*log,
@@ -2631,8 +2507,6 @@ xlog_recover_commit_pass2(
 				trans->r_lsn);
 
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_ICREATE:
-		return xlog_recover_do_icreate_pass2(log, buffer_list, item);
 	case XFS_LI_QUOTAOFF:
 		/* nothing to do in pass2 */
 		return 0;


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/19] xfs: refactor log recovery quotaoff item dispatch for pass2 commit functions
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (8 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 09/19] xfs: refactor log recovery icreate " Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-22  2:07 ` [PATCH 11/19] xfs: refactor EFI log item recovery dispatch Darrick J. Wong
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the log quotaoff item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 18b797ca4a6c..5e7e5c66327e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2502,20 +2502,16 @@ xlog_recover_commit_pass2(
 {
 	trace_xfs_log_recover_item_recover(log, trans, item, XLOG_RECOVER_PASS2);
 
-	if (item->ri_type && item->ri_type->commit_pass2_fn)
-		return item->ri_type->commit_pass2_fn(log, buffer_list, item,
-				trans->r_lsn);
-
-	switch (ITEM_TYPE(item)) {
-	case XFS_LI_QUOTAOFF:
-		/* nothing to do in pass2 */
-		return 0;
-	default:
+	if (!item->ri_type) {
 		xfs_warn(log->l_mp, "%s: invalid item type (%d)",
 			__func__, ITEM_TYPE(item));
 		ASSERT(0);
 		return -EFSCORRUPTED;
 	}
+	if (!item->ri_type->commit_pass2_fn)
+		return 0;
+	return item->ri_type->commit_pass2_fn(log, buffer_list, item,
+			trans->r_lsn);
 }
 
 STATIC int


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (9 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 10/19] xfs: refactor log recovery quotaoff " Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-25 18:28   ` Christoph Hellwig
  2020-04-22  2:07 ` [PATCH 12/19] xfs: refactor RUI " Darrick J. Wong
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the extent free intent and intent-done log recovery code into the
per-item source code files and use dispatch functions to call them.  We
do these one at a time because there's a lot of code to move.  No
functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |   42 +++++++++
 fs/xfs/xfs_extfree_item.c       |  153 +++++++++++++++++++++++++++++++-
 fs/xfs/xfs_extfree_item.h       |    9 --
 fs/xfs/xfs_log_recover.c        |  184 +++++++++------------------------------
 4 files changed, 232 insertions(+), 156 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 1eca24cc83a9..188f27ccf2ec 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -124,4 +124,46 @@ void xlog_recover_iodone(struct xfs_buf *bp);
 int xlog_check_buffer_cancelled(struct xlog *log, xfs_daddr_t blkno, uint len,
 		unsigned short flags);
 
+/* Log intent item types */
+
+typedef int (*xlog_recover_intent_fn)(struct xlog *xlog,
+		struct xlog_recover_item *item, xfs_lsn_t lsn);
+typedef int (*xlog_recover_done_fn)(struct xlog *xlog,
+		struct xlog_recover_item *item);
+typedef int (*xlog_recover_process_intent_fn)(struct xlog *log,
+		struct xfs_trans *tp, struct xfs_log_item *lip);
+typedef void (*xlog_recover_cancel_intent_fn)(struct xlog *log,
+		struct xfs_log_item *lip);
+
+struct xlog_recover_intent_type {
+	/*
+	 * This function should parse the recovered log item (which will be an
+	 * intent log item) to construct an in-core log intent item and insert
+	 * it into the AIL.  The in-core log intent item should have 1 refcount
+	 * so that ->recover_done or ->cancel_intent can drop it.
+	 */
+	xlog_recover_intent_fn		recover_intent;
+
+	/*
+	 * This function should do the actual work of replaying an unfinished
+	 * log intent item.
+	 */
+	xlog_recover_process_intent_fn	process_intent;
+
+	/*
+	 * This function is called to release an incore log intent item if
+	 * recovery fails.
+	 */
+	xlog_recover_cancel_intent_fn	cancel_intent;
+
+	/*
+	 * This function should parse the recovered log item (which will be an
+	 * intent done log item) to find the id of the corresponding intent log
+	 * item.  Find the incore item in the AIL and release it.
+	 */
+	xlog_recover_done_fn		recover_done;
+};
+
+extern const struct xlog_recover_intent_type xlog_recover_extfree_type;
+
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 6ea847f6e298..cd2beb581b40 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -22,6 +22,8 @@
 #include "xfs_bmap.h"
 #include "xfs_trace.h"
 #include "xfs_error.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 kmem_zone_t	*xfs_efi_zone;
 kmem_zone_t	*xfs_efd_zone;
@@ -31,7 +33,7 @@ static inline struct xfs_efi_log_item *EFI_ITEM(struct xfs_log_item *lip)
 	return container_of(lip, struct xfs_efi_log_item, efi_item);
 }
 
-void
+STATIC void
 xfs_efi_item_free(
 	struct xfs_efi_log_item	*efip)
 {
@@ -49,7 +51,7 @@ xfs_efi_item_free(
  * committed vs unpin operations in bulk insert operations. Hence the reference
  * count to ensure only the last caller frees the EFI.
  */
-void
+STATIC void
 xfs_efi_release(
 	struct xfs_efi_log_item	*efip)
 {
@@ -150,7 +152,7 @@ static const struct xfs_item_ops xfs_efi_item_ops = {
 /*
  * Allocate and initialize an efi item with the given number of extents.
  */
-struct xfs_efi_log_item *
+STATIC struct xfs_efi_log_item *
 xfs_efi_init(
 	struct xfs_mount	*mp,
 	uint			nextents)
@@ -184,7 +186,7 @@ xfs_efi_init(
  * one of which will be the native format for this kernel.
  * It will handle the conversion of formats if necessary.
  */
-int
+STATIC int
 xfs_efi_copy_format(xfs_log_iovec_t *buf, xfs_efi_log_format_t *dst_efi_fmt)
 {
 	xfs_efi_log_format_t *src_efi_fmt = buf->i_addr;
@@ -592,7 +594,7 @@ const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
  * Process an extent free intent item that was recovered from
  * the log.  We need to free the extents that it describes.
  */
-int
+STATIC int
 xfs_efi_recover(
 	struct xfs_mount	*mp,
 	struct xfs_efi_log_item	*efip)
@@ -652,3 +654,144 @@ xfs_efi_recover(
 	xfs_trans_cancel(tp);
 	return error;
 }
+
+/*
+ * This routine is called to create an in-core extent free intent
+ * item from the efi format structure which was logged on disk.
+ * It allocates an in-core efi, copies the extents from the format
+ * structure into it, and adds the efi to the AIL with the given
+ * LSN.
+ */
+STATIC int
+xlog_recover_efi(
+	struct xlog			*log,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	struct xfs_mount		*mp = log->l_mp;
+	struct xfs_efi_log_item		*efip;
+	struct xfs_efi_log_format	*efi_formatp;
+	int				error;
+
+	efi_formatp = item->ri_buf[0].i_addr;
+
+	efip = xfs_efi_init(mp, efi_formatp->efi_nextents);
+	error = xfs_efi_copy_format(&item->ri_buf[0], &efip->efi_format);
+	if (error) {
+		xfs_efi_item_free(efip);
+		return error;
+	}
+	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
+
+	spin_lock(&log->l_ailp->ail_lock);
+	/*
+	 * The EFI has two references. One for the EFD and one for EFI to ensure
+	 * it makes it into the AIL. Insert the EFI into the AIL directly and
+	 * drop the EFI reference. Note that xfs_trans_ail_update() drops the
+	 * AIL lock.
+	 */
+	xfs_trans_ail_update(log->l_ailp, &efip->efi_item, lsn);
+	xfs_efi_release(efip);
+	return 0;
+}
+
+
+/*
+ * This routine is called when an EFD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding EFI if it
+ * was still in the log. To do this it searches the AIL for the EFI with an id
+ * equal to that in the EFD format structure. If we find it we drop the EFD
+ * reference, which removes the EFI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_efd(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	struct xfs_ail_cursor		cur;
+	struct xfs_efd_log_format	*efd_formatp;
+	struct xfs_efi_log_item		*efip = NULL;
+	struct xfs_log_item		*lip;
+	struct xfs_ail			*ailp = log->l_ailp;
+	uint64_t			efi_id;
+
+	efd_formatp = item->ri_buf[0].i_addr;
+	ASSERT((item->ri_buf[0].i_len == (sizeof(xfs_efd_log_format_32_t) +
+		((efd_formatp->efd_nextents - 1) * sizeof(xfs_extent_32_t)))) ||
+	       (item->ri_buf[0].i_len == (sizeof(xfs_efd_log_format_64_t) +
+		((efd_formatp->efd_nextents - 1) * sizeof(xfs_extent_64_t)))));
+	efi_id = efd_formatp->efd_efi_id;
+
+	/*
+	 * Search for the EFI with the id in the EFD format structure in the
+	 * AIL.
+	 */
+	spin_lock(&ailp->ail_lock);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	while (lip != NULL) {
+		if (lip->li_type == XFS_LI_EFI) {
+			efip = (xfs_efi_log_item_t *)lip;
+			if (efip->efi_format.efi_id == efi_id) {
+				/*
+				 * Drop the EFD reference to the EFI. This
+				 * removes the EFI from the AIL and frees it.
+				 */
+				spin_unlock(&ailp->ail_lock);
+				xfs_efi_release(efip);
+				spin_lock(&ailp->ail_lock);
+				break;
+			}
+		}
+		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+	}
+
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+
+	return 0;
+}
+
+/* Recover the EFI if necessary. */
+STATIC int
+xlog_recover_process_efi(
+	struct xlog			*log,
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
+	int				error;
+
+	/*
+	 * Skip EFIs that we've already processed.
+	 */
+	if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
+		return 0;
+
+	spin_unlock(&ailp->ail_lock);
+	error = xfs_efi_recover(tp->t_mountp, efip);
+	spin_lock(&ailp->ail_lock);
+
+	return error;
+}
+
+/* Release the EFI since we're cancelling everything. */
+STATIC void
+xlog_recover_cancel_efi(
+	struct xlog			*log,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
+
+	spin_unlock(&ailp->ail_lock);
+	xfs_efi_release(efip);
+	spin_lock(&ailp->ail_lock);
+}
+
+const struct xlog_recover_intent_type xlog_recover_extfree_type = {
+	.recover_intent		= xlog_recover_efi,
+	.recover_done		= xlog_recover_efd,
+	.process_intent		= xlog_recover_process_efi,
+	.cancel_intent		= xlog_recover_cancel_efi,
+};
diff --git a/fs/xfs/xfs_extfree_item.h b/fs/xfs/xfs_extfree_item.h
index 16aaab06d4ec..23e3758b5dbb 100644
--- a/fs/xfs/xfs_extfree_item.h
+++ b/fs/xfs/xfs_extfree_item.h
@@ -78,13 +78,4 @@ typedef struct xfs_efd_log_item {
 extern struct kmem_zone	*xfs_efi_zone;
 extern struct kmem_zone	*xfs_efd_zone;
 
-xfs_efi_log_item_t	*xfs_efi_init(struct xfs_mount *, uint);
-int			xfs_efi_copy_format(xfs_log_iovec_t *buf,
-					    xfs_efi_log_format_t *dst_efi_fmt);
-void			xfs_efi_item_free(xfs_efi_log_item_t *);
-void			xfs_efi_release(struct xfs_efi_log_item *);
-
-int			xfs_efi_recover(struct xfs_mount *mp,
-					struct xfs_efi_log_item *efip);
-
 #endif	/* __XFS_EXTFREE_ITEM_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 5e7e5c66327e..4d5eb81dadac 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2012,102 +2012,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * This routine is called to create an in-core extent free intent
- * item from the efi format structure which was logged on disk.
- * It allocates an in-core efi, copies the extents from the format
- * structure into it, and adds the efi to the AIL with the given
- * LSN.
- */
-STATIC int
-xlog_recover_efi_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			lsn)
-{
-	int				error;
-	struct xfs_mount		*mp = log->l_mp;
-	struct xfs_efi_log_item		*efip;
-	struct xfs_efi_log_format	*efi_formatp;
-
-	efi_formatp = item->ri_buf[0].i_addr;
-
-	efip = xfs_efi_init(mp, efi_formatp->efi_nextents);
-	error = xfs_efi_copy_format(&item->ri_buf[0], &efip->efi_format);
-	if (error) {
-		xfs_efi_item_free(efip);
-		return error;
-	}
-	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The EFI has two references. One for the EFD and one for EFI to ensure
-	 * it makes it into the AIL. Insert the EFI into the AIL directly and
-	 * drop the EFI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &efip->efi_item, lsn);
-	xfs_efi_release(efip);
-	return 0;
-}
-
-
-/*
- * This routine is called when an EFD format structure is found in a committed
- * transaction in the log. Its purpose is to cancel the corresponding EFI if it
- * was still in the log. To do this it searches the AIL for the EFI with an id
- * equal to that in the EFD format structure. If we find it we drop the EFD
- * reference, which removes the EFI from the AIL and frees it.
- */
-STATIC int
-xlog_recover_efd_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	xfs_efd_log_format_t	*efd_formatp;
-	xfs_efi_log_item_t	*efip = NULL;
-	struct xfs_log_item	*lip;
-	uint64_t		efi_id;
-	struct xfs_ail_cursor	cur;
-	struct xfs_ail		*ailp = log->l_ailp;
-
-	efd_formatp = item->ri_buf[0].i_addr;
-	ASSERT((item->ri_buf[0].i_len == (sizeof(xfs_efd_log_format_32_t) +
-		((efd_formatp->efd_nextents - 1) * sizeof(xfs_extent_32_t)))) ||
-	       (item->ri_buf[0].i_len == (sizeof(xfs_efd_log_format_64_t) +
-		((efd_formatp->efd_nextents - 1) * sizeof(xfs_extent_64_t)))));
-	efi_id = efd_formatp->efd_efi_id;
-
-	/*
-	 * Search for the EFI with the id in the EFD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_EFI) {
-			efip = (xfs_efi_log_item_t *)lip;
-			if (efip->efi_format.efi_id == efi_id) {
-				/*
-				 * Drop the EFD reference to the EFI. This
-				 * removes the EFI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_efi_release(efip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-
-	return 0;
-}
-
 /*
  * This routine is called to create an in-core extent rmap update
  * item from the rui format structure which was logged on disk.
@@ -2464,6 +2368,31 @@ xlog_recover_commit_pass1(
 	return item->ri_type->commit_pass1_fn(log, item);
 }
 
+static inline const struct xlog_recover_intent_type *
+xlog_intent_for_type(
+	unsigned short			type)
+{
+	switch (type) {
+	case XFS_LI_EFD:
+	case XFS_LI_EFI:
+		return &xlog_recover_extfree_type;
+	default:
+		return NULL;
+	}
+}
+
+static inline bool
+xlog_is_intent_done_item(
+	struct xlog_recover_item	*item)
+{
+	switch (ITEM_TYPE(item)) {
+	case XFS_LI_EFD:
+		return true;
+	default:
+		return false;
+	}
+}
+
 STATIC int
 xlog_recover_intent_pass2(
 	struct xlog			*log,
@@ -2471,11 +2400,16 @@ xlog_recover_intent_pass2(
 	struct xlog_recover_item	*item,
 	xfs_lsn_t			current_lsn)
 {
+	const struct xlog_recover_intent_type *type;
+
+	type = xlog_intent_for_type(ITEM_TYPE(item));
+	if (type) {
+		if (xlog_is_intent_done_item(item))
+			return type->recover_done(log, item);
+		return type->recover_intent(log, item, current_lsn);
+	}
+
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_EFI:
-		return xlog_recover_efi_pass2(log, item, current_lsn);
-	case XFS_LI_EFD:
-		return xlog_recover_efd_pass2(log, item);
 	case XFS_LI_RUI:
 		return xlog_recover_rui_pass2(log, item, current_lsn);
 	case XFS_LI_RUD:
@@ -3017,46 +2951,6 @@ xlog_recover_process_data(
 	return 0;
 }
 
-/* Recover the EFI if necessary. */
-STATIC int
-xlog_recover_process_efi(
-	struct xfs_mount		*mp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_efi_log_item		*efip;
-	int				error;
-
-	/*
-	 * Skip EFIs that we've already processed.
-	 */
-	efip = container_of(lip, struct xfs_efi_log_item, efi_item);
-	if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
-		return 0;
-
-	spin_unlock(&ailp->ail_lock);
-	error = xfs_efi_recover(mp, efip);
-	spin_lock(&ailp->ail_lock);
-
-	return error;
-}
-
-/* Release the EFI since we're cancelling everything. */
-STATIC void
-xlog_recover_cancel_efi(
-	struct xfs_mount		*mp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_efi_log_item		*efip;
-
-	efip = container_of(lip, struct xfs_efi_log_item, efi_item);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_efi_release(efip);
-	spin_lock(&ailp->ail_lock);
-}
-
 /* Recover the RUI if necessary. */
 STATIC int
 xlog_recover_process_rui(
@@ -3274,6 +3168,8 @@ xlog_recover_process_intents(
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
 #endif
 	while (lip != NULL) {
+		const struct xlog_recover_intent_type	*type;
+
 		/*
 		 * We're done when we see something other than an intent.
 		 * There should be no intents left in the AIL now.
@@ -3301,7 +3197,8 @@ xlog_recover_process_intents(
 		 */
 		switch (lip->li_type) {
 		case XFS_LI_EFI:
-			error = xlog_recover_process_efi(log->l_mp, ailp, lip);
+			type = xlog_intent_for_type(lip->li_type);
+			error = type->process_intent(log, parent_tp, lip);
 			break;
 		case XFS_LI_RUI:
 			error = xlog_recover_process_rui(log->l_mp, ailp, lip);
@@ -3343,6 +3240,8 @@ xlog_recover_cancel_intents(
 	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
+		const struct xlog_recover_intent_type	*type;
+
 		/*
 		 * We're done when we see something other than an intent.
 		 * There should be no intents left in the AIL now.
@@ -3357,7 +3256,8 @@ xlog_recover_cancel_intents(
 
 		switch (lip->li_type) {
 		case XFS_LI_EFI:
-			xlog_recover_cancel_efi(log->l_mp, ailp, lip);
+			type = xlog_intent_for_type(lip->li_type);
+			type->cancel_intent(log, lip);
 			break;
 		case XFS_LI_RUI:
 			xlog_recover_cancel_rui(log->l_mp, ailp, lip);


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/19] xfs: refactor RUI log item recovery dispatch
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (10 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 11/19] xfs: refactor EFI log item recovery dispatch Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-25 18:28   ` Christoph Hellwig
  2020-04-22  2:07 ` [PATCH 13/19] xfs: refactor CUI " Darrick J. Wong
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the rmap update intent and intent-done log recovery code into the
per-item source code files and use dispatch functions to call them.  We
do these one at a time because there's a lot of code to move.  No
functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    1 
 fs/xfs/xfs_log_recover.c        |  149 ++------------------------------------
 fs/xfs/xfs_rmap_item.c          |  153 +++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_rmap_item.h          |    7 --
 4 files changed, 154 insertions(+), 156 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 188f27ccf2ec..325f4907c563 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -165,5 +165,6 @@ struct xlog_recover_intent_type {
 };
 
 extern const struct xlog_recover_intent_type xlog_recover_extfree_type;
+extern const struct xlog_recover_intent_type xlog_recover_rmap_type;
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 4d5eb81dadac..ce63660d29f6 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2012,99 +2012,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * This routine is called to create an in-core extent rmap update
- * item from the rui format structure which was logged on disk.
- * It allocates an in-core rui, copies the extents from the format
- * structure into it, and adds the rui to the AIL with the given
- * LSN.
- */
-STATIC int
-xlog_recover_rui_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			lsn)
-{
-	int				error;
-	struct xfs_mount		*mp = log->l_mp;
-	struct xfs_rui_log_item		*ruip;
-	struct xfs_rui_log_format	*rui_formatp;
-
-	rui_formatp = item->ri_buf[0].i_addr;
-
-	ruip = xfs_rui_init(mp, rui_formatp->rui_nextents);
-	error = xfs_rui_copy_format(&item->ri_buf[0], &ruip->rui_format);
-	if (error) {
-		xfs_rui_item_free(ruip);
-		return error;
-	}
-	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The RUI has two references. One for the RUD and one for RUI to ensure
-	 * it makes it into the AIL. Insert the RUI into the AIL directly and
-	 * drop the RUI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &ruip->rui_item, lsn);
-	xfs_rui_release(ruip);
-	return 0;
-}
-
-
-/*
- * This routine is called when an RUD format structure is found in a committed
- * transaction in the log. Its purpose is to cancel the corresponding RUI if it
- * was still in the log. To do this it searches the AIL for the RUI with an id
- * equal to that in the RUD format structure. If we find it we drop the RUD
- * reference, which removes the RUI from the AIL and frees it.
- */
-STATIC int
-xlog_recover_rud_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	struct xfs_rud_log_format	*rud_formatp;
-	struct xfs_rui_log_item		*ruip = NULL;
-	struct xfs_log_item		*lip;
-	uint64_t			rui_id;
-	struct xfs_ail_cursor		cur;
-	struct xfs_ail			*ailp = log->l_ailp;
-
-	rud_formatp = item->ri_buf[0].i_addr;
-	ASSERT(item->ri_buf[0].i_len == sizeof(struct xfs_rud_log_format));
-	rui_id = rud_formatp->rud_rui_id;
-
-	/*
-	 * Search for the RUI with the id in the RUD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_RUI) {
-			ruip = (struct xfs_rui_log_item *)lip;
-			if (ruip->rui_format.rui_id == rui_id) {
-				/*
-				 * Drop the RUD reference to the RUI. This
-				 * removes the RUI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_rui_release(ruip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-
-	return 0;
-}
-
 /*
  * Copy an CUI format buffer from the given buf, and into the destination
  * CUI format structure.  The CUI/CUD items were designed not to need any
@@ -2376,6 +2283,9 @@ xlog_intent_for_type(
 	case XFS_LI_EFD:
 	case XFS_LI_EFI:
 		return &xlog_recover_extfree_type;
+	case XFS_LI_RUD:
+	case XFS_LI_RUI:
+		return &xlog_recover_rmap_type;
 	default:
 		return NULL;
 	}
@@ -2387,6 +2297,7 @@ xlog_is_intent_done_item(
 {
 	switch (ITEM_TYPE(item)) {
 	case XFS_LI_EFD:
+	case XFS_LI_RUD:
 		return true;
 	default:
 		return false;
@@ -2410,10 +2321,6 @@ xlog_recover_intent_pass2(
 	}
 
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_RUI:
-		return xlog_recover_rui_pass2(log, item, current_lsn);
-	case XFS_LI_RUD:
-		return xlog_recover_rud_pass2(log, item);
 	case XFS_LI_CUI:
 		return xlog_recover_cui_pass2(log, item, current_lsn);
 	case XFS_LI_CUD:
@@ -2951,46 +2858,6 @@ xlog_recover_process_data(
 	return 0;
 }
 
-/* Recover the RUI if necessary. */
-STATIC int
-xlog_recover_process_rui(
-	struct xfs_mount		*mp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_rui_log_item		*ruip;
-	int				error;
-
-	/*
-	 * Skip RUIs that we've already processed.
-	 */
-	ruip = container_of(lip, struct xfs_rui_log_item, rui_item);
-	if (test_bit(XFS_RUI_RECOVERED, &ruip->rui_flags))
-		return 0;
-
-	spin_unlock(&ailp->ail_lock);
-	error = xfs_rui_recover(mp, ruip);
-	spin_lock(&ailp->ail_lock);
-
-	return error;
-}
-
-/* Release the RUI since we're cancelling everything. */
-STATIC void
-xlog_recover_cancel_rui(
-	struct xfs_mount		*mp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_rui_log_item		*ruip;
-
-	ruip = container_of(lip, struct xfs_rui_log_item, rui_item);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_rui_release(ruip);
-	spin_lock(&ailp->ail_lock);
-}
-
 /* Recover the CUI if necessary. */
 STATIC int
 xlog_recover_process_cui(
@@ -3197,12 +3064,10 @@ xlog_recover_process_intents(
 		 */
 		switch (lip->li_type) {
 		case XFS_LI_EFI:
+		case XFS_LI_RUI:
 			type = xlog_intent_for_type(lip->li_type);
 			error = type->process_intent(log, parent_tp, lip);
 			break;
-		case XFS_LI_RUI:
-			error = xlog_recover_process_rui(log->l_mp, ailp, lip);
-			break;
 		case XFS_LI_CUI:
 			error = xlog_recover_process_cui(parent_tp, ailp, lip);
 			break;
@@ -3256,12 +3121,10 @@ xlog_recover_cancel_intents(
 
 		switch (lip->li_type) {
 		case XFS_LI_EFI:
+		case XFS_LI_RUI:
 			type = xlog_intent_for_type(lip->li_type);
 			type->cancel_intent(log, lip);
 			break;
-		case XFS_LI_RUI:
-			xlog_recover_cancel_rui(log->l_mp, ailp, lip);
-			break;
 		case XFS_LI_CUI:
 			xlog_recover_cancel_cui(log->l_mp, ailp, lip);
 			break;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 4911b68f95dd..1ef752563f37 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -18,6 +18,8 @@
 #include "xfs_log.h"
 #include "xfs_rmap.h"
 #include "xfs_error.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 kmem_zone_t	*xfs_rui_zone;
 kmem_zone_t	*xfs_rud_zone;
@@ -27,7 +29,7 @@ static inline struct xfs_rui_log_item *RUI_ITEM(struct xfs_log_item *lip)
 	return container_of(lip, struct xfs_rui_log_item, rui_item);
 }
 
-void
+STATIC void
 xfs_rui_item_free(
 	struct xfs_rui_log_item	*ruip)
 {
@@ -44,7 +46,7 @@ xfs_rui_item_free(
  * committed vs unpin operations in bulk insert operations. Hence the reference
  * count to ensure only the last caller frees the RUI.
  */
-void
+STATIC void
 xfs_rui_release(
 	struct xfs_rui_log_item	*ruip)
 {
@@ -132,7 +134,7 @@ static const struct xfs_item_ops xfs_rui_item_ops = {
 /*
  * Allocate and initialize an rui item with the given number of extents.
  */
-struct xfs_rui_log_item *
+STATIC struct xfs_rui_log_item *
 xfs_rui_init(
 	struct xfs_mount		*mp,
 	uint				nextents)
@@ -160,7 +162,7 @@ xfs_rui_init(
  * RUI format structure.  The RUI/RUD items were designed not to need any
  * special alignment handling.
  */
-int
+STATIC int
 xfs_rui_copy_format(
 	struct xfs_log_iovec		*buf,
 	struct xfs_rui_log_format	*dst_rui_fmt)
@@ -487,9 +489,9 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
  * Process an rmap update intent item that was recovered from the log.
  * We need to update the rmapbt.
  */
-int
+STATIC int
 xfs_rui_recover(
-	struct xfs_mount		*mp,
+	struct xfs_trans		*parent_tp,
 	struct xfs_rui_log_item		*ruip)
 {
 	int				i;
@@ -503,6 +505,7 @@ xfs_rui_recover(
 	xfs_exntst_t			state;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
+	struct xfs_mount		*mp = parent_tp->t_mountp;
 
 	ASSERT(!test_bit(XFS_RUI_RECOVERED, &ruip->rui_flags));
 
@@ -606,3 +609,141 @@ xfs_rui_recover(
 	xfs_trans_cancel(tp);
 	return error;
 }
+
+/*
+ * This routine is called to create an in-core extent rmap update
+ * item from the rui format structure which was logged on disk.
+ * It allocates an in-core rui, copies the extents from the format
+ * structure into it, and adds the rui to the AIL with the given
+ * LSN.
+ */
+STATIC int
+xlog_recover_rui(
+	struct xlog			*log,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	int				error;
+	struct xfs_mount		*mp = log->l_mp;
+	struct xfs_rui_log_item		*ruip;
+	struct xfs_rui_log_format	*rui_formatp;
+
+	rui_formatp = item->ri_buf[0].i_addr;
+
+	ruip = xfs_rui_init(mp, rui_formatp->rui_nextents);
+	error = xfs_rui_copy_format(&item->ri_buf[0], &ruip->rui_format);
+	if (error) {
+		xfs_rui_item_free(ruip);
+		return error;
+	}
+	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
+
+	spin_lock(&log->l_ailp->ail_lock);
+	/*
+	 * The RUI has two references. One for the RUD and one for RUI to ensure
+	 * it makes it into the AIL. Insert the RUI into the AIL directly and
+	 * drop the RUI reference. Note that xfs_trans_ail_update() drops the
+	 * AIL lock.
+	 */
+	xfs_trans_ail_update(log->l_ailp, &ruip->rui_item, lsn);
+	xfs_rui_release(ruip);
+	return 0;
+}
+
+
+/*
+ * This routine is called when an RUD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding RUI if it
+ * was still in the log. To do this it searches the AIL for the RUI with an id
+ * equal to that in the RUD format structure. If we find it we drop the RUD
+ * reference, which removes the RUI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_rud(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	struct xfs_rud_log_format	*rud_formatp;
+	struct xfs_rui_log_item		*ruip = NULL;
+	struct xfs_log_item		*lip;
+	uint64_t			rui_id;
+	struct xfs_ail_cursor		cur;
+	struct xfs_ail			*ailp = log->l_ailp;
+
+	rud_formatp = item->ri_buf[0].i_addr;
+	ASSERT(item->ri_buf[0].i_len == sizeof(struct xfs_rud_log_format));
+	rui_id = rud_formatp->rud_rui_id;
+
+	/*
+	 * Search for the RUI with the id in the RUD format structure in the
+	 * AIL.
+	 */
+	spin_lock(&ailp->ail_lock);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	while (lip != NULL) {
+		if (lip->li_type == XFS_LI_RUI) {
+			ruip = (struct xfs_rui_log_item *)lip;
+			if (ruip->rui_format.rui_id == rui_id) {
+				/*
+				 * Drop the RUD reference to the RUI. This
+				 * removes the RUI from the AIL and frees it.
+				 */
+				spin_unlock(&ailp->ail_lock);
+				xfs_rui_release(ruip);
+				spin_lock(&ailp->ail_lock);
+				break;
+			}
+		}
+		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+	}
+
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+
+	return 0;
+}
+
+/* Recover the RUI if necessary. */
+STATIC int
+xlog_recover_process_rui(
+	struct xlog			*log,
+	struct xfs_trans		*parent_tp,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
+	int				error;
+
+	/*
+	 * Skip RUIs that we've already processed.
+	 */
+	if (test_bit(XFS_RUI_RECOVERED, &ruip->rui_flags))
+		return 0;
+
+	spin_unlock(&ailp->ail_lock);
+	error = xfs_rui_recover(parent_tp, ruip);
+	spin_lock(&ailp->ail_lock);
+
+	return error;
+}
+
+/* Release the RUI since we're cancelling everything. */
+STATIC void
+xlog_recover_cancel_rui(
+	struct xlog			*log,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
+
+	spin_unlock(&ailp->ail_lock);
+	xfs_rui_release(ruip);
+	spin_lock(&ailp->ail_lock);
+}
+
+const struct xlog_recover_intent_type xlog_recover_rmap_type = {
+	.recover_intent		= xlog_recover_rui,
+	.recover_done		= xlog_recover_rud,
+	.process_intent		= xlog_recover_process_rui,
+	.cancel_intent		= xlog_recover_cancel_rui,
+};
diff --git a/fs/xfs/xfs_rmap_item.h b/fs/xfs/xfs_rmap_item.h
index 8708e4a5aa5c..48a77a6f5c94 100644
--- a/fs/xfs/xfs_rmap_item.h
+++ b/fs/xfs/xfs_rmap_item.h
@@ -77,11 +77,4 @@ struct xfs_rud_log_item {
 extern struct kmem_zone	*xfs_rui_zone;
 extern struct kmem_zone	*xfs_rud_zone;
 
-struct xfs_rui_log_item *xfs_rui_init(struct xfs_mount *, uint);
-int xfs_rui_copy_format(struct xfs_log_iovec *buf,
-		struct xfs_rui_log_format *dst_rui_fmt);
-void xfs_rui_item_free(struct xfs_rui_log_item *);
-void xfs_rui_release(struct xfs_rui_log_item *);
-int xfs_rui_recover(struct xfs_mount *mp, struct xfs_rui_log_item *ruip);
-
 #endif	/* __XFS_RMAP_ITEM_H__ */


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/19] xfs: refactor CUI log item recovery dispatch
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (11 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 12/19] xfs: refactor RUI " Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-22  2:07 ` [PATCH 14/19] xfs: refactor BUI " Darrick J. Wong
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the refcount update intent and intent-done log recovery code into
the per-item source code files and use dispatch functions to call them.
We do these one at a time because there's a lot of code to move.  No
functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    1 
 fs/xfs/xfs_log_recover.c        |  176 +--------------------------------------
 fs/xfs/xfs_refcount_item.c      |  175 ++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_refcount_item.h      |    5 -
 4 files changed, 178 insertions(+), 179 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 325f4907c563..936da98ff370 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -166,5 +166,6 @@ struct xlog_recover_intent_type {
 
 extern const struct xlog_recover_intent_type xlog_recover_extfree_type;
 extern const struct xlog_recover_intent_type xlog_recover_rmap_type;
+extern const struct xlog_recover_intent_type xlog_recover_refcount_type;
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index ce63660d29f6..8a112f22583b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2012,126 +2012,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * Copy an CUI format buffer from the given buf, and into the destination
- * CUI format structure.  The CUI/CUD items were designed not to need any
- * special alignment handling.
- */
-static int
-xfs_cui_copy_format(
-	struct xfs_log_iovec		*buf,
-	struct xfs_cui_log_format	*dst_cui_fmt)
-{
-	struct xfs_cui_log_format	*src_cui_fmt;
-	uint				len;
-
-	src_cui_fmt = buf->i_addr;
-	len = xfs_cui_log_format_sizeof(src_cui_fmt->cui_nextents);
-
-	if (buf->i_len == len) {
-		memcpy(dst_cui_fmt, src_cui_fmt, len);
-		return 0;
-	}
-	XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
-	return -EFSCORRUPTED;
-}
-
-/*
- * This routine is called to create an in-core extent refcount update
- * item from the cui format structure which was logged on disk.
- * It allocates an in-core cui, copies the extents from the format
- * structure into it, and adds the cui to the AIL with the given
- * LSN.
- */
-STATIC int
-xlog_recover_cui_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			lsn)
-{
-	int				error;
-	struct xfs_mount		*mp = log->l_mp;
-	struct xfs_cui_log_item		*cuip;
-	struct xfs_cui_log_format	*cui_formatp;
-
-	cui_formatp = item->ri_buf[0].i_addr;
-
-	cuip = xfs_cui_init(mp, cui_formatp->cui_nextents);
-	error = xfs_cui_copy_format(&item->ri_buf[0], &cuip->cui_format);
-	if (error) {
-		xfs_cui_item_free(cuip);
-		return error;
-	}
-	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The CUI has two references. One for the CUD and one for CUI to ensure
-	 * it makes it into the AIL. Insert the CUI into the AIL directly and
-	 * drop the CUI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &cuip->cui_item, lsn);
-	xfs_cui_release(cuip);
-	return 0;
-}
-
-
-/*
- * This routine is called when an CUD format structure is found in a committed
- * transaction in the log. Its purpose is to cancel the corresponding CUI if it
- * was still in the log. To do this it searches the AIL for the CUI with an id
- * equal to that in the CUD format structure. If we find it we drop the CUD
- * reference, which removes the CUI from the AIL and frees it.
- */
-STATIC int
-xlog_recover_cud_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	struct xfs_cud_log_format	*cud_formatp;
-	struct xfs_cui_log_item		*cuip = NULL;
-	struct xfs_log_item		*lip;
-	uint64_t			cui_id;
-	struct xfs_ail_cursor		cur;
-	struct xfs_ail			*ailp = log->l_ailp;
-
-	cud_formatp = item->ri_buf[0].i_addr;
-	if (item->ri_buf[0].i_len != sizeof(struct xfs_cud_log_format)) {
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
-		return -EFSCORRUPTED;
-	}
-	cui_id = cud_formatp->cud_cui_id;
-
-	/*
-	 * Search for the CUI with the id in the CUD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_CUI) {
-			cuip = (struct xfs_cui_log_item *)lip;
-			if (cuip->cui_format.cui_id == cui_id) {
-				/*
-				 * Drop the CUD reference to the CUI. This
-				 * removes the CUI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_cui_release(cuip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-
-	return 0;
-}
-
 /*
  * Copy an BUI format buffer from the given buf, and into the destination
  * BUI format structure.  The BUI/BUD items were designed not to need any
@@ -2286,6 +2166,9 @@ xlog_intent_for_type(
 	case XFS_LI_RUD:
 	case XFS_LI_RUI:
 		return &xlog_recover_rmap_type;
+	case XFS_LI_CUI:
+	case XFS_LI_CUD:
+		return &xlog_recover_refcount_type;
 	default:
 		return NULL;
 	}
@@ -2298,6 +2181,7 @@ xlog_is_intent_done_item(
 	switch (ITEM_TYPE(item)) {
 	case XFS_LI_EFD:
 	case XFS_LI_RUD:
+	case XFS_LI_CUD:
 		return true;
 	default:
 		return false;
@@ -2321,10 +2205,6 @@ xlog_recover_intent_pass2(
 	}
 
 	switch (ITEM_TYPE(item)) {
-	case XFS_LI_CUI:
-		return xlog_recover_cui_pass2(log, item, current_lsn);
-	case XFS_LI_CUD:
-		return xlog_recover_cud_pass2(log, item);
 	case XFS_LI_BUI:
 		return xlog_recover_bui_pass2(log, item, current_lsn);
 	case XFS_LI_BUD:
@@ -2858,46 +2738,6 @@ xlog_recover_process_data(
 	return 0;
 }
 
-/* Recover the CUI if necessary. */
-STATIC int
-xlog_recover_process_cui(
-	struct xfs_trans		*parent_tp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_cui_log_item		*cuip;
-	int				error;
-
-	/*
-	 * Skip CUIs that we've already processed.
-	 */
-	cuip = container_of(lip, struct xfs_cui_log_item, cui_item);
-	if (test_bit(XFS_CUI_RECOVERED, &cuip->cui_flags))
-		return 0;
-
-	spin_unlock(&ailp->ail_lock);
-	error = xfs_cui_recover(parent_tp, cuip);
-	spin_lock(&ailp->ail_lock);
-
-	return error;
-}
-
-/* Release the CUI since we're cancelling everything. */
-STATIC void
-xlog_recover_cancel_cui(
-	struct xfs_mount		*mp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_cui_log_item		*cuip;
-
-	cuip = container_of(lip, struct xfs_cui_log_item, cui_item);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_cui_release(cuip);
-	spin_lock(&ailp->ail_lock);
-}
-
 /* Recover the BUI if necessary. */
 STATIC int
 xlog_recover_process_bui(
@@ -3065,12 +2905,10 @@ xlog_recover_process_intents(
 		switch (lip->li_type) {
 		case XFS_LI_EFI:
 		case XFS_LI_RUI:
+		case XFS_LI_CUI:
 			type = xlog_intent_for_type(lip->li_type);
 			error = type->process_intent(log, parent_tp, lip);
 			break;
-		case XFS_LI_CUI:
-			error = xlog_recover_process_cui(parent_tp, ailp, lip);
-			break;
 		case XFS_LI_BUI:
 			error = xlog_recover_process_bui(parent_tp, ailp, lip);
 			break;
@@ -3122,12 +2960,10 @@ xlog_recover_cancel_intents(
 		switch (lip->li_type) {
 		case XFS_LI_EFI:
 		case XFS_LI_RUI:
+		case XFS_LI_CUI:
 			type = xlog_intent_for_type(lip->li_type);
 			type->cancel_intent(log, lip);
 			break;
-		case XFS_LI_CUI:
-			xlog_recover_cancel_cui(log->l_mp, ailp, lip);
-			break;
 		case XFS_LI_BUI:
 			xlog_recover_cancel_bui(log->l_mp, ailp, lip);
 			break;
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 8eeed73928cd..a1dac3d60a2a 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -18,6 +18,8 @@
 #include "xfs_log.h"
 #include "xfs_refcount.h"
 #include "xfs_error.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 kmem_zone_t	*xfs_cui_zone;
 kmem_zone_t	*xfs_cud_zone;
@@ -27,7 +29,7 @@ static inline struct xfs_cui_log_item *CUI_ITEM(struct xfs_log_item *lip)
 	return container_of(lip, struct xfs_cui_log_item, cui_item);
 }
 
-void
+STATIC void
 xfs_cui_item_free(
 	struct xfs_cui_log_item	*cuip)
 {
@@ -44,7 +46,7 @@ xfs_cui_item_free(
  * committed vs unpin operations in bulk insert operations. Hence the reference
  * count to ensure only the last caller frees the CUI.
  */
-void
+STATIC void
 xfs_cui_release(
 	struct xfs_cui_log_item	*cuip)
 {
@@ -133,7 +135,7 @@ static const struct xfs_item_ops xfs_cui_item_ops = {
 /*
  * Allocate and initialize an cui item with the given number of extents.
  */
-struct xfs_cui_log_item *
+STATIC struct xfs_cui_log_item *
 xfs_cui_init(
 	struct xfs_mount		*mp,
 	uint				nextents)
@@ -443,7 +445,7 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
  * Process a refcount update intent item that was recovered from the log.
  * We need to update the refcountbt.
  */
-int
+STATIC int
 xfs_cui_recover(
 	struct xfs_trans		*parent_tp,
 	struct xfs_cui_log_item		*cuip)
@@ -590,3 +592,168 @@ xfs_cui_recover(
 	xfs_trans_cancel(tp);
 	return error;
 }
+
+/*
+ * Copy an CUI format buffer from the given buf, and into the destination
+ * CUI format structure.  The CUI/CUD items were designed not to need any
+ * special alignment handling.
+ */
+static int
+xfs_cui_copy_format(
+	struct xfs_log_iovec		*buf,
+	struct xfs_cui_log_format	*dst_cui_fmt)
+{
+	struct xfs_cui_log_format	*src_cui_fmt;
+	uint				len;
+
+	src_cui_fmt = buf->i_addr;
+	len = xfs_cui_log_format_sizeof(src_cui_fmt->cui_nextents);
+
+	if (buf->i_len == len) {
+		memcpy(dst_cui_fmt, src_cui_fmt, len);
+		return 0;
+	}
+	XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
+	return -EFSCORRUPTED;
+}
+
+/*
+ * This routine is called to create an in-core extent refcount update
+ * item from the cui format structure which was logged on disk.
+ * It allocates an in-core cui, copies the extents from the format
+ * structure into it, and adds the cui to the AIL with the given
+ * LSN.
+ */
+STATIC int
+xlog_recover_cui(
+	struct xlog			*log,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	int				error;
+	struct xfs_mount		*mp = log->l_mp;
+	struct xfs_cui_log_item		*cuip;
+	struct xfs_cui_log_format	*cui_formatp;
+
+	cui_formatp = item->ri_buf[0].i_addr;
+
+	cuip = xfs_cui_init(mp, cui_formatp->cui_nextents);
+	error = xfs_cui_copy_format(&item->ri_buf[0], &cuip->cui_format);
+	if (error) {
+		xfs_cui_item_free(cuip);
+		return error;
+	}
+	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
+
+	spin_lock(&log->l_ailp->ail_lock);
+	/*
+	 * The CUI has two references. One for the CUD and one for CUI to ensure
+	 * it makes it into the AIL. Insert the CUI into the AIL directly and
+	 * drop the CUI reference. Note that xfs_trans_ail_update() drops the
+	 * AIL lock.
+	 */
+	xfs_trans_ail_update(log->l_ailp, &cuip->cui_item, lsn);
+	xfs_cui_release(cuip);
+	return 0;
+}
+
+
+/*
+ * This routine is called when an CUD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding CUI if it
+ * was still in the log. To do this it searches the AIL for the CUI with an id
+ * equal to that in the CUD format structure. If we find it we drop the CUD
+ * reference, which removes the CUI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_cud(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	struct xfs_cud_log_format	*cud_formatp;
+	struct xfs_cui_log_item		*cuip = NULL;
+	struct xfs_log_item		*lip;
+	uint64_t			cui_id;
+	struct xfs_ail_cursor		cur;
+	struct xfs_ail			*ailp = log->l_ailp;
+
+	cud_formatp = item->ri_buf[0].i_addr;
+	if (item->ri_buf[0].i_len != sizeof(struct xfs_cud_log_format)) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
+		return -EFSCORRUPTED;
+	}
+	cui_id = cud_formatp->cud_cui_id;
+
+	/*
+	 * Search for the CUI with the id in the CUD format structure in the
+	 * AIL.
+	 */
+	spin_lock(&ailp->ail_lock);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	while (lip != NULL) {
+		if (lip->li_type == XFS_LI_CUI) {
+			cuip = (struct xfs_cui_log_item *)lip;
+			if (cuip->cui_format.cui_id == cui_id) {
+				/*
+				 * Drop the CUD reference to the CUI. This
+				 * removes the CUI from the AIL and frees it.
+				 */
+				spin_unlock(&ailp->ail_lock);
+				xfs_cui_release(cuip);
+				spin_lock(&ailp->ail_lock);
+				break;
+			}
+		}
+		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+	}
+
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+
+	return 0;
+}
+
+/* Recover the CUI if necessary. */
+STATIC int
+xlog_recover_process_cui(
+	struct xlog			*log,
+	struct xfs_trans		*parent_tp,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
+	int				error;
+
+	/*
+	 * Skip CUIs that we've already processed.
+	 */
+	if (test_bit(XFS_CUI_RECOVERED, &cuip->cui_flags))
+		return 0;
+
+	spin_unlock(&ailp->ail_lock);
+	error = xfs_cui_recover(parent_tp, cuip);
+	spin_lock(&ailp->ail_lock);
+
+	return error;
+}
+
+/* Release the CUI since we're cancelling everything. */
+STATIC void
+xlog_recover_cancel_cui(
+	struct xlog			*log,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
+
+	spin_unlock(&ailp->ail_lock);
+	xfs_cui_release(cuip);
+	spin_lock(&ailp->ail_lock);
+}
+
+const struct xlog_recover_intent_type xlog_recover_refcount_type = {
+	.recover_intent		= xlog_recover_cui,
+	.recover_done		= xlog_recover_cud,
+	.process_intent		= xlog_recover_process_cui,
+	.cancel_intent		= xlog_recover_cancel_cui,
+};
diff --git a/fs/xfs/xfs_refcount_item.h b/fs/xfs/xfs_refcount_item.h
index e47530f30489..cfaa857673a6 100644
--- a/fs/xfs/xfs_refcount_item.h
+++ b/fs/xfs/xfs_refcount_item.h
@@ -77,9 +77,4 @@ struct xfs_cud_log_item {
 extern struct kmem_zone	*xfs_cui_zone;
 extern struct kmem_zone	*xfs_cud_zone;
 
-struct xfs_cui_log_item *xfs_cui_init(struct xfs_mount *, uint);
-void xfs_cui_item_free(struct xfs_cui_log_item *);
-void xfs_cui_release(struct xfs_cui_log_item *);
-int xfs_cui_recover(struct xfs_trans *parent_tp, struct xfs_cui_log_item *cuip);
-
 #endif	/* __XFS_REFCOUNT_ITEM_H__ */


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/19] xfs: refactor BUI log item recovery dispatch
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (12 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 13/19] xfs: refactor CUI " Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-22  2:07 ` [PATCH 15/19] xfs: refactor releasing finished intents during log recovery Darrick J. Wong
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the bmap update intent and intent-done log recovery code into the
per-item source code files and use dispatch functions to call them.  We
do these one at a time because there's a lot of code to move.  No
functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    1 
 fs/xfs/xfs_bmap_item.c          |  179 +++++++++++++++++++++++++++++++
 fs/xfs/xfs_bmap_item.h          |    5 -
 fs/xfs/xfs_log_recover.c        |  222 +++------------------------------------
 4 files changed, 193 insertions(+), 214 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 936da98ff370..5bdba7ee98c5 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -167,5 +167,6 @@ struct xlog_recover_intent_type {
 extern const struct xlog_recover_intent_type xlog_recover_extfree_type;
 extern const struct xlog_recover_intent_type xlog_recover_rmap_type;
 extern const struct xlog_recover_intent_type xlog_recover_refcount_type;
+extern const struct xlog_recover_intent_type xlog_recover_bmap_type;
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index ee6f4229cebc..b4fbc58b6906 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -22,6 +22,8 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_trans_space.h"
 #include "xfs_error.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
 
 kmem_zone_t	*xfs_bui_zone;
 kmem_zone_t	*xfs_bud_zone;
@@ -31,7 +33,7 @@ static inline struct xfs_bui_log_item *BUI_ITEM(struct xfs_log_item *lip)
 	return container_of(lip, struct xfs_bui_log_item, bui_item);
 }
 
-void
+STATIC void
 xfs_bui_item_free(
 	struct xfs_bui_log_item	*buip)
 {
@@ -45,7 +47,7 @@ xfs_bui_item_free(
  * committed vs unpin operations in bulk insert operations. Hence the reference
  * count to ensure only the last caller frees the BUI.
  */
-void
+STATIC void
 xfs_bui_release(
 	struct xfs_bui_log_item	*buip)
 {
@@ -134,7 +136,7 @@ static const struct xfs_item_ops xfs_bui_item_ops = {
 /*
  * Allocate and initialize an bui item with the given number of extents.
  */
-struct xfs_bui_log_item *
+STATIC struct xfs_bui_log_item *
 xfs_bui_init(
 	struct xfs_mount		*mp)
 
@@ -429,7 +431,7 @@ const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
  * Process a bmap update intent item that was recovered from the log.
  * We need to update some inode's bmbt.
  */
-int
+STATIC int
 xfs_bui_recover(
 	struct xfs_trans		*parent_tp,
 	struct xfs_bui_log_item		*buip)
@@ -563,3 +565,172 @@ xfs_bui_recover(
 	}
 	return error;
 }
+
+/*
+ * Copy an BUI format buffer from the given buf, and into the destination
+ * BUI format structure.  The BUI/BUD items were designed not to need any
+ * special alignment handling.
+ */
+static int
+xfs_bui_copy_format(
+	struct xfs_log_iovec		*buf,
+	struct xfs_bui_log_format	*dst_bui_fmt)
+{
+	struct xfs_bui_log_format	*src_bui_fmt;
+	uint				len;
+
+	src_bui_fmt = buf->i_addr;
+	len = xfs_bui_log_format_sizeof(src_bui_fmt->bui_nextents);
+
+	if (buf->i_len == len) {
+		memcpy(dst_bui_fmt, src_bui_fmt, len);
+		return 0;
+	}
+	XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
+	return -EFSCORRUPTED;
+}
+
+/*
+ * This routine is called to create an in-core extent bmap update
+ * item from the bui format structure which was logged on disk.
+ * It allocates an in-core bui, copies the extents from the format
+ * structure into it, and adds the bui to the AIL with the given
+ * LSN.
+ */
+STATIC int
+xlog_recover_bui(
+	struct xlog			*log,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	int				error;
+	struct xfs_mount		*mp = log->l_mp;
+	struct xfs_bui_log_item		*buip;
+	struct xfs_bui_log_format	*bui_formatp;
+
+	bui_formatp = item->ri_buf[0].i_addr;
+
+	if (bui_formatp->bui_nextents != XFS_BUI_MAX_FAST_EXTENTS) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
+		return -EFSCORRUPTED;
+	}
+	buip = xfs_bui_init(mp);
+	error = xfs_bui_copy_format(&item->ri_buf[0], &buip->bui_format);
+	if (error) {
+		xfs_bui_item_free(buip);
+		return error;
+	}
+	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
+
+	spin_lock(&log->l_ailp->ail_lock);
+	/*
+	 * The RUI has two references. One for the RUD and one for RUI to ensure
+	 * it makes it into the AIL. Insert the RUI into the AIL directly and
+	 * drop the RUI reference. Note that xfs_trans_ail_update() drops the
+	 * AIL lock.
+	 */
+	xfs_trans_ail_update(log->l_ailp, &buip->bui_item, lsn);
+	xfs_bui_release(buip);
+	return 0;
+}
+
+
+/*
+ * This routine is called when an BUD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding BUI if it
+ * was still in the log. To do this it searches the AIL for the BUI with an id
+ * equal to that in the BUD format structure. If we find it we drop the BUD
+ * reference, which removes the BUI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_bud(
+	struct xlog			*log,
+	struct xlog_recover_item	*item)
+{
+	struct xfs_bud_log_format	*bud_formatp;
+	struct xfs_bui_log_item		*buip = NULL;
+	struct xfs_log_item		*lip;
+	uint64_t			bui_id;
+	struct xfs_ail_cursor		cur;
+	struct xfs_ail			*ailp = log->l_ailp;
+
+	bud_formatp = item->ri_buf[0].i_addr;
+	if (item->ri_buf[0].i_len != sizeof(struct xfs_bud_log_format)) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
+		return -EFSCORRUPTED;
+	}
+	bui_id = bud_formatp->bud_bui_id;
+
+	/*
+	 * Search for the BUI with the id in the BUD format structure in the
+	 * AIL.
+	 */
+	spin_lock(&ailp->ail_lock);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	while (lip != NULL) {
+		if (lip->li_type == XFS_LI_BUI) {
+			buip = (struct xfs_bui_log_item *)lip;
+			if (buip->bui_format.bui_id == bui_id) {
+				/*
+				 * Drop the BUD reference to the BUI. This
+				 * removes the BUI from the AIL and frees it.
+				 */
+				spin_unlock(&ailp->ail_lock);
+				xfs_bui_release(buip);
+				spin_lock(&ailp->ail_lock);
+				break;
+			}
+		}
+		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+	}
+
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+
+	return 0;
+}
+
+/* Recover the BUI if necessary. */
+STATIC int
+xlog_recover_process_bui(
+	struct xlog			*log,
+	struct xfs_trans		*parent_tp,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
+	int				error;
+
+	/*
+	 * Skip BUIs that we've already processed.
+	 */
+	if (test_bit(XFS_BUI_RECOVERED, &buip->bui_flags))
+		return 0;
+
+	spin_unlock(&ailp->ail_lock);
+	error = xfs_bui_recover(parent_tp, buip);
+	spin_lock(&ailp->ail_lock);
+
+	return error;
+}
+
+/* Release the BUI since we're cancelling everything. */
+STATIC void
+xlog_recover_cancel_bui(
+	struct xlog			*log,
+	struct xfs_log_item		*lip)
+{
+	struct xfs_ail			*ailp = log->l_ailp;
+	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
+
+	spin_unlock(&ailp->ail_lock);
+	xfs_bui_release(buip);
+	spin_lock(&ailp->ail_lock);
+}
+
+const struct xlog_recover_intent_type xlog_recover_bmap_type = {
+	.recover_intent		= xlog_recover_bui,
+	.recover_done		= xlog_recover_bud,
+	.process_intent		= xlog_recover_process_bui,
+	.cancel_intent		= xlog_recover_cancel_bui,
+};
diff --git a/fs/xfs/xfs_bmap_item.h b/fs/xfs/xfs_bmap_item.h
index ad479cc73de8..44d06e62f8f9 100644
--- a/fs/xfs/xfs_bmap_item.h
+++ b/fs/xfs/xfs_bmap_item.h
@@ -74,9 +74,4 @@ struct xfs_bud_log_item {
 extern struct kmem_zone	*xfs_bui_zone;
 extern struct kmem_zone	*xfs_bud_zone;
 
-struct xfs_bui_log_item *xfs_bui_init(struct xfs_mount *);
-void xfs_bui_item_free(struct xfs_bui_log_item *);
-void xfs_bui_release(struct xfs_bui_log_item *);
-int xfs_bui_recover(struct xfs_trans *parent_tp, struct xfs_bui_log_item *buip);
-
 #endif	/* __XFS_BMAP_ITEM_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 8a112f22583b..08066fa32b80 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2012,130 +2012,6 @@ xlog_check_buffer_cancelled(
 	return 1;
 }
 
-/*
- * Copy an BUI format buffer from the given buf, and into the destination
- * BUI format structure.  The BUI/BUD items were designed not to need any
- * special alignment handling.
- */
-static int
-xfs_bui_copy_format(
-	struct xfs_log_iovec		*buf,
-	struct xfs_bui_log_format	*dst_bui_fmt)
-{
-	struct xfs_bui_log_format	*src_bui_fmt;
-	uint				len;
-
-	src_bui_fmt = buf->i_addr;
-	len = xfs_bui_log_format_sizeof(src_bui_fmt->bui_nextents);
-
-	if (buf->i_len == len) {
-		memcpy(dst_bui_fmt, src_bui_fmt, len);
-		return 0;
-	}
-	XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
-	return -EFSCORRUPTED;
-}
-
-/*
- * This routine is called to create an in-core extent bmap update
- * item from the bui format structure which was logged on disk.
- * It allocates an in-core bui, copies the extents from the format
- * structure into it, and adds the bui to the AIL with the given
- * LSN.
- */
-STATIC int
-xlog_recover_bui_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			lsn)
-{
-	int				error;
-	struct xfs_mount		*mp = log->l_mp;
-	struct xfs_bui_log_item		*buip;
-	struct xfs_bui_log_format	*bui_formatp;
-
-	bui_formatp = item->ri_buf[0].i_addr;
-
-	if (bui_formatp->bui_nextents != XFS_BUI_MAX_FAST_EXTENTS) {
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
-		return -EFSCORRUPTED;
-	}
-	buip = xfs_bui_init(mp);
-	error = xfs_bui_copy_format(&item->ri_buf[0], &buip->bui_format);
-	if (error) {
-		xfs_bui_item_free(buip);
-		return error;
-	}
-	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The RUI has two references. One for the RUD and one for RUI to ensure
-	 * it makes it into the AIL. Insert the RUI into the AIL directly and
-	 * drop the RUI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &buip->bui_item, lsn);
-	xfs_bui_release(buip);
-	return 0;
-}
-
-
-/*
- * This routine is called when an BUD format structure is found in a committed
- * transaction in the log. Its purpose is to cancel the corresponding BUI if it
- * was still in the log. To do this it searches the AIL for the BUI with an id
- * equal to that in the BUD format structure. If we find it we drop the BUD
- * reference, which removes the BUI from the AIL and frees it.
- */
-STATIC int
-xlog_recover_bud_pass2(
-	struct xlog			*log,
-	struct xlog_recover_item	*item)
-{
-	struct xfs_bud_log_format	*bud_formatp;
-	struct xfs_bui_log_item		*buip = NULL;
-	struct xfs_log_item		*lip;
-	uint64_t			bui_id;
-	struct xfs_ail_cursor		cur;
-	struct xfs_ail			*ailp = log->l_ailp;
-
-	bud_formatp = item->ri_buf[0].i_addr;
-	if (item->ri_buf[0].i_len != sizeof(struct xfs_bud_log_format)) {
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
-		return -EFSCORRUPTED;
-	}
-	bui_id = bud_formatp->bud_bui_id;
-
-	/*
-	 * Search for the BUI with the id in the BUD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_BUI) {
-			buip = (struct xfs_bui_log_item *)lip;
-			if (buip->bui_format.bui_id == bui_id) {
-				/*
-				 * Drop the BUD reference to the BUI. This
-				 * removes the BUI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_bui_release(buip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-
-	return 0;
-}
-
 STATIC int
 xlog_recover_commit_pass1(
 	struct xlog			*log,
@@ -2169,6 +2045,9 @@ xlog_intent_for_type(
 	case XFS_LI_CUI:
 	case XFS_LI_CUD:
 		return &xlog_recover_refcount_type;
+	case XFS_LI_BUI:
+	case XFS_LI_BUD:
+		return &xlog_recover_bmap_type;
 	default:
 		return NULL;
 	}
@@ -2182,6 +2061,7 @@ xlog_is_intent_done_item(
 	case XFS_LI_EFD:
 	case XFS_LI_RUD:
 	case XFS_LI_CUD:
+	case XFS_LI_BUD:
 		return true;
 	default:
 		return false;
@@ -2198,20 +2078,11 @@ xlog_recover_intent_pass2(
 	const struct xlog_recover_intent_type *type;
 
 	type = xlog_intent_for_type(ITEM_TYPE(item));
-	if (type) {
-		if (xlog_is_intent_done_item(item))
-			return type->recover_done(log, item);
-		return type->recover_intent(log, item, current_lsn);
-	}
-
-	switch (ITEM_TYPE(item)) {
-	case XFS_LI_BUI:
-		return xlog_recover_bui_pass2(log, item, current_lsn);
-	case XFS_LI_BUD:
-		return xlog_recover_bud_pass2(log, item);
-	}
-
-	return -EFSCORRUPTED;
+	if (!type)
+		return -EFSCORRUPTED;
+	if (xlog_is_intent_done_item(item))
+		return type->recover_done(log, item);
+	return type->recover_intent(log, item, current_lsn);
 }
 
 STATIC int
@@ -2738,46 +2609,6 @@ xlog_recover_process_data(
 	return 0;
 }
 
-/* Recover the BUI if necessary. */
-STATIC int
-xlog_recover_process_bui(
-	struct xfs_trans		*parent_tp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_bui_log_item		*buip;
-	int				error;
-
-	/*
-	 * Skip BUIs that we've already processed.
-	 */
-	buip = container_of(lip, struct xfs_bui_log_item, bui_item);
-	if (test_bit(XFS_BUI_RECOVERED, &buip->bui_flags))
-		return 0;
-
-	spin_unlock(&ailp->ail_lock);
-	error = xfs_bui_recover(parent_tp, buip);
-	spin_lock(&ailp->ail_lock);
-
-	return error;
-}
-
-/* Release the BUI since we're cancelling everything. */
-STATIC void
-xlog_recover_cancel_bui(
-	struct xfs_mount		*mp,
-	struct xfs_ail			*ailp,
-	struct xfs_log_item		*lip)
-{
-	struct xfs_bui_log_item		*buip;
-
-	buip = container_of(lip, struct xfs_bui_log_item, bui_item);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_bui_release(buip);
-	spin_lock(&ailp->ail_lock);
-}
-
 /* Is this log item a deferred action intent? */
 static inline bool xlog_item_is_intent(struct xfs_log_item *lip)
 {
@@ -2881,7 +2712,8 @@ xlog_recover_process_intents(
 		 * We're done when we see something other than an intent.
 		 * There should be no intents left in the AIL now.
 		 */
-		if (!xlog_item_is_intent(lip)) {
+		type = xlog_intent_for_type(lip->li_type);
+		if (!type) {
 #ifdef DEBUG
 			for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
 				ASSERT(!xlog_item_is_intent(lip));
@@ -2898,21 +2730,11 @@ xlog_recover_process_intents(
 
 		/*
 		 * NOTE: If your intent processing routine can create more
-		 * deferred ops, you /must/ attach them to the dfops in this
-		 * routine or else those subsequent intents will get
+		 * deferred ops, you /must/ attach them to the transaction in
+		 * this routine or else those subsequent intents will get
 		 * replayed in the wrong order!
 		 */
-		switch (lip->li_type) {
-		case XFS_LI_EFI:
-		case XFS_LI_RUI:
-		case XFS_LI_CUI:
-			type = xlog_intent_for_type(lip->li_type);
-			error = type->process_intent(log, parent_tp, lip);
-			break;
-		case XFS_LI_BUI:
-			error = xlog_recover_process_bui(parent_tp, ailp, lip);
-			break;
-		}
+		error = type->process_intent(log, parent_tp, lip);
 		if (error)
 			goto out;
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
@@ -2949,7 +2771,8 @@ xlog_recover_cancel_intents(
 		 * We're done when we see something other than an intent.
 		 * There should be no intents left in the AIL now.
 		 */
-		if (!xlog_item_is_intent(lip)) {
+		type = xlog_intent_for_type(lip->li_type);
+		if (!type) {
 #ifdef DEBUG
 			for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
 				ASSERT(!xlog_item_is_intent(lip));
@@ -2957,18 +2780,7 @@ xlog_recover_cancel_intents(
 			break;
 		}
 
-		switch (lip->li_type) {
-		case XFS_LI_EFI:
-		case XFS_LI_RUI:
-		case XFS_LI_CUI:
-			type = xlog_intent_for_type(lip->li_type);
-			type->cancel_intent(log, lip);
-			break;
-		case XFS_LI_BUI:
-			xlog_recover_cancel_bui(log->l_mp, ailp, lip);
-			break;
-		}
-
+		type->cancel_intent(log, lip);
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
 	}
 


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/19] xfs: refactor releasing finished intents during log recovery
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (13 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 14/19] xfs: refactor BUI " Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-25 18:34   ` Christoph Hellwig
  2020-04-22  2:07 ` [PATCH 16/19] xfs: refactor adding recovered intent items to the log Darrick J. Wong
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    5 +++
 fs/xfs/xfs_bmap_item.c          |   56 +++++++++++++++++----------------------
 fs/xfs/xfs_extfree_item.c       |   56 +++++++++++++++++----------------------
 fs/xfs/xfs_log_recover.c        |   27 +++++++++++++++++++
 fs/xfs/xfs_refcount_item.c      |   56 +++++++++++++++++----------------------
 fs/xfs/xfs_rmap_item.c          |   56 +++++++++++++++++----------------------
 6 files changed, 128 insertions(+), 128 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 5bdba7ee98c5..ac1adccc8451 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -169,4 +169,9 @@ extern const struct xlog_recover_intent_type xlog_recover_rmap_type;
 extern const struct xlog_recover_intent_type xlog_recover_refcount_type;
 extern const struct xlog_recover_intent_type xlog_recover_bmap_type;
 
+typedef bool (*xlog_recover_release_intent_fn)(struct xlog *log,
+		struct xfs_log_item *item, uint64_t intent_id);
+void xlog_recover_release_intent(struct xlog *log, unsigned short intent_type,
+		uint64_t intent_id, xlog_recover_release_intent_fn fn);
+
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index b4fbc58b6906..cd593b98f102 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -634,6 +634,28 @@ xlog_recover_bui(
 	return 0;
 }
 
+STATIC bool
+xlog_release_bui(
+	struct xlog		*log,
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	struct xfs_bui_log_item	*buip = BUI_ITEM(lip);
+	struct xfs_ail		*ailp = log->l_ailp;
+
+	if (buip->bui_format.bui_id == intent_id) {
+		/*
+		 * Drop the BUD reference to the BUI. This
+		 * removes the BUI from the AIL and frees it.
+		 */
+		spin_unlock(&ailp->ail_lock);
+		xfs_bui_release(buip);
+		spin_lock(&ailp->ail_lock);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * This routine is called when an BUD format structure is found in a committed
@@ -648,45 +670,15 @@ xlog_recover_bud(
 	struct xlog_recover_item	*item)
 {
 	struct xfs_bud_log_format	*bud_formatp;
-	struct xfs_bui_log_item		*buip = NULL;
-	struct xfs_log_item		*lip;
-	uint64_t			bui_id;
-	struct xfs_ail_cursor		cur;
-	struct xfs_ail			*ailp = log->l_ailp;
 
 	bud_formatp = item->ri_buf[0].i_addr;
 	if (item->ri_buf[0].i_len != sizeof(struct xfs_bud_log_format)) {
 		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
 		return -EFSCORRUPTED;
 	}
-	bui_id = bud_formatp->bud_bui_id;
-
-	/*
-	 * Search for the BUI with the id in the BUD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_BUI) {
-			buip = (struct xfs_bui_log_item *)lip;
-			if (buip->bui_format.bui_id == bui_id) {
-				/*
-				 * Drop the BUD reference to the BUI. This
-				 * removes the BUI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_bui_release(buip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
 
+	xlog_recover_release_intent(log, XFS_LI_BUI, bud_formatp->bud_bui_id,
+			 xlog_release_bui);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index cd2beb581b40..6a873309f3bc 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -695,6 +695,28 @@ xlog_recover_efi(
 	return 0;
 }
 
+STATIC bool
+xlog_release_efi(
+	struct xlog		*log,
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	struct xfs_efi_log_item	*efip = EFI_ITEM(lip);
+	struct xfs_ail		*ailp = log->l_ailp;
+
+	if (efip->efi_format.efi_id == intent_id) {
+		/*
+		 * Drop the EFD reference to the EFI. This
+		 * removes the EFI from the AIL and frees it.
+		 */
+		spin_unlock(&ailp->ail_lock);
+		xfs_efi_release(efip);
+		spin_lock(&ailp->ail_lock);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * This routine is called when an EFD format structure is found in a committed
@@ -708,46 +730,16 @@ xlog_recover_efd(
 	struct xlog			*log,
 	struct xlog_recover_item	*item)
 {
-	struct xfs_ail_cursor		cur;
 	struct xfs_efd_log_format	*efd_formatp;
-	struct xfs_efi_log_item		*efip = NULL;
-	struct xfs_log_item		*lip;
-	struct xfs_ail			*ailp = log->l_ailp;
-	uint64_t			efi_id;
 
 	efd_formatp = item->ri_buf[0].i_addr;
 	ASSERT((item->ri_buf[0].i_len == (sizeof(xfs_efd_log_format_32_t) +
 		((efd_formatp->efd_nextents - 1) * sizeof(xfs_extent_32_t)))) ||
 	       (item->ri_buf[0].i_len == (sizeof(xfs_efd_log_format_64_t) +
 		((efd_formatp->efd_nextents - 1) * sizeof(xfs_extent_64_t)))));
-	efi_id = efd_formatp->efd_efi_id;
-
-	/*
-	 * Search for the EFI with the id in the EFD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_EFI) {
-			efip = (xfs_efi_log_item_t *)lip;
-			if (efip->efi_format.efi_id == efi_id) {
-				/*
-				 * Drop the EFD reference to the EFI. This
-				 * removes the EFI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_efi_release(efip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
 
+	xlog_recover_release_intent(log, XFS_LI_EFI, efd_formatp->efd_efi_id,
+			 xlog_release_efi);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 08066fa32b80..460f836de963 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1770,6 +1770,33 @@ xlog_clear_stale_blocks(
 
 /* Log intent item dispatching. */
 
+/*
+ * Release the recovered intent item in the AIL that matches the given intent
+ * type and intent id.
+ */
+void
+xlog_recover_release_intent(
+	struct xlog		*log,
+	unsigned short		intent_type,
+	uint64_t		intent_id,
+	xlog_recover_release_intent_fn	fn)
+{
+	struct xfs_ail_cursor	cur;
+	struct xfs_log_item	*lip;
+	struct xfs_ail		*ailp = log->l_ailp;
+
+	spin_lock(&ailp->ail_lock);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	while (lip != NULL) {
+		if (lip->li_type == intent_type && fn(log, lip, intent_id))
+			break;
+		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+	}
+
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+}
+
 STATIC int xlog_recover_intent_pass2(struct xlog *log,
 		struct list_head *buffer_list, struct xlog_recover_item *item,
 		xfs_lsn_t current_lsn);
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index a1dac3d60a2a..6eef1523078c 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -657,6 +657,28 @@ xlog_recover_cui(
 	return 0;
 }
 
+STATIC bool
+xlog_release_cui(
+	struct xlog		*log,
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	struct xfs_cui_log_item	*cuip = CUI_ITEM(lip);
+	struct xfs_ail		*ailp = log->l_ailp;
+
+	if (cuip->cui_format.cui_id == intent_id) {
+		/*
+		 * Drop the CUD reference to the CUI. This
+		 * removes the CUI from the AIL and frees it.
+		 */
+		spin_unlock(&ailp->ail_lock);
+		xfs_cui_release(cuip);
+		spin_lock(&ailp->ail_lock);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * This routine is called when an CUD format structure is found in a committed
@@ -671,45 +693,15 @@ xlog_recover_cud(
 	struct xlog_recover_item	*item)
 {
 	struct xfs_cud_log_format	*cud_formatp;
-	struct xfs_cui_log_item		*cuip = NULL;
-	struct xfs_log_item		*lip;
-	uint64_t			cui_id;
-	struct xfs_ail_cursor		cur;
-	struct xfs_ail			*ailp = log->l_ailp;
 
 	cud_formatp = item->ri_buf[0].i_addr;
 	if (item->ri_buf[0].i_len != sizeof(struct xfs_cud_log_format)) {
 		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp);
 		return -EFSCORRUPTED;
 	}
-	cui_id = cud_formatp->cud_cui_id;
-
-	/*
-	 * Search for the CUI with the id in the CUD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_CUI) {
-			cuip = (struct xfs_cui_log_item *)lip;
-			if (cuip->cui_format.cui_id == cui_id) {
-				/*
-				 * Drop the CUD reference to the CUI. This
-				 * removes the CUI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_cui_release(cuip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
 
+	xlog_recover_release_intent(log, XFS_LI_CUI, cud_formatp->cud_cui_id,
+			 xlog_release_cui);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 1ef752563f37..b60fb141c22e 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -650,6 +650,28 @@ xlog_recover_rui(
 	return 0;
 }
 
+STATIC bool
+xlog_release_rui(
+	struct xlog		*log,
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	struct xfs_rui_log_item	*ruip = RUI_ITEM(lip);
+	struct xfs_ail		*ailp = log->l_ailp;
+
+	if (ruip->rui_format.rui_id == intent_id) {
+		/*
+		 * Drop the RUD reference to the RUI. This
+		 * removes the RUI from the AIL and frees it.
+		 */
+		spin_unlock(&ailp->ail_lock);
+		xfs_rui_release(ruip);
+		spin_lock(&ailp->ail_lock);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * This routine is called when an RUD format structure is found in a committed
@@ -664,42 +686,12 @@ xlog_recover_rud(
 	struct xlog_recover_item	*item)
 {
 	struct xfs_rud_log_format	*rud_formatp;
-	struct xfs_rui_log_item		*ruip = NULL;
-	struct xfs_log_item		*lip;
-	uint64_t			rui_id;
-	struct xfs_ail_cursor		cur;
-	struct xfs_ail			*ailp = log->l_ailp;
 
 	rud_formatp = item->ri_buf[0].i_addr;
 	ASSERT(item->ri_buf[0].i_len == sizeof(struct xfs_rud_log_format));
-	rui_id = rud_formatp->rud_rui_id;
-
-	/*
-	 * Search for the RUI with the id in the RUD format structure in the
-	 * AIL.
-	 */
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (lip->li_type == XFS_LI_RUI) {
-			ruip = (struct xfs_rui_log_item *)lip;
-			if (ruip->rui_format.rui_id == rui_id) {
-				/*
-				 * Drop the RUD reference to the RUI. This
-				 * removes the RUI from the AIL and frees it.
-				 */
-				spin_unlock(&ailp->ail_lock);
-				xfs_rui_release(ruip);
-				spin_lock(&ailp->ail_lock);
-				break;
-			}
-		}
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
 
+	xlog_recover_release_intent(log, XFS_LI_RUI, rud_formatp->rud_rui_id,
+			 xlog_release_rui);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/19] xfs: refactor adding recovered intent items to the log
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (14 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 15/19] xfs: refactor releasing finished intents during log recovery Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-25 18:34   ` Christoph Hellwig
  2020-04-22  2:07 ` [PATCH 17/19] xfs: hoist the ail unlock/lock cycle when cancelling intents during recovery Darrick J. Wong
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

During recovery, every intent that we recover from the log has to be
added to the AIL.  Replace the open-coded addition with a helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    2 ++
 fs/xfs/xfs_bmap_item.c          |   10 +---------
 fs/xfs/xfs_extfree_item.c       |   10 +---------
 fs/xfs/xfs_log_recover.c        |   17 +++++++++++++++++
 fs/xfs/xfs_refcount_item.c      |   10 +---------
 fs/xfs/xfs_rmap_item.c          |   10 +---------
 6 files changed, 23 insertions(+), 36 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index ac1adccc8451..8cb38d8327ce 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -173,5 +173,7 @@ typedef bool (*xlog_recover_release_intent_fn)(struct xlog *log,
 		struct xfs_log_item *item, uint64_t intent_id);
 void xlog_recover_release_intent(struct xlog *log, unsigned short intent_type,
 		uint64_t intent_id, xlog_recover_release_intent_fn fn);
+void xlog_recover_insert_ail(struct xlog *log, struct xfs_log_item *lip,
+		xfs_lsn_t lsn);
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index cd593b98f102..df8155dfcc87 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -621,15 +621,7 @@ xlog_recover_bui(
 		return error;
 	}
 	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The RUI has two references. One for the RUD and one for RUI to ensure
-	 * it makes it into the AIL. Insert the RUI into the AIL directly and
-	 * drop the RUI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &buip->bui_item, lsn);
+	xlog_recover_insert_ail(log, &buip->bui_item, lsn);
 	xfs_bui_release(buip);
 	return 0;
 }
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 6a873309f3bc..3af9e37892f1 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -682,15 +682,7 @@ xlog_recover_efi(
 		return error;
 	}
 	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The EFI has two references. One for the EFD and one for EFI to ensure
-	 * it makes it into the AIL. Insert the EFI into the AIL directly and
-	 * drop the EFI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &efip->efi_item, lsn);
+	xlog_recover_insert_ail(log, &efip->efi_item, lsn);
 	xfs_efi_release(efip);
 	return 0;
 }
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 460f836de963..913eb9101110 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1797,6 +1797,23 @@ xlog_recover_release_intent(
 	spin_unlock(&ailp->ail_lock);
 }
 
+/* Insert a recovered intent item into the AIL. */
+void
+xlog_recover_insert_ail(
+	struct xlog		*log,
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+	/*
+	 * The intent has two references. One for the done item and one for the
+	 * intent to ensure it makes it into the AIL. Insert the intent into
+	 * the AIL directly and drop the intent reference. Note that
+	 * xfs_trans_ail_update() drops the AIL lock.
+	 */
+	spin_lock(&log->l_ailp->ail_lock);
+	xfs_trans_ail_update(log->l_ailp, lip, lsn);
+}
+
 STATIC int xlog_recover_intent_pass2(struct xlog *log,
 		struct list_head *buffer_list, struct xlog_recover_item *item,
 		xfs_lsn_t current_lsn);
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 6eef1523078c..ab786739ff7c 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -644,15 +644,7 @@ xlog_recover_cui(
 		return error;
 	}
 	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The CUI has two references. One for the CUD and one for CUI to ensure
-	 * it makes it into the AIL. Insert the CUI into the AIL directly and
-	 * drop the CUI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &cuip->cui_item, lsn);
+	xlog_recover_insert_ail(log, &cuip->cui_item, lsn);
 	xfs_cui_release(cuip);
 	return 0;
 }
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index b60fb141c22e..a83f86915c40 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -637,15 +637,7 @@ xlog_recover_rui(
 		return error;
 	}
 	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
-
-	spin_lock(&log->l_ailp->ail_lock);
-	/*
-	 * The RUI has two references. One for the RUD and one for RUI to ensure
-	 * it makes it into the AIL. Insert the RUI into the AIL directly and
-	 * drop the RUI reference. Note that xfs_trans_ail_update() drops the
-	 * AIL lock.
-	 */
-	xfs_trans_ail_update(log->l_ailp, &ruip->rui_item, lsn);
+	xlog_recover_insert_ail(log, &ruip->rui_item, lsn);
 	xfs_rui_release(ruip);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/19] xfs: hoist the ail unlock/lock cycle when cancelling intents during recovery
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (15 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 16/19] xfs: refactor adding recovered intent items to the log Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-25 18:35   ` Christoph Hellwig
  2020-04-22  2:07 ` [PATCH 18/19] xfs: remove xlog_item_is_intent Darrick J. Wong
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the spin_unlock/spin_lock of the ail lock up to
xlog_recover_cancel_intents so that the individual ->cancel_intent
functions don't have to do that anymore.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |    3 +--
 fs/xfs/xfs_bmap_item.c          |    8 +-------
 fs/xfs/xfs_extfree_item.c       |    8 +-------
 fs/xfs/xfs_log_recover.c        |    4 +++-
 fs/xfs/xfs_refcount_item.c      |    8 +-------
 fs/xfs/xfs_rmap_item.c          |    8 +-------
 6 files changed, 8 insertions(+), 31 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 8cb38d8327ce..5c37940386d6 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -132,8 +132,7 @@ typedef int (*xlog_recover_done_fn)(struct xlog *xlog,
 		struct xlog_recover_item *item);
 typedef int (*xlog_recover_process_intent_fn)(struct xlog *log,
 		struct xfs_trans *tp, struct xfs_log_item *lip);
-typedef void (*xlog_recover_cancel_intent_fn)(struct xlog *log,
-		struct xfs_log_item *lip);
+typedef void (*xlog_recover_cancel_intent_fn)(struct xfs_log_item *lip);
 
 struct xlog_recover_intent_type {
 	/*
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index df8155dfcc87..53160172c36b 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -701,15 +701,9 @@ xlog_recover_process_bui(
 /* Release the BUI since we're cancelling everything. */
 STATIC void
 xlog_recover_cancel_bui(
-	struct xlog			*log,
 	struct xfs_log_item		*lip)
 {
-	struct xfs_ail			*ailp = log->l_ailp;
-	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_bui_release(buip);
-	spin_lock(&ailp->ail_lock);
+	xfs_bui_release(BUI_ITEM(lip));
 }
 
 const struct xlog_recover_intent_type xlog_recover_bmap_type = {
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 3af9e37892f1..a15ede29244a 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -762,15 +762,9 @@ xlog_recover_process_efi(
 /* Release the EFI since we're cancelling everything. */
 STATIC void
 xlog_recover_cancel_efi(
-	struct xlog			*log,
 	struct xfs_log_item		*lip)
 {
-	struct xfs_ail			*ailp = log->l_ailp;
-	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_efi_release(efip);
-	spin_lock(&ailp->ail_lock);
+	xfs_efi_release(EFI_ITEM(lip));
 }
 
 const struct xlog_recover_intent_type xlog_recover_extfree_type = {
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 913eb9101110..51a7d4b963cd 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2824,7 +2824,9 @@ xlog_recover_cancel_intents(
 			break;
 		}
 
-		type->cancel_intent(log, lip);
+		spin_unlock(&ailp->ail_lock);
+		type->cancel_intent(lip);
+		spin_lock(&ailp->ail_lock);
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
 	}
 
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index ab786739ff7c..01a393727a1e 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -724,15 +724,9 @@ xlog_recover_process_cui(
 /* Release the CUI since we're cancelling everything. */
 STATIC void
 xlog_recover_cancel_cui(
-	struct xlog			*log,
 	struct xfs_log_item		*lip)
 {
-	struct xfs_ail			*ailp = log->l_ailp;
-	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_cui_release(cuip);
-	spin_lock(&ailp->ail_lock);
+	xfs_cui_release(CUI_ITEM(lip));
 }
 
 const struct xlog_recover_intent_type xlog_recover_refcount_type = {
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index a83f86915c40..69a2d23eedda 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -714,15 +714,9 @@ xlog_recover_process_rui(
 /* Release the RUI since we're cancelling everything. */
 STATIC void
 xlog_recover_cancel_rui(
-	struct xlog			*log,
 	struct xfs_log_item		*lip)
 {
-	struct xfs_ail			*ailp = log->l_ailp;
-	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
-
-	spin_unlock(&ailp->ail_lock);
-	xfs_rui_release(ruip);
-	spin_lock(&ailp->ail_lock);
+	xfs_rui_release(RUI_ITEM(lip));
 }
 
 const struct xlog_recover_intent_type xlog_recover_rmap_type = {


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/19] xfs: remove xlog_item_is_intent
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (16 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 17/19] xfs: hoist the ail unlock/lock cycle when cancelling intents during recovery Darrick J. Wong
@ 2020-04-22  2:07 ` Darrick J. Wong
  2020-04-22  2:08 ` [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file Darrick J. Wong
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Now that we've parameterized all of the log intent types, get rid of
this redundant predicate.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |   18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 51a7d4b963cd..250f04419035 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2653,20 +2653,6 @@ xlog_recover_process_data(
 	return 0;
 }
 
-/* Is this log item a deferred action intent? */
-static inline bool xlog_item_is_intent(struct xfs_log_item *lip)
-{
-	switch (lip->li_type) {
-	case XFS_LI_EFI:
-	case XFS_LI_RUI:
-	case XFS_LI_CUI:
-	case XFS_LI_BUI:
-		return true;
-	default:
-		return false;
-	}
-}
-
 /* Take all the collected deferred ops and finish them in order. */
 static int
 xlog_finish_defer_ops(
@@ -2760,7 +2746,7 @@ xlog_recover_process_intents(
 		if (!type) {
 #ifdef DEBUG
 			for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
-				ASSERT(!xlog_item_is_intent(lip));
+				ASSERT(!xlog_intent_for_type(lip->li_type));
 #endif
 			break;
 		}
@@ -2819,7 +2805,7 @@ xlog_recover_cancel_intents(
 		if (!type) {
 #ifdef DEBUG
 			for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
-				ASSERT(!xlog_item_is_intent(lip));
+				ASSERT(!xlog_intent_for_type(lip->li_type));
 #endif
 			break;
 		}


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (17 preceding siblings ...)
  2020-04-22  2:07 ` [PATCH 18/19] xfs: remove xlog_item_is_intent Darrick J. Wong
@ 2020-04-22  2:08 ` Darrick J. Wong
  2020-04-25 18:36   ` Christoph Hellwig
  2020-04-22 16:18 ` [PATCH 00/19] xfs: refactor log recovery Brian Foster
  2020-04-28  6:22 ` Christoph Hellwig
  20 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-22  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move xlog_recover_intent_pass2 up in the file so that we don't have to
have a forward declaration of a static function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |  110 ++++++++++++++++++++++------------------------
 1 file changed, 53 insertions(+), 57 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 250f04419035..5a20a95c5dad 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1814,9 +1814,59 @@ xlog_recover_insert_ail(
 	xfs_trans_ail_update(log->l_ailp, lip, lsn);
 }
 
-STATIC int xlog_recover_intent_pass2(struct xlog *log,
-		struct list_head *buffer_list, struct xlog_recover_item *item,
-		xfs_lsn_t current_lsn);
+static inline const struct xlog_recover_intent_type *
+xlog_intent_for_type(
+	unsigned short			type)
+{
+	switch (type) {
+	case XFS_LI_EFD:
+	case XFS_LI_EFI:
+		return &xlog_recover_extfree_type;
+	case XFS_LI_RUD:
+	case XFS_LI_RUI:
+		return &xlog_recover_rmap_type;
+	case XFS_LI_CUI:
+	case XFS_LI_CUD:
+		return &xlog_recover_refcount_type;
+	case XFS_LI_BUI:
+	case XFS_LI_BUD:
+		return &xlog_recover_bmap_type;
+	default:
+		return NULL;
+	}
+}
+
+static inline bool
+xlog_is_intent_done_item(
+	struct xlog_recover_item	*item)
+{
+	switch (ITEM_TYPE(item)) {
+	case XFS_LI_EFD:
+	case XFS_LI_RUD:
+	case XFS_LI_CUD:
+	case XFS_LI_BUD:
+		return true;
+	default:
+		return false;
+	}
+}
+
+STATIC int
+xlog_recover_intent_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
+{
+	const struct xlog_recover_intent_type *type;
+
+	type = xlog_intent_for_type(ITEM_TYPE(item));
+	if (!type)
+		return -EFSCORRUPTED;
+	if (xlog_is_intent_done_item(item))
+		return type->recover_done(log, item);
+	return type->recover_intent(log, item, current_lsn);
+}
 
 const struct xlog_recover_item_type xlog_intent_item_type = {
 	.reorder		= XLOG_REORDER_INODE_LIST,
@@ -2075,60 +2125,6 @@ xlog_recover_commit_pass1(
 	return item->ri_type->commit_pass1_fn(log, item);
 }
 
-static inline const struct xlog_recover_intent_type *
-xlog_intent_for_type(
-	unsigned short			type)
-{
-	switch (type) {
-	case XFS_LI_EFD:
-	case XFS_LI_EFI:
-		return &xlog_recover_extfree_type;
-	case XFS_LI_RUD:
-	case XFS_LI_RUI:
-		return &xlog_recover_rmap_type;
-	case XFS_LI_CUI:
-	case XFS_LI_CUD:
-		return &xlog_recover_refcount_type;
-	case XFS_LI_BUI:
-	case XFS_LI_BUD:
-		return &xlog_recover_bmap_type;
-	default:
-		return NULL;
-	}
-}
-
-static inline bool
-xlog_is_intent_done_item(
-	struct xlog_recover_item	*item)
-{
-	switch (ITEM_TYPE(item)) {
-	case XFS_LI_EFD:
-	case XFS_LI_RUD:
-	case XFS_LI_CUD:
-	case XFS_LI_BUD:
-		return true;
-	default:
-		return false;
-	}
-}
-
-STATIC int
-xlog_recover_intent_pass2(
-	struct xlog			*log,
-	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item,
-	xfs_lsn_t			current_lsn)
-{
-	const struct xlog_recover_intent_type *type;
-
-	type = xlog_intent_for_type(ITEM_TYPE(item));
-	if (!type)
-		return -EFSCORRUPTED;
-	if (xlog_is_intent_done_item(item))
-		return type->recover_done(log, item);
-	return type->recover_intent(log, item, current_lsn);
-}
-
 STATIC int
 xlog_recover_commit_pass2(
 	struct xlog			*log,


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 01/19] xfs: complain when we don't recognize the log item type
  2020-04-22  2:06 ` [PATCH 01/19] xfs: complain when we don't recognize the log item type Darrick J. Wong
@ 2020-04-22 16:17   ` Brian Foster
  2020-04-25 17:42   ` Christoph Hellwig
  1 sibling, 0 replies; 56+ messages in thread
From: Brian Foster @ 2020-04-22 16:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:06:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're sorting recovered log items ahead of recovering them and
> encounter a log item of unknown type, actually print the type code when
> we're rejecting the whole transaction to aid in debugging.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_log_recover.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 11c3502b07b1..5f803083ddc3 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1887,8 +1887,8 @@ xlog_recover_reorder_trans(
>  			break;
>  		default:
>  			xfs_warn(log->l_mp,
> -				"%s: unrecognized type of log operation",
> -				__func__);
> +				"%s: unrecognized type of log operation (%d)",
> +				__func__, ITEM_TYPE(item));
>  			ASSERT(0);
>  			/*
>  			 * return the remaining items back to the transaction
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (18 preceding siblings ...)
  2020-04-22  2:08 ` [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file Darrick J. Wong
@ 2020-04-22 16:18 ` Brian Foster
  2020-04-28  6:12   ` Christoph Hellwig
  2020-04-28  6:22 ` Christoph Hellwig
  20 siblings, 1 reply; 56+ messages in thread
From: Brian Foster @ 2020-04-22 16:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:06:03PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> This series refactors log recovery by moving recovery code for each log
> item type into the source code for the rest of that log item type and
> using dispatch function pointers to virtualize the interactions.  This
> dramatically reduces the amount of code in xfs_log_recover.c and
> increases cohesion throughout the log code.
> 

So I realized pretty quickly after looking at patch 2 that I probably
need to step back and look at the big picture before getting into the
details of the patches. After skimming through the end result, my
preliminary reaction is that the concept looks interesting for intents,
but I'm not so sure about abstracting so aggressively across the core
recovery implementation simply because I don't think the current design
lends itself to it. Some high level thoughts on particular areas...

- Transaction reorder

Virtualizing the transaction reorder across all several files/types
strikes me as overkill for several reasons. From a code standpoint,
we've created a new type enumeration and a couple fields (enum type and
a function) in a generic structure to essentially abstract out the
buffer handling into a function. The latter checks another couple of blf
flags, which appears to be the only real "type specific" logic in the
whole sequence. From a complexity standpoint, the reorder operation is a
fairly low level and internal recovery operation. We have this huge
comment just to explain exactly what's happening and why certain items
have to be ordered as such, or some treated like others, etc. TBH it's
not terribly clear even with that documentation, so I don't know that
splitting the associated mapping logic off into separate files is
helpful.

- Readahead

We end up with readahead callouts for only the types that translate to
buffers (so buffers, inode, dquots), and then those callouts do some
type specific mapping (that is duplicated within the specific type
handers) and issue a readahead (which is duplicated across each ra_pass2
call). I wonder if this would be better abstracted by a ->bmap() like
call that simply maps the item to a [block,length] and returns a
non-zero length if the core recovery code should invoke readahead (after
checking for cancellation). It looks like the underlying implementation
of those bmap calls could be further factored into helpers that
translate from the raw record data into the type specific format
structures, and that could reduce duplication between the readahead
calls and the pass2 calls in a couple cases. (The more I think about,
the more I think we should introduce those kind of cleanups before
getting into the need for function pointers.)

- Recovery (pass1/pass2)

The core recovery bits seem more reasonable to factor out in general.
That said, we only have two pass1 callbacks (buffers and quotaoff). The
buffer callback does cancellation management and the quotaoff sets some
flags, so I wonder why those couldn't just remain as direct function
calls (even if we move the functions out of xfs_log_recover.c). There
are more callbacks for pass2 so the function pointers make a bit more
sense there, but at the same time it looks like the various intents are
further abstracted behind a single "intent type" pass2 call (which has a
hardcoded XLOG_REORDER_INODE_LIST reorder value and is about as clear as
mud in that context, getting to my earlier point).

Given all that, ISTM that the virtualization pattern is perhaps best
suited for the intent bits since we'll probably grow more
xlog_recover_intent_type users over time as we defer-enable more
operations, and the interfaces therein are more broadly used across the
types. I'm much more on the fence about the whole xlog_recover_item_type
thing. I've no objection to lifting more type-specific code out of
xfs_log_recover.c, but if we're going as far as creating a virt
interface then I'd probably rather see us refactor things more first and
design a cleaner interface rather than simply spread the order,
readahead, pass1, pass2 recovery sequence logic around through multiple
source files, particularly where several of the virt interfaces are used
relatively sparsely (and the landing spots for that code is shared with
other functional components...)

Which also makes me wonder if the _item.c files are the right place to
dump functional recovery code. I've always considered those files mostly
related to xfs_log_item management so it looks a little off to see some
of the functional recovery code in there. Perhaps we should split log
recovery into its own set of per-type source files and leave any common
log item/format bits in the _item.c files..? That certainly might make
more sense to me if we do go with such a low level recovery interface...

Brian

> If you're going to start using this mess, you probably ought to just
> pull from my git trees, which are linked below.
> 
> This is an extraordinary way to destroy everything.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-log-recovery
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 01/19] xfs: complain when we don't recognize the log item type
  2020-04-22  2:06 ` [PATCH 01/19] xfs: complain when we don't recognize the log item type Darrick J. Wong
  2020-04-22 16:17   ` Brian Foster
@ 2020-04-25 17:42   ` Christoph Hellwig
  2020-04-27 17:55     ` Darrick J. Wong
  1 sibling, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 17:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:06:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're sorting recovered log items ahead of recovering them and
> encounter a log item of unknown type, actually print the type code when
> we're rejecting the whole transaction to aid in debugging.

The subject seems wrong - we already complain, we are just not very
specific what we complain about as you mention in the actual commit log.

With a fixed subject:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure
  2020-04-22  2:06 ` [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure Darrick J. Wong
@ 2020-04-25 18:13   ` Christoph Hellwig
  2020-04-27 22:04     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> +
> +/* Sorting hat for log items as they're read in. */
> +enum xlog_recover_reorder {
> +	XLOG_REORDER_UNKNOWN,
> +	XLOG_REORDER_BUFFER_LIST,
> +	XLOG_REORDER_CANCEL_LIST,
> +	XLOG_REORDER_INODE_BUFFER_LIST,
> +	XLOG_REORDER_INODE_LIST,

XLOG_REORDER_INODE_LIST seems a bit misnamed as it really is the
"misc" or "no reorder" list.  I guess the naming comes from the
local inode_list variable, but maybe we need to fix that as well?

> +typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
> +		struct xlog_recover_item *item);

This typedef doesn't actually seem to help with anything (neither
with just thіs patch nor the final tree).

> +
> +struct xlog_recover_item_type {
> +	/*
> +	 * These two items decide how to sort recovered log items during
> +	 * recovery.  If reorder_fn is non-NULL it will be called; otherwise,
> +	 * reorder will be used to decide.  See the comment above
> +	 * xlog_recover_reorder_trans for more details about what the values
> +	 * mean.
> +	 */
> +	enum xlog_recover_reorder	reorder;
> +	xlog_recover_reorder_fn		reorder_fn;

I'd just use reorder_fn and skip the simple field.  Just one way to do
things even if it adds a tiny amount of boilerplate code.

> +	case XFS_LI_INODE:
> +		return &xlog_inode_item_type;
> +	case XFS_LI_DQUOT:
> +		return &xlog_dquot_item_type;
> +	case XFS_LI_QUOTAOFF:
> +		return &xlog_quotaoff_item_type;
> +	case XFS_LI_IUNLINK:
> +		/* Not implemented? */

Not implemented!  I think we need a prep patch to remove this first.

> @@ -1851,41 +1890,34 @@ xlog_recover_reorder_trans(
>  
>  	list_splice_init(&trans->r_itemq, &sort_list);
>  	list_for_each_entry_safe(item, n, &sort_list, ri_list) {
> -		xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
> +		enum xlog_recover_reorder	fate = XLOG_REORDER_UNKNOWN;
> +
> +		item->ri_type = xlog_item_for_type(ITEM_TYPE(item));

I wonder if just passing the whole item to xlog_item_for_type would
make more sense.  It would then need a different name, of course.

> +		if (item->ri_type) {
> +			if (item->ri_type->reorder_fn)
> +				fate = item->ri_type->reorder_fn(item);
> +			else
> +				fate = item->ri_type->reorder;
> +		}

I think for the !item->ri_type we should immediately jump to what
currently is the XLOG_REORDER_UNKNOWN case, and thus avoid even
adding XLOG_REORDER_UNKNOWN to the enum.  The added benefit is that
any item without a reorder_fn could then be treated as on what
currently is the inode_list, but needs a btter name.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions
  2020-04-22  2:06 ` [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions Darrick J. Wong
@ 2020-04-25 18:19   ` Christoph Hellwig
  2020-04-28 20:54     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:06:21PM -0700, Darrick J. Wong wrote:
>  typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
>  		struct xlog_recover_item *item);
> +typedef void (*xlog_recover_ra_pass2_fn)(struct xlog *log,
> +		struct xlog_recover_item *item);

Same comment for this one - or the other two following.

> +/*
> + * This structure is used during recovery to record the buf log items which
> + * have been canceled and should not be replayed.
> + */
> +struct xfs_buf_cancel {
> +	xfs_daddr_t		bc_blkno;
> +	uint			bc_len;
> +	int			bc_refcount;
> +	struct list_head	bc_list;
> +};
> +
> +struct xfs_buf_cancel *xlog_peek_buffer_cancelled(struct xlog *log,
> +		xfs_daddr_t blkno, uint len, unsigned short flags);

None of the callers moved in this patch even needs the xfs_buf_cancel
structure, it just uses the return value as a boolean.  I think they
all should be switched to use xlog_check_buffer_cancelled instead, which
means that struct xfs_buf_cancel and xlog_peek_buffer_cancelled can stay
private in xfs_log_recovery.c (and later xfs_buf_item.c).

> +STATIC void
> +xlog_recover_buffer_ra_pass2(
> +	struct xlog                     *log,
> +	struct xlog_recover_item        *item)
> +{
> +	struct xfs_buf_log_format	*buf_f = item->ri_buf[0].i_addr;
> +	struct xfs_mount		*mp = log->l_mp;
> +
> +	if (xlog_peek_buffer_cancelled(log, buf_f->blf_blkno,
> +			buf_f->blf_len, buf_f->blf_flags)) {
> +		return;
> +	}
> +
> +	xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
> +				buf_f->blf_len, NULL);

Why not:

	if (!xlog_peek_buffer_cancelled(log, buf_f->blf_blkno,
			buf_f->blf_len, buf_f->blf_flags)) {
		xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
				buf_f->blf_len, NULL);
	}

> +STATIC void
> +xlog_recover_dquot_ra_pass2(
> +	struct xlog			*log,
> +	struct xlog_recover_item	*item)
> +{
> +	struct xfs_mount	*mp = log->l_mp;
> +	struct xfs_disk_dquot	*recddq;
> +	struct xfs_dq_logformat	*dq_f;
> +	uint			type;
> +	int			len;
> +
> +
> +	if (mp->m_qflags == 0)

Double empty line above.

> +	if (xlog_peek_buffer_cancelled(log, dq_f->qlf_blkno, len, 0))
> +		return;
> +
> +	xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, len,
> +			  &xfs_dquot_buf_ra_ops);

Same comment as above, no real need for the early return here.

> +	if (xlog_peek_buffer_cancelled(log, ilfp->ilf_blkno, ilfp->ilf_len, 0))
> +		return;
> +
> +	xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
> +				ilfp->ilf_len, &xfs_inode_buf_ra_ops);

Here again.

> -	unsigned short			flags)
> +	unsigned short		flags)

Spurious whitespace change.

>  		case XLOG_RECOVER_PASS2:
> -			xlog_recover_ra_pass2(log, item);
> +			if (item->ri_type && item->ri_type->ra_pass2_fn)
> +				item->ri_type->ra_pass2_fn(log, item);

Shouldn't we ensure eatly on that we always have a valid ->ri_type?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/19] xfs: refactor log recovery item dispatch for pass1 commit functions
  2020-04-22  2:06 ` [PATCH 04/19] xfs: refactor log recovery item dispatch for pass1 commit functions Darrick J. Wong
@ 2020-04-25 18:20   ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> +	xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;

Please switch to the struct types instead of typedefs for all moved
code.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 07/19] xfs: refactor log recovery intent item dispatch for pass2 commit functions
  2020-04-22  2:06 ` [PATCH 07/19] xfs: refactor log recovery intent " Darrick J. Wong
@ 2020-04-25 18:24   ` Christoph Hellwig
  2020-04-28 22:42     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:06:48PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move the log intent item pass2 commit code into the per-item source code
> files and use the dispatch function to call it.  We do these one at a
> time because there's a lot of code to move.  No functional changes.

This commit log doesn't really match what is going on, as no move
beween files happens.  And the changes themselves look pretty odd
as well.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-22  2:07 ` [PATCH 11/19] xfs: refactor EFI log item recovery dispatch Darrick J. Wong
@ 2020-04-25 18:28   ` Christoph Hellwig
  2020-04-28 22:41     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:07:13PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move the extent free intent and intent-done log recovery code into the
> per-item source code files and use dispatch functions to call them.  We
> do these one at a time because there's a lot of code to move.  No
> functional changes.

What is the reason for splitting xlog_recover_item_type vs
xlog_recover_intent_type?  To me it would seem more logical to have
one operation vector, with some ops only set for intents.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 12/19] xfs: refactor RUI log item recovery dispatch
  2020-04-22  2:07 ` [PATCH 12/19] xfs: refactor RUI " Darrick J. Wong
@ 2020-04-25 18:28   ` Christoph Hellwig
  2020-04-28 22:40     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:07:19PM -0700, Darrick J. Wong wrote:
> +const struct xlog_recover_intent_type xlog_recover_rmap_type = {
> +	.recover_intent		= xlog_recover_rui,
> +	.recover_done		= xlog_recover_rud,
> +	.process_intent		= xlog_recover_process_rui,
> +	.cancel_intent		= xlog_recover_cancel_rui,

Can you make sure your method instance names are always
someprefix_actual_method_name?  That makes life so much easier when
looking for all instances.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/19] xfs: refactor releasing finished intents during log recovery
  2020-04-22  2:07 ` [PATCH 15/19] xfs: refactor releasing finished intents during log recovery Darrick J. Wong
@ 2020-04-25 18:34   ` Christoph Hellwig
  2020-04-28 22:40     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> +STATIC bool
> +xlog_release_bui(
> +	struct xlog		*log,
> +	struct xfs_log_item	*lip,
> +	uint64_t		intent_id)
> +{
> +	struct xfs_bui_log_item	*buip = BUI_ITEM(lip);
> +	struct xfs_ail		*ailp = log->l_ailp;
> +
> +	if (buip->bui_format.bui_id == intent_id) {
> +		/*
> +		 * Drop the BUD reference to the BUI. This
> +		 * removes the BUI from the AIL and frees it.
> +		 */
> +		spin_unlock(&ailp->ail_lock);
> +		xfs_bui_release(buip);
> +		spin_lock(&ailp->ail_lock);
> +		return true;
> +	}
> +
> +	return false;

Early returns for the no match case would seem a little more clear for
all the callbacks.  Also the boilerplate comments in all the callback
seem rather pointless.  If you think they have enough value I'd
consolidate that information once in the xlog_recover_release_intent
description.

> +	spin_lock(&ailp->ail_lock);
> +	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> +	while (lip != NULL) {
> +		if (lip->li_type == intent_type && fn(log, lip, intent_id))
> +			break;
> +		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> +	}

What about:

	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip;
	     lip = xfs_trans_ail_cursor_next(ailp, &cur) {
		if (lip->li_type != intent_type)
			continue;
		if (fn(log, lip, intent_id))
			break;
	}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 16/19] xfs: refactor adding recovered intent items to the log
  2020-04-22  2:07 ` [PATCH 16/19] xfs: refactor adding recovered intent items to the log Darrick J. Wong
@ 2020-04-25 18:34   ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:07:46PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> During recovery, every intent that we recover from the log has to be
> added to the AIL.  Replace the open-coded addition with a helper.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 17/19] xfs: hoist the ail unlock/lock cycle when cancelling intents during recovery
  2020-04-22  2:07 ` [PATCH 17/19] xfs: hoist the ail unlock/lock cycle when cancelling intents during recovery Darrick J. Wong
@ 2020-04-25 18:35   ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:07:52PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move the spin_unlock/spin_lock of the ail lock up to
> xlog_recover_cancel_intents so that the individual ->cancel_intent
> functions don't have to do that anymore.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file
  2020-04-22  2:08 ` [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file Darrick J. Wong
@ 2020-04-25 18:36   ` Christoph Hellwig
  2020-04-28 22:38     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-25 18:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 07:08:05PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move xlog_recover_intent_pass2 up in the file so that we don't have to
> have a forward declaration of a static function.

FYI, I'd expect IS_INTENT and IS_INTENT_DONE to simply be flags in the
xlog_recover_item_type structure.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 01/19] xfs: complain when we don't recognize the log item type
  2020-04-25 17:42   ` Christoph Hellwig
@ 2020-04-27 17:55     ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-27 17:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 10:42:10AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 21, 2020 at 07:06:09PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we're sorting recovered log items ahead of recovering them and
> > encounter a log item of unknown type, actually print the type code when
> > we're rejecting the whole transaction to aid in debugging.
> 
> The subject seems wrong - we already complain, we are just not very
> specific what we complain about as you mention in the actual commit log.
> 
> With a fixed subject:

ok, changed to:
"xfs: report unrecognized log item type codes during recovery"

--D

> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure
  2020-04-25 18:13   ` Christoph Hellwig
@ 2020-04-27 22:04     ` Darrick J. Wong
  2020-04-28  5:11       ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-27 22:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:13:07AM -0700, Christoph Hellwig wrote:
> > +
> > +/* Sorting hat for log items as they're read in. */
> > +enum xlog_recover_reorder {
> > +	XLOG_REORDER_UNKNOWN,
> > +	XLOG_REORDER_BUFFER_LIST,
> > +	XLOG_REORDER_CANCEL_LIST,
> > +	XLOG_REORDER_INODE_BUFFER_LIST,
> > +	XLOG_REORDER_INODE_LIST,
> 
> XLOG_REORDER_INODE_LIST seems a bit misnamed as it really is the
> "misc" or "no reorder" list.  I guess the naming comes from the
> local inode_list variable, but maybe we need to fix that as well?

Yes, thanks for the series fixing that.

> > +typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
> > +		struct xlog_recover_item *item);
> 
> This typedef doesn't actually seem to help with anything (neither
> with just thіs patch nor the final tree).

Fair enough.

> > +
> > +struct xlog_recover_item_type {
> > +	/*
> > +	 * These two items decide how to sort recovered log items during
> > +	 * recovery.  If reorder_fn is non-NULL it will be called; otherwise,
> > +	 * reorder will be used to decide.  See the comment above
> > +	 * xlog_recover_reorder_trans for more details about what the values
> > +	 * mean.
> > +	 */
> > +	enum xlog_recover_reorder	reorder;
> > +	xlog_recover_reorder_fn		reorder_fn;
> 
> I'd just use reorder_fn and skip the simple field.  Just one way to do
> things even if it adds a tiny amount of boilerplate code.

<nod>

> > +	case XFS_LI_INODE:
> > +		return &xlog_inode_item_type;
> > +	case XFS_LI_DQUOT:
> > +		return &xlog_dquot_item_type;
> > +	case XFS_LI_QUOTAOFF:
> > +		return &xlog_quotaoff_item_type;
> > +	case XFS_LI_IUNLINK:
> > +		/* Not implemented? */
> 
> Not implemented!  I think we need a prep patch to remove this first.

The thing I can't tell is if XFS_LI_IUNLINK is a code point reserved
from some long-ago log item that fell out, or reserved for some future
project?

Either way, this case doesn't need to be there.

> > @@ -1851,41 +1890,34 @@ xlog_recover_reorder_trans(
> >  
> >  	list_splice_init(&trans->r_itemq, &sort_list);
> >  	list_for_each_entry_safe(item, n, &sort_list, ri_list) {
> > -		xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
> > +		enum xlog_recover_reorder	fate = XLOG_REORDER_UNKNOWN;
> > +
> > +		item->ri_type = xlog_item_for_type(ITEM_TYPE(item));
> 
> I wonder if just passing the whole item to xlog_item_for_type would
> make more sense.  It would then need a different name, of course.

xlog_set_item_type(item); yes.

> > +		if (item->ri_type) {
> > +			if (item->ri_type->reorder_fn)
> > +				fate = item->ri_type->reorder_fn(item);
> > +			else
> > +				fate = item->ri_type->reorder;
> > +		}
> 
> I think for the !item->ri_type we should immediately jump to what
> currently is the XLOG_REORDER_UNKNOWN case, and thus avoid even
> adding XLOG_REORDER_UNKNOWN to the enum.  The added benefit is that
> any item without a reorder_fn could then be treated as on what
> currently is the inode_list, but needs a btter name.

Ok.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure
  2020-04-27 22:04     ` Darrick J. Wong
@ 2020-04-28  5:11       ` Christoph Hellwig
  2020-04-28 20:46         ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-28  5:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Mon, Apr 27, 2020 at 03:04:12PM -0700, Darrick J. Wong wrote:
> > > +	case XFS_LI_INODE:
> > > +		return &xlog_inode_item_type;
> > > +	case XFS_LI_DQUOT:
> > > +		return &xlog_dquot_item_type;
> > > +	case XFS_LI_QUOTAOFF:
> > > +		return &xlog_quotaoff_item_type;
> > > +	case XFS_LI_IUNLINK:
> > > +		/* Not implemented? */
> > 
> > Not implemented!  I think we need a prep patch to remove this first.
> 
> The thing I can't tell is if XFS_LI_IUNLINK is a code point reserved
> from some long-ago log item that fell out, or reserved for some future
> project?
> 
> Either way, this case doesn't need to be there.

The addition goes back to:

commit 1588194c4a13b36badf2f55c022fc4487633f746
Author: Adam Sweeney <ajs@sgi.com>
Date:   Fri Feb 25 02:02:01 1994 +0000

    Add new prototypes and log item types.


In the imported tree, which only added the definition, and no code
related to it.  I can't find any code related to it either in random
checkpoints.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-22 16:18 ` [PATCH 00/19] xfs: refactor log recovery Brian Foster
@ 2020-04-28  6:12   ` Christoph Hellwig
  2020-04-28 12:43     ` Brian Foster
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-28  6:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, linux-xfs

On Wed, Apr 22, 2020 at 12:18:54PM -0400, Brian Foster wrote:
> - Transaction reorder
> 
> Virtualizing the transaction reorder across all several files/types
> strikes me as overkill for several reasons. From a code standpoint,
> we've created a new type enumeration and a couple fields (enum type and
> a function) in a generic structure to essentially abstract out the
> buffer handling into a function. The latter checks another couple of blf
> flags, which appears to be the only real "type specific" logic in the
> whole sequence. From a complexity standpoint, the reorder operation is a
> fairly low level and internal recovery operation. We have this huge
> comment just to explain exactly what's happening and why certain items
> have to be ordered as such, or some treated like others, etc. TBH it's
> not terribly clear even with that documentation, so I don't know that
> splitting the associated mapping logic off into separate files is
> helpful.

I actually very much like idea of moving any knowledge of the individual
item types out of xfs_log_recovery.c.  In reply to the patch I've
suggsted an idea how to kill the knowledge for all but the buffer and
icreate items, which should make this a little more sensible.

I actually think we should go further in one aspect - instead of having
the item type to ops mapping in a single function in xfs_log_recovery.c
we should have a table that the items can just add themselves to.

> - Readahead
> 
> We end up with readahead callouts for only the types that translate to
> buffers (so buffers, inode, dquots), and then those callouts do some
> type specific mapping (that is duplicated within the specific type
> handers) and issue a readahead (which is duplicated across each ra_pass2
> call). I wonder if this would be better abstracted by a ->bmap() like
> call that simply maps the item to a [block,length] and returns a
> non-zero length if the core recovery code should invoke readahead (after
> checking for cancellation). It looks like the underlying implementation
> of those bmap calls could be further factored into helpers that
> translate from the raw record data into the type specific format
> structures, and that could reduce duplication between the readahead
> calls and the pass2 calls in a couple cases. (The more I think about,
> the more I think we should introduce those kind of cleanups before
> getting into the need for function pointers.)

That sounds more complicated what we have right now, and even more so
with my little xlog_buf_readahead helper.  Yes, the methods will all
just call xlog_buf_readahead, but they are trivial two-liners that are
easy to understand.  Much easier than a complicated calling convention
to pass the blkno, len and buf ops back.

> - Recovery (pass1/pass2)
> 
> The core recovery bits seem more reasonable to factor out in general.
> That said, we only have two pass1 callbacks (buffers and quotaoff). The
> buffer callback does cancellation management and the quotaoff sets some
> flags, so I wonder why those couldn't just remain as direct function
> calls (even if we move the functions out of xfs_log_recover.c). There
> are more callbacks for pass2 so the function pointers make a bit more
> sense there, but at the same time it looks like the various intents are
> further abstracted behind a single "intent type" pass2 call (which has a
> hardcoded XLOG_REORDER_INODE_LIST reorder value and is about as clear as
> mud in that context, getting to my earlier point).

Again I actually like the callouts, mostly because they make it pretty
clear what is going on.  I also really like the fact that the recovery
code is close to the code actually writing the log items.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
                   ` (19 preceding siblings ...)
  2020-04-22 16:18 ` [PATCH 00/19] xfs: refactor log recovery Brian Foster
@ 2020-04-28  6:22 ` Christoph Hellwig
  2020-04-28 22:28   ` Darrick J. Wong
  20 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-28  6:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

With all patches applied we can also drop several includes in
xfs_log_recovery.c:

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index b210457d6ba23..9250c29193a71 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -18,21 +18,13 @@
 #include "xfs_log.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
-#include "xfs_inode_item.h"
-#include "xfs_extfree_item.h"
 #include "xfs_trans_priv.h"
 #include "xfs_alloc.h"
 #include "xfs_ialloc.h"
-#include "xfs_quota.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
-#include "xfs_bmap_btree.h"
 #include "xfs_error.h"
-#include "xfs_dir2.h"
-#include "xfs_rmap_item.h"
 #include "xfs_buf_item.h"
-#include "xfs_refcount_item.h"
-#include "xfs_bmap_item.h"
 
 #define BLK_AVG(blk1, blk2)	((blk1+blk2) >> 1)
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-28  6:12   ` Christoph Hellwig
@ 2020-04-28 12:43     ` Brian Foster
  2020-04-28 22:34       ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Brian Foster @ 2020-04-28 12:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs

On Mon, Apr 27, 2020 at 11:12:08PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 22, 2020 at 12:18:54PM -0400, Brian Foster wrote:
> > - Transaction reorder
> > 
> > Virtualizing the transaction reorder across all several files/types
> > strikes me as overkill for several reasons. From a code standpoint,
> > we've created a new type enumeration and a couple fields (enum type and
> > a function) in a generic structure to essentially abstract out the
> > buffer handling into a function. The latter checks another couple of blf
> > flags, which appears to be the only real "type specific" logic in the
> > whole sequence. From a complexity standpoint, the reorder operation is a
> > fairly low level and internal recovery operation. We have this huge
> > comment just to explain exactly what's happening and why certain items
> > have to be ordered as such, or some treated like others, etc. TBH it's
> > not terribly clear even with that documentation, so I don't know that
> > splitting the associated mapping logic off into separate files is
> > helpful.
> 
> I actually very much like idea of moving any knowledge of the individual
> item types out of xfs_log_recovery.c.  In reply to the patch I've
> suggsted an idea how to kill the knowledge for all but the buffer and
> icreate items, which should make this a little more sensible.
> 

I mentioned to Darrick the other day briefly on IRC that I don't
fundamentally object to splitting up xfs_log_recover.c. I just think
this mechanical split out of the existing code includes too much of the
implementation details of recovery and perhaps abstracts a bit too much.
I find the general idea much more acceptable with preliminary cleanups
and a more simple interface.

> I actually think we should go further in one aspect - instead of having
> the item type to ops mapping in a single function in xfs_log_recovery.c
> we should have a table that the items can just add themselves to.
> 

That sounds reasonable, but that's more about abstraction mechanism than
defining the interface. I was more focused on simplifying the latter in
my previous comments.

> > - Readahead
> > 
> > We end up with readahead callouts for only the types that translate to
> > buffers (so buffers, inode, dquots), and then those callouts do some
> > type specific mapping (that is duplicated within the specific type
> > handers) and issue a readahead (which is duplicated across each ra_pass2
> > call). I wonder if this would be better abstracted by a ->bmap() like
> > call that simply maps the item to a [block,length] and returns a
> > non-zero length if the core recovery code should invoke readahead (after
> > checking for cancellation). It looks like the underlying implementation
> > of those bmap calls could be further factored into helpers that
> > translate from the raw record data into the type specific format
> > structures, and that could reduce duplication between the readahead
> > calls and the pass2 calls in a couple cases. (The more I think about,
> > the more I think we should introduce those kind of cleanups before
> > getting into the need for function pointers.)
> 
> That sounds more complicated what we have right now, and even more so
> with my little xlog_buf_readahead helper.  Yes, the methods will all
> just call xlog_buf_readahead, but they are trivial two-liners that are
> easy to understand.  Much easier than a complicated calling convention
> to pass the blkno, len and buf ops back.
> 

Ok. The above was just an idea to simplify things vs. duplicating
readahead code and recovery logic N times. I haven't seen your
idea/code, but if that problem is addressed with a helper vs. a
different interface then that seems just as reasonable to me.

> > - Recovery (pass1/pass2)
> > 
> > The core recovery bits seem more reasonable to factor out in general.
> > That said, we only have two pass1 callbacks (buffers and quotaoff). The
> > buffer callback does cancellation management and the quotaoff sets some
> > flags, so I wonder why those couldn't just remain as direct function
> > calls (even if we move the functions out of xfs_log_recover.c). There
> > are more callbacks for pass2 so the function pointers make a bit more
> > sense there, but at the same time it looks like the various intents are
> > further abstracted behind a single "intent type" pass2 call (which has a
> > hardcoded XLOG_REORDER_INODE_LIST reorder value and is about as clear as
> > mud in that context, getting to my earlier point).
> 
> Again I actually like the callouts, mostly because they make it pretty
> clear what is going on.  I also really like the fact that the recovery
> code is close to the code actually writing the log items.
> 

I find both the runtime logging and recovery code to be complex enough
individually that I prefer not to stuff them together, but there is
already precedent with dfops and such so that's not the biggest deal to
me if the interface is simplified (and hopefully amount of code
reduced).

Brian


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure
  2020-04-28  5:11       ` Christoph Hellwig
@ 2020-04-28 20:46         ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Apr 27, 2020 at 10:11:54PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 27, 2020 at 03:04:12PM -0700, Darrick J. Wong wrote:
> > > > +	case XFS_LI_INODE:
> > > > +		return &xlog_inode_item_type;
> > > > +	case XFS_LI_DQUOT:
> > > > +		return &xlog_dquot_item_type;
> > > > +	case XFS_LI_QUOTAOFF:
> > > > +		return &xlog_quotaoff_item_type;
> > > > +	case XFS_LI_IUNLINK:
> > > > +		/* Not implemented? */
> > > 
> > > Not implemented!  I think we need a prep patch to remove this first.
> > 
> > The thing I can't tell is if XFS_LI_IUNLINK is a code point reserved
> > from some long-ago log item that fell out, or reserved for some future
> > project?
> > 
> > Either way, this case doesn't need to be there.
> 
> The addition goes back to:
> 
> commit 1588194c4a13b36badf2f55c022fc4487633f746
> Author: Adam Sweeney <ajs@sgi.com>
> Date:   Fri Feb 25 02:02:01 1994 +0000
> 
>     Add new prototypes and log item types.
> 
> 
> In the imported tree, which only added the definition, and no code
> related to it.  I can't find any code related to it either in random
> checkpoints.

<nod> Dave told me on IRC that it was added in 1994 but never used, so
it's safe to get rid of it entirely.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions
  2020-04-25 18:19   ` Christoph Hellwig
@ 2020-04-28 20:54     ` Darrick J. Wong
  2020-04-29  6:07       ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:19:29AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 21, 2020 at 07:06:21PM -0700, Darrick J. Wong wrote:
> >  typedef enum xlog_recover_reorder (*xlog_recover_reorder_fn)(
> >  		struct xlog_recover_item *item);
> > +typedef void (*xlog_recover_ra_pass2_fn)(struct xlog *log,
> > +		struct xlog_recover_item *item);
> 
> Same comment for this one - or the other two following.

<nod> I'll skip the typedefs.

> > +/*
> > + * This structure is used during recovery to record the buf log items which
> > + * have been canceled and should not be replayed.
> > + */
> > +struct xfs_buf_cancel {
> > +	xfs_daddr_t		bc_blkno;
> > +	uint			bc_len;
> > +	int			bc_refcount;
> > +	struct list_head	bc_list;
> > +};
> > +
> > +struct xfs_buf_cancel *xlog_peek_buffer_cancelled(struct xlog *log,
> > +		xfs_daddr_t blkno, uint len, unsigned short flags);
> 
> None of the callers moved in this patch even needs the xfs_buf_cancel
> structure, it just uses the return value as a boolean.  I think they
> all should be switched to use xlog_check_buffer_cancelled instead, which
> means that struct xfs_buf_cancel and xlog_peek_buffer_cancelled can stay
> private in xfs_log_recovery.c (and later xfs_buf_item.c).

They now all use your new xlog_buf_readahead function.

> > +STATIC void
> > +xlog_recover_buffer_ra_pass2(
> > +	struct xlog                     *log,
> > +	struct xlog_recover_item        *item)
> > +{
> > +	struct xfs_buf_log_format	*buf_f = item->ri_buf[0].i_addr;
> > +	struct xfs_mount		*mp = log->l_mp;
> > +
> > +	if (xlog_peek_buffer_cancelled(log, buf_f->blf_blkno,
> > +			buf_f->blf_len, buf_f->blf_flags)) {
> > +		return;
> > +	}
> > +
> > +	xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
> > +				buf_f->blf_len, NULL);
> 
> Why not:
> 
> 	if (!xlog_peek_buffer_cancelled(log, buf_f->blf_blkno,
> 			buf_f->blf_len, buf_f->blf_flags)) {
> 		xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
> 				buf_f->blf_len, NULL);
> 	}
> 
> > +STATIC void
> > +xlog_recover_dquot_ra_pass2(
> > +	struct xlog			*log,
> > +	struct xlog_recover_item	*item)
> > +{
> > +	struct xfs_mount	*mp = log->l_mp;
> > +	struct xfs_disk_dquot	*recddq;
> > +	struct xfs_dq_logformat	*dq_f;
> > +	uint			type;
> > +	int			len;
> > +
> > +
> > +	if (mp->m_qflags == 0)
> 
> Double empty line above.
> 
> > +	if (xlog_peek_buffer_cancelled(log, dq_f->qlf_blkno, len, 0))
> > +		return;
> > +
> > +	xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, len,
> > +			  &xfs_dquot_buf_ra_ops);
> 
> Same comment as above, no real need for the early return here.
> 
> > +	if (xlog_peek_buffer_cancelled(log, ilfp->ilf_blkno, ilfp->ilf_len, 0))
> > +		return;
> > +
> > +	xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
> > +				ilfp->ilf_len, &xfs_inode_buf_ra_ops);
> 
> Here again.

Changed to xlog_buf_readahead.

> 
> > -	unsigned short			flags)
> > +	unsigned short		flags)
> 
> Spurious whitespace change.
> 
> >  		case XLOG_RECOVER_PASS2:
> > -			xlog_recover_ra_pass2(log, item);
> > +			if (item->ri_type && item->ri_type->ra_pass2_fn)
> > +				item->ri_type->ra_pass2_fn(log, item);
> 
> Shouldn't we ensure eatly on that we always have a valid ->ri_type?

Item sorting will bail out on unrecognized log item types, in which case
we will discard the transaction (and all its items) and abort the mount,
right?  That should suffice to ensure that we always have a valid
ri_type long before we get to starting readahead for pass 2.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-28  6:22 ` Christoph Hellwig
@ 2020-04-28 22:28   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Apr 27, 2020 at 11:22:42PM -0700, Christoph Hellwig wrote:
> With all patches applied we can also drop several includes in
> xfs_log_recovery.c:

Heh, neat, I'll add that somewhere.

--D

> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index b210457d6ba23..9250c29193a71 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -18,21 +18,13 @@
>  #include "xfs_log.h"
>  #include "xfs_log_priv.h"
>  #include "xfs_log_recover.h"
> -#include "xfs_inode_item.h"
> -#include "xfs_extfree_item.h"
>  #include "xfs_trans_priv.h"
>  #include "xfs_alloc.h"
>  #include "xfs_ialloc.h"
> -#include "xfs_quota.h"
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
> -#include "xfs_bmap_btree.h"
>  #include "xfs_error.h"
> -#include "xfs_dir2.h"
> -#include "xfs_rmap_item.h"
>  #include "xfs_buf_item.h"
> -#include "xfs_refcount_item.h"
> -#include "xfs_bmap_item.h"
>  
>  #define BLK_AVG(blk1, blk2)	((blk1+blk2) >> 1)
>  

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-28 12:43     ` Brian Foster
@ 2020-04-28 22:34       ` Darrick J. Wong
  2020-04-29  6:09         ` Christoph Hellwig
  2020-04-29 11:52         ` Brian Foster
  0 siblings, 2 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Tue, Apr 28, 2020 at 08:43:42AM -0400, Brian Foster wrote:
> On Mon, Apr 27, 2020 at 11:12:08PM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 22, 2020 at 12:18:54PM -0400, Brian Foster wrote:
> > > - Transaction reorder
> > > 
> > > Virtualizing the transaction reorder across all several files/types
> > > strikes me as overkill for several reasons. From a code standpoint,
> > > we've created a new type enumeration and a couple fields (enum type and
> > > a function) in a generic structure to essentially abstract out the
> > > buffer handling into a function. The latter checks another couple of blf
> > > flags, which appears to be the only real "type specific" logic in the
> > > whole sequence. From a complexity standpoint, the reorder operation is a
> > > fairly low level and internal recovery operation. We have this huge
> > > comment just to explain exactly what's happening and why certain items
> > > have to be ordered as such, or some treated like others, etc. TBH it's
> > > not terribly clear even with that documentation, so I don't know that
> > > splitting the associated mapping logic off into separate files is
> > > helpful.
> > 
> > I actually very much like idea of moving any knowledge of the individual
> > item types out of xfs_log_recovery.c.  In reply to the patch I've
> > suggsted an idea how to kill the knowledge for all but the buffer and
> > icreate items, which should make this a little more sensible.
> > 
> 
> I mentioned to Darrick the other day briefly on IRC that I don't
> fundamentally object to splitting up xfs_log_recover.c. I just think
> this mechanical split out of the existing code includes too much of the
> implementation details of recovery and perhaps abstracts a bit too much.
> I find the general idea much more acceptable with preliminary cleanups
> and a more simple interface.

It's cleaned up considerably with hch's cleanup patches 1-5 of 2. ;)

> > I actually think we should go further in one aspect - instead of having
> > the item type to ops mapping in a single function in xfs_log_recovery.c
> > we should have a table that the items can just add themselves to.
> > 
> 
> That sounds reasonable, but that's more about abstraction mechanism than
> defining the interface. I was more focused on simplifying the latter in
> my previous comments.

<nod>

> > > - Readahead
> > > 
> > > We end up with readahead callouts for only the types that translate to
> > > buffers (so buffers, inode, dquots), and then those callouts do some
> > > type specific mapping (that is duplicated within the specific type
> > > handers) and issue a readahead (which is duplicated across each ra_pass2
> > > call). I wonder if this would be better abstracted by a ->bmap() like
> > > call that simply maps the item to a [block,length] and returns a
> > > non-zero length if the core recovery code should invoke readahead (after
> > > checking for cancellation). It looks like the underlying implementation
> > > of those bmap calls could be further factored into helpers that
> > > translate from the raw record data into the type specific format
> > > structures, and that could reduce duplication between the readahead
> > > calls and the pass2 calls in a couple cases. (The more I think about,
> > > the more I think we should introduce those kind of cleanups before
> > > getting into the need for function pointers.)
> > 
> > That sounds more complicated what we have right now, and even more so
> > with my little xlog_buf_readahead helper.  Yes, the methods will all
> > just call xlog_buf_readahead, but they are trivial two-liners that are
> > easy to understand.  Much easier than a complicated calling convention
> > to pass the blkno, len and buf ops back.
> > 
> 
> Ok. The above was just an idea to simplify things vs. duplicating
> readahead code and recovery logic N times. I haven't seen your
> idea/code, but if that problem is addressed with a helper vs. a
> different interface then that seems just as reasonable to me.
> 
> > > - Recovery (pass1/pass2)
> > > 
> > > The core recovery bits seem more reasonable to factor out in general.
> > > That said, we only have two pass1 callbacks (buffers and quotaoff). The
> > > buffer callback does cancellation management and the quotaoff sets some
> > > flags, so I wonder why those couldn't just remain as direct function
> > > calls (even if we move the functions out of xfs_log_recover.c). There
> > > are more callbacks for pass2 so the function pointers make a bit more
> > > sense there, but at the same time it looks like the various intents are
> > > further abstracted behind a single "intent type" pass2 call (which has a
> > > hardcoded XLOG_REORDER_INODE_LIST reorder value and is about as clear as
> > > mud in that context, getting to my earlier point).
> > 
> > Again I actually like the callouts, mostly because they make it pretty
> > clear what is going on.  I also really like the fact that the recovery
> > code is close to the code actually writing the log items.

Looking back at that, I realize that (provided nobody minds having
function dispatch structures that are sort of sparse) there's no reason
why we need to have separate xlog_recover_intent_type and
xlog_recover_item_type structures.

> I find both the runtime logging and recovery code to be complex enough
> individually that I prefer not to stuff them together, but there is
> already precedent with dfops and such so that's not the biggest deal to
> me if the interface is simplified (and hopefully amount of code
> reduced).

I combined them largely on the observation that with the exception of
buffers, log item recovery code is generally short and not worth
creating even more files.  224 is enough.

--D

> Brian
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file
  2020-04-25 18:36   ` Christoph Hellwig
@ 2020-04-28 22:38     ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:36:44AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 21, 2020 at 07:08:05PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Move xlog_recover_intent_pass2 up in the file so that we don't have to
> > have a forward declaration of a static function.
> 
> FYI, I'd expect IS_INTENT and IS_INTENT_DONE to simply be flags in the
> xlog_recover_item_type structure.

If I simply gave every XFS_LI_* type code its own
xlog_recover_item_type, there wouldn't even be a need for that.

Though I dunno, an entire xlog_recover_item_type for each of the done
item types might be a lot of NULL space.  <shrug> Maybe I'm stressing
out too much over 8 * 6 pointers == 400 bytes.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/19] xfs: refactor releasing finished intents during log recovery
  2020-04-25 18:34   ` Christoph Hellwig
@ 2020-04-28 22:40     ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:34:10AM -0700, Christoph Hellwig wrote:
> > +STATIC bool
> > +xlog_release_bui(
> > +	struct xlog		*log,
> > +	struct xfs_log_item	*lip,
> > +	uint64_t		intent_id)
> > +{
> > +	struct xfs_bui_log_item	*buip = BUI_ITEM(lip);
> > +	struct xfs_ail		*ailp = log->l_ailp;
> > +
> > +	if (buip->bui_format.bui_id == intent_id) {
> > +		/*
> > +		 * Drop the BUD reference to the BUI. This
> > +		 * removes the BUI from the AIL and frees it.
> > +		 */
> > +		spin_unlock(&ailp->ail_lock);
> > +		xfs_bui_release(buip);
> > +		spin_lock(&ailp->ail_lock);
> > +		return true;
> > +	}
> > +
> > +	return false;
> 
> Early returns for the no match case would seem a little more clear for
> all the callbacks.

Done.

> Also the boilerplate comments in all the callback
> seem rather pointless.  If you think they have enough value I'd
> consolidate that information once in the xlog_recover_release_intent
> description.

Yeah, I'll move it.

> 
> > +	spin_lock(&ailp->ail_lock);
> > +	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> > +	while (lip != NULL) {
> > +		if (lip->li_type == intent_type && fn(log, lip, intent_id))
> > +			break;
> > +		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> > +	}
> 
> What about:
> 
> 	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip;
> 	     lip = xfs_trans_ail_cursor_next(ailp, &cur) {
> 		if (lip->li_type != intent_type)
> 			continue;
> 		if (fn(log, lip, intent_id))
> 			break;
> 	}

Done.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 12/19] xfs: refactor RUI log item recovery dispatch
  2020-04-25 18:28   ` Christoph Hellwig
@ 2020-04-28 22:40     ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:28:54AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 21, 2020 at 07:07:19PM -0700, Darrick J. Wong wrote:
> > +const struct xlog_recover_intent_type xlog_recover_rmap_type = {
> > +	.recover_intent		= xlog_recover_rui,
> > +	.recover_done		= xlog_recover_rud,
> > +	.process_intent		= xlog_recover_process_rui,
> > +	.cancel_intent		= xlog_recover_cancel_rui,
> 
> Can you make sure your method instance names are always
> someprefix_actual_method_name?  That makes life so much easier when
> looking for all instances.

Will do.

--D


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-25 18:28   ` Christoph Hellwig
@ 2020-04-28 22:41     ` Darrick J. Wong
  2020-04-28 23:45       ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:28:01AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 21, 2020 at 07:07:13PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Move the extent free intent and intent-done log recovery code into the
> > per-item source code files and use dispatch functions to call them.  We
> > do these one at a time because there's a lot of code to move.  No
> > functional changes.
> 
> What is the reason for splitting xlog_recover_item_type vs
> xlog_recover_intent_type?  To me it would seem more logical to have
> one operation vector, with some ops only set for intents.

Partly because I started by refactoring only the intent items, and then
decided to prepend a series to do everything; and partly to be stingy
with bytes. :P

That said, I like your suggestion of every XFS_LI_* code gets its own
xlog_recover_item_type so I'll go do that.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 07/19] xfs: refactor log recovery intent item dispatch for pass2 commit functions
  2020-04-25 18:24   ` Christoph Hellwig
@ 2020-04-28 22:42     ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 22:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sat, Apr 25, 2020 at 11:24:17AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 21, 2020 at 07:06:48PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Move the log intent item pass2 commit code into the per-item source code
> > files and use the dispatch function to call it.  We do these one at a
> > time because there's a lot of code to move.  No functional changes.
> 
> This commit log doesn't really match what is going on, as no move
> beween files happens.  And the changes themselves look pretty odd
> as well.

Yeah.  This patch will look very different once I get rid of
xlog_recover_intent_type.

(Fair warning: I was replying to these two threads in reverse order.)

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-28 22:41     ` Darrick J. Wong
@ 2020-04-28 23:45       ` Darrick J. Wong
  2020-04-29  7:09         ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-28 23:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Apr 28, 2020 at 03:41:32PM -0700, Darrick J. Wong wrote:
> On Sat, Apr 25, 2020 at 11:28:01AM -0700, Christoph Hellwig wrote:
> > On Tue, Apr 21, 2020 at 07:07:13PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Move the extent free intent and intent-done log recovery code into the
> > > per-item source code files and use dispatch functions to call them.  We
> > > do these one at a time because there's a lot of code to move.  No
> > > functional changes.
> > 
> > What is the reason for splitting xlog_recover_item_type vs
> > xlog_recover_intent_type?  To me it would seem more logical to have
> > one operation vector, with some ops only set for intents.
> 
> Partly because I started by refactoring only the intent items, and then
> decided to prepend a series to do everything; and partly to be stingy
> with bytes. :P
> 
> That said, I like your suggestion of every XFS_LI_* code gets its own
> xlog_recover_item_type so I'll go do that.

Aha, now I remember why those two are separate types -- the
process_intent and cancel_intent functions operate on the xfs_log_item
that gets created from the xlog_recover_item that we pulled out of the
log, whereas the other functions are called directly on the
xlog_recovery_item.  There's no direct link between the log item and the
recovery log item, nor is there a good way to link through their
dispatch functions.

The recover_intent and recover_done functions can certainly become
commit_pass2 functions of various xlog_recover_item_type structures, but
that doesn't totally eliminate the need for xlog_recover_intent_type.

--D

> --D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions
  2020-04-28 20:54     ` Darrick J. Wong
@ 2020-04-29  6:07       ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-29  6:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Apr 28, 2020 at 01:54:24PM -0700, Darrick J. Wong wrote:
> > > +			if (item->ri_type && item->ri_type->ra_pass2_fn)
> > > +				item->ri_type->ra_pass2_fn(log, item);
> > 
> > Shouldn't we ensure eatly on that we always have a valid ->ri_type?
> 
> Item sorting will bail out on unrecognized log item types, in which case
> we will discard the transaction (and all its items) and abort the mount,
> right?  That should suffice to ensure that we always have a valid
> ri_type long before we get to starting readahead for pass 2.

Yes, I think so.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-28 22:34       ` Darrick J. Wong
@ 2020-04-29  6:09         ` Christoph Hellwig
  2020-04-29 11:52         ` Brian Foster
  1 sibling, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-29  6:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Christoph Hellwig, linux-xfs

On Tue, Apr 28, 2020 at 03:34:22PM -0700, Darrick J. Wong wrote:
> > I find both the runtime logging and recovery code to be complex enough
> > individually that I prefer not to stuff them together, but there is
> > already precedent with dfops and such so that's not the biggest deal to
> > me if the interface is simplified (and hopefully amount of code
> > reduced).
> 
> I combined them largely on the observation that with the exception of
> buffers, log item recovery code is generally short and not worth
> creating even more files.  224 is enough.

True, the buffer items are a fair amount of code, also called into
from say the inode buffers.  Maybe we need a new xfs_buf_item_recovery.c
file in that case?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-28 23:45       ` Darrick J. Wong
@ 2020-04-29  7:09         ` Christoph Hellwig
  2020-04-29  7:18           ` Christoph Hellwig
  2020-04-29 14:20           ` Darrick J. Wong
  0 siblings, 2 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-29  7:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Apr 28, 2020 at 04:45:57PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 28, 2020 at 03:41:32PM -0700, Darrick J. Wong wrote:
> > On Sat, Apr 25, 2020 at 11:28:01AM -0700, Christoph Hellwig wrote:
> > > On Tue, Apr 21, 2020 at 07:07:13PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Move the extent free intent and intent-done log recovery code into the
> > > > per-item source code files and use dispatch functions to call them.  We
> > > > do these one at a time because there's a lot of code to move.  No
> > > > functional changes.
> > > 
> > > What is the reason for splitting xlog_recover_item_type vs
> > > xlog_recover_intent_type?  To me it would seem more logical to have
> > > one operation vector, with some ops only set for intents.
> > 
> > Partly because I started by refactoring only the intent items, and then
> > decided to prepend a series to do everything; and partly to be stingy
> > with bytes. :P
> > 
> > That said, I like your suggestion of every XFS_LI_* code gets its own
> > xlog_recover_item_type so I'll go do that.
> 
> Aha, now I remember why those two are separate types -- the
> process_intent and cancel_intent functions operate on the xfs_log_item
> that gets created from the xlog_recover_item that we pulled out of the
> log, whereas the other functions are called directly on the
> xlog_recovery_item.  There's no direct link between the log item and the
> recovery log item, nor is there a good way to link through their
> dispatch functions.

Maybe those should move to xfs_item_ops as they operate on a "live"
xfs_log_item? (they'd need to grow names clearly related to recovery
of course).  In fact except for slightly different calling convention
->cancel_intent already seems to be identical to ->abort_intent in
xfs_item_ops, so that would be one off the list.

Btw, it seems like we should drop the ail_lock before calling
->process_intent as all instances do that anyway, and it would keep
the locking a little more centralized, and it will allow killing
one pointless wrapper in each instance.  Maybe we can also move
the recovered flag to the generic log item flags?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-29  7:09         ` Christoph Hellwig
@ 2020-04-29  7:18           ` Christoph Hellwig
  2020-04-29 14:20           ` Darrick J. Wong
  1 sibling, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2020-04-29  7:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Wed, Apr 29, 2020 at 12:09:16AM -0700, Christoph Hellwig wrote:
> Maybe those should move to xfs_item_ops as they operate on a "live"
> xfs_log_item? (they'd need to grow names clearly related to recovery
> of course).  In fact except for slightly different calling convention
> ->cancel_intent already seems to be identical to ->abort_intent in
> xfs_item_ops, so that would be one off the list.

Actually abort_intent is in xfs_defer_op_type, so we can't share it
easily.  But at least in another step we could refactor them to have
the same prototype :)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-28 22:34       ` Darrick J. Wong
  2020-04-29  6:09         ` Christoph Hellwig
@ 2020-04-29 11:52         ` Brian Foster
  2020-04-29 14:22           ` Darrick J. Wong
  1 sibling, 1 reply; 56+ messages in thread
From: Brian Foster @ 2020-04-29 11:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Apr 28, 2020 at 03:34:22PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 28, 2020 at 08:43:42AM -0400, Brian Foster wrote:
> > On Mon, Apr 27, 2020 at 11:12:08PM -0700, Christoph Hellwig wrote:
> > > On Wed, Apr 22, 2020 at 12:18:54PM -0400, Brian Foster wrote:
> > > > - Transaction reorder
> > > > 
> > > > Virtualizing the transaction reorder across all several files/types
> > > > strikes me as overkill for several reasons. From a code standpoint,
> > > > we've created a new type enumeration and a couple fields (enum type and
> > > > a function) in a generic structure to essentially abstract out the
> > > > buffer handling into a function. The latter checks another couple of blf
> > > > flags, which appears to be the only real "type specific" logic in the
> > > > whole sequence. From a complexity standpoint, the reorder operation is a
> > > > fairly low level and internal recovery operation. We have this huge
> > > > comment just to explain exactly what's happening and why certain items
> > > > have to be ordered as such, or some treated like others, etc. TBH it's
> > > > not terribly clear even with that documentation, so I don't know that
> > > > splitting the associated mapping logic off into separate files is
> > > > helpful.
> > > 
> > > I actually very much like idea of moving any knowledge of the individual
> > > item types out of xfs_log_recovery.c.  In reply to the patch I've
> > > suggsted an idea how to kill the knowledge for all but the buffer and
> > > icreate items, which should make this a little more sensible.
> > > 
> > 
> > I mentioned to Darrick the other day briefly on IRC that I don't
> > fundamentally object to splitting up xfs_log_recover.c. I just think
> > this mechanical split out of the existing code includes too much of the
> > implementation details of recovery and perhaps abstracts a bit too much.
> > I find the general idea much more acceptable with preliminary cleanups
> > and a more simple interface.
> 
> It's cleaned up considerably with hch's cleanup patches 1-5 of 2. ;)
> 
> > > I actually think we should go further in one aspect - instead of having
> > > the item type to ops mapping in a single function in xfs_log_recovery.c
> > > we should have a table that the items can just add themselves to.
> > > 
> > 
> > That sounds reasonable, but that's more about abstraction mechanism than
> > defining the interface. I was more focused on simplifying the latter in
> > my previous comments.
> 
> <nod>
> 
> > > > - Readahead
> > > > 
> > > > We end up with readahead callouts for only the types that translate to
> > > > buffers (so buffers, inode, dquots), and then those callouts do some
> > > > type specific mapping (that is duplicated within the specific type
> > > > handers) and issue a readahead (which is duplicated across each ra_pass2
> > > > call). I wonder if this would be better abstracted by a ->bmap() like
> > > > call that simply maps the item to a [block,length] and returns a
> > > > non-zero length if the core recovery code should invoke readahead (after
> > > > checking for cancellation). It looks like the underlying implementation
> > > > of those bmap calls could be further factored into helpers that
> > > > translate from the raw record data into the type specific format
> > > > structures, and that could reduce duplication between the readahead
> > > > calls and the pass2 calls in a couple cases. (The more I think about,
> > > > the more I think we should introduce those kind of cleanups before
> > > > getting into the need for function pointers.)
> > > 
> > > That sounds more complicated what we have right now, and even more so
> > > with my little xlog_buf_readahead helper.  Yes, the methods will all
> > > just call xlog_buf_readahead, but they are trivial two-liners that are
> > > easy to understand.  Much easier than a complicated calling convention
> > > to pass the blkno, len and buf ops back.
> > > 
> > 
> > Ok. The above was just an idea to simplify things vs. duplicating
> > readahead code and recovery logic N times. I haven't seen your
> > idea/code, but if that problem is addressed with a helper vs. a
> > different interface then that seems just as reasonable to me.
> > 
> > > > - Recovery (pass1/pass2)
> > > > 
> > > > The core recovery bits seem more reasonable to factor out in general.
> > > > That said, we only have two pass1 callbacks (buffers and quotaoff). The
> > > > buffer callback does cancellation management and the quotaoff sets some
> > > > flags, so I wonder why those couldn't just remain as direct function
> > > > calls (even if we move the functions out of xfs_log_recover.c). There
> > > > are more callbacks for pass2 so the function pointers make a bit more
> > > > sense there, but at the same time it looks like the various intents are
> > > > further abstracted behind a single "intent type" pass2 call (which has a
> > > > hardcoded XLOG_REORDER_INODE_LIST reorder value and is about as clear as
> > > > mud in that context, getting to my earlier point).
> > > 
> > > Again I actually like the callouts, mostly because they make it pretty
> > > clear what is going on.  I also really like the fact that the recovery
> > > code is close to the code actually writing the log items.
> 
> Looking back at that, I realize that (provided nobody minds having
> function dispatch structures that are sort of sparse) there's no reason
> why we need to have separate xlog_recover_intent_type and
> xlog_recover_item_type structures.
> 

The sparseness doesn't bother me provided the underlying interfaces are
simple/generic enough to understand from the structure definition and
are broadly (if not universally) used. I'm still not convinced the
transaction reorder thing should be distributed at all, but I'll wait to
see what the next iteration looks like.

> > I find both the runtime logging and recovery code to be complex enough
> > individually that I prefer not to stuff them together, but there is
> > already precedent with dfops and such so that's not the biggest deal to
> > me if the interface is simplified (and hopefully amount of code
> > reduced).
> 
> I combined them largely on the observation that with the exception of
> buffers, log item recovery code is generally short and not worth
> creating even more files.  224 is enough.
> 

I like Christoph's idea of selectively separating out the particularly
large (i.e. buffers) bits.

Brian

> --D
> 
> > Brian
> > 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/19] xfs: refactor EFI log item recovery dispatch
  2020-04-29  7:09         ` Christoph Hellwig
  2020-04-29  7:18           ` Christoph Hellwig
@ 2020-04-29 14:20           ` Darrick J. Wong
  1 sibling, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-29 14:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Apr 29, 2020 at 12:09:16AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 28, 2020 at 04:45:57PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 28, 2020 at 03:41:32PM -0700, Darrick J. Wong wrote:
> > > On Sat, Apr 25, 2020 at 11:28:01AM -0700, Christoph Hellwig wrote:
> > > > On Tue, Apr 21, 2020 at 07:07:13PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > Move the extent free intent and intent-done log recovery code into the
> > > > > per-item source code files and use dispatch functions to call them.  We
> > > > > do these one at a time because there's a lot of code to move.  No
> > > > > functional changes.
> > > > 
> > > > What is the reason for splitting xlog_recover_item_type vs
> > > > xlog_recover_intent_type?  To me it would seem more logical to have
> > > > one operation vector, with some ops only set for intents.
> > > 
> > > Partly because I started by refactoring only the intent items, and then
> > > decided to prepend a series to do everything; and partly to be stingy
> > > with bytes. :P
> > > 
> > > That said, I like your suggestion of every XFS_LI_* code gets its own
> > > xlog_recover_item_type so I'll go do that.
> > 
> > Aha, now I remember why those two are separate types -- the
> > process_intent and cancel_intent functions operate on the xfs_log_item
> > that gets created from the xlog_recover_item that we pulled out of the
> > log, whereas the other functions are called directly on the
> > xlog_recovery_item.  There's no direct link between the log item and the
> > recovery log item, nor is there a good way to link through their
> > dispatch functions.
> 
> Maybe those should move to xfs_item_ops as they operate on a "live"
> xfs_log_item? (they'd need to grow names clearly related to recovery
> of course).  In fact except for slightly different calling convention
> ->cancel_intent already seems to be identical to ->abort_intent in
> xfs_item_ops, so that would be one off the list.

Hmm, yes, that's a better way out.  Trees, meet forest. ;)

> Btw, it seems like we should drop the ail_lock before calling
> ->process_intent as all instances do that anyway, and it would keep
> the locking a little more centralized, and it will allow killing
> one pointless wrapper in each instance.  Maybe we can also move
> the recovered flag to the generic log item flags?

Yeah, I was working on adding that to the patchset too.

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/19] xfs: refactor log recovery
  2020-04-29 11:52         ` Brian Foster
@ 2020-04-29 14:22           ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2020-04-29 14:22 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Apr 29, 2020 at 07:52:05AM -0400, Brian Foster wrote:
> On Tue, Apr 28, 2020 at 03:34:22PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 28, 2020 at 08:43:42AM -0400, Brian Foster wrote:
> > > On Mon, Apr 27, 2020 at 11:12:08PM -0700, Christoph Hellwig wrote:
> > > > On Wed, Apr 22, 2020 at 12:18:54PM -0400, Brian Foster wrote:
> > > > > - Transaction reorder
> > > > > 
> > > > > Virtualizing the transaction reorder across all several files/types
> > > > > strikes me as overkill for several reasons. From a code standpoint,
> > > > > we've created a new type enumeration and a couple fields (enum type and
> > > > > a function) in a generic structure to essentially abstract out the
> > > > > buffer handling into a function. The latter checks another couple of blf
> > > > > flags, which appears to be the only real "type specific" logic in the
> > > > > whole sequence. From a complexity standpoint, the reorder operation is a
> > > > > fairly low level and internal recovery operation. We have this huge
> > > > > comment just to explain exactly what's happening and why certain items
> > > > > have to be ordered as such, or some treated like others, etc. TBH it's
> > > > > not terribly clear even with that documentation, so I don't know that
> > > > > splitting the associated mapping logic off into separate files is
> > > > > helpful.
> > > > 
> > > > I actually very much like idea of moving any knowledge of the individual
> > > > item types out of xfs_log_recovery.c.  In reply to the patch I've
> > > > suggsted an idea how to kill the knowledge for all but the buffer and
> > > > icreate items, which should make this a little more sensible.
> > > > 
> > > 
> > > I mentioned to Darrick the other day briefly on IRC that I don't
> > > fundamentally object to splitting up xfs_log_recover.c. I just think
> > > this mechanical split out of the existing code includes too much of the
> > > implementation details of recovery and perhaps abstracts a bit too much.
> > > I find the general idea much more acceptable with preliminary cleanups
> > > and a more simple interface.
> > 
> > It's cleaned up considerably with hch's cleanup patches 1-5 of 2. ;)
> > 
> > > > I actually think we should go further in one aspect - instead of having
> > > > the item type to ops mapping in a single function in xfs_log_recovery.c
> > > > we should have a table that the items can just add themselves to.
> > > > 
> > > 
> > > That sounds reasonable, but that's more about abstraction mechanism than
> > > defining the interface. I was more focused on simplifying the latter in
> > > my previous comments.
> > 
> > <nod>
> > 
> > > > > - Readahead
> > > > > 
> > > > > We end up with readahead callouts for only the types that translate to
> > > > > buffers (so buffers, inode, dquots), and then those callouts do some
> > > > > type specific mapping (that is duplicated within the specific type
> > > > > handers) and issue a readahead (which is duplicated across each ra_pass2
> > > > > call). I wonder if this would be better abstracted by a ->bmap() like
> > > > > call that simply maps the item to a [block,length] and returns a
> > > > > non-zero length if the core recovery code should invoke readahead (after
> > > > > checking for cancellation). It looks like the underlying implementation
> > > > > of those bmap calls could be further factored into helpers that
> > > > > translate from the raw record data into the type specific format
> > > > > structures, and that could reduce duplication between the readahead
> > > > > calls and the pass2 calls in a couple cases. (The more I think about,
> > > > > the more I think we should introduce those kind of cleanups before
> > > > > getting into the need for function pointers.)
> > > > 
> > > > That sounds more complicated what we have right now, and even more so
> > > > with my little xlog_buf_readahead helper.  Yes, the methods will all
> > > > just call xlog_buf_readahead, but they are trivial two-liners that are
> > > > easy to understand.  Much easier than a complicated calling convention
> > > > to pass the blkno, len and buf ops back.
> > > > 
> > > 
> > > Ok. The above was just an idea to simplify things vs. duplicating
> > > readahead code and recovery logic N times. I haven't seen your
> > > idea/code, but if that problem is addressed with a helper vs. a
> > > different interface then that seems just as reasonable to me.
> > > 
> > > > > - Recovery (pass1/pass2)
> > > > > 
> > > > > The core recovery bits seem more reasonable to factor out in general.
> > > > > That said, we only have two pass1 callbacks (buffers and quotaoff). The
> > > > > buffer callback does cancellation management and the quotaoff sets some
> > > > > flags, so I wonder why those couldn't just remain as direct function
> > > > > calls (even if we move the functions out of xfs_log_recover.c). There
> > > > > are more callbacks for pass2 so the function pointers make a bit more
> > > > > sense there, but at the same time it looks like the various intents are
> > > > > further abstracted behind a single "intent type" pass2 call (which has a
> > > > > hardcoded XLOG_REORDER_INODE_LIST reorder value and is about as clear as
> > > > > mud in that context, getting to my earlier point).
> > > > 
> > > > Again I actually like the callouts, mostly because they make it pretty
> > > > clear what is going on.  I also really like the fact that the recovery
> > > > code is close to the code actually writing the log items.
> > 
> > Looking back at that, I realize that (provided nobody minds having
> > function dispatch structures that are sort of sparse) there's no reason
> > why we need to have separate xlog_recover_intent_type and
> > xlog_recover_item_type structures.
> > 
> 
> The sparseness doesn't bother me provided the underlying interfaces are
> simple/generic enough to understand from the structure definition and
> are broadly (if not universally) used. I'm still not convinced the
> transaction reorder thing should be distributed at all, but I'll wait to
> see what the next iteration looks like.

There's a lot less of it, since we assume ITEM_LIST unless explicitly
overridden via function.

> > > I find both the runtime logging and recovery code to be complex enough
> > > individually that I prefer not to stuff them together, but there is
> > > already precedent with dfops and such so that's not the biggest deal to
> > > me if the interface is simplified (and hopefully amount of code
> > > reduced).
> > 
> > I combined them largely on the observation that with the exception of
> > buffers, log item recovery code is generally short and not worth
> > creating even more files.  224 is enough.
> > 
> 
> I like Christoph's idea of selectively separating out the particularly
> large (i.e. buffers) bits.

Ok, I'll start with the xfs_buf_item.c.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2020-04-29 14:22 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22  2:06 [PATCH 00/19] xfs: refactor log recovery Darrick J. Wong
2020-04-22  2:06 ` [PATCH 01/19] xfs: complain when we don't recognize the log item type Darrick J. Wong
2020-04-22 16:17   ` Brian Foster
2020-04-25 17:42   ` Christoph Hellwig
2020-04-27 17:55     ` Darrick J. Wong
2020-04-22  2:06 ` [PATCH 02/19] xfs: refactor log recovery item sorting into a generic dispatch structure Darrick J. Wong
2020-04-25 18:13   ` Christoph Hellwig
2020-04-27 22:04     ` Darrick J. Wong
2020-04-28  5:11       ` Christoph Hellwig
2020-04-28 20:46         ` Darrick J. Wong
2020-04-22  2:06 ` [PATCH 03/19] xfs: refactor log recovery item dispatch for pass2 readhead functions Darrick J. Wong
2020-04-25 18:19   ` Christoph Hellwig
2020-04-28 20:54     ` Darrick J. Wong
2020-04-29  6:07       ` Christoph Hellwig
2020-04-22  2:06 ` [PATCH 04/19] xfs: refactor log recovery item dispatch for pass1 commit functions Darrick J. Wong
2020-04-25 18:20   ` Christoph Hellwig
2020-04-22  2:06 ` [PATCH 05/19] xfs: refactor log recovery buffer item dispatch for pass2 " Darrick J. Wong
2020-04-22  2:06 ` [PATCH 06/19] xfs: refactor log recovery inode " Darrick J. Wong
2020-04-22  2:06 ` [PATCH 07/19] xfs: refactor log recovery intent " Darrick J. Wong
2020-04-25 18:24   ` Christoph Hellwig
2020-04-28 22:42     ` Darrick J. Wong
2020-04-22  2:06 ` [PATCH 08/19] xfs: refactor log recovery dquot " Darrick J. Wong
2020-04-22  2:07 ` [PATCH 09/19] xfs: refactor log recovery icreate " Darrick J. Wong
2020-04-22  2:07 ` [PATCH 10/19] xfs: refactor log recovery quotaoff " Darrick J. Wong
2020-04-22  2:07 ` [PATCH 11/19] xfs: refactor EFI log item recovery dispatch Darrick J. Wong
2020-04-25 18:28   ` Christoph Hellwig
2020-04-28 22:41     ` Darrick J. Wong
2020-04-28 23:45       ` Darrick J. Wong
2020-04-29  7:09         ` Christoph Hellwig
2020-04-29  7:18           ` Christoph Hellwig
2020-04-29 14:20           ` Darrick J. Wong
2020-04-22  2:07 ` [PATCH 12/19] xfs: refactor RUI " Darrick J. Wong
2020-04-25 18:28   ` Christoph Hellwig
2020-04-28 22:40     ` Darrick J. Wong
2020-04-22  2:07 ` [PATCH 13/19] xfs: refactor CUI " Darrick J. Wong
2020-04-22  2:07 ` [PATCH 14/19] xfs: refactor BUI " Darrick J. Wong
2020-04-22  2:07 ` [PATCH 15/19] xfs: refactor releasing finished intents during log recovery Darrick J. Wong
2020-04-25 18:34   ` Christoph Hellwig
2020-04-28 22:40     ` Darrick J. Wong
2020-04-22  2:07 ` [PATCH 16/19] xfs: refactor adding recovered intent items to the log Darrick J. Wong
2020-04-25 18:34   ` Christoph Hellwig
2020-04-22  2:07 ` [PATCH 17/19] xfs: hoist the ail unlock/lock cycle when cancelling intents during recovery Darrick J. Wong
2020-04-25 18:35   ` Christoph Hellwig
2020-04-22  2:07 ` [PATCH 18/19] xfs: remove xlog_item_is_intent Darrick J. Wong
2020-04-22  2:08 ` [PATCH 19/19] xfs: move xlog_recover_intent_pass2 up in the file Darrick J. Wong
2020-04-25 18:36   ` Christoph Hellwig
2020-04-28 22:38     ` Darrick J. Wong
2020-04-22 16:18 ` [PATCH 00/19] xfs: refactor log recovery Brian Foster
2020-04-28  6:12   ` Christoph Hellwig
2020-04-28 12:43     ` Brian Foster
2020-04-28 22:34       ` Darrick J. Wong
2020-04-29  6:09         ` Christoph Hellwig
2020-04-29 11:52         ` Brian Foster
2020-04-29 14:22           ` Darrick J. Wong
2020-04-28  6:22 ` Christoph Hellwig
2020-04-28 22:28   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.