All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] xfs: fix inode use-after-free during log recovery
@ 2020-09-29 17:43 Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 1/3] xfs: clean up bmap intent item recovery checking Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david, hch

Hi all,

In this second series, I try to fix a use-after-free that I discovered
during development of the dfops freezer, where BUI recovery releases the
inode even if it requeues itself.  If the inode gets reclaimed, the fs
corrupts memory and explodes.  The fix is to make the dfops capture
struct take over ownership of the inodes if there's any more work to be
done.  This is a bit clunky, but it's a simpler mechanism than saving
inode pointers and inode numbers and introducing tagged structures so
that we can distinguish one from the other.

v2: rebase atop the new defer capture code
v3: only capture one inode, move as much of the defer capture code to
xfs_defer.c as we can

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-bmap-intent-recovery-5.10
---
 fs/xfs/libxfs/xfs_defer.c  |   55 +++++++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_defer.h  |   11 +++++-
 fs/xfs/xfs_bmap_item.c     |   78 +++++++++++++++++---------------------------
 fs/xfs/xfs_extfree_item.c  |    2 +
 fs/xfs/xfs_log_recover.c   |    7 +++-
 fs/xfs/xfs_refcount_item.c |    2 +
 fs/xfs/xfs_rmap_item.c     |    2 +
 7 files changed, 96 insertions(+), 61 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] xfs: clean up bmap intent item recovery checking
  2020-09-29 17:43 [PATCH v3 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
@ 2020-09-29 17:43 ` Darrick J. Wong
  2020-09-29 17:44 ` [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering Darrick J. Wong
  2020-09-29 17:44 ` [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover Darrick J. Wong
  2 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

The bmap intent item checking code in xfs_bui_item_recover is spread all
over the function.  We should check the recovered log item at the top
before we allocate any resources or do anything else, so do that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_item.c |   38 ++++++++++++--------------------------
 1 file changed, 12 insertions(+), 26 deletions(-)


diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 126df48dae5f..c1f2cc3c42cb 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -437,8 +437,6 @@ xfs_bui_item_recover(
 	xfs_fsblock_t			inode_fsb;
 	xfs_filblks_t			count;
 	xfs_exntst_t			state;
-	enum xfs_bmap_intent_type	type;
-	bool				op_ok;
 	unsigned int			bui_type;
 	int				whichfork;
 	int				error = 0;
@@ -456,16 +454,19 @@ xfs_bui_item_recover(
 			   XFS_FSB_TO_DADDR(mp, bmap->me_startblock));
 	inode_fsb = XFS_BB_TO_FSB(mp, XFS_FSB_TO_DADDR(mp,
 			XFS_INO_TO_FSB(mp, bmap->me_owner)));
-	switch (bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK) {
+	state = (bmap->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
+			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+	whichfork = (bmap->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
+			XFS_ATTR_FORK : XFS_DATA_FORK;
+	bui_type = bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
+	switch (bui_type) {
 	case XFS_BMAP_MAP:
 	case XFS_BMAP_UNMAP:
-		op_ok = true;
 		break;
 	default:
-		op_ok = false;
-		break;
+		return -EFSCORRUPTED;
 	}
-	if (!op_ok || startblock_fsb == 0 ||
+	if (startblock_fsb == 0 ||
 	    bmap->me_len == 0 ||
 	    inode_fsb == 0 ||
 	    startblock_fsb >= mp->m_sb.sb_dblocks ||
@@ -493,32 +494,17 @@ xfs_bui_item_recover(
 	if (VFS_I(ip)->i_nlink == 0)
 		xfs_iflags_set(ip, XFS_IRECOVERY);
 
-	/* Process deferred bmap item. */
-	state = (bmap->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
-			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
-	whichfork = (bmap->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
-			XFS_ATTR_FORK : XFS_DATA_FORK;
-	bui_type = bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
-	switch (bui_type) {
-	case XFS_BMAP_MAP:
-	case XFS_BMAP_UNMAP:
-		type = bui_type;
-		break;
-	default:
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
-		error = -EFSCORRUPTED;
-		goto err_inode;
-	}
 	xfs_trans_ijoin(tp, ip, 0);
 
 	count = bmap->me_len;
-	error = xfs_trans_log_finish_bmap_update(tp, budp, type, ip, whichfork,
-			bmap->me_startoff, bmap->me_startblock, &count, state);
+	error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip,
+			whichfork, bmap->me_startoff, bmap->me_startblock,
+			&count, state);
 	if (error)
 		goto err_inode;
 
 	if (count > 0) {
-		ASSERT(type == XFS_BMAP_UNMAP);
+		ASSERT(bui_type == XFS_BMAP_UNMAP);
 		irec.br_startblock = bmap->me_startblock;
 		irec.br_blockcount = count;
 		irec.br_startoff = bmap->me_startoff;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  2020-09-29 17:43 [PATCH v3 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 1/3] xfs: clean up bmap intent item recovery checking Darrick J. Wong
@ 2020-09-29 17:44 ` Darrick J. Wong
  2020-10-02 16:27   ` Brian Foster
  2020-10-04 19:09   ` [PATCH v3.2 " Darrick J. Wong
  2020-09-29 17:44 ` [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover Darrick J. Wong
  2 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:44 UTC (permalink / raw)
  To: darrick.wong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

In most places in XFS, we have a specific order in which we gather
resources: grab the inode, allocate a transaction, then lock the inode.
xfs_bui_item_recover doesn't do it in that order, so fix it to be more
consistent.  This also makes the error bailout code a bit less weird.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_item.c |   42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index c1f2cc3c42cb..1c9cb5a04bb5 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -475,25 +475,26 @@ xfs_bui_item_recover(
 	    (bmap->me_flags & ~XFS_BMAP_EXTENT_FLAGS))
 		return -EFSCORRUPTED;
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
-			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
-	if (error)
-		return error;
-
-	budp = xfs_trans_get_bud(tp, buip);
-
 	/* Grab the inode. */
-	error = xfs_iget(mp, tp, bmap->me_owner, 0, XFS_ILOCK_EXCL, &ip);
+	error = xfs_iget(mp, NULL, bmap->me_owner, 0, 0, &ip);
 	if (error)
-		goto err_inode;
+		return error;
 
-	error = xfs_qm_dqattach_locked(ip, false);
+	error = xfs_qm_dqattach(ip);
 	if (error)
-		goto err_inode;
+		goto err_rele;
 
 	if (VFS_I(ip)->i_nlink == 0)
 		xfs_iflags_set(ip, XFS_IRECOVERY);
 
+	/* Allocate transaction and do the work. */
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
+			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
+	if (error)
+		goto err_rele;
+
+	budp = xfs_trans_get_bud(tp, buip);
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
 	count = bmap->me_len;
@@ -501,7 +502,7 @@ xfs_bui_item_recover(
 			whichfork, bmap->me_startoff, bmap->me_startblock,
 			&count, state);
 	if (error)
-		goto err_inode;
+		goto err_cancel;
 
 	if (count > 0) {
 		ASSERT(bui_type == XFS_BMAP_UNMAP);
@@ -512,18 +513,19 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
+	/* Commit transaction, which frees tp. */
 	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
+	if (error)
+		goto err_unlock;
+	return 0;
+
+err_cancel:
+	xfs_trans_cancel(tp);
+err_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+err_rele:
 	xfs_irele(ip);
 	return error;
-
-err_inode:
-	xfs_trans_cancel(tp);
-	if (ip) {
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		xfs_irele(ip);
-	}
-	return error;
 }
 
 STATIC bool


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-09-29 17:43 [PATCH v3 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 1/3] xfs: clean up bmap intent item recovery checking Darrick J. Wong
  2020-09-29 17:44 ` [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering Darrick J. Wong
@ 2020-09-29 17:44 ` Darrick J. Wong
  2020-10-02  4:22   ` [PATCH v5.2 " Darrick J. Wong
  2020-10-04 19:11   ` [PATCH v3.3 " Darrick J. Wong
  2 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:44 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation.  If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode.  If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up.  Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c  |   55 ++++++++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_defer.h  |   11 +++++++--
 fs/xfs/xfs_bmap_item.c     |    8 ++----
 fs/xfs/xfs_extfree_item.c  |    2 +-
 fs/xfs/xfs_log_recover.c   |    7 +++++-
 fs/xfs/xfs_refcount_item.c |    2 +-
 fs/xfs/xfs_rmap_item.c     |    2 +-
 7 files changed, 67 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 4caaf5527403..c466a3177acc 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -16,6 +16,7 @@
 #include "xfs_inode.h"
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
+#include "xfs_icache.h"
 
 /*
  * Deferred Operations in XFS
@@ -553,10 +554,14 @@ xfs_defer_move(
  * deferred ops state is transferred to the capture structure and the
  * transaction is then ready for the caller to commit it.  If there are no
  * intent items to capture, this function returns NULL.
+ *
+ * If inodes are passed in and this function returns a capture structure, the
+ * inodes are now owned by the capture structure.
  */
 static struct xfs_defer_capture *
 xfs_defer_ops_capture(
-	struct xfs_trans		*tp)
+	struct xfs_trans		*tp,
+	struct xfs_inode		*ip)
 {
 	struct xfs_defer_capture	*dfc;
 
@@ -588,6 +593,12 @@ xfs_defer_ops_capture(
 	dfc->dfc_tres.tr_logcount = 1;
 	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
 
+	/*
+	 * Transfer responsibility for unlocking and releasing the inodes to
+	 * the capture structure.
+	 */
+	dfc->dfc_ip = ip;
+
 	return dfc;
 }
 
@@ -598,29 +609,49 @@ xfs_defer_ops_release(
 	struct xfs_defer_capture	*dfc)
 {
 	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
+	if (dfc->dfc_ip)
+		xfs_irele(dfc->dfc_ip);
 	kmem_free(dfc);
 }
 
 /*
  * Capture any deferred ops and commit the transaction.  This is the last step
- * needed to finish a log intent item that we recovered from the log.
+ * needed to finish a log intent item that we recovered from the log.  If any
+ * of the deferred ops operate on an inode, the caller must pass in that inode
+ * so that the reference can be transferred to the capture structure.  The
+ * caller must hold ILOCK_EXCL on the inode, and must not touch the inode after
+ * this call returns.
  */
 int
 xfs_defer_ops_capture_and_commit(
 	struct xfs_trans		*tp,
+	struct xfs_inode		*ip,
 	struct list_head		*capture_list)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_defer_capture	*dfc;
 	int				error;
 
+	ASSERT(ip == NULL || xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
 	/* If we don't capture anything, commit transaction and exit. */
-	dfc = xfs_defer_ops_capture(tp);
-	if (!dfc)
-		return xfs_trans_commit(tp);
+	dfc = xfs_defer_ops_capture(tp, ip);
+	if (!dfc) {
+		error = xfs_trans_commit(tp);
+		if (ip) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			xfs_irele(ip);
+		}
+		return error;
+	}
 
-	/* Commit the transaction and add the capture structure to the list. */
+	/*
+	 * Commit the transaction and add the capture structure to the list.
+	 * Once that's done, we can unlock the inode.
+	 */
 	error = xfs_trans_commit(tp);
+	if (ip)
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error) {
 		xfs_defer_ops_release(mp, dfc);
 		return error;
@@ -632,16 +663,24 @@ xfs_defer_ops_capture_and_commit(
 
 /*
  * Attach a chain of captured deferred ops to a new transaction and free the
- * capture structure.
+ * capture structure.  A captured inode will be passed back to the caller.
  */
 void
 xfs_defer_ops_continue(
 	struct xfs_defer_capture	*dfc,
-	struct xfs_trans		*tp)
+	struct xfs_trans		*tp,
+	struct xfs_inode		**ipp)
 {
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
+	/* Lock and join the captured inode to the new transaction. */
+	if (dfc->dfc_ip) {
+		xfs_ilock(dfc->dfc_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, dfc->dfc_ip, 0);
+	}
+	*ipp = dfc->dfc_ip;
+
 	/* Move captured dfops chain and state to the transaction. */
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index c447c79bbe74..3aaf702d4445 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -77,15 +77,22 @@ struct xfs_defer_capture {
 	unsigned int		dfc_tpflags;
 	unsigned int		dfc_blkres;
 	struct xfs_trans_res	dfc_tres;
+
+	/*
+	 * An inode reference that must be maintained to complete the deferred
+	 * work.
+	 */
+	struct xfs_inode	*dfc_ip;
 };
 
 /*
  * Functions to capture a chain of deferred operations and continue them later.
  * This doesn't normally happen except log recovery.
  */
-int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
+int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp, struct xfs_inode *ip,
 		struct list_head *capture_list);
-void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp);
+void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
+		struct xfs_inode **ipp);
 void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 1c9cb5a04bb5..0ffbc75bafe1 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -513,15 +513,11 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
-	/* Commit transaction, which frees tp. */
-	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
-	if (error)
-		goto err_unlock;
-	return 0;
+	/* Commit transaction, which frees the transaction and the inode. */
+	return xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
 
 err_cancel:
 	xfs_trans_cancel(tp);
-err_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 err_rele:
 	xfs_irele(ip);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 17d36fe5cfd0..3920542f5736 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -627,7 +627,7 @@ xfs_efi_item_recover(
 
 	}
 
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 46e750279634..90ad2bfdfa48 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2439,6 +2439,7 @@ xlog_finish_defer_ops(
 {
 	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
+	struct xfs_inode	*ip;
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
@@ -2449,9 +2450,13 @@ xlog_finish_defer_ops(
 
 		/* Transfer all collected dfops to this transaction. */
 		list_del_init(&dfc->dfc_list);
-		xfs_defer_ops_continue(dfc, tp);
+		xfs_defer_ops_continue(dfc, tp, &ip);
 
 		error = xfs_trans_commit(tp);
+		if (ip) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			xfs_irele(ip);
+		}
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 0478374add64..ad895b48f365 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -544,7 +544,7 @@ xfs_cui_item_recover(
 	}
 
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 0d8fa707f079..1163f32c3e62 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -567,7 +567,7 @@ xfs_rui_item_recover(
 	}
 
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5.2 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-09-29 17:44 ` [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover Darrick J. Wong
@ 2020-10-02  4:22   ` Darrick J. Wong
  2020-10-02  7:30     ` Christoph Hellwig
  2020-10-04 19:11   ` [PATCH v3.3 " Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-10-02  4:22 UTC (permalink / raw)
  To: linux-xfs, david, hch, Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation.  If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode.  If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up.  Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v5.2: rebase on updated defer capture patches
---
 fs/xfs/libxfs/xfs_defer.c  |   55 ++++++++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_defer.h  |   11 +++++++--
 fs/xfs/xfs_bmap_item.c     |    8 ++----
 fs/xfs/xfs_extfree_item.c  |    2 +-
 fs/xfs/xfs_log_recover.c   |    7 +++++-
 fs/xfs/xfs_refcount_item.c |    2 +-
 fs/xfs/xfs_rmap_item.c     |    2 +-
 7 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index e19dc1ced7e6..4af5752f9830 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -16,6 +16,7 @@
 #include "xfs_inode.h"
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
+#include "xfs_icache.h"
 
 /*
  * Deferred Operations in XFS
@@ -553,10 +554,14 @@ xfs_defer_move(
  * deferred ops state is transferred to the capture structure and the
  * transaction is then ready for the caller to commit it.  If there are no
  * intent items to capture, this function returns NULL.
+ *
+ * If inodes are passed in and this function returns a capture structure, the
+ * inodes are now owned by the capture structure.
  */
 static struct xfs_defer_capture *
 xfs_defer_ops_capture(
-	struct xfs_trans		*tp)
+	struct xfs_trans		*tp,
+	struct xfs_inode		*ip)
 {
 	struct xfs_defer_capture	*dfc;
 
@@ -582,6 +587,12 @@ xfs_defer_ops_capture(
 	/* Preserve the log reservation size. */
 	dfc->dfc_logres = tp->t_log_res;
 
+	/*
+	 * Transfer responsibility for unlocking and releasing the inodes to
+	 * the capture structure.
+	 */
+	dfc->dfc_ip = ip;
+
 	return dfc;
 }
 
@@ -592,29 +603,49 @@ xfs_defer_ops_release(
 	struct xfs_defer_capture	*dfc)
 {
 	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
+	if (dfc->dfc_ip)
+		xfs_irele(dfc->dfc_ip);
 	kmem_free(dfc);
 }
 
 /*
  * Capture any deferred ops and commit the transaction.  This is the last step
- * needed to finish a log intent item that we recovered from the log.
+ * needed to finish a log intent item that we recovered from the log.  If any
+ * of the deferred ops operate on an inode, the caller must pass in that inode
+ * so that the reference can be transferred to the capture structure.  The
+ * caller must hold ILOCK_EXCL on the inode, and must not touch the inode after
+ * this call returns.
  */
 int
 xfs_defer_ops_capture_and_commit(
 	struct xfs_trans		*tp,
+	struct xfs_inode		*ip,
 	struct list_head		*capture_list)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_defer_capture	*dfc;
 	int				error;
 
+	ASSERT(ip == NULL || xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
 	/* If we don't capture anything, commit transaction and exit. */
-	dfc = xfs_defer_ops_capture(tp);
-	if (!dfc)
-		return xfs_trans_commit(tp);
+	dfc = xfs_defer_ops_capture(tp, ip);
+	if (!dfc) {
+		error = xfs_trans_commit(tp);
+		if (ip) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			xfs_irele(ip);
+		}
+		return error;
+	}
 
-	/* Commit the transaction and add the capture structure to the list. */
+	/*
+	 * Commit the transaction and add the capture structure to the list.
+	 * Once that's done, we can unlock the inode.
+	 */
 	error = xfs_trans_commit(tp);
+	if (ip)
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error) {
 		xfs_defer_ops_release(mp, dfc);
 		return error;
@@ -626,16 +657,24 @@ xfs_defer_ops_capture_and_commit(
 
 /*
  * Attach a chain of captured deferred ops to a new transaction and free the
- * capture structure.
+ * capture structure.  A captured inode will be passed back to the caller.
  */
 void
 xfs_defer_ops_continue(
 	struct xfs_defer_capture	*dfc,
-	struct xfs_trans		*tp)
+	struct xfs_trans		*tp,
+	struct xfs_inode		**ipp)
 {
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
+	/* Lock and join the captured inode to the new transaction. */
+	if (dfc->dfc_ip) {
+		xfs_ilock(dfc->dfc_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, dfc->dfc_ip, 0);
+	}
+	*ipp = dfc->dfc_ip;
+
 	/* Move captured dfops chain and state to the transaction. */
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 6cde6f0713f7..04d4def50b19 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -82,15 +82,22 @@ struct xfs_defer_capture {
 
 	/* Log reservation saved from the transaction. */
 	unsigned int		dfc_logres;
+
+	/*
+	 * An inode reference that must be maintained to complete the deferred
+	 * work.
+	 */
+	struct xfs_inode	*dfc_ip;
 };
 
 /*
  * Functions to capture a chain of deferred operations and continue them later.
  * This doesn't normally happen except log recovery.
  */
-int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
+int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp, struct xfs_inode *ip,
 		struct list_head *capture_list);
-void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp);
+void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
+		struct xfs_inode **ipp);
 void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 1c9cb5a04bb5..0ffbc75bafe1 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -513,15 +513,11 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
-	/* Commit transaction, which frees tp. */
-	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
-	if (error)
-		goto err_unlock;
-	return 0;
+	/* Commit transaction, which frees the transaction and the inode. */
+	return xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
 
 err_cancel:
 	xfs_trans_cancel(tp);
-err_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 err_rele:
 	xfs_irele(ip);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 17d36fe5cfd0..3920542f5736 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -627,7 +627,7 @@ xfs_efi_item_recover(
 
 	}
 
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 001e1585ddc6..a8289adc1b29 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2439,6 +2439,7 @@ xlog_finish_defer_ops(
 {
 	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
+	struct xfs_inode	*ip;
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
@@ -2464,9 +2465,13 @@ xlog_finish_defer_ops(
 		 * from recovering a single intent item.
 		 */
 		list_del_init(&dfc->dfc_list);
-		xfs_defer_ops_continue(dfc, tp);
+		xfs_defer_ops_continue(dfc, tp, &ip);
 
 		error = xfs_trans_commit(tp);
+		if (ip) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			xfs_irele(ip);
+		}
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 0478374add64..ad895b48f365 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -544,7 +544,7 @@ xfs_cui_item_recover(
 	}
 
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 0d8fa707f079..1163f32c3e62 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -567,7 +567,7 @@ xfs_rui_item_recover(
 	}
 
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5.2 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-10-02  4:22   ` [PATCH v5.2 " Darrick J. Wong
@ 2020-10-02  7:30     ` Christoph Hellwig
  2020-10-02 16:29       ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2020-10-02  7:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch, Brian Foster

On Thu, Oct 01, 2020 at 09:22:36PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In xfs_bui_item_recover, there exists a use-after-free bug with regards
> to the inode that is involved in the bmap replay operation.  If the
> mapping operation does not complete, we call xfs_bmap_unmap_extent to
> create a deferred op to finish the unmapping work, and we retain a
> pointer to the incore inode.
> 
> Unfortunately, the very next thing we do is commit the transaction and
> drop the inode.  If reclaim tears down the inode before we try to finish
> the defer ops, we dereference garbage and blow up.  Therefore, create a
> way to join inodes to the defer ops freezer so that we can maintain the
> xfs_inode reference until we're done with the inode.
> 
> Note: This imposes the requirement that there be enough memory to keep
> every incore inode in memory throughout recovery.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v5.2: rebase on updated defer capture patches
> ---
>  fs/xfs/libxfs/xfs_defer.c  |   55 ++++++++++++++++++++++++++++++++++++++------
>  fs/xfs/libxfs/xfs_defer.h  |   11 +++++++--
>  fs/xfs/xfs_bmap_item.c     |    8 ++----
>  fs/xfs/xfs_extfree_item.c  |    2 +-
>  fs/xfs/xfs_log_recover.c   |    7 +++++-
>  fs/xfs/xfs_refcount_item.c |    2 +-
>  fs/xfs/xfs_rmap_item.c     |    2 +-
>  7 files changed, 67 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index e19dc1ced7e6..4af5752f9830 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -16,6 +16,7 @@
>  #include "xfs_inode.h"
>  #include "xfs_inode_item.h"
>  #include "xfs_trace.h"
> +#include "xfs_icache.h"
>  
>  /*
>   * Deferred Operations in XFS
> @@ -553,10 +554,14 @@ xfs_defer_move(
>   * deferred ops state is transferred to the capture structure and the
>   * transaction is then ready for the caller to commit it.  If there are no
>   * intent items to capture, this function returns NULL.
> + *
> + * If inodes are passed in and this function returns a capture structure, the
> + * inodes are now owned by the capture structure.
>   */
>  static struct xfs_defer_capture *
>  xfs_defer_ops_capture(
> -	struct xfs_trans		*tp)
> +	struct xfs_trans		*tp,
> +	struct xfs_inode		*ip)
>  {
>  	struct xfs_defer_capture	*dfc;
>  
> @@ -582,6 +587,12 @@ xfs_defer_ops_capture(
>  	/* Preserve the log reservation size. */
>  	dfc->dfc_logres = tp->t_log_res;
>  
> +	/*
> +	 * Transfer responsibility for unlocking and releasing the inodes to
> +	 * the capture structure.
> +	 */
> +	dfc->dfc_ip = ip;
> +

Maybe rename ip to capture_ip?

> +	ASSERT(ip == NULL || xfs_isilocked(ip, XFS_ILOCK_EXCL));
> +
>  	/* If we don't capture anything, commit transaction and exit. */
> +	dfc = xfs_defer_ops_capture(tp, ip);
> +	if (!dfc) {
> +		error = xfs_trans_commit(tp);
> +		if (ip) {
> +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +			xfs_irele(ip);
> +		}
> +		return error;
> +	}

Instead of coming up with our own inode unlocking and release schemes,
can't we just require that the inode is joinged by passing the lock
flags to xfs_trans_ijoin, and piggy back on xfs_trans_commit unlocking
it in that case?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  2020-09-29 17:44 ` [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering Darrick J. Wong
@ 2020-10-02 16:27   ` Brian Foster
  2020-10-02 16:30     ` Darrick J. Wong
  2020-10-04 19:09   ` [PATCH v3.2 " Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Foster @ 2020-10-02 16:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david

On Tue, Sep 29, 2020 at 10:44:00AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In most places in XFS, we have a specific order in which we gather
> resources: grab the inode, allocate a transaction, then lock the inode.
> xfs_bui_item_recover doesn't do it in that order, so fix it to be more
> consistent.  This also makes the error bailout code a bit less weird.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_bmap_item.c |   42 ++++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 20 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index c1f2cc3c42cb..1c9cb5a04bb5 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
...
> @@ -512,18 +513,19 @@ xfs_bui_item_recover(
>  		xfs_bmap_unmap_extent(tp, ip, &irec);
>  	}
>  
> +	/* Commit transaction, which frees tp. */
>  	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> +	if (error)
> +		goto err_unlock;
> +	return 0;
> +
> +err_cancel:
> +	xfs_trans_cancel(tp);
> +err_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +err_rele:
>  	xfs_irele(ip);

What happened to the unlock and irele in the non-error path?

Brian

>  	return error;
> -
> -err_inode:
> -	xfs_trans_cancel(tp);
> -	if (ip) {
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -		xfs_irele(ip);
> -	}
> -	return error;
>  }
>  
>  STATIC bool
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5.2 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-10-02  7:30     ` Christoph Hellwig
@ 2020-10-02 16:29       ` Darrick J. Wong
  2020-10-05  6:25         ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-10-02 16:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, david, Brian Foster

On Fri, Oct 02, 2020 at 09:30:06AM +0200, Christoph Hellwig wrote:
> On Thu, Oct 01, 2020 at 09:22:36PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In xfs_bui_item_recover, there exists a use-after-free bug with regards
> > to the inode that is involved in the bmap replay operation.  If the
> > mapping operation does not complete, we call xfs_bmap_unmap_extent to
> > create a deferred op to finish the unmapping work, and we retain a
> > pointer to the incore inode.
> > 
> > Unfortunately, the very next thing we do is commit the transaction and
> > drop the inode.  If reclaim tears down the inode before we try to finish
> > the defer ops, we dereference garbage and blow up.  Therefore, create a
> > way to join inodes to the defer ops freezer so that we can maintain the
> > xfs_inode reference until we're done with the inode.
> > 
> > Note: This imposes the requirement that there be enough memory to keep
> > every incore inode in memory throughout recovery.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v5.2: rebase on updated defer capture patches
> > ---
> >  fs/xfs/libxfs/xfs_defer.c  |   55 ++++++++++++++++++++++++++++++++++++++------
> >  fs/xfs/libxfs/xfs_defer.h  |   11 +++++++--
> >  fs/xfs/xfs_bmap_item.c     |    8 ++----
> >  fs/xfs/xfs_extfree_item.c  |    2 +-
> >  fs/xfs/xfs_log_recover.c   |    7 +++++-
> >  fs/xfs/xfs_refcount_item.c |    2 +-
> >  fs/xfs/xfs_rmap_item.c     |    2 +-
> >  7 files changed, 67 insertions(+), 20 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index e19dc1ced7e6..4af5752f9830 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -16,6 +16,7 @@
> >  #include "xfs_inode.h"
> >  #include "xfs_inode_item.h"
> >  #include "xfs_trace.h"
> > +#include "xfs_icache.h"
> >  
> >  /*
> >   * Deferred Operations in XFS
> > @@ -553,10 +554,14 @@ xfs_defer_move(
> >   * deferred ops state is transferred to the capture structure and the
> >   * transaction is then ready for the caller to commit it.  If there are no
> >   * intent items to capture, this function returns NULL.
> > + *
> > + * If inodes are passed in and this function returns a capture structure, the
> > + * inodes are now owned by the capture structure.
> >   */
> >  static struct xfs_defer_capture *
> >  xfs_defer_ops_capture(
> > -	struct xfs_trans		*tp)
> > +	struct xfs_trans		*tp,
> > +	struct xfs_inode		*ip)
> >  {
> >  	struct xfs_defer_capture	*dfc;
> >  
> > @@ -582,6 +587,12 @@ xfs_defer_ops_capture(
> >  	/* Preserve the log reservation size. */
> >  	dfc->dfc_logres = tp->t_log_res;
> >  
> > +	/*
> > +	 * Transfer responsibility for unlocking and releasing the inodes to
> > +	 * the capture structure.
> > +	 */
> > +	dfc->dfc_ip = ip;
> > +
> 
> Maybe rename ip to capture_ip?

Ok.

> > +	ASSERT(ip == NULL || xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > +
> >  	/* If we don't capture anything, commit transaction and exit. */
> > +	dfc = xfs_defer_ops_capture(tp, ip);
> > +	if (!dfc) {
> > +		error = xfs_trans_commit(tp);
> > +		if (ip) {
> > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +			xfs_irele(ip);
> > +		}
> > +		return error;
> > +	}
> 
> Instead of coming up with our own inode unlocking and release schemes,
> can't we just require that the inode is joinged by passing the lock
> flags to xfs_trans_ijoin, and piggy back on xfs_trans_commit unlocking
> it in that case?

Yes, and let's also xfs_iget(capture_ip->i_ino) to increase the incore
inode's refcount, which would make it so that the caller would still
unlock and rele the reference that they got.

--D

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  2020-10-02 16:27   ` Brian Foster
@ 2020-10-02 16:30     ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-10-02 16:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david

On Fri, Oct 02, 2020 at 12:27:54PM -0400, Brian Foster wrote:
> On Tue, Sep 29, 2020 at 10:44:00AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In most places in XFS, we have a specific order in which we gather
> > resources: grab the inode, allocate a transaction, then lock the inode.
> > xfs_bui_item_recover doesn't do it in that order, so fix it to be more
> > consistent.  This also makes the error bailout code a bit less weird.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_bmap_item.c |   42 ++++++++++++++++++++++--------------------
> >  1 file changed, 22 insertions(+), 20 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > index c1f2cc3c42cb..1c9cb5a04bb5 100644
> > --- a/fs/xfs/xfs_bmap_item.c
> > +++ b/fs/xfs/xfs_bmap_item.c
> ...
> > @@ -512,18 +513,19 @@ xfs_bui_item_recover(
> >  		xfs_bmap_unmap_extent(tp, ip, &irec);
> >  	}
> >  
> > +	/* Commit transaction, which frees tp. */
> >  	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> > +	if (error)
> > +		goto err_unlock;
> > +	return 0;
> > +
> > +err_cancel:
> > +	xfs_trans_cancel(tp);
> > +err_unlock:
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +err_rele:
> >  	xfs_irele(ip);
> 
> What happened to the unlock and irele in the non-error path?

xfs_defer_capture_and_consume did that, but see christoph's reply.

--D

> Brian
> 
> >  	return error;
> > -
> > -err_inode:
> > -	xfs_trans_cancel(tp);
> > -	if (ip) {
> > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > -		xfs_irele(ip);
> > -	}
> > -	return error;
> >  }
> >  
> >  STATIC bool
> > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3.2 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  2020-09-29 17:44 ` [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering Darrick J. Wong
  2020-10-02 16:27   ` Brian Foster
@ 2020-10-04 19:09   ` Darrick J. Wong
  2020-10-05 16:19     ` Brian Foster
  1 sibling, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-10-04 19:09 UTC (permalink / raw)
  To: Dave Chinner, Christoph Hellwig, linux-xfs, david, Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

In most places in XFS, we have a specific order in which we gather
resources: grab the inode, allocate a transaction, then lock the inode.
xfs_bui_item_recover doesn't do it in that order, so fix it to be more
consistent.  This also makes the error bailout code a bit less weird.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v3.2: don't remove the iunlock/irele if the defer commit succeeds
---
 fs/xfs/xfs_bmap_item.c |   41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index c1f2cc3c42cb..852411568d14 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -475,25 +475,26 @@ xfs_bui_item_recover(
 	    (bmap->me_flags & ~XFS_BMAP_EXTENT_FLAGS))
 		return -EFSCORRUPTED;
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
-			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
-	if (error)
-		return error;
-
-	budp = xfs_trans_get_bud(tp, buip);
-
 	/* Grab the inode. */
-	error = xfs_iget(mp, tp, bmap->me_owner, 0, XFS_ILOCK_EXCL, &ip);
+	error = xfs_iget(mp, NULL, bmap->me_owner, 0, 0, &ip);
 	if (error)
-		goto err_inode;
+		return error;
 
-	error = xfs_qm_dqattach_locked(ip, false);
+	error = xfs_qm_dqattach(ip);
 	if (error)
-		goto err_inode;
+		goto err_rele;
 
 	if (VFS_I(ip)->i_nlink == 0)
 		xfs_iflags_set(ip, XFS_IRECOVERY);
 
+	/* Allocate transaction and do the work. */
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
+			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
+	if (error)
+		goto err_rele;
+
+	budp = xfs_trans_get_bud(tp, buip);
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
 	count = bmap->me_len;
@@ -501,7 +502,7 @@ xfs_bui_item_recover(
 			whichfork, bmap->me_startoff, bmap->me_startblock,
 			&count, state);
 	if (error)
-		goto err_inode;
+		goto err_cancel;
 
 	if (count > 0) {
 		ASSERT(bui_type == XFS_BMAP_UNMAP);
@@ -512,17 +513,21 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
+	/* Commit transaction, which frees the transaction. */
 	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
+	if (error)
+		goto err_unlock;
+
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
-	return error;
+	return 0;
 
-err_inode:
+err_cancel:
 	xfs_trans_cancel(tp);
-	if (ip) {
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		xfs_irele(ip);
-	}
+err_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+err_rele:
+	xfs_irele(ip);
 	return error;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3.3 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-09-29 17:44 ` [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover Darrick J. Wong
  2020-10-02  4:22   ` [PATCH v5.2 " Darrick J. Wong
@ 2020-10-04 19:11   ` Darrick J. Wong
  2020-10-05 16:20     ` Brian Foster
  1 sibling, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-10-04 19:11 UTC (permalink / raw)
  To: linux-xfs, david, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation.  If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode.  If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up.  Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v3.3: ihold the captured inode and let callers iunlock/irele their own
reference
v3.2: rebase on updated defer capture patches
---
 fs/xfs/libxfs/xfs_defer.c  |   43 ++++++++++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_defer.h  |   11 +++++++++--
 fs/xfs/xfs_bmap_item.c     |    7 +++++--
 fs/xfs/xfs_extfree_item.c  |    2 +-
 fs/xfs/xfs_inode.c         |    8 ++++++++
 fs/xfs/xfs_inode.h         |    2 ++
 fs/xfs/xfs_log_recover.c   |    7 ++++++-
 fs/xfs/xfs_refcount_item.c |    2 +-
 fs/xfs/xfs_rmap_item.c     |    2 +-
 9 files changed, 71 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index e19dc1ced7e6..00696c23670c 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -16,6 +16,7 @@
 #include "xfs_inode.h"
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
+#include "xfs_icache.h"
 
 /*
  * Deferred Operations in XFS
@@ -553,10 +554,14 @@ xfs_defer_move(
  * deferred ops state is transferred to the capture structure and the
  * transaction is then ready for the caller to commit it.  If there are no
  * intent items to capture, this function returns NULL.
+ *
+ * If capture_ip is not NULL, the capture structure will obtain an extra
+ * reference to the inode.
  */
 static struct xfs_defer_capture *
 xfs_defer_ops_capture(
-	struct xfs_trans		*tp)
+	struct xfs_trans		*tp,
+	struct xfs_inode		*capture_ip)
 {
 	struct xfs_defer_capture	*dfc;
 
@@ -582,6 +587,15 @@ xfs_defer_ops_capture(
 	/* Preserve the log reservation size. */
 	dfc->dfc_logres = tp->t_log_res;
 
+	/*
+	 * Grab an extra reference to this inode and attach it to the capture
+	 * structure.
+	 */
+	if (capture_ip) {
+		xfs_ihold(capture_ip);
+		dfc->dfc_capture_ip = capture_ip;
+	}
+
 	return dfc;
 }
 
@@ -592,24 +606,33 @@ xfs_defer_ops_release(
 	struct xfs_defer_capture	*dfc)
 {
 	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
+	if (dfc->dfc_capture_ip)
+		xfs_irele(dfc->dfc_capture_ip);
 	kmem_free(dfc);
 }
 
 /*
  * Capture any deferred ops and commit the transaction.  This is the last step
- * needed to finish a log intent item that we recovered from the log.
+ * needed to finish a log intent item that we recovered from the log.  If any
+ * of the deferred ops operate on an inode, the caller must pass in that inode
+ * so that the reference can be transferred to the capture structure.  The
+ * caller must hold ILOCK_EXCL on the inode, and must unlock it before calling
+ * xfs_defer_ops_continue.
  */
 int
 xfs_defer_ops_capture_and_commit(
 	struct xfs_trans		*tp,
+	struct xfs_inode		*capture_ip,
 	struct list_head		*capture_list)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_defer_capture	*dfc;
 	int				error;
 
+	ASSERT(!capture_ip || xfs_isilocked(capture_ip, XFS_ILOCK_EXCL));
+
 	/* If we don't capture anything, commit transaction and exit. */
-	dfc = xfs_defer_ops_capture(tp);
+	dfc = xfs_defer_ops_capture(tp, capture_ip);
 	if (!dfc)
 		return xfs_trans_commit(tp);
 
@@ -626,16 +649,26 @@ xfs_defer_ops_capture_and_commit(
 
 /*
  * Attach a chain of captured deferred ops to a new transaction and free the
- * capture structure.
+ * capture structure.  If an inode was captured, it will be passed back to the
+ * caller with ILOCK_EXCL held and joined to the transaction with lockflags==0.
+ * The caller now owns the inode reference.
  */
 void
 xfs_defer_ops_continue(
 	struct xfs_defer_capture	*dfc,
-	struct xfs_trans		*tp)
+	struct xfs_trans		*tp,
+	struct xfs_inode		**captured_ipp)
 {
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
+	/* Lock and join the captured inode to the new transaction. */
+	if (dfc->dfc_capture_ip) {
+		xfs_ilock(dfc->dfc_capture_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, dfc->dfc_capture_ip, 0);
+	}
+	*captured_ipp = dfc->dfc_capture_ip;
+
 	/* Move captured dfops chain and state to the transaction. */
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 6cde6f0713f7..05472f71fffe 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -82,6 +82,12 @@ struct xfs_defer_capture {
 
 	/* Log reservation saved from the transaction. */
 	unsigned int		dfc_logres;
+
+	/*
+	 * An inode reference that must be maintained to complete the deferred
+	 * work.
+	 */
+	struct xfs_inode	*dfc_capture_ip;
 };
 
 /*
@@ -89,8 +95,9 @@ struct xfs_defer_capture {
  * This doesn't normally happen except log recovery.
  */
 int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
-		struct list_head *capture_list);
-void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp);
+		struct xfs_inode *capture_ip, struct list_head *capture_list);
+void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
+		struct xfs_inode **captured_ipp);
 void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 852411568d14..4570da07eb06 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -513,8 +513,11 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
-	/* Commit transaction, which frees the transaction. */
-	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
+	/*
+	 * Commit transaction, which frees the transaction and saves the inode
+	 * for later replay activities.
+	 */
+	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
 	if (error)
 		goto err_unlock;
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 17d36fe5cfd0..3920542f5736 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -627,7 +627,7 @@ xfs_efi_item_recover(
 
 	}
 
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2bfbcf28b1bd..24b1e2244905 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3813,3 +3813,11 @@ xfs_iunlock2_io_mmap(
 	if (!same_inode)
 		inode_unlock(VFS_I(ip1));
 }
+
+/* Grab an extra reference to the VFS inode. */
+void
+xfs_ihold(
+	struct xfs_inode	*ip)
+{
+	ihold(VFS_I(ip));
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 751a3d1d7d84..e9b0186b594c 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -476,4 +476,6 @@ void xfs_end_io(struct work_struct *work);
 int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 
+void xfs_ihold(struct xfs_inode *ip);
+
 #endif	/* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 001e1585ddc6..a8289adc1b29 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2439,6 +2439,7 @@ xlog_finish_defer_ops(
 {
 	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
+	struct xfs_inode	*ip;
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
@@ -2464,9 +2465,13 @@ xlog_finish_defer_ops(
 		 * from recovering a single intent item.
 		 */
 		list_del_init(&dfc->dfc_list);
-		xfs_defer_ops_continue(dfc, tp);
+		xfs_defer_ops_continue(dfc, tp, &ip);
 
 		error = xfs_trans_commit(tp);
+		if (ip) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			xfs_irele(ip);
+		}
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 0478374add64..ad895b48f365 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -544,7 +544,7 @@ xfs_cui_item_recover(
 	}
 
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 0d8fa707f079..1163f32c3e62 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -567,7 +567,7 @@ xfs_rui_item_recover(
 	}
 
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
 
 abort_error:
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5.2 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-10-02 16:29       ` Darrick J. Wong
@ 2020-10-05  6:25         ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2020-10-05  6:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, david, Brian Foster

On Fri, Oct 02, 2020 at 09:29:58AM -0700, Darrick J. Wong wrote:
> > Instead of coming up with our own inode unlocking and release schemes,
> > can't we just require that the inode is joinged by passing the lock
> > flags to xfs_trans_ijoin, and piggy back on xfs_trans_commit unlocking
> > it in that case?
> 
> Yes, and let's also xfs_iget(capture_ip->i_ino) to increase the incore
> inode's refcount, which would make it so that the caller would still
> unlock and rele the reference that they got.

Please use ihold(VFS_I(capture_ip)) as that is a lot more efficient.

Can you resend the whole 2 series?  I'm lost with all the incremental
updates for individual patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3.2 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  2020-10-04 19:09   ` [PATCH v3.2 " Darrick J. Wong
@ 2020-10-05 16:19     ` Brian Foster
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-10-05 16:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david

On Sun, Oct 04, 2020 at 12:09:39PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In most places in XFS, we have a specific order in which we gather
> resources: grab the inode, allocate a transaction, then lock the inode.
> xfs_bui_item_recover doesn't do it in that order, so fix it to be more
> consistent.  This also makes the error bailout code a bit less weird.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> v3.2: don't remove the iunlock/irele if the defer commit succeeds
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_bmap_item.c |   41 +++++++++++++++++++++++------------------
>  1 file changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index c1f2cc3c42cb..852411568d14 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -475,25 +475,26 @@ xfs_bui_item_recover(
>  	    (bmap->me_flags & ~XFS_BMAP_EXTENT_FLAGS))
>  		return -EFSCORRUPTED;
>  
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
> -			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
> -	if (error)
> -		return error;
> -
> -	budp = xfs_trans_get_bud(tp, buip);
> -
>  	/* Grab the inode. */
> -	error = xfs_iget(mp, tp, bmap->me_owner, 0, XFS_ILOCK_EXCL, &ip);
> +	error = xfs_iget(mp, NULL, bmap->me_owner, 0, 0, &ip);
>  	if (error)
> -		goto err_inode;
> +		return error;
>  
> -	error = xfs_qm_dqattach_locked(ip, false);
> +	error = xfs_qm_dqattach(ip);
>  	if (error)
> -		goto err_inode;
> +		goto err_rele;
>  
>  	if (VFS_I(ip)->i_nlink == 0)
>  		xfs_iflags_set(ip, XFS_IRECOVERY);
>  
> +	/* Allocate transaction and do the work. */
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
> +			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
> +	if (error)
> +		goto err_rele;
> +
> +	budp = xfs_trans_get_bud(tp, buip);
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  
>  	count = bmap->me_len;
> @@ -501,7 +502,7 @@ xfs_bui_item_recover(
>  			whichfork, bmap->me_startoff, bmap->me_startblock,
>  			&count, state);
>  	if (error)
> -		goto err_inode;
> +		goto err_cancel;
>  
>  	if (count > 0) {
>  		ASSERT(bui_type == XFS_BMAP_UNMAP);
> @@ -512,17 +513,21 @@ xfs_bui_item_recover(
>  		xfs_bmap_unmap_extent(tp, ip, &irec);
>  	}
>  
> +	/* Commit transaction, which frees the transaction. */
>  	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> +	if (error)
> +		goto err_unlock;
> +
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	xfs_irele(ip);
> -	return error;
> +	return 0;
>  
> -err_inode:
> +err_cancel:
>  	xfs_trans_cancel(tp);
> -	if (ip) {
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -		xfs_irele(ip);
> -	}
> +err_unlock:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +err_rele:
> +	xfs_irele(ip);
>  	return error;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3.3 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-10-04 19:11   ` [PATCH v3.3 " Darrick J. Wong
@ 2020-10-05 16:20     ` Brian Foster
  2020-10-05 17:01       ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2020-10-05 16:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch

On Sun, Oct 04, 2020 at 12:11:27PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In xfs_bui_item_recover, there exists a use-after-free bug with regards
> to the inode that is involved in the bmap replay operation.  If the
> mapping operation does not complete, we call xfs_bmap_unmap_extent to
> create a deferred op to finish the unmapping work, and we retain a
> pointer to the incore inode.
> 
> Unfortunately, the very next thing we do is commit the transaction and
> drop the inode.  If reclaim tears down the inode before we try to finish
> the defer ops, we dereference garbage and blow up.  Therefore, create a
> way to join inodes to the defer ops freezer so that we can maintain the
> xfs_inode reference until we're done with the inode.
> 
> Note: This imposes the requirement that there be enough memory to keep
> every incore inode in memory throughout recovery.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v3.3: ihold the captured inode and let callers iunlock/irele their own
> reference
> v3.2: rebase on updated defer capture patches
> ---
>  fs/xfs/libxfs/xfs_defer.c  |   43 ++++++++++++++++++++++++++++++++++++++-----
>  fs/xfs/libxfs/xfs_defer.h  |   11 +++++++++--
>  fs/xfs/xfs_bmap_item.c     |    7 +++++--
>  fs/xfs/xfs_extfree_item.c  |    2 +-
>  fs/xfs/xfs_inode.c         |    8 ++++++++
>  fs/xfs/xfs_inode.h         |    2 ++
>  fs/xfs/xfs_log_recover.c   |    7 ++++++-
>  fs/xfs/xfs_refcount_item.c |    2 +-
>  fs/xfs/xfs_rmap_item.c     |    2 +-
>  9 files changed, 71 insertions(+), 13 deletions(-)
> 
...
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 2bfbcf28b1bd..24b1e2244905 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3813,3 +3813,11 @@ xfs_iunlock2_io_mmap(
>  	if (!same_inode)
>  		inode_unlock(VFS_I(ip1));
>  }
> +
> +/* Grab an extra reference to the VFS inode. */
> +void
> +xfs_ihold(
> +	struct xfs_inode	*ip)
> +{
> +	ihold(VFS_I(ip));
> +}

It looks to me that the only reason xfs_irele() exists is for a
tracepoint. We don't have that here, so what's the purpose of the
helper?

Otherwise the patch looks good to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 751a3d1d7d84..e9b0186b594c 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -476,4 +476,6 @@ void xfs_end_io(struct work_struct *work);
>  int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
>  void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
>  
> +void xfs_ihold(struct xfs_inode *ip);
> +
>  #endif	/* __XFS_INODE_H__ */
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 001e1585ddc6..a8289adc1b29 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2439,6 +2439,7 @@ xlog_finish_defer_ops(
>  {
>  	struct xfs_defer_capture *dfc, *next;
>  	struct xfs_trans	*tp;
> +	struct xfs_inode	*ip;
>  	int			error = 0;
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> @@ -2464,9 +2465,13 @@ xlog_finish_defer_ops(
>  		 * from recovering a single intent item.
>  		 */
>  		list_del_init(&dfc->dfc_list);
> -		xfs_defer_ops_continue(dfc, tp);
> +		xfs_defer_ops_continue(dfc, tp, &ip);
>  
>  		error = xfs_trans_commit(tp);
> +		if (ip) {
> +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +			xfs_irele(ip);
> +		}
>  		if (error)
>  			return error;
>  	}
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 0478374add64..ad895b48f365 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -544,7 +544,7 @@ xfs_cui_item_recover(
>  	}
>  
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> -	return xfs_defer_ops_capture_and_commit(tp, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>  
>  abort_error:
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 0d8fa707f079..1163f32c3e62 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -567,7 +567,7 @@ xfs_rui_item_recover(
>  	}
>  
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> -	return xfs_defer_ops_capture_and_commit(tp, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>  
>  abort_error:
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3.3 3/3] xfs: fix an incore inode UAF in xfs_bui_recover
  2020-10-05 16:20     ` Brian Foster
@ 2020-10-05 17:01       ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-10-05 17:01 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, hch

On Mon, Oct 05, 2020 at 12:20:14PM -0400, Brian Foster wrote:
> On Sun, Oct 04, 2020 at 12:11:27PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In xfs_bui_item_recover, there exists a use-after-free bug with regards
> > to the inode that is involved in the bmap replay operation.  If the
> > mapping operation does not complete, we call xfs_bmap_unmap_extent to
> > create a deferred op to finish the unmapping work, and we retain a
> > pointer to the incore inode.
> > 
> > Unfortunately, the very next thing we do is commit the transaction and
> > drop the inode.  If reclaim tears down the inode before we try to finish
> > the defer ops, we dereference garbage and blow up.  Therefore, create a
> > way to join inodes to the defer ops freezer so that we can maintain the
> > xfs_inode reference until we're done with the inode.
> > 
> > Note: This imposes the requirement that there be enough memory to keep
> > every incore inode in memory throughout recovery.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v3.3: ihold the captured inode and let callers iunlock/irele their own
> > reference
> > v3.2: rebase on updated defer capture patches
> > ---
> >  fs/xfs/libxfs/xfs_defer.c  |   43 ++++++++++++++++++++++++++++++++++++++-----
> >  fs/xfs/libxfs/xfs_defer.h  |   11 +++++++++--
> >  fs/xfs/xfs_bmap_item.c     |    7 +++++--
> >  fs/xfs/xfs_extfree_item.c  |    2 +-
> >  fs/xfs/xfs_inode.c         |    8 ++++++++
> >  fs/xfs/xfs_inode.h         |    2 ++
> >  fs/xfs/xfs_log_recover.c   |    7 ++++++-
> >  fs/xfs/xfs_refcount_item.c |    2 +-
> >  fs/xfs/xfs_rmap_item.c     |    2 +-
> >  9 files changed, 71 insertions(+), 13 deletions(-)
> > 
> ...
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 2bfbcf28b1bd..24b1e2244905 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3813,3 +3813,11 @@ xfs_iunlock2_io_mmap(
> >  	if (!same_inode)
> >  		inode_unlock(VFS_I(ip1));
> >  }
> > +
> > +/* Grab an extra reference to the VFS inode. */
> > +void
> > +xfs_ihold(
> > +	struct xfs_inode	*ip)
> > +{
> > +	ihold(VFS_I(ip));
> > +}
> 
> It looks to me that the only reason xfs_irele() exists is for a
> tracepoint. We don't have that here, so what's the purpose of the
> helper?

Wellll... ihold() is a VFS inode function, and I didn't want to force
libxfs to have yet another direct dependency on a VFS function that we'd
then have to port to userspace.

OTOH, userspace totally lacks the concept of refcounting the incore
inodes (and indeed it even seems to allow for aliasing inodes!) so maybe
I'll just do it...

> Otherwise the patch looks good to me:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > index 751a3d1d7d84..e9b0186b594c 100644
> > --- a/fs/xfs/xfs_inode.h
> > +++ b/fs/xfs/xfs_inode.h
> > @@ -476,4 +476,6 @@ void xfs_end_io(struct work_struct *work);
> >  int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
> >  void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
> >  
> > +void xfs_ihold(struct xfs_inode *ip);
> > +
> >  #endif	/* __XFS_INODE_H__ */
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 001e1585ddc6..a8289adc1b29 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -2439,6 +2439,7 @@ xlog_finish_defer_ops(
> >  {
> >  	struct xfs_defer_capture *dfc, *next;
> >  	struct xfs_trans	*tp;
> > +	struct xfs_inode	*ip;
> >  	int			error = 0;
> >  
> >  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > @@ -2464,9 +2465,13 @@ xlog_finish_defer_ops(
> >  		 * from recovering a single intent item.
> >  		 */
> >  		list_del_init(&dfc->dfc_list);
> > -		xfs_defer_ops_continue(dfc, tp);
> > +		xfs_defer_ops_continue(dfc, tp, &ip);
> >  
> >  		error = xfs_trans_commit(tp);
> > +		if (ip) {
> > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +			xfs_irele(ip);
> > +		}
> >  		if (error)
> >  			return error;
> >  	}
> > diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> > index 0478374add64..ad895b48f365 100644
> > --- a/fs/xfs/xfs_refcount_item.c
> > +++ b/fs/xfs/xfs_refcount_item.c
> > @@ -544,7 +544,7 @@ xfs_cui_item_recover(
> >  	}
> >  
> >  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> > -	return xfs_defer_ops_capture_and_commit(tp, capture_list);
> > +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> >  
> >  abort_error:
> >  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> > diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> > index 0d8fa707f079..1163f32c3e62 100644
> > --- a/fs/xfs/xfs_rmap_item.c
> > +++ b/fs/xfs/xfs_rmap_item.c
> > @@ -567,7 +567,7 @@ xfs_rui_item_recover(
> >  	}
> >  
> >  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> > -	return xfs_defer_ops_capture_and_commit(tp, capture_list);
> > +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> >  
> >  abort_error:
> >  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-05 17:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 17:43 [PATCH v3 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
2020-09-29 17:43 ` [PATCH 1/3] xfs: clean up bmap intent item recovery checking Darrick J. Wong
2020-09-29 17:44 ` [PATCH 2/3] xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering Darrick J. Wong
2020-10-02 16:27   ` Brian Foster
2020-10-02 16:30     ` Darrick J. Wong
2020-10-04 19:09   ` [PATCH v3.2 " Darrick J. Wong
2020-10-05 16:19     ` Brian Foster
2020-09-29 17:44 ` [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover Darrick J. Wong
2020-10-02  4:22   ` [PATCH v5.2 " Darrick J. Wong
2020-10-02  7:30     ` Christoph Hellwig
2020-10-02 16:29       ` Darrick J. Wong
2020-10-05  6:25         ` Christoph Hellwig
2020-10-04 19:11   ` [PATCH v3.3 " Darrick J. Wong
2020-10-05 16:20     ` Brian Foster
2020-10-05 17:01       ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.