[PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck
@ 2022-10-02 18:20 Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

This patchset fixes a handful of problems relating to how we get and
release incore inodes in the online scrub code.  The first patch fixes
how we handle DONTCACHE -- our reasons for setting (or clearing it)
depend entirely on the runtime environment at irele time.  Hence we can
refactor iget and irele to use our own wrappers that set that context
appropriately.

The second patch fixes a race between the iget call in the inode core
scrubber and other writer threads that are allocating or freeing inodes
in the same AG by changing the behavior of xchk_iget (and the inode core
scrub setup function) to return either an incore inode or the AGI buffer
so that we can be sure that the inode cannot disappear on us.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iget-fixes
---
 fs/xfs/scrub/common.c |  251 +++++++++++++++++++++++++++++++++++++++++--------
 fs/xfs/scrub/common.h |    8 ++
 fs/xfs/scrub/dir.c    |    2 
 fs/xfs/scrub/inode.c  |  163 ++++++++++++++++++++++++++++----
 fs/xfs/scrub/parent.c |    5 -
 fs/xfs/scrub/scrub.c  |    2 
 fs/xfs/xfs_icache.c   |    3 -
 fs/xfs/xfs_icache.h   |    1 
 8 files changed, 368 insertions(+), 67 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time
  2022-10-02 18:20 [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-15  3:13   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode Darrick J. Wong
  2 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Right now, there are statements scattered all over the online fsck
codebase about how we can't use XFS_IGET_DONTCACHE because of concerns
about scrub's unusual practice of releasing inodes with transactions
held.

However, iget is the wrong place to handle this -- the DONTCACHE state
doesn't matter at all until we try to *release* the inode, and here we
get things wrong in multiple ways:

First, if we /do/ have a transaction, we must NOT drop the inode,
because the inode could have dirty pages, dropping the inode will
trigger writeback, and writeback can trigger a nested transaction.

Second, if the inode already had an active reference and the DONTCACHE
flag set, the icache hit when scrub grabs another ref will not clear
DONTCACHE.  This is sort of by design, since DONTCACHE is now used to
initiate cache drops so that sysadmins can change a file's access mode
between pagecache and DAX.

Third, if we do actually have the last active reference to the inode, we
can set DONTCACHE to avoid polluting the cache.  This is the /one/ case
where we actually want that flag.

Create an xchk_irele helper to encode all that logic and switch the
online fsck code to use it.  Since this now means that nearly all
scrubbers use the same xfs_iget flags, we can wrap them too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.c |   52 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/scrub/common.h |    3 +++
 fs/xfs/scrub/dir.c    |    2 +-
 fs/xfs/scrub/parent.c |    5 ++---
 fs/xfs/scrub/scrub.c  |    2 +-
 5 files changed, 55 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 73ac38c1126e..42a25488bd25 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -710,6 +710,16 @@ xchk_checkpoint_log(
 	return 0;
 }
 
+/* Verify that an inode is allocated ondisk, then return its cached inode. */
+int
+xchk_iget(
+	struct xfs_scrub	*sc,
+	xfs_ino_t		inum,
+	struct xfs_inode	**ipp)
+{
+	return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
+}
+
 /*
  * Given an inode and the scrub control structure, grab either the
  * inode referenced in the control structure or the inode passed in.
@@ -734,8 +744,7 @@ xchk_get_inode(
 	/* Look up the inode, see if the generation number matches. */
 	if (xfs_internal_inum(mp, sc->sm->sm_ino))
 		return -ENOENT;
-	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
-			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
+	error = xchk_iget(sc, sc->sm->sm_ino, &ip);
 	switch (error) {
 	case -ENOENT:
 		/* Inode doesn't exist, just bail out. */
@@ -757,7 +766,7 @@ xchk_get_inode(
 		 * that it no longer exists.
 		 */
 		error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap,
-				XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE);
+				XFS_IGET_UNTRUSTED);
 		if (error)
 			return -ENOENT;
 		error = -EFSCORRUPTED;
@@ -770,7 +779,7 @@ xchk_get_inode(
 		return error;
 	}
 	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
-		xfs_irele(ip);
+		xchk_irele(sc, ip);
 		return -ENOENT;
 	}
 
@@ -778,6 +787,41 @@ xchk_get_inode(
 	return 0;
 }
 
+/* Release an inode, possibly dropping it in the process. */
+void
+xchk_irele(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	if (current->journal_info != NULL) {
+		ASSERT(current->journal_info == sc->tp);
+
+		/*
+		 * If we are in a transaction, we /cannot/ drop the inode
+		 * ourselves, because the VFS will trigger writeback, which
+		 * can require a transaction.  Clear DONTCACHE to force the
+		 * inode to the LRU, where someone else can take care of
+		 * dropping it.
+		 *
+		 * Note that when we grabbed our reference to the inode, it
+		 * could have had an active ref and DONTCACHE set if a sysadmin
+		 * is trying to coerce a change in file access mode.  icache
+		 * hits do not clear DONTCACHE, so we must do it here.
+		 */
+		spin_lock(&VFS_I(ip)->i_lock);
+		VFS_I(ip)->i_state &= ~I_DONTCACHE;
+		spin_unlock(&VFS_I(ip)->i_lock);
+	} else if (atomic_read(&VFS_I(ip)->i_count) == 1) {
+		/*
+		 * If this is the last reference to the inode and the caller
+		 * permits it, set DONTCACHE to avoid thrashing.
+		 */
+		d_mark_dontcache(VFS_I(ip));
+	}
+
+	xfs_irele(ip);
+}
+
 /* Set us up to scrub a file's contents. */
 int
 xchk_setup_inode_contents(
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 0efe6b947d88..7472c41d9cfe 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -137,6 +137,9 @@ int xchk_get_inode(struct xfs_scrub *sc);
 int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks);
 void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp);
 
+int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
+void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip);
+
 /*
  * Don't bother cross-referencing if we already found corruption or cross
  * referencing discrepancies.
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 58b9761db48d..43a1cbf2ac67 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -86,7 +86,7 @@ xchk_dir_check_ftype(
 			xfs_mode_to_ftype(VFS_I(ip)->i_mode));
 	if (ino_dtype != dtype)
 		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
-	xfs_irele(ip);
+	xchk_irele(sdc->sc, ip);
 out:
 	return error;
 }
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index ab182a5cd0c0..38ea04e66468 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -131,7 +131,6 @@ xchk_parent_validate(
 	xfs_ino_t		dnum,
 	bool			*try_again)
 {
-	struct xfs_mount	*mp = sc->mp;
 	struct xfs_inode	*dp = NULL;
 	xfs_nlink_t		expected_nlink;
 	xfs_nlink_t		nlink;
@@ -168,7 +167,7 @@ xchk_parent_validate(
 	 * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a
 	 *  cross referencing error.  Any other error is an operational error.
 	 */
-	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp);
+	error = xchk_iget(sc, dnum, &dp);
 	if (error == -EINVAL || error == -ENOENT) {
 		error = -EFSCORRUPTED;
 		xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error);
@@ -253,7 +252,7 @@ xchk_parent_validate(
 out_unlock:
 	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
 out_rele:
-	xfs_irele(dp);
+	xchk_irele(sc, dp);
 out:
 	return error;
 }
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 7a3557a69fe0..bc9638c7a379 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -181,7 +181,7 @@ xchk_teardown(
 			xfs_iunlock(sc->ip, sc->ilock_flags);
 		if (sc->ip != ip_in &&
 		    !xfs_internal_inum(sc->mp, sc->ip->i_ino))
-			xfs_irele(sc->ip);
+			xchk_irele(sc, sc->ip);
 		sc->ip = NULL;
 	}
 	if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode
  2022-10-02 18:20 [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-15  3:49   ` Dave Chinner
  2 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit d658e, we tried to improve the robustnes of xchk_get_inode in
the face of EINVAL returns from iget by calling xfs_imap to see if the
inobt itself thinks that the inode is allocated.  Unfortunately, that
commit didn't consider the possibility that the inode gets allocated
after iget but before imap.  In this case, the imap call will succeed,
but we turn that into a corruption error and tell userspace the inode is
corrupt.

Avoid this false corruption report by grabbing the AGI header and
retrying the iget before calling imap.  If the iget succeeds, we can
proceed with the usual scrub-by-handle code.  Fix all the incorrect
comments too, since unreadable/corrupt inodes no longer result in EINVAL
returns.

Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.c |  203 ++++++++++++++++++++++++++++++++++++++++---------
 fs/xfs/scrub/common.h |    4 +
 fs/xfs/xfs_icache.c   |    3 -
 fs/xfs/xfs_icache.h   |    1 
 4 files changed, 172 insertions(+), 39 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 42a25488bd25..9a811c5fa8f7 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -635,6 +635,14 @@ xchk_ag_init(
 
 /* Per-scrubber setup functions */
 
+void
+xchk_trans_cancel(
+	struct xfs_scrub	*sc)
+{
+	xfs_trans_cancel(sc->tp);
+	sc->tp = NULL;
+}
+
 /*
  * Grab an empty transaction so that we can re-grab locked buffers if
  * one of our btrees turns out to be cyclic.
@@ -720,6 +728,80 @@ xchk_iget(
 	return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
 }
 
+/*
+ * Try to grab an inode.  If we can't, return the AGI buffer so that the caller
+ * can single-step the loading process to see where things went wrong.
+ *
+ * If the iget succeeds, return 0, a NULL AGI, and the inode.
+ * If the iget fails, return the error, the AGI, and a NULL inode.  This can
+ * include -EINVAL and -ENOENT for invalid inode numbers or inodes that are no
+ * longer allocated; or any other corruption or runtime error.
+ * If the AGI read fails, return the error, a NULL AGI, and NULL inode.
+ * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode.
+ */
+int
+xchk_iget_agi(
+	struct xfs_scrub	*sc,
+	xfs_ino_t		inum,
+	struct xfs_buf		**agi_bpp,
+	struct xfs_inode	**ipp)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_trans	*tp = sc->tp;
+	struct xfs_perag	*pag;
+	int			error;
+
+again:
+	*agi_bpp = NULL;
+	*ipp = NULL;
+	error = 0;
+
+	if (xchk_should_terminate(sc, &error))
+		return error;
+
+	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
+	error = xfs_ialloc_read_agi(pag, tp, agi_bpp);
+	xfs_perag_put(pag);
+	if (error)
+		return error;
+
+	error = xfs_iget(mp, tp, inum,
+			XFS_IGET_NOWAIT | XFS_IGET_UNTRUSTED, 0, ipp);
+	if (error == -EAGAIN) {
+		/*
+		 * Cached inode awaiting inactivation.  Drop the AGI buffer in
+		 * case the inactivation worker is now waiting for it, and try
+		 * the iget again.
+		 */
+		xfs_trans_brelse(tp, *agi_bpp);
+		delay(1);
+		goto again;
+	}
+	if (error == 0) {
+		/* We got the inode, so we can release the AGI. */
+		ASSERT(*ipp != NULL);
+		xfs_trans_brelse(tp, *agi_bpp);
+		*agi_bpp = NULL;
+	}
+
+	return error;
+}
+
+/* Install an inode that we opened by handle for scrubbing. */
+static int
+xchk_install_handle_inode(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
+		xchk_irele(sc, ip);
+		return -ENOENT;
+	}
+
+	sc->ip = ip;
+	return 0;
+}
+
 /*
  * Given an inode and the scrub control structure, grab either the
  * inode referenced in the control structure or the inode passed in.
@@ -731,60 +813,105 @@ xchk_get_inode(
 {
 	struct xfs_imap		imap;
 	struct xfs_mount	*mp = sc->mp;
+	struct xfs_buf		*agi_bp;
 	struct xfs_inode	*ip_in = XFS_I(file_inode(sc->file));
 	struct xfs_inode	*ip = NULL;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, sc->sm->sm_ino);
 	int			error;
 
+	ASSERT(sc->tp == NULL);
+
 	/* We want to scan the inode we already had opened. */
 	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
 		sc->ip = ip_in;
 		return 0;
 	}
 
-	/* Look up the inode, see if the generation number matches. */
+	/* Reject internal metadata files and obviously bad inode numbers. */
 	if (xfs_internal_inum(mp, sc->sm->sm_ino))
 		return -ENOENT;
+	if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino))
+		return -ENOENT;
+
+	/* Try a regular untrusted iget. */
 	error = xchk_iget(sc, sc->sm->sm_ino, &ip);
-	switch (error) {
-	case -ENOENT:
-		/* Inode doesn't exist, just bail out. */
+	if (!error)
+		return xchk_install_handle_inode(sc, ip);
+	if (error == -ENOENT)
 		return error;
-	case 0:
-		/* Got an inode, continue. */
-		break;
-	case -EINVAL:
-		/*
-		 * -EINVAL with IGET_UNTRUSTED could mean one of several
-		 * things: userspace gave us an inode number that doesn't
-		 * correspond to fs space, or doesn't have an inobt entry;
-		 * or it could simply mean that the inode buffer failed the
-		 * read verifiers.
-		 *
-		 * Try just the inode mapping lookup -- if it succeeds, then
-		 * the inode buffer verifier failed and something needs fixing.
-		 * Otherwise, we really couldn't find it so tell userspace
-		 * that it no longer exists.
-		 */
-		error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap,
-				XFS_IGET_UNTRUSTED);
-		if (error)
-			return -ENOENT;
+	if (error != -EINVAL)
+		goto out_error;
+
+	/*
+	 * EINVAL with IGET_UNTRUSTED probably means one of several things:
+	 * userspace gave us an inode number that doesn't correspond to fs
+	 * space; the inode btree lacks a record for this inode; or there is a
+	 * record, and it says this inode is free.
+	 *
+	 * We want to look up this inode in the inobt to distinguish two
+	 * scenarios: (1) the inobt says the inode is free, in which case
+	 * there's nothing to do; and (2) the inobt says the inode is
+	 * allocated, but loading it failed due to corruption.
+	 *
+	 * Allocate a transaction and grab the AGI to prevent inobt activity
+	 * in this AG.  Retry the iget in case someone allocated a new inode
+	 * after the first iget failed.
+	 */
+	error = xchk_trans_alloc(sc, 0);
+	if (error)
+		goto out_error;
+
+	error = xchk_iget_agi(sc, sc->sm->sm_ino, &agi_bp, &ip);
+	if (error == 0) {
+		/* Actually got the inode, so install it. */
+		xchk_trans_cancel(sc);
+		return xchk_install_handle_inode(sc, ip);
+	}
+	if (error == -ENOENT)
+		goto out_gone;
+	if (error != -EINVAL)
+		goto out_cancel;
+
+	/* Ensure that we have protected against inode allocation/freeing. */
+	if (agi_bp == NULL) {
+		ASSERT(agi_bp != NULL);
+		error = -ECANCELED;
+		goto out_cancel;
+	}
+
+	/*
+	 * Untrusted iget failed a second time.  Let's try an inobt lookup.
+	 * If the inobt thinks this the inode neither can exist inside the
+	 * filesystem nor is allocated, return ENOENT to signal that the check
+	 * can be skipped.
+	 *
+	 * If the lookup returns corruption, we'll mark this inode corrupt and
+	 * exit to userspace.  There's little chance of fixing anything until
+	 * the inobt is straightened out, but there's nothing we can do here.
+	 *
+	 * If the lookup encounters any other error, exit to userspace.
+	 *
+	 * If the lookup succeeds, something else must be very wrong in the fs
+	 * such that setting up the incore inode failed in some strange way.
+	 * Treat those as corruptions.
+	 */
+	error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap,
+			XFS_IGET_UNTRUSTED);
+	if (error == -EINVAL || error == -ENOENT)
+		goto out_gone;
+	if (!error)
 		error = -EFSCORRUPTED;
-		fallthrough;
-	default:
-		trace_xchk_op_error(sc,
-				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
-				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
-				error, __return_address);
-		return error;
-	}
-	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
-		xchk_irele(sc, ip);
-		return -ENOENT;
-	}
 
-	sc->ip = ip;
-	return 0;
+out_cancel:
+	xchk_trans_cancel(sc);
+out_error:
+	trace_xchk_op_error(sc, agno, XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
+			error, __return_address);
+	return error;
+out_gone:
+	/* The file is gone, so there's nothing to check. */
+	xchk_trans_cancel(sc);
+	return -ENOENT;
 }
 
 /* Release an inode, possibly dropping it in the process. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 7472c41d9cfe..6a7fe2596841 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -32,6 +32,8 @@ xchk_should_terminate(
 }
 
 int xchk_trans_alloc(struct xfs_scrub *sc, uint resblks);
+void xchk_trans_cancel(struct xfs_scrub *sc);
+
 bool xchk_process_error(struct xfs_scrub *sc, xfs_agnumber_t agno,
 		xfs_agblock_t bno, int *error);
 bool xchk_fblock_process_error(struct xfs_scrub *sc, int whichfork,
@@ -138,6 +140,8 @@ int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks);
 void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp);
 
 int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
+int xchk_iget_agi(struct xfs_scrub *sc, xfs_ino_t inum,
+		struct xfs_buf **agi_bpp, struct xfs_inode **ipp);
 void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip);
 
 /*
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f51a00ae2bea..fc831b3e1e3d 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -769,7 +769,8 @@ xfs_iget(
 	return 0;
 
 out_error_or_again:
-	if (!(flags & XFS_IGET_INCORE) && error == -EAGAIN) {
+	if (!(flags & (XFS_IGET_INCORE | XFS_IGET_NOWAIT)) &&
+	    error == -EAGAIN) {
 		delay(1);
 		goto again;
 	}
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 6cd180721659..305a14c30b73 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -38,6 +38,7 @@ struct xfs_icwalk {
 #define XFS_IGET_UNTRUSTED	0x2
 #define XFS_IGET_DONTCACHE	0x4
 #define XFS_IGET_INCORE		0x8	/* don't read from disk or reinit */
+#define XFS_IGET_NOWAIT		0x10	/* return EAGAIN instead of blocking */
 
 int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
 	     uint flags, uint lock_flags, xfs_inode_t **ipp);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core
  2022-10-02 18:20 [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-15  4:08   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode Darrick J. Wong
  2 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

xchk_get_inode is not quite the right function to be calling from the
inode scrubber setup function.  The common get_inode function either
gets an inode and installs it in the scrub context, or it returns an
error code explaining what happened.  This is acceptable for most file
scrubbers because it is not in their scope to fix corruptions in the
inode core and fork areas that cause iget to fail.

Dealing with these problems is within the scope of the inode scrubber,
however.  If iget fails with EFSCORRUPTED, we need to xchk_inode to flag
that as corruption.  Since we can't get our hands on an incore inode, we
need to hold the AGI to prevent inode allocation activity so that
nothing changes in the inode metadata.

Looking ahead to the inode core repair patches, we will also need to
hold the AGI buffer into xrep_inode so that we can make modifications to
the xfs_dinode structure without any other thread swooping in to
allocate or free the inode.

Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off
use case where the error codes we check for are a little different, and
the return state is much different from the common function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.c |    2 -
 fs/xfs/scrub/common.h |    1 
 fs/xfs/scrub/inode.c  |  163 +++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 144 insertions(+), 22 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 9a811c5fa8f7..dd60c7ced780 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -788,7 +788,7 @@ xchk_iget_agi(
 }
 
 /* Install an inode that we opened by handle for scrubbing. */
-static int
+int
 xchk_install_handle_inode(
 	struct xfs_scrub	*sc,
 	struct xfs_inode	*ip)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 6a7fe2596841..4d621811eb89 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -143,6 +143,7 @@ int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
 int xchk_iget_agi(struct xfs_scrub *sc, xfs_ino_t inum,
 		struct xfs_buf **agi_bpp, struct xfs_inode **ipp);
 void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip);
+int xchk_install_handle_inode(struct xfs_scrub *sc, struct xfs_inode *ip);
 
 /*
  * Don't bother cross-referencing if we already found corruption or cross
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 3b272c86d0ad..c1136b62811b 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -11,8 +11,10 @@
 #include "xfs_mount.h"
 #include "xfs_btree.h"
 #include "xfs_log_format.h"
+#include "xfs_trans.h"
 #include "xfs_inode.h"
 #include "xfs_ialloc.h"
+#include "xfs_icache.h"
 #include "xfs_da_format.h"
 #include "xfs_reflink.h"
 #include "xfs_rmap.h"
@@ -20,6 +22,41 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/* Prepare the attached inode for scrubbing. */
+static inline int
+xchk_prepare_iscrub(
+	struct xfs_scrub	*sc)
+{
+	int			error;
+
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	error = xchk_trans_alloc(sc, 0);
+	if (error)
+		return error;
+
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+	return 0;
+}
+
+/* Install this scrub-by-handle inode and prepare it for scrubbing. */
+static inline int
+xchk_install_handle_iscrub(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	int			error;
+
+	error = xchk_install_handle_inode(sc, ip);
+	if (error)
+		return error;
+
+	return xchk_prepare_iscrub(sc);
+}
 
 /*
  * Grab total control of the inode metadata.  It doesn't matter here if
@@ -30,38 +67,122 @@ int
 xchk_setup_inode(
 	struct xfs_scrub	*sc)
 {
+	struct xfs_imap		imap;
+	struct xfs_inode	*ip;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_inode	*ip_in = XFS_I(file_inode(sc->file));
+	struct xfs_buf		*agi_bp;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, sc->sm->sm_ino);
 	int			error;
 
 	if (xchk_need_fshook_drain(sc))
 		xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN);
 
-	/*
-	 * Try to get the inode.  If the verifiers fail, we try again
-	 * in raw mode.
-	 */
-	error = xchk_get_inode(sc);
-	switch (error) {
-	case 0:
-		break;
-	case -EFSCORRUPTED:
-	case -EFSBADCRC:
-		return xchk_trans_alloc(sc, 0);
-	default:
+	/* We want to scan the opened inode, so lock it and exit. */
+	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
+		sc->ip = ip_in;
+		return xchk_prepare_iscrub(sc);
+	}
+
+	/* Reject internal metadata files and obviously bad inode numbers. */
+	if (xfs_internal_inum(mp, sc->sm->sm_ino))
+		return -ENOENT;
+	if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino))
+		return -ENOENT;
+
+	/* Try a regular untrusted iget. */
+	error = xchk_iget(sc, sc->sm->sm_ino, &ip);
+	if (!error)
+		return xchk_install_handle_iscrub(sc, ip);
+	if (error == -ENOENT)
 		return error;
-	}
+	if (error != -EFSCORRUPTED && error != -EFSBADCRC && error != -EINVAL)
+		goto out_error;
 
-	/* Got the inode, lock it and we're ready to go. */
-	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
-	xfs_ilock(sc->ip, sc->ilock_flags);
+	/*
+	 * EINVAL with IGET_UNTRUSTED probably means one of several things:
+	 * userspace gave us an inode number that doesn't correspond to fs
+	 * space; the inode btree lacks a record for this inode; or there is
+	 * a record, and it says this inode is free.
+	 *
+	 * EFSCORRUPTED/EFSBADCRC could mean that the inode was mappable, but
+	 * some other metadata corruption (e.g. inode forks) prevented
+	 * instantiation of the incore inode.  Or it could mean the inobt is
+	 * corrupt.
+	 *
+	 * We want to look up this inode in the inobt directly to distinguish
+	 * three different scenarios: (1) the inobt says the inode is free,
+	 * in which case there's nothing to do; (2) the inobt is corrupt so we
+	 * should flag the corruption and exit to userspace to let it fix the
+	 * inobt; and (3) the inobt says the inode is allocated, but loading it
+	 * failed due to corruption.
+	 *
+	 * Allocate a transaction and grab the AGI to prevent inobt activity in
+	 * this AG.  Retry the iget in case someone allocated a new inode after
+	 * the first iget failed.
+	 */
 	error = xchk_trans_alloc(sc, 0);
 	if (error)
-		goto out;
-	sc->ilock_flags |= XFS_ILOCK_EXCL;
-	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+		goto out_error;
 
-out:
-	/* scrub teardown will unlock and release the inode for us */
+	error = xchk_iget_agi(sc, sc->sm->sm_ino, &agi_bp, &ip);
+	if (error == 0) {
+		/* Actually got the incore inode, so install it and proceed. */
+		xchk_trans_cancel(sc);
+		return xchk_install_handle_iscrub(sc, ip);
+	}
+	if (error == -ENOENT)
+		goto out_gone;
+	if (error != -EFSCORRUPTED && error != -EFSBADCRC && error != -EINVAL)
+		goto out_cancel;
+
+	/* Ensure that we have protected against inode allocation/freeing. */
+	if (agi_bp == NULL) {
+		ASSERT(agi_bp != NULL);
+		error = -ECANCELED;
+		goto out_cancel;
+	}
+
+	/*
+	 * Untrusted iget failed a second time.  Let's try an inobt lookup.
+	 * If the inobt doesn't think this is an allocated inode then we'll
+	 * return ENOENT to signal that the check can be skipped.
+	 *
+	 * If the lookup signals corruption, we'll mark this inode corrupt and
+	 * exit to userspace.  There's little chance of fixing anything until
+	 * the inobt is straightened out, but there's nothing we can do here.
+	 *
+	 * If the lookup encounters a runtime error, exit to userspace.
+	 */
+	error = xfs_imap(mp, sc->tp, sc->sm->sm_ino, &imap,
+			XFS_IGET_UNTRUSTED);
+	if (error == -EINVAL || error == -ENOENT)
+		goto out_gone;
+	if (error)
+		goto out_cancel;
+
+	/*
+	 * The lookup succeeded.  Chances are the ondisk inode is corrupt and
+	 * preventing iget from reading it.  Retain the scrub transaction and
+	 * the AGI buffer to prevent anyone from allocating or freeing inodes.
+	 * This ensures that we preserve the inconsistency between the inobt
+	 * saying the inode is allocated and the icache being unable to load
+	 * the inode until we can flag the corruption in xchk_inode.  The
+	 * scrub function has to note the corruption, since we're not really
+	 * supposed to do that from the setup function.
+	 */
+	return 0;
+
+out_cancel:
+	xchk_trans_cancel(sc);
+out_error:
+	trace_xchk_op_error(sc, agno, XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
+			error, __return_address);
 	return error;
+out_gone:
+	/* The file is gone, so there's nothing to check. */
+	xchk_trans_cancel(sc);
+	return -ENOENT;
 }
 
 /* Inode core */


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time
  2022-10-02 18:20 ` [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
@ 2022-11-15  3:13   ` Dave Chinner
  2022-11-15  3:34     ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-11-15  3:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Right now, there are statements scattered all over the online fsck
> codebase about how we can't use XFS_IGET_DONTCACHE because of concerns
> about scrub's unusual practice of releasing inodes with transactions
> held.
> 
> However, iget is the wrong place to handle this -- the DONTCACHE state
> doesn't matter at all until we try to *release* the inode, and here we
> get things wrong in multiple ways:
> 
> First, if we /do/ have a transaction, we must NOT drop the inode,
> because the inode could have dirty pages, dropping the inode will
> trigger writeback, and writeback can trigger a nested transaction.
> 
> Second, if the inode already had an active reference and the DONTCACHE
> flag set, the icache hit when scrub grabs another ref will not clear
> DONTCACHE.  This is sort of by design, since DONTCACHE is now used to
> initiate cache drops so that sysadmins can change a file's access mode
> between pagecache and DAX.
> 
> Third, if we do actually have the last active reference to the inode, we
> can set DONTCACHE to avoid polluting the cache.  This is the /one/ case
> where we actually want that flag.
> 
> Create an xchk_irele helper to encode all that logic and switch the
> online fsck code to use it.  Since this now means that nearly all
> scrubbers use the same xfs_iget flags, we can wrap them too.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Ok, I can see what needs to be done here. It seems a bit fragile,
but I don't see a better way at the moment.

That said...

> diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
> index ab182a5cd0c0..38ea04e66468 100644
> --- a/fs/xfs/scrub/parent.c
> +++ b/fs/xfs/scrub/parent.c
> @@ -131,7 +131,6 @@ xchk_parent_validate(
>  	xfs_ino_t		dnum,
>  	bool			*try_again)
>  {
> -	struct xfs_mount	*mp = sc->mp;
>  	struct xfs_inode	*dp = NULL;
>  	xfs_nlink_t		expected_nlink;
>  	xfs_nlink_t		nlink;
> @@ -168,7 +167,7 @@ xchk_parent_validate(
>  	 * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a
>  	 *  cross referencing error.  Any other error is an operational error.
>  	 */
> -	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp);
> +	error = xchk_iget(sc, dnum, &dp);
>  	if (error == -EINVAL || error == -ENOENT) {
>  		error = -EFSCORRUPTED;
>  		xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error);
> @@ -253,7 +252,7 @@ xchk_parent_validate(
>  out_unlock:
>  	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
>  out_rele:
> -	xfs_irele(dp);
> +	xchk_irele(sc, dp);
>  out:
>  	return error;
>  }

Didn't you miss a couple of cases here? THe current upstream code
looks like:

.......
237         /* Drat, parent changed.  Try again! */
238         if (dnum != dp->i_ino) {
239                 xfs_irele(dp);
240                 *try_again = true;
241                 return 0;
242         }
243         xfs_irele(dp);
244
245         /*
246          * '..' didn't change, so check that there was only one entry
247          * for us in the parent.
248          */
249         if (nlink != expected_nlink)
250                 xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
251         return error;
252
253 out_unlock:
254         xfs_iunlock(dp, XFS_IOLOCK_SHARED);
255 out_rele:
256         xfs_irele(dp);
257 out:
258         return error;
259 }

So it looks like you missed the conversion at lines 239 and 243. Of
course, these may have been removed in a prior patchset I've looked
at and forgotten about, but on the surface this looks like missed
conversions.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time
  2022-11-15  3:13   ` Dave Chinner
@ 2022-11-15  3:34     ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2022-11-15  3:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Nov 15, 2022 at 02:13:18PM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Right now, there are statements scattered all over the online fsck
> > codebase about how we can't use XFS_IGET_DONTCACHE because of concerns
> > about scrub's unusual practice of releasing inodes with transactions
> > held.
> > 
> > However, iget is the wrong place to handle this -- the DONTCACHE state
> > doesn't matter at all until we try to *release* the inode, and here we
> > get things wrong in multiple ways:
> > 
> > First, if we /do/ have a transaction, we must NOT drop the inode,
> > because the inode could have dirty pages, dropping the inode will
> > trigger writeback, and writeback can trigger a nested transaction.
> > 
> > Second, if the inode already had an active reference and the DONTCACHE
> > flag set, the icache hit when scrub grabs another ref will not clear
> > DONTCACHE.  This is sort of by design, since DONTCACHE is now used to
> > initiate cache drops so that sysadmins can change a file's access mode
> > between pagecache and DAX.
> > 
> > Third, if we do actually have the last active reference to the inode, we
> > can set DONTCACHE to avoid polluting the cache.  This is the /one/ case
> > where we actually want that flag.
> > 
> > Create an xchk_irele helper to encode all that logic and switch the
> > online fsck code to use it.  Since this now means that nearly all
> > scrubbers use the same xfs_iget flags, we can wrap them too.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Ok, I can see what needs to be done here. It seems a bit fragile,
> but I don't see a better way at the moment.
> 
> That said...
> 
> > diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
> > index ab182a5cd0c0..38ea04e66468 100644
> > --- a/fs/xfs/scrub/parent.c
> > +++ b/fs/xfs/scrub/parent.c
> > @@ -131,7 +131,6 @@ xchk_parent_validate(
> >  	xfs_ino_t		dnum,
> >  	bool			*try_again)
> >  {
> > -	struct xfs_mount	*mp = sc->mp;
> >  	struct xfs_inode	*dp = NULL;
> >  	xfs_nlink_t		expected_nlink;
> >  	xfs_nlink_t		nlink;
> > @@ -168,7 +167,7 @@ xchk_parent_validate(
> >  	 * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a
> >  	 *  cross referencing error.  Any other error is an operational error.
> >  	 */
> > -	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp);
> > +	error = xchk_iget(sc, dnum, &dp);
> >  	if (error == -EINVAL || error == -ENOENT) {
> >  		error = -EFSCORRUPTED;
> >  		xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error);
> > @@ -253,7 +252,7 @@ xchk_parent_validate(
> >  out_unlock:
> >  	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
> >  out_rele:
> > -	xfs_irele(dp);
> > +	xchk_irele(sc, dp);
> >  out:
> >  	return error;
> >  }
> 
> Didn't you miss a couple of cases here? THe current upstream code
> looks like:
> 
> .......
> 237         /* Drat, parent changed.  Try again! */
> 238         if (dnum != dp->i_ino) {
> 239                 xfs_irele(dp);
> 240                 *try_again = true;
> 241                 return 0;
> 242         }
> 243         xfs_irele(dp);
> 244
> 245         /*
> 246          * '..' didn't change, so check that there was only one entry
> 247          * for us in the parent.
> 248          */
> 249         if (nlink != expected_nlink)
> 250                 xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> 251         return error;
> 252
> 253 out_unlock:
> 254         xfs_iunlock(dp, XFS_IOLOCK_SHARED);
> 255 out_rele:
> 256         xfs_irele(dp);
> 257 out:
> 258         return error;
> 259 }
> 
> So it looks like you missed the conversion at lines 239 and 243. Of
> course, these may have been removed in a prior patchset I've looked
> at and forgotten about, but on the surface this looks like missed
> conversions.

Actually, I probably missed it because one of the follow-on fixpatches
in the v23.1 patchbomb removes it entirely:
https://lore.kernel.org/linux-xfs/166473483278.1084804.14032671424392139245.stgit@magnolia/

--D

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode
  2022-10-02 18:20 ` [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode Darrick J. Wong
@ 2022-11-15  3:49   ` Dave Chinner
  2022-11-16  0:53     ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-11-15  3:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> In commit d658e, we tried to improve the robustnes of xchk_get_inode in
> the face of EINVAL returns from iget by calling xfs_imap to see if the
> inobt itself thinks that the inode is allocated.  Unfortunately, that
> commit didn't consider the possibility that the inode gets allocated
> after iget but before imap.  In this case, the imap call will succeed,
> but we turn that into a corruption error and tell userspace the inode is
> corrupt.
> 
> Avoid this false corruption report by grabbing the AGI header and
> retrying the iget before calling imap.  If the iget succeeds, we can
> proceed with the usual scrub-by-handle code.  Fix all the incorrect
> comments too, since unreadable/corrupt inodes no longer result in EINVAL
> returns.
> 
> Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

OK.

> ---
>  fs/xfs/scrub/common.c |  203 ++++++++++++++++++++++++++++++++++++++++---------
>  fs/xfs/scrub/common.h |    4 +
>  fs/xfs/xfs_icache.c   |    3 -
>  fs/xfs/xfs_icache.h   |    1 
>  4 files changed, 172 insertions(+), 39 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 42a25488bd25..9a811c5fa8f7 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -635,6 +635,14 @@ xchk_ag_init(
>  
>  /* Per-scrubber setup functions */
>  
> +void
> +xchk_trans_cancel(
> +	struct xfs_scrub	*sc)
> +{
> +	xfs_trans_cancel(sc->tp);
> +	sc->tp = NULL;
> +}
> +
>  /*
>   * Grab an empty transaction so that we can re-grab locked buffers if
>   * one of our btrees turns out to be cyclic.
> @@ -720,6 +728,80 @@ xchk_iget(
>  	return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
>  }
>  
> +/*
> + * Try to grab an inode.  If we can't, return the AGI buffer so that the caller
> + * can single-step the loading process to see where things went wrong.

"Try to grab an inode in a manner that avoids races with physical inode
allocation. If we can't, return the locked AGI buffer so that the
caller can single-step the loading process to see where things went
wrong."

> + *
> + * If the iget succeeds, return 0, a NULL AGI, and the inode.
> + * If the iget fails, return the error, the AGI, and a NULL inode.  This can

"... the locked AGI, ...."

> + * include -EINVAL and -ENOENT for invalid inode numbers or inodes that are no
> + * longer allocated; or any other corruption or runtime error.
> + * If the AGI read fails, return the error, a NULL AGI, and NULL inode.
> + * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode.
> + */
> +int
> +xchk_iget_agi(
> +	struct xfs_scrub	*sc,
> +	xfs_ino_t		inum,
> +	struct xfs_buf		**agi_bpp,
> +	struct xfs_inode	**ipp)
> +{
> +	struct xfs_mount	*mp = sc->mp;
> +	struct xfs_trans	*tp = sc->tp;
> +	struct xfs_perag	*pag;
> +	int			error;
> +
> +again:
> +	*agi_bpp = NULL;
> +	*ipp = NULL;
> +	error = 0;
> +
> +	if (xchk_should_terminate(sc, &error))
> +		return error;
> +
> +	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
> +	error = xfs_ialloc_read_agi(pag, tp, agi_bpp);
> +	xfs_perag_put(pag);
> +	if (error)
> +		return error;
> +
> +	error = xfs_iget(mp, tp, inum,
> +			XFS_IGET_NOWAIT | XFS_IGET_UNTRUSTED, 0, ipp);

OK, IGET_NOWAIT is new....

> +#define XFS_IGET_NOWAIT		0x10	/* return EAGAIN instead of blocking */

But it doesn't prevent blocking. XFS_IGET_UNTRUSTED means we do a
inobt record lookup (btree buffer locking and IO that can block) as
well as reading the inode cluster from disk if it's not already in
cache. Hence this isn't what I'd call a "non-blocking" or "no wait"
operation. 

AFAICT, what is needed here is avoiding the -retry mechanism- of the
lookup, not "non blocking/no wait" semantics. i.e. this seems
reasonable to get an -EAGAIN error instead of delaying and trying
again if we are using XFS_IGET_NORETRY semantics....

> +	if (error == -EAGAIN) {
> +		/*
> +		 * Cached inode awaiting inactivation.  Drop the AGI buffer in
> +		 * case the inactivation worker is now waiting for it, and try
> +		 * the iget again.
> +		 */

That's not the only reason xfs_iget() could return -EAGAIN,
right? radix_tree_preload() failing can cause -EAGAIN to be returned
from xfs_iget_cache_miss(), as can an instantiation race inserting
the new inode into the radix tree. The cache hit path has a bunch
more cases, too. Perhaps:

		/*
		 * The inode may be in core but temporarily
		 * unavailable and may require the AGI buffer before
		 * it can be returned. Drop the AGI buffer and retry
		 * the lookup.
		 */

> +		xfs_trans_brelse(tp, *agi_bpp);
> +		delay(1);
> +		goto again;
> +	}
> +	if (error == 0) {
> +		/* We got the inode, so we can release the AGI. */
> +		ASSERT(*ipp != NULL);
> +		xfs_trans_brelse(tp, *agi_bpp);
> +		*agi_bpp = NULL;
> +	}
> +
> +	return error;

That's better written as:

	if (error)
		return error;

	/* We got the inode, so we can release the AGI. */
	ASSERT(*ipp != NULL);
	xfs_trans_brelse(tp, *agi_bpp);
	*agi_bpp = NULL;
	return 0;

Other than that, the code makes sense.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core
  2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
@ 2022-11-15  4:08   ` Dave Chinner
  2022-11-16  2:49     ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-11-15  4:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xchk_get_inode is not quite the right function to be calling from the
> inode scrubber setup function.  The common get_inode function either
> gets an inode and installs it in the scrub context, or it returns an
> error code explaining what happened.  This is acceptable for most file
> scrubbers because it is not in their scope to fix corruptions in the
> inode core and fork areas that cause iget to fail.
> 
> Dealing with these problems is within the scope of the inode scrubber,
> however.  If iget fails with EFSCORRUPTED, we need to xchk_inode to flag
> that as corruption.  Since we can't get our hands on an incore inode, we
> need to hold the AGI to prevent inode allocation activity so that
> nothing changes in the inode metadata.
> 
> Looking ahead to the inode core repair patches, we will also need to
> hold the AGI buffer into xrep_inode so that we can make modifications to
> the xfs_dinode structure without any other thread swooping in to
> allocate or free the inode.
> 
> Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off
> use case where the error codes we check for are a little different, and
> the return state is much different from the common function.

The code look fine, but...

... doesn't this mean that xchk_setup_inode() and xchk_get_inode()
now are almost identical apart from the xchk_prepare_iscrub() bits?
This kinda looks like a lot of duplicated but subtly different code
- does xchk_get_inode() still need all that complexity if we are now
doing it in xchk_setup_inode()? If it does, why does
xchk_setup_inode() need to duplicate the code?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode
  2022-11-15  3:49   ` Dave Chinner
@ 2022-11-16  0:53     ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2022-11-16  0:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Nov 15, 2022 at 02:49:54PM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > In commit d658e, we tried to improve the robustnes of xchk_get_inode in
> > the face of EINVAL returns from iget by calling xfs_imap to see if the
> > inobt itself thinks that the inode is allocated.  Unfortunately, that
> > commit didn't consider the possibility that the inode gets allocated
> > after iget but before imap.  In this case, the imap call will succeed,
> > but we turn that into a corruption error and tell userspace the inode is
> > corrupt.
> > 
> > Avoid this false corruption report by grabbing the AGI header and
> > retrying the iget before calling imap.  If the iget succeeds, we can
> > proceed with the usual scrub-by-handle code.  Fix all the incorrect
> > comments too, since unreadable/corrupt inodes no longer result in EINVAL
> > returns.
> > 
> > Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> OK.
> 
> > ---
> >  fs/xfs/scrub/common.c |  203 ++++++++++++++++++++++++++++++++++++++++---------
> >  fs/xfs/scrub/common.h |    4 +
> >  fs/xfs/xfs_icache.c   |    3 -
> >  fs/xfs/xfs_icache.h   |    1 
> >  4 files changed, 172 insertions(+), 39 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 42a25488bd25..9a811c5fa8f7 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -635,6 +635,14 @@ xchk_ag_init(
> >  
> >  /* Per-scrubber setup functions */
> >  
> > +void
> > +xchk_trans_cancel(
> > +	struct xfs_scrub	*sc)
> > +{
> > +	xfs_trans_cancel(sc->tp);
> > +	sc->tp = NULL;
> > +}
> > +
> >  /*
> >   * Grab an empty transaction so that we can re-grab locked buffers if
> >   * one of our btrees turns out to be cyclic.
> > @@ -720,6 +728,80 @@ xchk_iget(
> >  	return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
> >  }
> >  
> > +/*
> > + * Try to grab an inode.  If we can't, return the AGI buffer so that the caller
> > + * can single-step the loading process to see where things went wrong.
> 
> "Try to grab an inode in a manner that avoids races with physical inode
> allocation. If we can't, return the locked AGI buffer so that the
> caller can single-step the loading process to see where things went
> wrong."

Fixed.

> > + *
> > + * If the iget succeeds, return 0, a NULL AGI, and the inode.
> > + * If the iget fails, return the error, the AGI, and a NULL inode.  This can
> 
> "... the locked AGI, ...."
> 
> > + * include -EINVAL and -ENOENT for invalid inode numbers or inodes that are no
> > + * longer allocated; or any other corruption or runtime error.
> > + * If the AGI read fails, return the error, a NULL AGI, and NULL inode.
> > + * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode.
> > + */
> > +int
> > +xchk_iget_agi(
> > +	struct xfs_scrub	*sc,
> > +	xfs_ino_t		inum,
> > +	struct xfs_buf		**agi_bpp,
> > +	struct xfs_inode	**ipp)
> > +{
> > +	struct xfs_mount	*mp = sc->mp;
> > +	struct xfs_trans	*tp = sc->tp;
> > +	struct xfs_perag	*pag;
> > +	int			error;
> > +
> > +again:
> > +	*agi_bpp = NULL;
> > +	*ipp = NULL;
> > +	error = 0;
> > +
> > +	if (xchk_should_terminate(sc, &error))
> > +		return error;
> > +
> > +	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
> > +	error = xfs_ialloc_read_agi(pag, tp, agi_bpp);
> > +	xfs_perag_put(pag);
> > +	if (error)
> > +		return error;
> > +
> > +	error = xfs_iget(mp, tp, inum,
> > +			XFS_IGET_NOWAIT | XFS_IGET_UNTRUSTED, 0, ipp);
> 
> OK, IGET_NOWAIT is new....
> 
> > +#define XFS_IGET_NOWAIT		0x10	/* return EAGAIN instead of blocking */
> 
> But it doesn't prevent blocking. XFS_IGET_UNTRUSTED means we do a
> inobt record lookup (btree buffer locking and IO that can block) as
> well as reading the inode cluster from disk if it's not already in
> cache. Hence this isn't what I'd call a "non-blocking" or "no wait"
> operation. 
> 
> AFAICT, what is needed here is avoiding the -retry mechanism- of the
> lookup, not "non blocking/no wait" semantics. i.e. this seems
> reasonable to get an -EAGAIN error instead of delaying and trying
> again if we are using XFS_IGET_NORETRY semantics....

Aha, yes, thank you for the better name. :)

/* Return -EAGAIN immediately if the inode is unavailable. */
#define XFS_IGET_NORETRY		(1U << 5)

> > +	if (error == -EAGAIN) {
> > +		/*
> > +		 * Cached inode awaiting inactivation.  Drop the AGI buffer in
> > +		 * case the inactivation worker is now waiting for it, and try
> > +		 * the iget again.
> > +		 */
> 
> That's not the only reason xfs_iget() could return -EAGAIN,
> right? radix_tree_preload() failing can cause -EAGAIN to be returned
> from xfs_iget_cache_miss(), as can an instantiation race inserting
> the new inode into the radix tree. The cache hit path has a bunch
> more cases, too. Perhaps:

Yes, I suppose that's possible if the incore inode gets evicted and
someone else is reinstantiating it at the same time we're looking for
it...

> 		/*
> 		 * The inode may be in core but temporarily
> 		 * unavailable and may require the AGI buffer before
> 		 * it can be returned. Drop the AGI buffer and retry
> 		 * the lookup.
> 		 */

...so yes, this is a better explanation of what's going on.

> > +		xfs_trans_brelse(tp, *agi_bpp);
> > +		delay(1);
> > +		goto again;
> > +	}
> > +	if (error == 0) {
> > +		/* We got the inode, so we can release the AGI. */
> > +		ASSERT(*ipp != NULL);
> > +		xfs_trans_brelse(tp, *agi_bpp);
> > +		*agi_bpp = NULL;
> > +	}
> > +
> > +	return error;
> 
> That's better written as:
> 
> 	if (error)
> 		return error;
> 
> 	/* We got the inode, so we can release the AGI. */
> 	ASSERT(*ipp != NULL);
> 	xfs_trans_brelse(tp, *agi_bpp);
> 	*agi_bpp = NULL;
> 	return 0;
> 
> Other than that, the code makes sense.

Fixed.  Thanks for the review.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core
  2022-11-15  4:08   ` Dave Chinner
@ 2022-11-16  2:49     ` Darrick J. Wong
  2022-11-17  1:15       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-11-16  2:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Nov 15, 2022 at 03:08:16PM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > xchk_get_inode is not quite the right function to be calling from the
> > inode scrubber setup function.  The common get_inode function either
> > gets an inode and installs it in the scrub context, or it returns an
> > error code explaining what happened.  This is acceptable for most file
> > scrubbers because it is not in their scope to fix corruptions in the
> > inode core and fork areas that cause iget to fail.
> > 
> > Dealing with these problems is within the scope of the inode scrubber,
> > however.  If iget fails with EFSCORRUPTED, we need to xchk_inode to flag
> > that as corruption.  Since we can't get our hands on an incore inode, we
> > need to hold the AGI to prevent inode allocation activity so that
> > nothing changes in the inode metadata.
> > 
> > Looking ahead to the inode core repair patches, we will also need to
> > hold the AGI buffer into xrep_inode so that we can make modifications to
> > the xfs_dinode structure without any other thread swooping in to
> > allocate or free the inode.
> > 
> > Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off
> > use case where the error codes we check for are a little different, and
> > the return state is much different from the common function.
> 
> The code look fine, but...
> 
> ... doesn't this mean that xchk_setup_inode() and xchk_get_inode()
> now are almost identical apart from the xchk_prepare_iscrub() bits?

Yes, they're /nearly/ identical in the helper functions they call, but
they're not so similar in intent and how they handle @error values:

xchk_setup_inode prepares to check or repair an inode record, so it must
continue the scrub operation even if the inode/inobt verifiers cause
xfs_iget to return EFSCORRUPTED.  This is done by attaching the locked
AGI buffer to the scrub transaction and returning 0 to move on to the
actual scrub.  (Later, the online inode repair code will also want the
xfs_imap structure so that it can reset the ondisk xfs_dinode
structure.)

xchk_get_inode retrieves an inode on behalf of a scrubber that operates
on an incore inode -- data/attr/cow forks, directories, xattrs,
symlinks, parent pointers, etc.  If the inode/inobt verifiers fail and
xfs_iget returns EFSCORRUPTED, we want to exit to userspace (because the
caller should be fix the inode first) and drop everything we acquired
along the way.

A behavior common to both functions is that it's possible that xfs_scrub
asked for a scrub-by-handle concurrent with the inode being freed or the
passed-in inumber is invalid.  In this case, we call xfs_imap to see if
the inobt index thinks the inode is allocated, and return ENOENT
("nothing to check here") to userspace if this is not the case.  The
imap lookup is why both functions call xchk_iget_agi.

> This kinda looks like a lot of duplicated but subtly different code
> - does xchk_get_inode() still need all that complexity if we are now
> doing it in xchk_setup_inode()?  If it does, why does
> xchk_setup_inode() need to duplicate the code?

So yes, we do need the complexity of both functions because the
postconditions of the two functions are rather different.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core
  2022-11-16  2:49     ` Darrick J. Wong
@ 2022-11-17  1:15       ` Dave Chinner
  2022-11-17 20:20         ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-11-17  1:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Nov 15, 2022 at 06:49:14PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 15, 2022 at 03:08:16PM +1100, Dave Chinner wrote:
> > On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > xchk_get_inode is not quite the right function to be calling from the
> > > inode scrubber setup function.  The common get_inode function either
> > > gets an inode and installs it in the scrub context, or it returns an
> > > error code explaining what happened.  This is acceptable for most file
> > > scrubbers because it is not in their scope to fix corruptions in the
> > > inode core and fork areas that cause iget to fail.
> > > 
> > > Dealing with these problems is within the scope of the inode scrubber,
> > > however.  If iget fails with EFSCORRUPTED, we need to xchk_inode to flag
> > > that as corruption.  Since we can't get our hands on an incore inode, we
> > > need to hold the AGI to prevent inode allocation activity so that
> > > nothing changes in the inode metadata.
> > > 
> > > Looking ahead to the inode core repair patches, we will also need to
> > > hold the AGI buffer into xrep_inode so that we can make modifications to
> > > the xfs_dinode structure without any other thread swooping in to
> > > allocate or free the inode.
> > > 
> > > Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off
> > > use case where the error codes we check for are a little different, and
> > > the return state is much different from the common function.
> > 
> > The code look fine, but...
> > 
> > ... doesn't this mean that xchk_setup_inode() and xchk_get_inode()
> > now are almost identical apart from the xchk_prepare_iscrub() bits?
> 
> Yes, they're /nearly/ identical in the helper functions they call, but
> they're not so similar in intent and how they handle @error values:
> 
> xchk_setup_inode prepares to check or repair an inode record, so it must
> continue the scrub operation even if the inode/inobt verifiers cause
> xfs_iget to return EFSCORRUPTED.  This is done by attaching the locked
> AGI buffer to the scrub transaction and returning 0 to move on to the
> actual scrub.  (Later, the online inode repair code will also want the
> xfs_imap structure so that it can reset the ondisk xfs_dinode
> structure.)
> 
> xchk_get_inode retrieves an inode on behalf of a scrubber that operates
> on an incore inode -- data/attr/cow forks, directories, xattrs,
> symlinks, parent pointers, etc.  If the inode/inobt verifiers fail and
> xfs_iget returns EFSCORRUPTED, we want to exit to userspace (because the
> caller should be fix the inode first) and drop everything we acquired
> along the way.
> 
> A behavior common to both functions is that it's possible that xfs_scrub
> asked for a scrub-by-handle concurrent with the inode being freed or the
> passed-in inumber is invalid.  In this case, we call xfs_imap to see if
> the inobt index thinks the inode is allocated, and return ENOENT
> ("nothing to check here") to userspace if this is not the case.  The
> imap lookup is why both functions call xchk_iget_agi.

Ok, so given all this, all I really want then is better names for
the functions, as "setup" and "get" don't convey any of this. :)

Perhaps xchk_setup_inode() -> xchk_iget_for_record_check() and
xchk_get_inode() -> xchk_iget_for_scrubbing(). This gives an
indication taht they are being used for different purposes, and the
implementation is tailored to the requirements of those specific
operations....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core
  2022-11-17  1:15       ` Dave Chinner
@ 2022-11-17 20:20         ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2022-11-17 20:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Nov 17, 2022 at 12:15:48PM +1100, Dave Chinner wrote:
> On Tue, Nov 15, 2022 at 06:49:14PM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 15, 2022 at 03:08:16PM +1100, Dave Chinner wrote:
> > > On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > xchk_get_inode is not quite the right function to be calling from the
> > > > inode scrubber setup function.  The common get_inode function either
> > > > gets an inode and installs it in the scrub context, or it returns an
> > > > error code explaining what happened.  This is acceptable for most file
> > > > scrubbers because it is not in their scope to fix corruptions in the
> > > > inode core and fork areas that cause iget to fail.
> > > > 
> > > > Dealing with these problems is within the scope of the inode scrubber,
> > > > however.  If iget fails with EFSCORRUPTED, we need to xchk_inode to flag
> > > > that as corruption.  Since we can't get our hands on an incore inode, we
> > > > need to hold the AGI to prevent inode allocation activity so that
> > > > nothing changes in the inode metadata.
> > > > 
> > > > Looking ahead to the inode core repair patches, we will also need to
> > > > hold the AGI buffer into xrep_inode so that we can make modifications to
> > > > the xfs_dinode structure without any other thread swooping in to
> > > > allocate or free the inode.
> > > > 
> > > > Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off
> > > > use case where the error codes we check for are a little different, and
> > > > the return state is much different from the common function.
> > > 
> > > The code look fine, but...
> > > 
> > > ... doesn't this mean that xchk_setup_inode() and xchk_get_inode()
> > > now are almost identical apart from the xchk_prepare_iscrub() bits?
> > 
> > Yes, they're /nearly/ identical in the helper functions they call, but
> > they're not so similar in intent and how they handle @error values:
> > 
> > xchk_setup_inode prepares to check or repair an inode record, so it must
> > continue the scrub operation even if the inode/inobt verifiers cause
> > xfs_iget to return EFSCORRUPTED.  This is done by attaching the locked
> > AGI buffer to the scrub transaction and returning 0 to move on to the
> > actual scrub.  (Later, the online inode repair code will also want the
> > xfs_imap structure so that it can reset the ondisk xfs_dinode
> > structure.)
> > 
> > xchk_get_inode retrieves an inode on behalf of a scrubber that operates
> > on an incore inode -- data/attr/cow forks, directories, xattrs,
> > symlinks, parent pointers, etc.  If the inode/inobt verifiers fail and
> > xfs_iget returns EFSCORRUPTED, we want to exit to userspace (because the
> > caller should be fix the inode first) and drop everything we acquired
> > along the way.
> > 
> > A behavior common to both functions is that it's possible that xfs_scrub
> > asked for a scrub-by-handle concurrent with the inode being freed or the
> > passed-in inumber is invalid.  In this case, we call xfs_imap to see if
> > the inobt index thinks the inode is allocated, and return ENOENT
> > ("nothing to check here") to userspace if this is not the case.  The
> > imap lookup is why both functions call xchk_iget_agi.
> 
> Ok, so given all this, all I really want then is better names for
> the functions, as "setup" and "get" don't convey any of this. :)
> 
> Perhaps xchk_setup_inode() -> xchk_iget_for_record_check() and

I'd rather make this function static to inode.c and export a const
global struct xchk_meta_ops pointing to this function.  There's really
no need for the external declaration aside from populating the
meta_scrub_ops table in scrub.c.  The reason why I haven't done that
already is that doing that cleanup will likely cause ~23 merge conflicts
all the way down the branch as I add online repair functions.  Perhaps
the next time I make a branchwide change.

Second, xchk_setup_inode doesn't necessarily return a cached inode,
which is what most iget functions do -- if the read fails, it'll lock
the AGI buffer to the scrub transaction.

I haven't any strong objections to renaming this
xchk_setup_inode_record, if that's what's needed to get this patchset
through review.

> xchk_get_inode() -> xchk_iget_for_scrubbing(). This gives an
> indication taht they are being used for different purposes, and the
> implementation is tailored to the requirements of those specific
> operations....

I'll make this change, however.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-11-17 20:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 18:20 [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck Darrick J. Wong
2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
2022-11-15  4:08   ` Dave Chinner
2022-11-16  2:49     ` Darrick J. Wong
2022-11-17  1:15       ` Dave Chinner
2022-11-17 20:20         ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
2022-11-15  3:13   ` Dave Chinner
2022-11-15  3:34     ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode Darrick J. Wong
2022-11-15  3:49   ` Dave Chinner
2022-11-16  0:53     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).