linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Subject: [PATCH 2/3] xfs: failed delalloc conversion results in bad extent lists
Date: Tue, 14 Feb 2023 16:51:13 +1100	[thread overview]
Message-ID: <20230214055114.4141947-3-david@fromorbit.com> (raw)
In-Reply-To: <20230214055114.4141947-1-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

If we fail delayed allocation conversion because we cannot allocate
blocks, we end up in the situation where the inode extent list is
effectively corrupt and unresolvable. Whilst we have dirty data in
memory that we cannot allocate space for, we cannot write that data
back to disk. Unmounting a filesystem in this state results in data
loss.

In situations where we end up with a corrupt extent list in memory,
we can also be asked to convert a delayed region that does not have
a delalloc extent backing it. This should be considered a
corruption, too, not a "try again later" error.

These conversion failures result in the inode being sick and needing
repair, but we don't mark all the conditions that can lead to bmap
sickness being marked. Make sure that the error conditions that
indicate corruption are properly marked.

Further, if we trip over these corruptions conditions, we then have
to reclaim an inode that has unresolvable delayed allocation extents
attached to the inode. This generally happens at unmount and inode
inactivation will fire assert failures because we've left stray
delayed allocation extents behind on the inode. Hence we need to
ensure that we only trigger the stale delalloc extent checks if the
inode is fully healthy.

Even then, this may not be enough, because the inactivation code
assumes that there will be no stray delayed extents unless the
filesystem is shut down. If the inode is unhealthy, we need to
ensure we clean up delayed allocation extents within EOF because
the VFS has already tossed the data away. Hence there's no longer
any data over the delalloc extents to write back, so we need to also
toss the delayed allocation extents to ensure we release the space
reservation the delalloc extent holds. Failure to punch delalloc
extents in this case results in assert failures during unmount when
the delalloc block counter is torn down.

This all needs to be in place before the next patch which resolves a
bug in the iomap code that discards delalloc extents backing dirty
pages on writeback error without discarding the dirty data. Hence we
need to be able to handle delalloc extents in inode cleanup sanely,
rather than rely on incorrectly punching the delalloc extents on the
first writeback error that occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 13 ++++++++++---
 fs/xfs/xfs_icache.c      |  4 +++-
 fs/xfs/xfs_inode.c       | 10 ++++++++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 958e4bb2e51e..fb718a5825d5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4553,8 +4553,12 @@ xfs_bmapi_convert_delalloc(
 		 * should only happen for the COW fork, where another thread
 		 * might have moved the extent to the data fork in the meantime.
 		 */
-		WARN_ON_ONCE(whichfork != XFS_COW_FORK);
-		error = -EAGAIN;
+		if (whichfork != XFS_COW_FORK) {
+			xfs_bmap_mark_sick(ip, whichfork);
+			error = -EFSCORRUPTED;
+		} else {
+			error = -EAGAIN;
+		}
 		goto out_trans_cancel;
 	}
 
@@ -4598,8 +4602,11 @@ xfs_bmapi_convert_delalloc(
 		bma.prev.br_startoff = NULLFILEOFF;
 
 	error = xfs_bmapi_allocate(&bma);
-	if (error)
+	if (error) {
+		if ((error == -EFSCORRUPTED) || (error == -EFSBADCRC))
+			xfs_bmap_mark_sick(ip, whichfork);
 		goto out_finish;
+	}
 
 	error = -ENOSPC;
 	if (WARN_ON_ONCE(bma.blkno == NULLFSBLOCK))
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index ddeaccc04aec..4354b6639dec 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -24,6 +24,7 @@
 #include "xfs_ialloc.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
+#include "xfs_health.h"
 
 #include <linux/iversion.h>
 
@@ -1810,7 +1811,8 @@ xfs_inodegc_set_reclaimable(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_perag	*pag;
 
-	if (!xfs_is_shutdown(mp) && ip->i_delayed_blks) {
+	if (ip->i_delayed_blks && xfs_inode_is_healthy(ip) &&
+	    !xfs_is_shutdown(mp)) {
 		xfs_check_delalloc(ip, XFS_DATA_FORK);
 		xfs_check_delalloc(ip, XFS_COW_FORK);
 		ASSERT(0);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d354ea2b74f9..06f1229ef628 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -37,6 +37,7 @@
 #include "xfs_reflink.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
+#include "xfs_health.h"
 
 struct kmem_cache *xfs_inode_cache;
 
@@ -1738,6 +1739,15 @@ xfs_inactive(
 		if (xfs_can_free_eofblocks(ip, true))
 			xfs_free_eofblocks(ip);
 
+		/*
+		 * If the inode is sick, then it might have delalloc extents
+		 * within EOF that we were unable to convert. We have to punch
+		 * them out here to release the reservation as there is no
+		 * longer any data to write back into the delalloc range now.
+		 */
+		if (!xfs_inode_is_healthy(ip))
+			xfs_bmap_punch_delalloc_range(ip, 0,
+						i_size_read(VFS_I(ip)));
 		goto out;
 	}
 
-- 
2.39.0


  parent reply	other threads:[~2023-02-14  5:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-14  5:51 [PATCH 0/3] xfs, iomap: fix writeback failure handling Dave Chinner
2023-02-14  5:51 ` [PATCH 1/3] xfs: report block map corruption errors to the health tracking system Dave Chinner
2023-02-14  8:03   ` Christoph Hellwig
2023-02-14 22:21     ` Dave Chinner
2023-02-14  5:51 ` Dave Chinner [this message]
2023-02-14  8:13   ` [PATCH 2/3] xfs: failed delalloc conversion results in bad extent lists Christoph Hellwig
2023-02-14 22:26     ` Dave Chinner
2023-02-14  5:51 ` [PATCH 3/3] xfs, iomap: ->discard_folio() is broken so remove it Dave Chinner
2023-02-14  8:14   ` Christoph Hellwig
2023-02-14 18:10   ` Brian Foster
2023-02-14 22:20     ` Dave Chinner
2023-02-15  1:26       ` Dave Chinner
2023-02-15 15:25       ` Brian Foster
2023-02-15 23:03         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230214055114.4141947-3-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).