[PATCH] xfs: reset child dir '..' entry when unlinking child

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xfs: reset child dir '..' entry when unlinking child
@ 2021-07-03  3:02 Darrick J. Wong
  2021-07-05  8:20 ` Gao Xiang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Darrick J. Wong @ 2021-07-03  3:02 UTC (permalink / raw)
  To: xfs

From: Darrick J. Wong <djwong@kernel.org>

While running xfs/168, I noticed a second source of post-shrink
corruption errors causing shutdowns.

Let's say that directory B has a low inode number and is a child of
directory A, which has a high number.  If B is empty but open, and
unlinked from A, B's dotdot link continues to point to A.  If A is then
unlinked and the filesystem shrunk so that A is no longer a valid inode,
a subsequent AIL push of B will trip the inode verifiers because the
dotdot entry points outside of the filesystem.

To avoid this problem, reset B's dotdot entry to the root directory when
unlinking directories, since the root directory cannot be removed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 52be5c6d0b3b..03e25246e936 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2817,6 +2817,19 @@ xfs_remove(
 		error = xfs_droplink(tp, ip);
 		if (error)
 			goto out_trans_cancel;
+
+		/*
+		 * Point the unlinked child directory's ".." entry to the root
+		 * directory to eliminate back-references to inodes that may
+		 * get freed before the child directory is closed.  If the fs
+		 * gets shrunk, this can lead to dirent inode validation errors.
+		 */
+		if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) {
+			error = xfs_dir_replace(tp, ip, &xfs_name_dotdot,
+					tp->t_mountp->m_sb.sb_rootino, 0);
+			if (error)
+				return error;
+		}
 	} else {
 		/*
 		 * When removing a non-directory we need to log the parent

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-03  3:02 [PATCH] xfs: reset child dir '..' entry when unlinking child Darrick J. Wong
@ 2021-07-05  8:20 ` Gao Xiang
  2021-07-05  8:41 ` Christoph Hellwig
  2021-07-05 22:09 ` Dave Chinner
  2 siblings, 0 replies; 9+ messages in thread
From: Gao Xiang @ 2021-07-05  8:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While running xfs/168, I noticed a second source of post-shrink
> corruption errors causing shutdowns.
> 
> Let's say that directory B has a low inode number and is a child of
> directory A, which has a high number.  If B is empty but open, and
> unlinked from A, B's dotdot link continues to point to A.  If A is then
> unlinked and the filesystem shrunk so that A is no longer a valid inode,
> a subsequent AIL push of B will trip the inode verifiers because the
> dotdot entry points outside of the filesystem.
> 
> To avoid this problem, reset B's dotdot entry to the root directory when
> unlinking directories, since the root directory cannot be removed.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

I think it is reasonable to clean dotdot for all deaddir
cases (compared with skipping validating such dotdot dirent which
sounds a bit hacky though since dotdot was actually outdated...)

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/xfs/xfs_inode.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 52be5c6d0b3b..03e25246e936 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2817,6 +2817,19 @@ xfs_remove(
>  		error = xfs_droplink(tp, ip);
>  		if (error)
>  			goto out_trans_cancel;
> +
> +		/*
> +		 * Point the unlinked child directory's ".." entry to the root
> +		 * directory to eliminate back-references to inodes that may
> +		 * get freed before the child directory is closed.  If the fs
> +		 * gets shrunk, this can lead to dirent inode validation errors.
> +		 */
> +		if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) {
> +			error = xfs_dir_replace(tp, ip, &xfs_name_dotdot,
> +					tp->t_mountp->m_sb.sb_rootino, 0);
> +			if (error)
> +				return error;
> +		}
>  	} else {
>  		/*
>  		 * When removing a non-directory we need to log the parent

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-03  3:02 [PATCH] xfs: reset child dir '..' entry when unlinking child Darrick J. Wong
  2021-07-05  8:20 ` Gao Xiang
@ 2021-07-05  8:41 ` Christoph Hellwig
  2021-07-06 18:35   ` Darrick J. Wong
  2021-07-05 22:09 ` Dave Chinner
  2 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-07-05  8:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While running xfs/168, I noticed a second source of post-shrink
> corruption errors causing shutdowns.
> 
> Let's say that directory B has a low inode number and is a child of
> directory A, which has a high number.  If B is empty but open, and
> unlinked from A, B's dotdot link continues to point to A.  If A is then
> unlinked and the filesystem shrunk so that A is no longer a valid inode,
> a subsequent AIL push of B will trip the inode verifiers because the
> dotdot entry points outside of the filesystem.
> 
> To avoid this problem, reset B's dotdot entry to the root directory when
> unlinking directories, since the root directory cannot be removed.

Uggh.  This causes extra overhead for every remove.  Can't we make
the verifieds deal with this situation instead of creating extra
overhead?  If we can't please at least limit it to file systems that do
have parent pointers enabled.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-03  3:02 [PATCH] xfs: reset child dir '..' entry when unlinking child Darrick J. Wong
  2021-07-05  8:20 ` Gao Xiang
  2021-07-05  8:41 ` Christoph Hellwig
@ 2021-07-05 22:09 ` Dave Chinner
  2021-07-05 23:20   ` Gao Xiang
  2 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2021-07-05 22:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While running xfs/168, I noticed a second source of post-shrink
> corruption errors causing shutdowns.
> 
> Let's say that directory B has a low inode number and is a child of
> directory A, which has a high number.  If B is empty but open, and
> unlinked from A, B's dotdot link continues to point to A.  If A is then
> unlinked and the filesystem shrunk so that A is no longer a valid inode,
> a subsequent AIL push of B will trip the inode verifiers because the
> dotdot entry points outside of the filesystem.

So we have a directory inode that is empty and unlinked but held
open, with a back pointer to an invalid inode number? Which can
never be followed, because the directory has been unlinked.

Can't this be handled in the inode verifier? This seems to me to
be a pretty clear cut case where the ".." back pointer should
always be considered invalid (because the parent dir has no
existence guarantee once the child has been removed from it), not
just in the situation where the filesystem has been shrunk...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-05 22:09 ` Dave Chinner
@ 2021-07-05 23:20   ` Gao Xiang
  2021-07-06 21:49     ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Gao Xiang @ 2021-07-05 23:20 UTC (permalink / raw)
  To: Dave Chinner, Christoph Hellwig; +Cc: Darrick J. Wong, xfs

Hi Dave and Christoph,

On Tue, Jul 06, 2021 at 08:09:25AM +1000, Dave Chinner wrote:
> On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > While running xfs/168, I noticed a second source of post-shrink
> > corruption errors causing shutdowns.
> > 
> > Let's say that directory B has a low inode number and is a child of
> > directory A, which has a high number.  If B is empty but open, and
> > unlinked from A, B's dotdot link continues to point to A.  If A is then
> > unlinked and the filesystem shrunk so that A is no longer a valid inode,
> > a subsequent AIL push of B will trip the inode verifiers because the
> > dotdot entry points outside of the filesystem.
> 
> So we have a directory inode that is empty and unlinked but held
> open, with a back pointer to an invalid inode number? Which can
> never be followed, because the directory has been unlinked.
> 
> Can't this be handled in the inode verifier? This seems to me to
> be a pretty clear cut case where the ".." back pointer should
> always be considered invalid (because the parent dir has no
> existence guarantee once the child has been removed from it), not
> just in the situation where the filesystem has been shrunk...

Yes, I agree all of this, this field can be handled by the inode
verifier. The only concern I can think out might be fs freeze
with a directory inode that is empty and unlinked but held open,
and then recovery on a unpatched old kernels, not sure if such
case can be handled properly by old kernel verifier.

Otherwise, it's also ok I think.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-05  8:41 ` Christoph Hellwig
@ 2021-07-06 18:35   ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2021-07-06 18:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Jul 05, 2021 at 09:41:03AM +0100, Christoph Hellwig wrote:
> On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > While running xfs/168, I noticed a second source of post-shrink
> > corruption errors causing shutdowns.
> > 
> > Let's say that directory B has a low inode number and is a child of
> > directory A, which has a high number.  If B is empty but open, and
> > unlinked from A, B's dotdot link continues to point to A.  If A is then
> > unlinked and the filesystem shrunk so that A is no longer a valid inode,
> > a subsequent AIL push of B will trip the inode verifiers because the
> > dotdot entry points outside of the filesystem.
> > 
> > To avoid this problem, reset B's dotdot entry to the root directory when
> > unlinking directories, since the root directory cannot be removed.
> 
> Uggh.  This causes extra overhead for every remove.

Not that much overhead.  A child directory can only be unlinked if it's
empty; empty directories by definition contain only a dotdot entry,
which meanns they're in shortform format; and we already have to log the
inode to reflect the i_nlinks change.

> Can't we make
> the verifieds deal with this situation instead of creating extra
> overhead?

I'll address that in the other thread suggesting I "just fix the
verifiers".

> If we can't please at least limit it to file systems that do
> have parent pointers enabled.

Parent pointers haven't been merged yet; this is the '..' entry that
has been stored in every directory since the beginning.

--D

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-05 23:20   ` Gao Xiang
@ 2021-07-06 21:49     ` Darrick J. Wong
  2021-07-06 23:07       ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2021-07-06 21:49 UTC (permalink / raw)
  To: Dave Chinner, Christoph Hellwig, xfs

On Tue, Jul 06, 2021 at 07:20:21AM +0800, Gao Xiang wrote:
> Hi Dave and Christoph,
> 
> On Tue, Jul 06, 2021 at 08:09:25AM +1000, Dave Chinner wrote:
> > On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > While running xfs/168, I noticed a second source of post-shrink
> > > corruption errors causing shutdowns.
> > > 
> > > Let's say that directory B has a low inode number and is a child of
> > > directory A, which has a high number.  If B is empty but open, and
> > > unlinked from A, B's dotdot link continues to point to A.  If A is then
> > > unlinked and the filesystem shrunk so that A is no longer a valid inode,
> > > a subsequent AIL push of B will trip the inode verifiers because the
> > > dotdot entry points outside of the filesystem.
> > 
> > So we have a directory inode that is empty and unlinked but held
> > open, with a back pointer to an invalid inode number? Which can
> > never be followed, because the directory has been unlinked.
> > 
> > Can't this be handled in the inode verifier? This seems to me to
> > be a pretty clear cut case where the ".." back pointer should
> > always be considered invalid (because the parent dir has no
> > existence guarantee once the child has been removed from it), not
> > just in the situation where the filesystem has been shrunk...
> 
> Yes, I agree all of this, this field can be handled by the inode
> verifier. The only concern I can think out might be fs freeze
> with a directory inode that is empty and unlinked but held open,
> and then recovery on a unpatched old kernels, not sure if such
> case can be handled properly by old kernel verifier.

I decided against changing the shortform directory verifier to ignore
the dotdot pointer on nlink==0 directories because older kernels will
still have the current behavior and that will cause recovery failure for
the following sequence:

1. create A and B and delete A as per above, leave B open
2. shrink fs so that A is no longer a valid inode number
3. flush the shrink transaction to disk
4. futimens(B) (or anything to dirty the inode)
5. crash
6. boot old kernel
7. try to recover fs with old kernel, which will crash since B hadn't
   been inactivated

But I guess I'll have to go write an fstest to prove that I'm not
making this up...

--D

> 
> Otherwise, it's also ok I think.
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-06 21:49     ` Darrick J. Wong
@ 2021-07-06 23:07       ` Dave Chinner
  2021-07-07  1:36         ` Gao Xiang
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2021-07-06 23:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Tue, Jul 06, 2021 at 02:49:29PM -0700, Darrick J. Wong wrote:
> On Tue, Jul 06, 2021 at 07:20:21AM +0800, Gao Xiang wrote:
> > Hi Dave and Christoph,
> > 
> > On Tue, Jul 06, 2021 at 08:09:25AM +1000, Dave Chinner wrote:
> > > On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > While running xfs/168, I noticed a second source of post-shrink
> > > > corruption errors causing shutdowns.
> > > > 
> > > > Let's say that directory B has a low inode number and is a child of
> > > > directory A, which has a high number.  If B is empty but open, and
> > > > unlinked from A, B's dotdot link continues to point to A.  If A is then
> > > > unlinked and the filesystem shrunk so that A is no longer a valid inode,
> > > > a subsequent AIL push of B will trip the inode verifiers because the
> > > > dotdot entry points outside of the filesystem.
> > > 
> > > So we have a directory inode that is empty and unlinked but held
> > > open, with a back pointer to an invalid inode number? Which can
> > > never be followed, because the directory has been unlinked.
> > > 
> > > Can't this be handled in the inode verifier? This seems to me to
> > > be a pretty clear cut case where the ".." back pointer should
> > > always be considered invalid (because the parent dir has no
> > > existence guarantee once the child has been removed from it), not
> > > just in the situation where the filesystem has been shrunk...
> > 
> > Yes, I agree all of this, this field can be handled by the inode
> > verifier. The only concern I can think out might be fs freeze
> > with a directory inode that is empty and unlinked but held open,
> > and then recovery on a unpatched old kernels, not sure if such
> > case can be handled properly by old kernel verifier.
> 
> I decided against changing the shortform directory verifier to ignore
> the dotdot pointer on nlink==0 directories because older kernels will
> still have the current behavior and that will cause recovery failure for
> the following sequence:
> 
> 1. create A and B and delete A as per above, leave B open
> 2. shrink fs so that A is no longer a valid inode number
> 3. flush the shrink transaction to disk
> 4. futimens(B) (or anything to dirty the inode)
> 5. crash
> 6. boot old kernel
> 7. try to recover fs with old kernel, which will crash since B hadn't
>    been inactivated

Yup, I can see how that would happen, but patching the directory
code still smells like band-aid to me on multiple levels.

At first glance, this seems like the first step in a on-going game
of whack-a-mole. If shrink is changing how the on disk format needs
to be evaluated in ways that older kernels can't safely deal with,
then shrink needs to be setting an incompat feature bit to prevent
older kernels from crashing trying to parse the on-disk format.

On deeper consideration after looking at the code, why is it even
considered safe to be recovering pre- and post- shrink operations in
a single log recovery operation? i.e. I see nothing in the shrink
code that quiesces the log and flushes all the dirty metadata to
stable storage prior to logging the superblock geometry changes and
returning to userspace.

I can think of another situation where this might be problematic -
recovering buffers containing unlinked inode lists updates. If the
inode cluster buffers we recover from transactions before the shrink
point to inodes that are beyond EOFS after the shrink, we recover
them and write them to disk, only for them to be read again and
written to disk when recovering subsequent transactions prior
to shrink.

IOWs, we can have transient on-disk state during log recovery where
stuff before the shrink transaction points to objects beyond the
EOFS after the shrink, but are still considered to be valid by log
recovery. If the device has already been shrunk, then log recovery
has just created a transient corrupt on-disk state, so if log
recovery fails at this point (for whatever reason), we've got a mess
on our hands.

I suspect that this means we could also have problems in the AIL,
too, but I haven't thought that far through this right now. The
transient corrupt log recovery state can be fixed by forcing the AIL
to be written back before we execute the shrink transaction so the
shrink transaction ends up being written to the tail of the log.
Doing this would also fix any sort of transient AIL ordering issue
that might exist around a shrink operation....

So, yeah, I think this problem hints at bigger issues with shrink,
on disk format interpretation and transient log recovery states. I
think we need to finish off all the infrastructure shrink needs
before we go any further with it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reset child dir '..' entry when unlinking child
  2021-07-06 23:07       ` Dave Chinner
@ 2021-07-07  1:36         ` Gao Xiang
  0 siblings, 0 replies; 9+ messages in thread
From: Gao Xiang @ 2021-07-07  1:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Christoph Hellwig, xfs

On Wed, Jul 07, 2021 at 09:07:36AM +1000, Dave Chinner wrote:
> On Tue, Jul 06, 2021 at 02:49:29PM -0700, Darrick J. Wong wrote:
> > On Tue, Jul 06, 2021 at 07:20:21AM +0800, Gao Xiang wrote:
> > > Hi Dave and Christoph,
> > > 
> > > On Tue, Jul 06, 2021 at 08:09:25AM +1000, Dave Chinner wrote:
> > > > On Fri, Jul 02, 2021 at 08:02:33PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > While running xfs/168, I noticed a second source of post-shrink
> > > > > corruption errors causing shutdowns.
> > > > > 
> > > > > Let's say that directory B has a low inode number and is a child of
> > > > > directory A, which has a high number.  If B is empty but open, and
> > > > > unlinked from A, B's dotdot link continues to point to A.  If A is then
> > > > > unlinked and the filesystem shrunk so that A is no longer a valid inode,
> > > > > a subsequent AIL push of B will trip the inode verifiers because the
> > > > > dotdot entry points outside of the filesystem.
> > > > 
> > > > So we have a directory inode that is empty and unlinked but held
> > > > open, with a back pointer to an invalid inode number? Which can
> > > > never be followed, because the directory has been unlinked.
> > > > 
> > > > Can't this be handled in the inode verifier? This seems to me to
> > > > be a pretty clear cut case where the ".." back pointer should
> > > > always be considered invalid (because the parent dir has no
> > > > existence guarantee once the child has been removed from it), not
> > > > just in the situation where the filesystem has been shrunk...
> > > 
> > > Yes, I agree all of this, this field can be handled by the inode
> > > verifier. The only concern I can think out might be fs freeze
> > > with a directory inode that is empty and unlinked but held open,
> > > and then recovery on a unpatched old kernels, not sure if such
> > > case can be handled properly by old kernel verifier.
> > 
> > I decided against changing the shortform directory verifier to ignore
> > the dotdot pointer on nlink==0 directories because older kernels will
> > still have the current behavior and that will cause recovery failure for
> > the following sequence:
> > 
> > 1. create A and B and delete A as per above, leave B open
> > 2. shrink fs so that A is no longer a valid inode number
> > 3. flush the shrink transaction to disk
> > 4. futimens(B) (or anything to dirty the inode)
> > 5. crash
> > 6. boot old kernel
> > 7. try to recover fs with old kernel, which will crash since B hadn't
> >    been inactivated
> 
> Yup, I can see how that would happen, but patching the directory
> code still smells like band-aid to me on multiple levels.
> 
> At first glance, this seems like the first step in a on-going game
> of whack-a-mole. If shrink is changing how the on disk format needs
> to be evaluated in ways that older kernels can't safely deal with,
> then shrink needs to be setting an incompat feature bit to prevent
> older kernels from crashing trying to parse the on-disk format.

Not a quite excuse for myself, but it's not easy for me to realize
there could be some problem with the outdated dotdot pointer pointing
to already free blocks in advance since that's too detail without
looking through the specific code considering my XFS knowledge, so
I was always expecting for more reviews, and that's why this was
marked as an experimental risky feature.

> 
> On deeper consideration after looking at the code, why is it even
> considered safe to be recovering pre- and post- shrink operations in
> a single log recovery operation? i.e. I see nothing in the shrink
> code that quiesces the log and flushes all the dirty metadata to
> stable storage prior to logging the superblock geometry changes and
> returning to userspace.
> 
> I can think of another situation where this might be problematic -
> recovering buffers containing unlinked inode lists updates. If the
> inode cluster buffers we recover from transactions before the shrink
> point to inodes that are beyond EOFS after the shrink, we recover
> them and write them to disk, only for them to be read again and
> written to disk when recovering subsequent transactions prior
> to shrink.
> 
> IOWs, we can have transient on-disk state during log recovery where
> stuff before the shrink transaction points to objects beyond the
> EOFS after the shrink, but are still considered to be valid by log
> recovery. If the device has already been shrunk, then log recovery
> has just created a transient corrupt on-disk state, so if log
> recovery fails at this point (for whatever reason), we've got a mess
> on our hands.

Yes, for such cases AIL needs to be pushed out before returning to the
userspace to avoid device shrinking. Since before shrinking transaction
the tail space was considered as valid. The shrinking transaction is
already a sync transaction so it will trigger log force, and it needs
a AIL push all as well before returning to the user space.

> 
> I suspect that this means we could also have problems in the AIL,
> too, but I haven't thought that far through this right now. The
> transient corrupt log recovery state can be fixed by forcing the AIL
> to be written back before we execute the shrink transaction so the
> shrink transaction ends up being written to the tail of the log.
> Doing this would also fix any sort of transient AIL ordering issue
> that might exist around a shrink operation....
> 
> So, yeah, I think this problem hints at bigger issues with shrink,
> on disk format interpretation and transient log recovery states. I
> think we need to finish off all the infrastructure shrink needs
> before we go any further with it....

Hmm.. That will avoid this indeed if more on-disk fields point to
free blocks like this, honestly I didn't realize outdated dotdot
pointer could point to already free blocks then a validator checks
this in the beginning.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-07-07  1:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03  3:02 [PATCH] xfs: reset child dir '..' entry when unlinking child Darrick J. Wong
2021-07-05  8:20 ` Gao Xiang
2021-07-05  8:41 ` Christoph Hellwig
2021-07-06 18:35   ` Darrick J. Wong
2021-07-05 22:09 ` Dave Chinner
2021-07-05 23:20   ` Gao Xiang
2021-07-06 21:49     ` Darrick J. Wong
2021-07-06 23:07       ` Dave Chinner
2021-07-07  1:36         ` Gao Xiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.