All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: run blockgc on freeze to avoid iget stalls after reclaim
Date: Fri, 14 Jan 2022 09:38:10 +1100	[thread overview]
Message-ID: <20220113223810.GG3290465@dread.disaster.area> (raw)
In-Reply-To: <20220113133701.629593-3-bfoster@redhat.com>

On Thu, Jan 13, 2022 at 08:37:01AM -0500, Brian Foster wrote:
> We've had reports on distro (pre-deferred inactivation) kernels that
> inode reclaim (i.e. via drop_caches) can deadlock on the s_umount
> lock when invoked on a frozen XFS fs. This occurs because
> drop_caches acquires the lock and then blocks in xfs_inactive() on
> transaction alloc for an inode that requires an eofb trim. unfreeze
> then blocks on the same lock and the fs is deadlocked.
> 
> With deferred inactivation, the deadlock problem is no longer
> present because ->destroy_inode() no longer blocks whether the fs is
> frozen or not. There is still unfortunate behavior in that lookups
> of a pending inactive inode spin loop waiting for the pending
> inactive state to clear, which won't happen until the fs is
> unfrozen. This was always possible to some degree, but is
> potentially amplified by the fact that reclaim no longer blocks on
> the first inode that requires inactivation work. Instead, we
> populate the inactivation queues indefinitely. The side effect can
> be observed easily by invoking drop_caches on a frozen fs previously
> populated with eofb and/or cowblocks inodes and then running
> anything that relies on inode lookup (i.e., ls).
> 
> To mitigate this behavior, invoke internal blockgc reclaim during
> the freeze sequence to guarantee that inode eviction doesn't lead to
> this state due to eofb or cowblocks inodes. This is similar to
> current behavior on read-only remount. Since the deadlock issue was
> present for such a long time, also document the subtle
> ->destroy_inode() constraint to avoid unintentional reintroduction
> of the deadlock problem in the future.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_super.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index c7ac486ca5d3..1d0f87e47fa4 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -623,8 +623,13 @@ xfs_fs_alloc_inode(
>  }
>  
>  /*
> - * Now that the generic code is guaranteed not to be accessing
> - * the linux inode, we can inactivate and reclaim the inode.
> + * Now that the generic code is guaranteed not to be accessing the inode, we can
> + * inactivate and reclaim it.
> + *
> + * NOTE: ->destroy_inode() can be called (with ->s_umount held) while the
> + * filesystem is frozen. Therefore it is generally unsafe to attempt transaction
> + * allocation in this context. A transaction alloc that blocks on frozen state
> + * from a context with ->s_umount held will deadlock with unfreeze.
>   */
>  STATIC void
>  xfs_fs_destroy_inode(
> @@ -764,6 +769,16 @@ xfs_fs_sync_fs(
>  	 * when the state is either SB_FREEZE_FS or SB_FREEZE_COMPLETE.
>  	 */
>  	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> +		struct xfs_icwalk	icw = {0};
> +
> +		/*
> +		 * Clear out eofb and cowblocks inodes so eviction while frozen
> +		 * doesn't leave them sitting in the inactivation queue where
> +		 * they cannot be processed.
> +		 */
> +		icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> +		xfs_blockgc_free_space(mp, &icw);

Is a SYNC walk safe to run here? I know we run
xfs_blockgc_free_space() from XFS_IOC_FREE_EOFBLOCKS under
SB_FREEZE_WRITE protection, but here we have both frozen writes and
page faults we're running in a much more constrained freeze context
here.

i.e. the SYNC walk will keep busy looping if it can't get the
IOLOCK_EXCL on an inode that is in cache, so if we end up with an
inode locked and blocked on SB_FREEZE_WRITE or SB_FREEZE_PAGEFAULT
for whatever reason this will never return....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2022-01-13 22:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 13:36 [PATCH 0/2] xfs: a couple misc/small deferred inactivation tweaks Brian Foster
2022-01-13 13:37 ` [PATCH 1/2] xfs: flush inodegc workqueue tasks before cancel Brian Foster
2022-01-13 18:35   ` Darrick J. Wong
2022-01-13 22:19   ` Dave Chinner
2022-01-13 13:37 ` [PATCH 2/2] xfs: run blockgc on freeze to avoid iget stalls after reclaim Brian Foster
2022-01-13 17:13   ` Darrick J. Wong
2022-01-13 19:58     ` Brian Foster
2022-01-13 20:43       ` Darrick J. Wong
2022-01-13 21:01         ` Darrick J. Wong
2022-01-13 22:38   ` Dave Chinner [this message]
2022-01-14 17:35     ` Darrick J. Wong
2022-01-14 19:45       ` Brian Foster
2022-01-14 21:30         ` Darrick J. Wong
2022-01-15  4:09           ` Darrick J. Wong
2022-01-15 22:40           ` Dave Chinner
2022-01-17 13:37           ` Brian Foster
2022-01-18 18:56             ` Darrick J. Wong
2022-01-19 20:07               ` Brian Foster
2022-01-20  0:36                 ` Darrick J. Wong
2022-01-20  5:18                   ` Dave Chinner
2022-01-24 16:57                   ` Brian Foster
2022-02-02  2:22                     ` Dave Chinner
2022-02-10 19:03                       ` Brian Foster
2022-02-10 23:08                         ` Dave Chinner
2022-02-15  1:54                           ` Darrick J. Wong
2022-02-15  9:26                             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220113223810.GG3290465@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.