linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 11/11] xfs: create a polled function to force inode inactivation
Date: Tue, 23 Mar 2021 20:34:28 -0700	[thread overview]
Message-ID: <20210324033428.GR22100@magnolia> (raw)
In-Reply-To: <20210323223158.GI63242@dread.disaster.area>

On Wed, Mar 24, 2021 at 09:31:58AM +1100, Dave Chinner wrote:
> On Wed, Mar 10, 2021 at 07:06:41PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Create a polled version of xfs_inactive_force so that we can force
> > inactivation while holding a lock (usually the umount lock) without
> > tripping over the softlockup timer.  This is for callers that hold vfs
> > locks while calling inactivation, which is currently unmount, iunlink
> > processing during mount, and rw->ro remount.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_icache.c |   38 +++++++++++++++++++++++++++++++++++++-
> >  fs/xfs/xfs_icache.h |    1 +
> >  fs/xfs/xfs_mount.c  |    2 +-
> >  fs/xfs/xfs_mount.h  |    5 +++++
> >  fs/xfs/xfs_super.c  |    3 ++-
> >  5 files changed, 46 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index d5f580b92e48..9db2beb4e732 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -25,6 +25,7 @@
> >  #include "xfs_ialloc.h"
> >  
> >  #include <linux/iversion.h>
> > +#include <linux/nmi.h>
> 
> This stuff goes in fs/xfs/xfs_linux.h, not here.
> 
> >  
> >  /*
> >   * Allocate and initialise an xfs_inode.
> > @@ -2067,8 +2068,12 @@ xfs_inodegc_free_space(
> >  	struct xfs_mount	*mp,
> >  	struct xfs_eofblocks	*eofb)
> >  {
> > -	return xfs_inode_walk(mp, XFS_INODE_WALK_INACTIVE,
> > +	int			error;
> > +
> > +	error = xfs_inode_walk(mp, XFS_INODE_WALK_INACTIVE,
> >  			xfs_inactive_inode, eofb, XFS_ICI_INACTIVE_TAG);
> > +	wake_up(&mp->m_inactive_wait);
> > +	return error;
> >  }
> >  
> >  /* Try to get inode inactivation moving. */
> > @@ -2138,6 +2143,37 @@ xfs_inodegc_force(
> >  	flush_workqueue(mp->m_gc_workqueue);
> >  }
> >  
> > +/*
> > + * Force all inode inactivation work to run immediately, and poll until the
> > + * work is complete.  Callers should only use this function if they must
> > + * inactivate inodes while holding VFS locks, and must be prepared to prevent
> > + * or to wait for inodes that are queued for inactivation while this runs.
> > + */
> > +void
> > +xfs_inodegc_force_poll(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_perag	*pag;
> > +	xfs_agnumber_t		agno;
> > +	bool			queued = false;
> > +
> > +	for_each_perag_tag(mp, agno, pag, XFS_ICI_INACTIVE_TAG)
> > +		queued |= xfs_inodegc_force_pag(pag);
> > +	if (!queued)
> > +		return;
> > +
> > +	/*
> > +	 * Touch the softlockup watchdog every 1/10th of a second while there
> > +	 * are still inactivation-tagged inodes in the filesystem.
> > +	 */
> > +	while (!wait_event_timeout(mp->m_inactive_wait,
> > +				   !radix_tree_tagged(&mp->m_perag_tree,
> > +						      XFS_ICI_INACTIVE_TAG),
> > +				   HZ / 10)) {
> > +		touch_softlockup_watchdog();
> > +	}
> > +}
> 
> This looks like a deadlock waiting to be tripped over. As long as
> there is something still able to queue inodes for inactivation,
> that radix tree tag check will always trigger and put us back to
> sleep.

Yes, I know this is a total livelock vector.  This ugly function exists
to avoid stall warnings when the VFS has called us with s_umount held
and there are a lot of inodes to process.

As the function comment points out, callers must prevent anyone else
from inactivating inodes or be prepared to deal with the consequences,
which the current callers are prepared to do.

I can't think of a better way to handle this, since we need to bail out
of the workqueue flush periodically to make the softlockup thing happy.
Alternately we could just let the stall warnings happen and deal with
the people who file a bug for every stack trace they see the kernel emit.

> Also, in terms of workqueues, this is a "sync flush" i because we
> are waiting for it. e.g. the difference between cancel_work() and
> cancel_work_sync() is that the later waits for all the work in
> progress to complete before returning and the former doesn't wait...

Yeah, I'll change all the xfs_inodegc_force-* -> xfs_inodegc_flush_*.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

      reply	other threads:[~2021-03-24  3:35 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11  3:05 [PATCHSET v3 00/11] xfs: deferred inode inactivation Darrick J. Wong
2021-03-11  3:05 ` [PATCH 01/11] xfs: prevent metadata files from being inactivated Darrick J. Wong
2021-03-11 13:05   ` Christoph Hellwig
2021-03-22 23:13   ` Dave Chinner
2021-03-11  3:05 ` [PATCH 02/11] xfs: refactor the predicate part of xfs_free_eofblocks Darrick J. Wong
2021-03-11 13:09   ` Christoph Hellwig
2021-03-15 18:46   ` Christoph Hellwig
2021-03-18  4:33     ` Darrick J. Wong
2021-03-19  1:48       ` Darrick J. Wong
2021-03-11  3:05 ` [PATCH 03/11] xfs: don't reclaim dquots with incore reservations Darrick J. Wong
2021-03-15 18:29   ` Christoph Hellwig
2021-03-22 23:31   ` Dave Chinner
2021-03-23  0:01     ` Darrick J. Wong
2021-03-23  1:48       ` Dave Chinner
2021-03-11  3:06 ` [PATCH 04/11] xfs: decide if inode needs inactivation Darrick J. Wong
2021-03-15 18:47   ` Christoph Hellwig
2021-03-15 19:06     ` Darrick J. Wong
2021-03-11  3:06 ` [PATCH 05/11] xfs: rename the blockgc workqueue Darrick J. Wong
2021-03-15 18:49   ` Christoph Hellwig
2021-03-11  3:06 ` [PATCH 06/11] xfs: deferred inode inactivation Darrick J. Wong
2021-03-16  7:27   ` Christoph Hellwig
2021-03-16 15:47     ` Darrick J. Wong
2021-03-17 15:21       ` Christoph Hellwig
2021-03-17 15:49         ` Darrick J. Wong
2021-03-22 23:46           ` Dave Chinner
2021-03-22 23:37       ` Dave Chinner
2021-03-23  0:24         ` Darrick J. Wong
2021-03-23  1:44   ` Dave Chinner
2021-03-23  4:00     ` Darrick J. Wong
2021-03-23  5:19       ` Dave Chinner
2021-03-24  2:04         ` Darrick J. Wong
2021-03-24  4:57           ` Dave Chinner
2021-03-25  4:20             ` Darrick J. Wong
2021-03-24 17:53       ` Christoph Hellwig
2021-03-25  4:26         ` Darrick J. Wong
2021-03-11  3:06 ` [PATCH 07/11] xfs: expose sysfs knob to control inode inactivation delay Darrick J. Wong
2021-03-11  3:06 ` [PATCH 08/11] xfs: force inode inactivation and retry fs writes when there isn't space Darrick J. Wong
2021-03-15 18:54   ` Christoph Hellwig
2021-03-15 19:06     ` Darrick J. Wong
2021-03-11  3:06 ` [PATCH 09/11] xfs: force inode garbage collection before fallocate when space is low Darrick J. Wong
2021-03-11  3:06 ` [PATCH 10/11] xfs: parallelize inode inactivation Darrick J. Wong
2021-03-15 18:55   ` Christoph Hellwig
2021-03-15 19:03     ` Darrick J. Wong
2021-03-23 22:21   ` Dave Chinner
2021-03-24  3:52     ` Darrick J. Wong
2021-03-11  3:06 ` [PATCH 11/11] xfs: create a polled function to force " Darrick J. Wong
2021-03-23 22:31   ` Dave Chinner
2021-03-24  3:34     ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210324033428.GR22100@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).