All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, Eryu Guan <eguan@redhat.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock
Date: Wed, 26 Apr 2017 14:23:33 -0700	[thread overview]
Message-ID: <20170426212333.GB23371@birch.djwong.org> (raw)
In-Reply-To: <1487173247-5965-3-git-send-email-bfoster@redhat.com>

On Wed, Feb 15, 2017 at 10:40:44AM -0500, Brian Foster wrote:
> The quotaoff operation commits two explicitly synchronous
> transactions to correctly handle log recovery of dquots being
> modified at the time the quotaoff occurs. The first quotaoff
> transaction pins the tail of the log with a qoff logitem in the AIL
> to ensure further logged dquots stay ahead of the quotaoff operation
> in the log. The second quotaoff_end transaction is committed after
> the quotaoff operation completes, releases the original log item and
> thus unpins the log.
> 
> Since the first transaction pins the tail of the log, this means a
> finite amount of space in the log is available between the time a
> quotaoff starts and completes before transaction reservations have
> to block on the log. While the quotaoff operation itself consumes no
> further log space, it is possible for other operations in the
> filesystem to consume the remaining log space before the quotaoff
> completes. If this occurs, the quotaoff_end transaction also blocks
> on the log which prevents the release of the log item and the
> filesystem deadlocks. This has been reproduced via repeated xfs/305
> iterations on a vm with fairly limited resources.
> 
> To avoid a deadlock due to a particularly slow quotaoff operation,
> allocate the quotaoff_end transaction immediately after the initial
> quotaoff transaction is committed. Carry a reference to the
> transaction through xfs_qm_scall_quotaoff() rather than the qoff log
> item and commit it once the quotaoff completes.

Hmm... so instead of holding a pinned log item for however long it takes
to finish quotaoff, we instead hold a transaction.  That does fix the
problem where we pin the log tail and the head crashes into it, but also
comes with its own problems, doesn't it?

In xfs_qm_scall_quotaoff, we call xfs_qm_dqrele_all_inodes, which
calls xfs_dqrele_inode to call xfs_qm_dqrele on all the inodes in the
system.  However, if any of those inodes are unlinked and inactive, the
iput -> xfs_fs_destroy_inode -> xfs_inactive path truncates the inode,
which itself requires a transaction, so now we're running with nested
transactions.

I'm not sure that's necessarily a problem because the outer transaction
contains only a single qoffend item.  AFAICT I haven't encountered any
other places in XFS where a single thread runs nested transactions, but
it still seems suspect to me.

Heh, lockdep just spat this at me:

[19767.048839] =============================================
[19767.049699] [ INFO: possible recursive locking detected ]
[19767.050511] 4.11.0-rc4-djw #3 Not tainted
[19767.051122] ---------------------------------------------
[19767.051907] xfs_quota/4152 is trying to acquire lock:
[19767.052617]  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
[19767.054151] 
[19767.054151] but task is already holding lock:
[19767.055246]  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
[19767.056897] 
[19767.056897] other info that might help us debug this:
[19767.057980]  Possible unsafe locking scenario:
[19767.057980] 
[19767.058967]        CPU0
[19767.059385]        ----
[19767.060339]   lock(sb_internal);
[19767.061187]   lock(sb_internal);
[19767.061771] 
[19767.061771]  *** DEADLOCK ***
[19767.061771] 
[19767.062818]  May be due to missing lock nesting notation
[19767.062818] 
[19767.063967] 3 locks held by xfs_quota/4152:
[19767.064521]  #0:  (&type->s_umount_key#34){++++++}, at: [<ffffffff812287f0>] __get_super.part.18+0xc0/0xe0
[19767.065805]  #1:  (&qinf->qi_quotaofflock){+.+.+.}, at: [<ffffffffa0182a53>] xfs_qm_scall_quotaoff+0x53/0x6c0 [xfs]
[19767.067226]  #2:  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
[19767.068543] 
[19767.068543] stack backtrace:
[19767.069191] CPU: 2 PID: 4152 Comm: xfs_quota Not tainted 4.11.0-rc4-djw #3
[19767.070124] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[19767.071405] Call Trace:
[19767.071798]  dump_stack+0x85/0xc2
[19767.072298]  __lock_acquire+0x164e/0x16a0
[19767.072866]  ? kvm_sched_clock_read+0x9/0x20
[19767.073460]  ? sched_clock+0x9/0x10
[19767.073979]  lock_acquire+0xbf/0x210
[19767.074532]  ? xfs_trans_alloc+0xe3/0x130 [xfs]
[19767.075166]  __sb_start_write+0xc0/0x220
[19767.075782]  ? xfs_trans_alloc+0xe3/0x130 [xfs]
[19767.076475]  xfs_trans_alloc+0xe3/0x130 [xfs]
[19767.077138]  xfs_inactive_truncate+0x31/0x130 [xfs]
[19767.077862]  ? xfs_qm_dqattach+0x16/0x50 [xfs]
[19767.078524]  xfs_inactive+0x17d/0x270 [xfs]
[19767.079141]  xfs_fs_destroy_inode+0xb3/0x350 [xfs]
[19767.079964]  destroy_inode+0x3b/0x60
[19767.080679]  evict+0x132/0x190
[19767.081301]  iput+0x243/0x320
[19767.081961]  xfs_inode_ag_walk+0x2a8/0x690 [xfs]
[19767.082757]  ? xfs_inode_ag_walk+0x92/0x690 [xfs]
[19767.083545]  ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
[19767.084279]  ? xfs_perag_get+0xa8/0x2b0 [xfs]
[19767.084753]  ? xfs_perag_get+0xa8/0x2b0 [xfs]
[19767.085234]  xfs_inode_ag_iterator_flags+0x69/0xa0 [xfs]
[19767.086251]  ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
[19767.087219]  xfs_qm_dqrele_all_inodes+0x36/0x60 [xfs]
[19767.088431]  xfs_qm_scall_quotaoff+0x100/0x6c0 [xfs]
[19767.089285]  xfs_quota_disable+0x3d/0x50 [xfs]
[19767.090138]  SyS_quotactl+0x3ff/0x820
[19767.090853]  ? SYSC_newstat+0x3b/0x50
[19767.091501]  ? trace_hardirqs_on_caller+0x129/0x1b0
[19767.092398]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[19767.093238]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[19767.094150] RIP: 0033:0x7f43530a70ca
[19767.094851] RSP: 002b:00007ffe7570fbe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
[19767.096078] RAX: ffffffffffffffda RBX: 00007ffe7570fde8 RCX: 00007f43530a70ca
[19767.097242] RDX: 0000000000000000 RSI: 00000000012d26f0 RDI: 0000000000580202
[19767.098402] RBP: 0000000000000005 R08: 00007ffe7570fbfc R09: 0000000000005802
[19767.099645] R10: 00007ffe7570fbfc R11: 0000000000000206 R12: 0000000000401e50
[19767.100841] R13: 00007ffe7570fde0 R14: 0000000000000000 R15: 0000000000000000

--D

> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_qm_syscalls.c | 42 ++++++++++++++++++++++++------------------
>  1 file changed, 24 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index 475a388..dbb6802 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -35,9 +35,11 @@
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
>  
> -STATIC int	xfs_qm_log_quotaoff(xfs_mount_t *, xfs_qoff_logitem_t **, uint);
> -STATIC int	xfs_qm_log_quotaoff_end(xfs_mount_t *, xfs_qoff_logitem_t *,
> -					uint);
> +STATIC int	xfs_qm_log_quotaoff(struct xfs_mount *, struct xfs_trans **,
> +				    uint);
> +STATIC int	xfs_qm_log_quotaoff_end(struct xfs_mount *,
> +					struct xfs_qoff_logitem *,
> +					struct xfs_trans **, uint);
>  
>  /*
>   * Turn off quota accounting and/or enforcement for all udquots and/or
> @@ -56,7 +58,7 @@ xfs_qm_scall_quotaoff(
>  	uint			dqtype;
>  	int			error;
>  	uint			inactivate_flags;
> -	xfs_qoff_logitem_t	*qoffstart;
> +	struct xfs_trans	*qend_tp;
>  
>  	/*
>  	 * No file system can have quotas enabled on disk but not in core.
> @@ -128,7 +130,7 @@ xfs_qm_scall_quotaoff(
>  	 * and synchronously. If we fail to write, we should abort the
>  	 * operation as it cannot be recovered safely if we crash.
>  	 */
> -	error = xfs_qm_log_quotaoff(mp, &qoffstart, flags);
> +	error = xfs_qm_log_quotaoff(mp, &qend_tp, flags);
>  	if (error)
>  		goto out_unlock;
>  
> @@ -181,7 +183,7 @@ xfs_qm_scall_quotaoff(
>  	 * So, we have QUOTAOFF start and end logitems; the start
>  	 * logitem won't get overwritten until the end logitem appears...
>  	 */
> -	error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags);
> +	error = xfs_trans_commit(qend_tp);
>  	if (error) {
>  		/* We're screwed now. Shutdown is the only option. */
>  		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> @@ -556,13 +558,14 @@ xfs_qm_scall_setqlim(
>  
>  STATIC int
>  xfs_qm_log_quotaoff_end(
> -	xfs_mount_t		*mp,
> -	xfs_qoff_logitem_t	*startqoff,
> +	struct xfs_mount	*mp,
> +	struct xfs_qoff_logitem	*startqoff,
> +	struct xfs_trans	**tpp,
>  	uint			flags)
>  {
> -	xfs_trans_t		*tp;
> +	struct xfs_trans	*tp;
>  	int			error;
> -	xfs_qoff_logitem_t	*qoffi;
> +	struct xfs_qoff_logitem	*qoffi;
>  
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
>  	if (error)
> @@ -578,21 +581,22 @@ xfs_qm_log_quotaoff_end(
>  	 * We don't care about quotoff's performance.
>  	 */
>  	xfs_trans_set_sync(tp);
> -	return xfs_trans_commit(tp);
> +	*tpp = tp;
> +	return 0;
>  }
>  
>  
>  STATIC int
>  xfs_qm_log_quotaoff(
> -	xfs_mount_t	       *mp,
> -	xfs_qoff_logitem_t     **qoffstartp,
> -	uint		       flags)
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	**end_tp,
> +	uint			flags)
>  {
> -	xfs_trans_t	       *tp;
> +	struct xfs_trans	*tp;
>  	int			error;
> -	xfs_qoff_logitem_t     *qoffi;
> +	struct xfs_qoff_logitem	*qoffi;
>  
> -	*qoffstartp = NULL;
> +	*end_tp = NULL;
>  
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
>  	if (error)
> @@ -617,7 +621,9 @@ xfs_qm_log_quotaoff(
>  	if (error)
>  		goto out;
>  
> -	*qoffstartp = qoffi;
> +	error = xfs_qm_log_quotaoff_end(mp, qoffi, end_tp, flags);
> +	if (error)
> +		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>  out:
>  	return error;
>  }
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-04-26 21:23 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-15 15:40 [PATCH 0/5] xfs: quota deadlock fixes Brian Foster
2017-02-15 15:40 ` [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock Brian Foster
2017-02-16 22:37   ` Dave Chinner
2017-02-17 18:30     ` Brian Foster
2017-02-17 23:12       ` Dave Chinner
2017-02-18 12:55         ` Brian Foster
2017-02-15 15:40 ` [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock Brian Foster
2017-04-26 21:23   ` Darrick J. Wong [this message]
2017-04-27 12:03     ` Brian Foster
2017-04-27 15:47       ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 3/5] xfs: support ability to wait on new inodes Brian Foster
2017-04-27 21:15   ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 4/5] xfs: update ag iterator to support " Brian Foster
2017-04-27 21:17   ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 5/5] xfs: wait on new inodes during quotaoff dquot release Brian Foster
2017-04-27 21:17   ` Darrick J. Wong
2017-02-16  7:42 ` [PATCH 0/5] xfs: quota deadlock fixes Eryu Guan
2017-02-16 12:01   ` Brian Foster
2017-02-17  6:53 ` Eryu Guan
2017-02-17 17:54   ` Brian Foster
2017-02-20  3:52     ` Eryu Guan
2017-02-20 13:25       ` Brian Foster
2017-02-22 15:35         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170426212333.GB23371@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.