All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v3 2/2] xfs: use sync buffer I/O for sync delwri queue submission
Date: Mon, 18 Jun 2018 22:21:22 -0700	[thread overview]
Message-ID: <20180619052122.GM8128@magnolia> (raw)
In-Reply-To: <20180615180536.17282-3-bfoster@redhat.com>

On Fri, Jun 15, 2018 at 02:05:36PM -0400, Brian Foster wrote:
> If a delwri queue occurs of a buffer that sits on a delwri queue
> wait list, the queue sets _XBF_DELWRI_Q without changing the state
> of ->b_list. This occurs, for example, if another thread beats the
> current delwri waiter thread to the buffer lock after I/O
> completion. Once the waiter acquires the lock, it removes the buffer
> from the wait list and leaves a buffer with _XBF_DELWRI_Q set but
> not populated on a list. This results in a lost buffer submission
> and in turn can result in assert failures due to _XBF_DELWRI_Q being
> set on buffer reclaim or filesystem lockups if the buffer happens to
> cover an item in the AIL.
> 
> This problem has been reproduced by repeated iterations of xfs/305
> on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty
> dquot reclaim races with an xfsaild push of a separate dquot backed
> by the same buffer such that the buffer sits on the reclaim wait
> list at the time xfsaild attempts to queue it. Since the latter
> dquot has been flush locked but the underlying buffer not submitted
> for I/O, the dquot pins the AIL and causes the filesystem to
> livelock.
> 
> This race is essentially made possible by the buffer lock cycle
> involved with waiting on a synchronous delwri queue submission.
> Close the race by using synchronous buffer I/O for respective delwri
> queue submission. This means the buffer remains locked across the
> I/O and so is inaccessible from other contexts while in the
> intermediate wait list state. The sync buffer I/O wait mechanism is
> factored into a helper such that sync delwri buffer submission and
> serialization are batched operations.
> 
> Designed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf.c | 78 +++++++++++++++++++++++++-----------------------
>  1 file changed, 41 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 7b0f7c79cd62..b5c0f6949b30 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1531,6 +1531,20 @@ xfs_buf_submit(
>  	xfs_buf_rele(bp);
>  }
>  
> +/*
> + * Wait for I/O completion of a sync buffer and return the I/O error code.
> + */
> +static int
> +xfs_buf_iowait(
> +	struct xfs_buf	*bp)
> +{
> +	trace_xfs_buf_iowait(bp, _RET_IP_);
> +	wait_for_completion(&bp->b_iowait);
> +	trace_xfs_buf_iowait_done(bp, _RET_IP_);
> +
> +	return bp->b_error;
> +}
> +
>  /*
>   * Synchronous buffer IO submission path, read or write.
>   */
> @@ -1553,12 +1567,7 @@ xfs_buf_submit_wait(
>  	error = __xfs_buf_submit(bp);
>  	if (error)
>  		goto out;
> -
> -	/* wait for completion before gathering the error from the buffer */
> -	trace_xfs_buf_iowait(bp, _RET_IP_);
> -	wait_for_completion(&bp->b_iowait);
> -	trace_xfs_buf_iowait_done(bp, _RET_IP_);
> -	error = bp->b_error;
> +	error = xfs_buf_iowait(bp);
>  
>  out:
>  	/*
> @@ -1961,16 +1970,11 @@ xfs_buf_cmp(
>  }
>  
>  /*
> - * submit buffers for write.
> - *
> - * When we have a large buffer list, we do not want to hold all the buffers
> - * locked while we block on the request queue waiting for IO dispatch. To avoid
> - * this problem, we lock and submit buffers in groups of 50, thereby minimising
> - * the lock hold times for lists which may contain thousands of objects.
> - *
> - * To do this, we sort the buffer list before we walk the list to lock and
> - * submit buffers, and we plug and unplug around each group of buffers we
> - * submit.
> + * Submit buffers for write. If wait_list is specified, the buffers are
> + * submitted using sync I/O and placed on the wait list such that the caller can
> + * iowait each buffer. Otherwise async I/O is used and the buffers are released
> + * at I/O completion time. In either case, buffers remain locked until I/O
> + * completes and the buffer is released from the queue.
>   */
>  static int
>  xfs_buf_delwri_submit_buffers(
> @@ -2012,21 +2016,22 @@ xfs_buf_delwri_submit_buffers(
>  		trace_xfs_buf_delwri_split(bp, _RET_IP_);
>  
>  		/*
> -		 * We do all IO submission async. This means if we need
> -		 * to wait for IO completion we need to take an extra
> -		 * reference so the buffer is still valid on the other
> -		 * side. We need to move the buffer onto the io_list
> -		 * at this point so the caller can still access it.
> +		 * If we have a wait list, each buffer (and associated delwri
> +		 * queue reference) transfers to it and is submitted
> +		 * synchronously. Otherwise, drop the buffer from the delwri
> +		 * queue and submit async.
>  		 */
>  		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
> -		bp->b_flags |= XBF_WRITE | XBF_ASYNC;
> +		bp->b_flags |= XBF_WRITE;
>  		if (wait_list) {
> -			xfs_buf_hold(bp);
> +			bp->b_flags &= ~XBF_ASYNC;
>  			list_move_tail(&bp->b_list, wait_list);
> -		} else
> +			__xfs_buf_submit(bp);
> +		} else {
> +			bp->b_flags |= XBF_ASYNC;
>  			list_del_init(&bp->b_list);
> -
> -		xfs_buf_submit(bp);
> +			xfs_buf_submit(bp);
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
> @@ -2073,8 +2078,11 @@ xfs_buf_delwri_submit(
>  
>  		list_del_init(&bp->b_list);
>  
> -		/* locking the buffer will wait for async IO completion. */
> -		xfs_buf_lock(bp);
> +		/*
> +		 * Wait on the locked buffer, check for errors and unlock and
> +		 * release the delwri queue reference.
> +		 */
> +		xfs_buf_iowait(bp);
>  		error2 = bp->b_error;

error2 = xfs_buf_iowait()?

>  		xfs_buf_relse(bp);
>  		if (!error)
> @@ -2121,22 +2129,18 @@ xfs_buf_delwri_pushbuf(
>  
>  	/*
>  	 * Delwri submission clears the DELWRI_Q buffer flag and returns with
> -	 * the buffer on the wait list with an associated reference. Rather than
> +	 * the buffer on the wait list with the original reference. Rather than
>  	 * bounce the buffer from a local wait list back to the original list
>  	 * after I/O completion, reuse the original list as the wait list.
>  	 */
>  	xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
>  
>  	/*
> -	 * The buffer is now under I/O and wait listed as during typical delwri
> -	 * submission. Lock the buffer to wait for I/O completion. Rather than
> -	 * remove the buffer from the wait list and release the reference, we
> -	 * want to return with the buffer queued to the original list. The
> -	 * buffer already sits on the original list with a wait list reference,
> -	 * however. If we let the queue inherit that wait list reference, all we
> -	 * need to do is reset the DELWRI_Q flag.
> +	 * The buffer is now locked, under I/O and wait listed on the original
> +	 * delwri queue. Wait for I/O completion, restore the DELWRI_Q flag and
> +	 * return with the buffer unlocked and on the original queue.
>  	 */
> -	xfs_buf_lock(bp);
> +	xfs_buf_iowait(bp);
>  	error = bp->b_error;

error = xfs_buf_iowait()?

Or is there a specific reason you collected the error separately?

Looks reasonable and I could follow along this time. :)

--D

>  	bp->b_flags |= _XBF_DELWRI_Q;
>  	xfs_buf_unlock(bp);
> -- 
> 2.17.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-06-19  5:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-15 18:05 [PATCH v3 0/2] xfs: fix buffer delwri queue state race Brian Foster
2018-06-15 18:05 ` [PATCH v3 1/2] xfs: refactor buffer submission into a common helper Brian Foster
2018-06-19  5:21   ` Darrick J. Wong
2018-06-15 18:05 ` [PATCH v3 2/2] xfs: use sync buffer I/O for sync delwri queue submission Brian Foster
2018-06-19  5:21   ` Darrick J. Wong [this message]
2018-06-19 11:41     ` Brian Foster
2018-06-19 11:42   ` [PATCH v4] " Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180619052122.GM8128@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.