All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] xfs: quotacheck vs. dquot reclaim deadlock
@ 2017-04-21 12:22 Brian Foster
  2017-04-21 12:22 ` [PATCH v3 1/2] xfs: fix up quotacheck buffer list error handling Brian Foster
  2017-04-21 12:22 ` [PATCH v3 2/2] xfs: push buffer of flush locked dquot to avoid quotacheck deadlock Brian Foster
  0 siblings, 2 replies; 4+ messages in thread
From: Brian Foster @ 2017-04-21 12:22 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This incorporates Dave's feedback from review of v2 with one exception:

- After taking another look at xfs_qm_quotacheck(), it appears to
  intentionally jump to the normal exit path of the function in various
  error situations. I therefore opted to not to create a new error label
  for post delwri submit errors, since we still do the cancellation in
  the case of success as well. If desired, we can append a broader
  refactoring patch for that function, separate from the patches that
  fix bugs.

Brian

v3:
- Dropped the rfc/experiment patch.
- Created a _delwri_cancel() helper function.
- Refactored _delwri_pushbuf() to expect an unlocked buffer.
- Updated _delwri_pushbuf() with more detailed comments around delwri
  queue logic and reference counting.
v2: http://www.spinics.net/lists/linux-xfs/msg04483.html
- Added quotacheck error handling fixup patch.
- Push buffers with flush locked dquots for deadlock avoidance rather
  than bypass dquot reclaim.
- Added RFC patch for quotacheck early buffer list submission.
v1: http://www.spinics.net/lists/linux-xfs/msg04304.html

Brian Foster (2):
  xfs: fix up quotacheck buffer list error handling
  xfs: push buffer of flush locked dquot to avoid quotacheck deadlock

 fs/xfs/xfs_buf.c   | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_buf.h   |  2 ++
 fs/xfs/xfs_qm.c    | 35 ++++++++++++++++++-----
 fs/xfs/xfs_trace.h |  1 +
 4 files changed, 115 insertions(+), 7 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 1/2] xfs: fix up quotacheck buffer list error handling
  2017-04-21 12:22 [PATCH v3 0/2] xfs: quotacheck vs. dquot reclaim deadlock Brian Foster
@ 2017-04-21 12:22 ` Brian Foster
  2017-04-21 21:28   ` Darrick J. Wong
  2017-04-21 12:22 ` [PATCH v3 2/2] xfs: push buffer of flush locked dquot to avoid quotacheck deadlock Brian Foster
  1 sibling, 1 reply; 4+ messages in thread
From: Brian Foster @ 2017-04-21 12:22 UTC (permalink / raw)
  To: linux-xfs

The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c | 24 ++++++++++++++++++++++++
 fs/xfs/xfs_buf.h |  1 +
 fs/xfs/xfs_qm.c  |  7 +------
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index b620872..ba036c1 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1079,6 +1079,8 @@ void
 xfs_buf_unlock(
 	struct xfs_buf		*bp)
 {
+	ASSERT(xfs_buf_islocked(bp));
+
 	XB_CLEAR_OWNER(bp);
 	up(&bp->b_sema);
 
@@ -1815,6 +1817,28 @@ xfs_alloc_buftarg(
 }
 
 /*
+ * Cancel a delayed write list.
+ *
+ * Remove each buffer from the list, clear the delwri queue flag and drop the
+ * associated buffer reference.
+ */
+void
+xfs_buf_delwri_cancel(
+	struct list_head	*list)
+{
+	struct xfs_buf		*bp;
+
+	while (!list_empty(list)) {
+		bp = list_first_entry(list, struct xfs_buf, b_list);
+
+		xfs_buf_lock(bp);
+		bp->b_flags &= ~_XBF_DELWRI_Q;
+		list_del_init(&bp->b_list);
+		xfs_buf_relse(bp);
+	}
+}
+
+/*
  * Add a buffer to the delayed write list.
  *
  * This queues a buffer for writeout if it hasn't already been.  Note that
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index e1bc1af..8d1d44f 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -329,6 +329,7 @@ extern void *xfs_buf_offset(struct xfs_buf *, size_t);
 extern void xfs_buf_stale(struct xfs_buf *bp);
 
 /* Delayed Write Buffer Routines */
+extern void xfs_buf_delwri_cancel(struct list_head *);
 extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
 extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index b669b12..8b9a9f1 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1384,12 +1384,7 @@ xfs_qm_quotacheck(
 	mp->m_qflags |= flags;
 
  error_return:
-	while (!list_empty(&buffer_list)) {
-		struct xfs_buf *bp =
-			list_first_entry(&buffer_list, struct xfs_buf, b_list);
-		list_del_init(&bp->b_list);
-		xfs_buf_relse(bp);
-	}
+	xfs_buf_delwri_cancel(&buffer_list);
 
 	if (error) {
 		xfs_warn(mp,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] xfs: push buffer of flush locked dquot to avoid quotacheck deadlock
  2017-04-21 12:22 [PATCH v3 0/2] xfs: quotacheck vs. dquot reclaim deadlock Brian Foster
  2017-04-21 12:22 ` [PATCH v3 1/2] xfs: fix up quotacheck buffer list error handling Brian Foster
@ 2017-04-21 12:22 ` Brian Foster
  1 sibling, 0 replies; 4+ messages in thread
From: Brian Foster @ 2017-04-21 12:22 UTC (permalink / raw)
  To: linux-xfs

Reclaim during quotacheck can lead to deadlocks on the dquot flush
lock:

 - Quotacheck populates a local delwri queue with the physical dquot
   buffers.
 - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
   dirties all of the dquots.
 - Reclaim kicks in and attempts to flush a dquot whose buffer is
   already queud on the quotacheck queue. The flush succeeds but
   queueing to the reclaim delwri queue fails as the backing buffer is
   already queued. The flush unlock is now deferred to I/O completion
   of the buffer from the quotacheck queue.
 - The dqadjust bulkstat continues and dirties the recently flushed
   dquot once again.
 - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
   the flush lock to update the backing buffers with the in-core
   recalculated values. It deadlocks on the redirtied dquot as the
   flush lock was already acquired by reclaim, but the buffer resides
   on the local delwri queue which isn't submitted until the end of
   quotacheck.

This is reproduced by running quotacheck on a filesystem with a
couple million inodes in low memory (512MB-1GB) situations. This is
a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
buffer lists"), which removed a trylock and buffer I/O submission
from the quotacheck dquot flush sequence.

Quotacheck first resets and collects the physical dquot buffers in a
delwri queue. Then, it traverses the filesystem inodes via bulkstat,
updates the in-core dquots, flushes the corrected dquots to the
backing buffers and finally submits the delwri queue for I/O. Since
the backing buffers are queued across the entire quotacheck
operation, dquot reclaim cannot possibly complete a dquot flush
before quotacheck completes.

Therefore, quotacheck must submit the buffer for I/O in order to
cycle the flush lock and flush the dirty in-core dquot to the
buffer. Add a delwri queue buffer push mechanism to submit an
individual buffer for I/O without losing the delwri queue status and
use it from quotacheck to avoid the deadlock. This restores
quotacheck behavior to as before the regression was introduced.

Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c   | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_buf.h   |  1 +
 fs/xfs/xfs_qm.c    | 28 ++++++++++++++++++++++++-
 fs/xfs/xfs_trace.h |  1 +
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ba036c1..667fc13 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2033,6 +2033,66 @@ xfs_buf_delwri_submit(
 	return error;
 }
 
+/*
+ * Push a single buffer on a delwri queue.
+ *
+ * The purpose of this function is to submit a single buffer of a delwri queue
+ * and return with the buffer still on the original queue. The waiting delwri
+ * buffer submission infrastructure guarantees transfer of the delwri queue
+ * buffer reference to a temporary wait list. We reuse this infrastructure to
+ * transfer the buffer back to the original queue.
+ *
+ * Note the buffer transitions from the queued state, to the submitted and wait
+ * listed state and back to the queued state during this call. The buffer
+ * locking and queue management logic between _delwri_pushbuf() and
+ * _delwri_queue() guarantee that the buffer cannot be queued to another list
+ * before returning.
+ */
+int
+xfs_buf_delwri_pushbuf(
+	struct xfs_buf		*bp,
+	struct list_head	*buffer_list)
+{
+	LIST_HEAD		(submit_list);
+	int			error;
+
+	ASSERT(bp->b_flags & _XBF_DELWRI_Q);
+
+	trace_xfs_buf_delwri_pushbuf(bp, _RET_IP_);
+
+	/*
+	 * Isolate the buffer to a new local list so we can submit it for I/O
+	 * independently from the rest of the original list.
+	 */
+	xfs_buf_lock(bp);
+	list_move(&bp->b_list, &submit_list);
+	xfs_buf_unlock(bp);
+
+	/*
+	 * Delwri submission clears the DELWRI_Q buffer flag and returns with
+	 * the buffer on the wait list with an associated reference. Rather than
+	 * bounce the buffer from a local wait list back to the original list
+	 * after I/O completion, reuse the original list as the wait list.
+	 */
+	xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
+
+	/*
+	 * The buffer is now under I/O and wait listed as during typical delwri
+	 * submission. Lock the buffer to wait for I/O completion. Rather than
+	 * remove the buffer from the wait list and release the reference, we
+	 * want to return with the buffer queued to the original list. The
+	 * buffer already sits on the original list with a wait list reference,
+	 * however. If we let the queue inherit that wait list reference, all we
+	 * need to do is reset the DELWRI_Q flag.
+	 */
+	xfs_buf_lock(bp);
+	error = bp->b_error;
+	bp->b_flags |= _XBF_DELWRI_Q;
+	xfs_buf_unlock(bp);
+
+	return error;
+}
+
 int __init
 xfs_buf_init(void)
 {
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8d1d44f..ca06eb0 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -333,6 +333,7 @@ extern void xfs_buf_delwri_cancel(struct list_head *);
 extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
 extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
+extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);
 
 /* Buffer Daemon Setup Routines */
 extern int xfs_buf_init(void);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 8b9a9f1..8068867 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1247,6 +1247,7 @@ xfs_qm_flush_one(
 	struct xfs_dquot	*dqp,
 	void			*data)
 {
+	struct xfs_mount	*mp = dqp->q_mount;
 	struct list_head	*buffer_list = data;
 	struct xfs_buf		*bp = NULL;
 	int			error = 0;
@@ -1257,7 +1258,32 @@ xfs_qm_flush_one(
 	if (!XFS_DQ_IS_DIRTY(dqp))
 		goto out_unlock;
 
-	xfs_dqflock(dqp);
+	/*
+	 * The only way the dquot is already flush locked by the time quotacheck
+	 * gets here is if reclaim flushed it before the dqadjust walk dirtied
+	 * it for the final time. Quotacheck collects all dquot bufs in the
+	 * local delwri queue before dquots are dirtied, so reclaim can't have
+	 * possibly queued it for I/O. The only way out is to push the buffer to
+	 * cycle the flush lock.
+	 */
+	if (!xfs_dqflock_nowait(dqp)) {
+		/* buf is pinned in-core by delwri list */
+		DEFINE_SINGLE_BUF_MAP(map, dqp->q_blkno,
+				      mp->m_quotainfo->qi_dqchunklen);
+		bp = _xfs_buf_find(mp->m_ddev_targp, &map, 1, 0, NULL);
+		if (!bp) {
+			error = -EINVAL;
+			goto out_unlock;
+		}
+		xfs_buf_unlock(bp);
+
+		xfs_buf_delwri_pushbuf(bp, buffer_list);
+		xfs_buf_rele(bp);
+
+		error = -EAGAIN;
+		goto out_unlock;
+	}
+
 	error = xfs_qm_dqflush(dqp, &bp);
 	if (error)
 		goto out_unlock;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 4f96dc9..f73fb33 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -367,6 +367,7 @@ DEFINE_BUF_EVENT(xfs_buf_iowait_done);
 DEFINE_BUF_EVENT(xfs_buf_delwri_queue);
 DEFINE_BUF_EVENT(xfs_buf_delwri_queued);
 DEFINE_BUF_EVENT(xfs_buf_delwri_split);
+DEFINE_BUF_EVENT(xfs_buf_delwri_pushbuf);
 DEFINE_BUF_EVENT(xfs_buf_get_uncached);
 DEFINE_BUF_EVENT(xfs_buf_item_relse);
 DEFINE_BUF_EVENT(xfs_buf_item_iodone_async);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 1/2] xfs: fix up quotacheck buffer list error handling
  2017-04-21 12:22 ` [PATCH v3 1/2] xfs: fix up quotacheck buffer list error handling Brian Foster
@ 2017-04-21 21:28   ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2017-04-21 21:28 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 21, 2017 at 08:22:12AM -0400, Brian Foster wrote:
> The quotacheck error handling of the delwri buffer list assumes the
> resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
> on the buffers that are dequeued. This can lead to assert failures
> on buffer release and possibly other locking problems.
> 
> Move this code to a delwri queue cancel helper function to
> encapsulate the logic required to properly release buffers from a
> delwri queue. Update the helper to clear the delwri queue flag and
> call it from quotacheck.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

I /think/ this one is ok -- it's useful to assert that the buffer is
actually locked before we try to unlock it, and the other parts refactor
existing code.

If we want to be super-pedantic, then yes, delwri_submit consumes the
entire list and we don't need to touch buffer_list if we hit the final
"goto error_return;" in quotacheck.  OTOH I feel like there's not much
harm in the extra list_empty check and should we ever decide that
delwri_submit shouldn't consume the whole list, then we're already
covered.

Regardless, if anyone /does/ want that change, it can be a separate
patch.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_buf.c | 24 ++++++++++++++++++++++++
>  fs/xfs/xfs_buf.h |  1 +
>  fs/xfs/xfs_qm.c  |  7 +------
>  3 files changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index b620872..ba036c1 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1079,6 +1079,8 @@ void
>  xfs_buf_unlock(
>  	struct xfs_buf		*bp)
>  {
> +	ASSERT(xfs_buf_islocked(bp));
> +
>  	XB_CLEAR_OWNER(bp);
>  	up(&bp->b_sema);
>  
> @@ -1815,6 +1817,28 @@ xfs_alloc_buftarg(
>  }
>  
>  /*
> + * Cancel a delayed write list.
> + *
> + * Remove each buffer from the list, clear the delwri queue flag and drop the
> + * associated buffer reference.
> + */
> +void
> +xfs_buf_delwri_cancel(
> +	struct list_head	*list)
> +{
> +	struct xfs_buf		*bp;
> +
> +	while (!list_empty(list)) {
> +		bp = list_first_entry(list, struct xfs_buf, b_list);
> +
> +		xfs_buf_lock(bp);
> +		bp->b_flags &= ~_XBF_DELWRI_Q;
> +		list_del_init(&bp->b_list);
> +		xfs_buf_relse(bp);
> +	}
> +}
> +
> +/*
>   * Add a buffer to the delayed write list.
>   *
>   * This queues a buffer for writeout if it hasn't already been.  Note that
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index e1bc1af..8d1d44f 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -329,6 +329,7 @@ extern void *xfs_buf_offset(struct xfs_buf *, size_t);
>  extern void xfs_buf_stale(struct xfs_buf *bp);
>  
>  /* Delayed Write Buffer Routines */
> +extern void xfs_buf_delwri_cancel(struct list_head *);
>  extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
>  extern int xfs_buf_delwri_submit(struct list_head *);
>  extern int xfs_buf_delwri_submit_nowait(struct list_head *);
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index b669b12..8b9a9f1 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -1384,12 +1384,7 @@ xfs_qm_quotacheck(
>  	mp->m_qflags |= flags;
>  
>   error_return:
> -	while (!list_empty(&buffer_list)) {
> -		struct xfs_buf *bp =
> -			list_first_entry(&buffer_list, struct xfs_buf, b_list);
> -		list_del_init(&bp->b_list);
> -		xfs_buf_relse(bp);
> -	}
> +	xfs_buf_delwri_cancel(&buffer_list);
>  
>  	if (error) {
>  		xfs_warn(mp,
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-04-21 21:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-21 12:22 [PATCH v3 0/2] xfs: quotacheck vs. dquot reclaim deadlock Brian Foster
2017-04-21 12:22 ` [PATCH v3 1/2] xfs: fix up quotacheck buffer list error handling Brian Foster
2017-04-21 21:28   ` Darrick J. Wong
2017-04-21 12:22 ` [PATCH v3 2/2] xfs: push buffer of flush locked dquot to avoid quotacheck deadlock Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.