All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: drain the buf delwri queue before xfsaild idles
@ 2020-07-15 12:38 Brian Foster
  2020-07-15 17:52 ` Christoph Hellwig
  2020-07-16 10:49 ` [PATCH v2] " Brian Foster
  0 siblings, 2 replies; 5+ messages in thread
From: Brian Foster @ 2020-07-15 12:38 UTC (permalink / raw)
  To: linux-xfs

xfsaild is racy with respect to transaction abort and shutdown in
that the task can idle or exit with an empty AIL but buffers still
on the delwri queue. This was partly addressed by cancelling the
delwri queue before the task exits to prevent memory leaks, but it's
also possible for xfsaild to empty and idle with buffers on the
delwri queue. For example, a transaction that pins a buffer that
also happens to sit on the AIL delwri queue will explicitly remove
the associated log item from the AIL if the transaction aborts. The
side effect of this is an unmount hang in xfs_wait_buftarg() as the
associated buffers remain held by the delwri queue indefinitely.
This is reproduced on repeated runs of generic/531 with an fs format
(-mrmapbt=1 -bsize=1k) that happens to also reproduce transaction
aborts.

Update xfsaild to not idle until both the AIL and associated delwri
queue are empty and update the push code to continue delwri queue
submission attempts even when the AIL is empty. This allows the AIL
to eventually release aborted buffers stranded on the delwri queue
when they are unlocked by the associated transaction. This should
have no significant effect on normal runtime behavior because the
xfsaild currently idles only when the AIL is empty and in practice
the AIL is rarely empty with a populated delwri queue. The items
must be AIL resident to land in the queue in the first place and
generally aren't removed until writeback completes.

Note that the pre-existing delwri queue cancel logic in the exit
path is retained because task stop is external, could technically
come at any point, and xfsaild is still responsible to release its
buffer references before it exits.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_trans_ail.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index c3be6e440134..6a6a79791fbb 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -542,11 +542,11 @@ xfsaild_push(
 	xfs_trans_ail_cursor_done(&cur);
 	spin_unlock(&ailp->ail_lock);
 
+out_done:
 	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
 		ailp->ail_log_flush++;
 
 	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
-out_done:
 		/*
 		 * We reached the target or the AIL is empty, so wait a bit
 		 * longer for I/O to complete and remove pushed items from the
@@ -638,7 +638,8 @@ xfsaild(
 		 */
 		smp_rmb();
 		if (!xfs_ail_min(ailp) &&
-		    ailp->ail_target == ailp->ail_target_prev) {
+		    ailp->ail_target == ailp->ail_target_prev &&
+		    list_empty(&ailp->ail_buf_list)) {
 			spin_unlock(&ailp->ail_lock);
 			freezable_schedule();
 			tout = 0;
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: drain the buf delwri queue before xfsaild idles
  2020-07-15 12:38 [PATCH] xfs: drain the buf delwri queue before xfsaild idles Brian Foster
@ 2020-07-15 17:52 ` Christoph Hellwig
  2020-07-16 10:48   ` Brian Foster
  2020-07-16 10:49 ` [PATCH v2] " Brian Foster
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-07-15 17:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Jul 15, 2020 at 08:38:35AM -0400, Brian Foster wrote:
> index c3be6e440134..6a6a79791fbb 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -542,11 +542,11 @@ xfsaild_push(
>  	xfs_trans_ail_cursor_done(&cur);
>  	spin_unlock(&ailp->ail_lock);
>  
> +out_done:
>  	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
>  		ailp->ail_log_flush++;
>  
>  	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
> -out_done:

Nit:  if you move the out_done up a bit we can also de-duplicate the
xfs_trans_ail_cursor_done and spin_unlock calls.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: drain the buf delwri queue before xfsaild idles
  2020-07-15 17:52 ` Christoph Hellwig
@ 2020-07-16 10:48   ` Brian Foster
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2020-07-16 10:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jul 15, 2020 at 06:52:07PM +0100, Christoph Hellwig wrote:
> On Wed, Jul 15, 2020 at 08:38:35AM -0400, Brian Foster wrote:
> > index c3be6e440134..6a6a79791fbb 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -542,11 +542,11 @@ xfsaild_push(
> >  	xfs_trans_ail_cursor_done(&cur);
> >  	spin_unlock(&ailp->ail_lock);
> >  
> > +out_done:
> >  	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
> >  		ailp->ail_log_flush++;
> >  
> >  	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
> > -out_done:
> 
> Nit:  if you move the out_done up a bit we can also de-duplicate the
> xfs_trans_ail_cursor_done and spin_unlock calls.
> 

Good point. v2 incoming..

Brian

> Otherwise this looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] xfs: drain the buf delwri queue before xfsaild idles
  2020-07-15 12:38 [PATCH] xfs: drain the buf delwri queue before xfsaild idles Brian Foster
  2020-07-15 17:52 ` Christoph Hellwig
@ 2020-07-16 10:49 ` Brian Foster
  2020-07-16 14:39   ` Darrick J. Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2020-07-16 10:49 UTC (permalink / raw)
  To: linux-xfs

xfsaild is racy with respect to transaction abort and shutdown in
that the task can idle or exit with an empty AIL but buffers still
on the delwri queue. This was partly addressed by cancelling the
delwri queue before the task exits to prevent memory leaks, but it's
also possible for xfsaild to empty and idle with buffers on the
delwri queue. For example, a transaction that pins a buffer that
also happens to sit on the AIL delwri queue will explicitly remove
the associated log item from the AIL if the transaction aborts. The
side effect of this is an unmount hang in xfs_wait_buftarg() as the
associated buffers remain held by the delwri queue indefinitely.
This is reproduced on repeated runs of generic/531 with an fs format
(-mrmapbt=1 -bsize=1k) that happens to also reproduce transaction
aborts.

Update xfsaild to not idle until both the AIL and associated delwri
queue are empty and update the push code to continue delwri queue
submission attempts even when the AIL is empty. This allows the AIL
to eventually release aborted buffers stranded on the delwri queue
when they are unlocked by the associated transaction. This should
have no significant effect on normal runtime behavior because the
xfsaild currently idles only when the AIL is empty and in practice
the AIL is rarely empty with a populated delwri queue. The items
must be AIL resident to land in the queue in the first place and
generally aren't removed until writeback completes.

Note that the pre-existing delwri queue cancel logic in the exit
path is retained because task stop is external, could technically
come at any point, and xfsaild is still responsible to release its
buffer references before it exits.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---

v2:
- Move the out_done label up a bit.
v1: https://lore.kernel.org/linux-xfs/20200715123835.8690-1-bfoster@redhat.com/

 fs/xfs/xfs_trans_ail.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index c3be6e440134..0c783d339675 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -448,16 +448,10 @@ xfsaild_push(
 	target = ailp->ail_target;
 	ailp->ail_target_prev = target;
 
+	/* we're done if the AIL is empty or our push has reached the end */
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn);
-	if (!lip) {
-		/*
-		 * If the AIL is empty or our push has reached the end we are
-		 * done now.
-		 */
-		xfs_trans_ail_cursor_done(&cur);
-		spin_unlock(&ailp->ail_lock);
+	if (!lip)
 		goto out_done;
-	}
 
 	XFS_STATS_INC(mp, xs_push_ail);
 
@@ -539,6 +533,8 @@ xfsaild_push(
 			break;
 		lsn = lip->li_lsn;
 	}
+
+out_done:
 	xfs_trans_ail_cursor_done(&cur);
 	spin_unlock(&ailp->ail_lock);
 
@@ -546,7 +542,6 @@ xfsaild_push(
 		ailp->ail_log_flush++;
 
 	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
-out_done:
 		/*
 		 * We reached the target or the AIL is empty, so wait a bit
 		 * longer for I/O to complete and remove pushed items from the
@@ -638,7 +633,8 @@ xfsaild(
 		 */
 		smp_rmb();
 		if (!xfs_ail_min(ailp) &&
-		    ailp->ail_target == ailp->ail_target_prev) {
+		    ailp->ail_target == ailp->ail_target_prev &&
+		    list_empty(&ailp->ail_buf_list)) {
 			spin_unlock(&ailp->ail_lock);
 			freezable_schedule();
 			tout = 0;
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] xfs: drain the buf delwri queue before xfsaild idles
  2020-07-16 10:49 ` [PATCH v2] " Brian Foster
@ 2020-07-16 14:39   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2020-07-16 14:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Jul 16, 2020 at 06:49:41AM -0400, Brian Foster wrote:
> xfsaild is racy with respect to transaction abort and shutdown in
> that the task can idle or exit with an empty AIL but buffers still
> on the delwri queue. This was partly addressed by cancelling the
> delwri queue before the task exits to prevent memory leaks, but it's
> also possible for xfsaild to empty and idle with buffers on the
> delwri queue. For example, a transaction that pins a buffer that
> also happens to sit on the AIL delwri queue will explicitly remove
> the associated log item from the AIL if the transaction aborts. The
> side effect of this is an unmount hang in xfs_wait_buftarg() as the
> associated buffers remain held by the delwri queue indefinitely.
> This is reproduced on repeated runs of generic/531 with an fs format
> (-mrmapbt=1 -bsize=1k) that happens to also reproduce transaction
> aborts.
> 
> Update xfsaild to not idle until both the AIL and associated delwri
> queue are empty and update the push code to continue delwri queue
> submission attempts even when the AIL is empty. This allows the AIL
> to eventually release aborted buffers stranded on the delwri queue
> when they are unlocked by the associated transaction. This should
> have no significant effect on normal runtime behavior because the
> xfsaild currently idles only when the AIL is empty and in practice
> the AIL is rarely empty with a populated delwri queue. The items
> must be AIL resident to land in the queue in the first place and
> generally aren't removed until writeback completes.
> 
> Note that the pre-existing delwri queue cancel logic in the exit
> path is retained because task stop is external, could technically
> come at any point, and xfsaild is still responsible to release its
> buffer references before it exits.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks fine to me...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> 
> v2:
> - Move the out_done label up a bit.
> v1: https://lore.kernel.org/linux-xfs/20200715123835.8690-1-bfoster@redhat.com/
> 
>  fs/xfs/xfs_trans_ail.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index c3be6e440134..0c783d339675 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -448,16 +448,10 @@ xfsaild_push(
>  	target = ailp->ail_target;
>  	ailp->ail_target_prev = target;
>  
> +	/* we're done if the AIL is empty or our push has reached the end */
>  	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn);
> -	if (!lip) {
> -		/*
> -		 * If the AIL is empty or our push has reached the end we are
> -		 * done now.
> -		 */
> -		xfs_trans_ail_cursor_done(&cur);
> -		spin_unlock(&ailp->ail_lock);
> +	if (!lip)
>  		goto out_done;
> -	}
>  
>  	XFS_STATS_INC(mp, xs_push_ail);
>  
> @@ -539,6 +533,8 @@ xfsaild_push(
>  			break;
>  		lsn = lip->li_lsn;
>  	}
> +
> +out_done:
>  	xfs_trans_ail_cursor_done(&cur);
>  	spin_unlock(&ailp->ail_lock);
>  
> @@ -546,7 +542,6 @@ xfsaild_push(
>  		ailp->ail_log_flush++;
>  
>  	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
> -out_done:
>  		/*
>  		 * We reached the target or the AIL is empty, so wait a bit
>  		 * longer for I/O to complete and remove pushed items from the
> @@ -638,7 +633,8 @@ xfsaild(
>  		 */
>  		smp_rmb();
>  		if (!xfs_ail_min(ailp) &&
> -		    ailp->ail_target == ailp->ail_target_prev) {
> +		    ailp->ail_target == ailp->ail_target_prev &&
> +		    list_empty(&ailp->ail_buf_list)) {
>  			spin_unlock(&ailp->ail_lock);
>  			freezable_schedule();
>  			tout = 0;
> -- 
> 2.21.3
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-16 14:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 12:38 [PATCH] xfs: drain the buf delwri queue before xfsaild idles Brian Foster
2020-07-15 17:52 ` Christoph Hellwig
2020-07-16 10:48   ` Brian Foster
2020-07-16 10:49 ` [PATCH v2] " Brian Foster
2020-07-16 14:39   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.