All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: allow delwri requeue of wait listed buffers
Date: Sat, 9 Jun 2018 08:49:09 +1000	[thread overview]
Message-ID: <20180608224909.GC10363@dastard> (raw)
In-Reply-To: <20180608120709.GA23628@bfoster>

On Fri, Jun 08, 2018 at 08:07:09AM -0400, Brian Foster wrote:
> On Fri, Jun 08, 2018 at 09:27:13AM +1000, Dave Chinner wrote:
> > On Thu, Jun 07, 2018 at 08:41:25AM -0400, Brian Foster wrote:
> > > If a delwri queue occurs of a buffer that was previously submitted
> > > from a delwri queue but has not yet been removed from a wait list,
> > > the queue sets _XBF_DELWRI_Q without changing the state of ->b_list.
> > > This occurs, for example, if another thread beats the submitter
> > > thread to the buffer lock after I/O completion. Once the submitter
> > > acquires the lock, it removes the buffer from the wait list and
> > > leaves a buffer with _XBF_DELWRI_Q set but not populated on a list.
> > > This results in a lost buffer submission and in turn can result in
> > > assert failures due to _XBF_DELWRI_Q being set on buffer reclaim or
> > > filesystem lockups if the buffer happens to cover an item in the
> > > AIL.
> > 
> > I just so happened to have this ASSERT happen over night on
> > generic/232 testing some code I wrote yesterday. It never ceases to
> > amaze me how bugs that have been around for ages always seem to be
> > hit at the same time by different people in completely different
> > contexts....
> > 
> 
> Interesting, out of curiosity was this in a memory limited environment?

No. had GBs of free RAM, but it's on a fakenuma setup so a node
could have been low on free memory and hence running the quota
shrinker....

....
> > This seems all a bit broken.
> 
> Yes, the premise of this was to do something that didn't break it
> further. ;) I figured using sync I/O would also address the problem, but
> would introduce terrible submission->completion serialization...

*nod*

> So essentially take apart sync buffer I/O so we can do batched
> submission/completion. That sounds like a nice idea to me.

*nod*

> Feedback on the code to follow. That aside, are you planning to turn
> this into a real patch submission or would you like me to do it?

I've got plenty of other things to do - if you want to take it an
run I'm fine with that :P

> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index a9678c2f3810..40f441e96dc5 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -1453,27 +1453,20 @@ _xfs_buf_ioapply(
> >  }
> >  
> >  /*
> > - * Asynchronous IO submission path. This transfers the buffer lock ownership and
> > - * the current reference to the IO. It is not safe to reference the buffer after
> > - * a call to this function unless the caller holds an additional reference
> > - * itself.
> > + * Internal I/O submission helpers for split submission and waiting actions.
> >   */
> > -void
> > -xfs_buf_submit(
> > +static int
> > +__xfs_buf_submit(
> 
> It looks like the buffer submission refactoring could be a separate
> patch from the delwri queue race fix.

Yup, it probably should be.

> > @@ -1483,12 +1476,8 @@ xfs_buf_submit(
> >  	bp->b_io_error = 0;
> >  
> >  	/*
> > -	 * The caller's reference is released during I/O completion.
> > -	 * This occurs some time after the last b_io_remaining reference is
> > -	 * released, so after we drop our Io reference we have to have some
> > -	 * other reference to ensure the buffer doesn't go away from underneath
> > -	 * us. Take a direct reference to ensure we have safe access to the
> > -	 * buffer until we are finished with it.
> > +	 * I/O needs a reference, because the caller may go away before we are
> > +	 * done with the IO. Completion will deal with it.
> >  	 */
> >  	xfs_buf_hold(bp);
> 
> I think this should be lifted in the callers. For one, it's confusing to
> follow.

Both paths take a reference here, both take it because the IO itself
needs a reference. The only difference is when it is dropped.

> Second, it looks like xfs_buf_submit() unconditionally drops a
> reference whereas __xfs_buf_submit() may not acquire one (i.e. when we
> return an error).
>
> ISTM that the buffer reference calls could be balanced in the top-level
> submit functions rather than split between the common submission path
> and unique sync completion path.

Yes, that may end up being cleaner, because the delwri path has to
take a reference across submit/complete anyway. I just chopped
quickl away at the code without thinking about this stuff too
deeply. I did say "untested", right? :P

> >  void *
> > @@ -2045,14 +2035,21 @@ xfs_buf_delwri_submit_buffers(
> >  		 * at this point so the caller can still access it.
> >  		 */
> >  		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
> > -		bp->b_flags |= XBF_WRITE | XBF_ASYNC;
> > +		bp->b_flags |= XBF_WRITE;
> 
> We set XBF_ASYNC below in the specific case, but this doesn't tell us
> anything about whether it might have already been set on the buffer. Is
> it not the responsibility of this function to set/clear XBF_ASYNC
> appropriately?

XBF_ASYNC should be clear (as the buffer is not under IO) buti, yes,
I agree that it would be a good idea to clear it anyway.

> >  		if (wait_list) {
> > +			/*
> > +			 * Split synchronous IO - we wait later, so we need ai
> > +			 * reference until we run completion processing and drop
> > +			 * the buffer lock ourselves
> > +			 */
> 
> Might as well merge this with the comment above, which needs fixing
> anyways since we no longer "do all IO submission async."
> 
> >  			xfs_buf_hold(bp);
> 
> I think the buffer reference counting is now broken here.

More than likely. :P

....
> >  			list_move_tail(&bp->b_list, wait_list);
> > -		} else
> > +			__xfs_buf_submit(bp);
> 
> I suspect we need to handle submission errors here, otherwise we wait on
> a buffer that was never submitted.

Yup.

> One final thought.. ISTM that the nature of batched sync buffer
> submission means that once we wait on one or two of those buffers,
> there's a good chance many of the remaining buffer physical I/Os will
> have completed by the time we get to the associated iowait. That means
> that the current behavior of large (sync) delwri buffer completions
> running in the async completion workqueue most likely changes to running
> from __xfs_buf_iowait()->xfs_buf_ioend() context. I'm not sure that
> really matters, but just something to note.

Yup, that's something that crossed my mind. But we only wait for
buffers to complete in log recovery, quota check and the /quota
shrinker/ (now the generic/232 failure makes sense, doesn't it? :).

Hence I don't think there's a performance issue we need to worry
about - correctness is more important...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-06-08 22:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07 12:41 [PATCH 0/2] xfs: fix buffer delwri queue state race Brian Foster
2018-06-07 12:41 ` [PATCH 1/2] xfs: define internal buf flag for delwri queue wait list Brian Foster
2018-06-07 12:41 ` [PATCH 2/2] xfs: allow delwri requeue of wait listed buffers Brian Foster
2018-06-07 23:27   ` Dave Chinner
2018-06-08 12:07     ` Brian Foster
2018-06-08 22:49       ` Dave Chinner [this message]
2018-06-09 11:26         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180608224909.GC10363@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.