All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: xfs ioend batching log reservation deadlock
Date: Fri, 26 Mar 2021 14:13:21 -0400	[thread overview]
Message-ID: <YF4kQWXqwCgAh4vW@bfoster> (raw)
In-Reply-To: <20210326173244.GY4090233@magnolia>

On Fri, Mar 26, 2021 at 10:32:44AM -0700, Darrick J. Wong wrote:
> On Fri, Mar 26, 2021 at 11:39:38AM -0400, Brian Foster wrote:
> > Hi all,
> > 
> > We have a report of a workload that deadlocks on log reservation via
> > iomap_ioend completion batching. To start, the fs format is somewhat
> > unique in that the log is on the smaller side (35MB) and the log stripe
> > unit is 256k, but this is actually a default mkfs for the underlying
> > storage. I don't have much more information wrt to the workload or
> > anything that contributes to the completion processing characteristics.
> > 
> > The overall scenario is that a workqueue task is executing in
> > xfs_end_io() and blocked on transaction reservation for an unwritten
> > extent conversion. Since this task began executing and pulled pending
> > items from ->i_ioend_list, the latter was repopulated with 90 ioends, 67
> > of which have append transactions. These append transactions account for
> > ~520k of log reservation each due to the log stripe unit. All together
> > this consumes nearly all of available log space, prevents allocation of
> > the aforementioned unwritten extent conversion transaction and thus
> > leaves the fs in a deadlocked state.
> > 
> > I can think of different ways we could probably optimize this problem
> > away. One example is to transfer the append transaction to the inode at
> > bio completion time such that we retain only one per pending batch of
> > ioends. The workqueue task would then pull this append transaction from
> > the inode along with the ioend list and transfer it back to the last
> > non-unwritten/shared ioend in the sorted list.
> > 
> > That said, I'm not totally convinced this addresses the fundamental
> > problem of acquiring transaction reservation from a context that
> > essentially already owns outstanding reservation vs. just making it hard
> > to reproduce. I'm wondering if/why we need the append transaction at
> > all. AFAICT it goes back to commit 281627df3eb5 ("xfs: log file size
> > updates at I/O completion time") in v3.4 which changed the completion
> > on-disk size update from being an unlogged update. If we continue to
> > send these potential append ioends to the workqueue for completion
> > processing, is there any reason we can't let the workqueue allocate the
> > transaction as it already does for unwritten conversion?
> 
> Frankly I've never understood what benefit we get from preallocating a
> transaction and letting it twist in the wind consuming log space while
> writeback pushes data to the disk.  It's perfectly fine to delay ioend
> processing while we wait for unwritten conversions and cow remapping to
> take effect, so what's the harm in a slight delay for this?
> 
> I guess it's an optimization to reduce wait times?  It's a pity that
> nobody left a comment justifying why it was done in that particular way,
> what with the freeze protection lockdep weirdness too.
> 

I thought it might have been to facilitate ioend completion in interrupt
(i.e. bio completion) context, but I'm really not sure. That's why I'm
asking. :) I'm hoping Christoph can provide some context since it
appears to be his original implementation.

> > If that is reasonable, I'm thinking of a couple patches:
> > 
> > 1. Optimize current append transaction processing with an inode field as
> > noted above.
> > 
> > 2. Replace the submission side append transaction entirely with a flag
> > or some such on the ioend that allocates the transaction at completion
> > time, but otherwise preserves batching behavior instituted in patch 1.
> 
> What happens if you replace the call to xfs_setfilesize_ioend in
> xfs_end_ioend with xfs_setfilesize, and skip the transaction
> preallocation altogether?
> 

That's pretty much what I'm referring to in step 2 above. I was just
thinking it might make sense to implement some sort of batching model
first to avoid scenarios where we have a bunch of discontiguous append
ioends in the completion list and really only need to update the file
size at the end. (Hence a flag or whatever to indicate the last ioend
must call xfs_setfilesize()).

That said, we do still have contiguous ioend merging happening already.
We could certainly just rely on that, update di_size as we process each
append ioend and worry about further optimization later. That might be
more simple and less code (and a safer first step)..

Brian

> --D
> 
> > Thoughts?
> > 
> > Brian
> > 
> 


  reply	other threads:[~2021-03-26 18:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 15:39 xfs ioend batching log reservation deadlock Brian Foster
2021-03-26 17:32 ` Darrick J. Wong
2021-03-26 18:13   ` Brian Foster [this message]
2021-03-29  2:28   ` Dave Chinner
2021-03-29 18:04     ` Brian Foster
2021-03-29 23:51       ` Dave Chinner
2021-03-30 14:42         ` Brian Foster
2021-03-30 23:57           ` Dave Chinner
2021-03-31 12:26             ` Brian Foster
2021-03-29  5:45 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YF4kQWXqwCgAh4vW@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.