All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
Date: Fri, 9 Apr 2021 09:47:24 -0400	[thread overview]
Message-ID: <YHBa7Duyvjcu+3zF@bfoster> (raw)
In-Reply-To: <20210407153152.GI3957620@magnolia>

On Wed, Apr 07, 2021 at 08:31:52AM -0700, Darrick J. Wong wrote:
> On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> > Per-inode ioend completion batching has a log reservation deadlock
> > vector between preallocated append transactions and transactions
> > that are acquired at completion time for other purposes (i.e.,
> > unwritten extent conversion or COW fork remaps). For example, if the
> > ioend completion workqueue task executes on a batch of ioends that
> > are sorted such that an append ioend sits at the tail, it's possible
> > for the outstanding append transaction reservation to block
> > allocation of transactions required to process preceding ioends in
> > the list.
> > 
> > Append ioend completion is historically the common path for on-disk
> > inode size updates. While file extending writes may have completed
> > sometime earlier, the on-disk inode size is only updated after
> > successful writeback completion. These transactions are preallocated
> > serially from writeback context to mitigate concurrency and
> > associated log reservation pressure across completions processed by
> > multi-threaded workqueue tasks.
> > 
> > However, now that delalloc blocks unconditionally map to unwritten
> > extents at physical block allocation time, size updates via append
> > ioends are relatively rare. This means that inode size updates most
> > commonly occur as part of the preexisting completion time
> > transaction to convert unwritten extents. As a result, there is no
> > longer a strong need to preallocate size update transactions.
> > 
> > Remove the preallocation of inode size update transactions to avoid
> > the ioend completion processing log reservation deadlock. Instead,
> > continue to send all potential size extending ioends to workqueue
> > context for completion and allocate the transaction from that
> > context. This ensures that no outstanding log reservation is owned
> > by the ioend completion worker task when it begins to process
> > ioends.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
> >  1 file changed, 3 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 1cc7c36d98e9..c1951975bd6a 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> >  		XFS_I(ioend->io_inode)->i_d.di_size;
> >  }
> >  
> > -STATIC int
> > -xfs_setfilesize_trans_alloc(
> > -	struct iomap_ioend	*ioend)
> > -{
> > -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> > -	struct xfs_trans	*tp;
> > -	int			error;
> > -
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> > -	if (error)
> > -		return error;
> > -
> > -	ioend->io_private = tp;
> > -
> > -	/*
> > -	 * We may pass freeze protection with a transaction.  So tell lockdep
> > -	 * we released it.
> > -	 */
> > -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> > -	/*
> > -	 * We hand off the transaction to the completion thread now, so
> > -	 * clear the flag here.
> > -	 */
> > -	xfs_trans_clear_context(tp);
> > -	return 0;
> > -}
> > -
> >  /*
> >   * Update on-disk file size now that data has been written to disk.
> >   */
> > @@ -182,12 +155,10 @@ xfs_end_ioend(
> >  		error = xfs_reflink_end_cow(ip, offset, size);
> 
> Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
> learn how to extend the ondisk EOF as patch 6/4.
> 

After looking into this since multiple people asked for it, I'm not as
convinced it's the right approach as I was on first thought. The reason
being that the reflink completion code actually remaps in reverse order
(I suspect due to depending on the same behavior of the bmap unmap code,
which is used to pull extents from the COW fork..).

That means that while inter-ioend completion takes some care (even if
not necessarily guaranteed) to order on-disk size updates against
completion within the associated i_size, a size update along with the
COW fork remap transaction could result in the opposite in some cases
(i.e. if the cow fork is fragmented relative to the data fork). I
suspect this is a rare situation and not a critical problem, but at the
same time I find it a bit odd to implement intra-ioend ordering
inconsistent with the broader inter-ioend logic.

I could probably be convinced otherwise, but I think this warrants more
thought and is not necessarily a straightforward cleanup. I'm also not
sure how common disk size updates via COW fork I/O completion really are
in the first place, so I'm not sure it's worth the oddness to save an
extra transaction. In any event, I'll probably leave this off from this
series for now because it doesn't have much to do with fixing the
deadlock problem and can revisit it if anybody has thoughts on any of
the above..

Brian

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
> >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > -	else
> > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> >  
> >  done:
> > -	if (ioend->io_private)
> > -		error = xfs_setfilesize_ioend(ioend, error);
> > +	if (!error && xfs_ioend_is_append(ioend))
> > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> >  	iomap_finish_ioends(ioend, error);
> >  	memalloc_nofs_restore(nofs_flag);
> >  }
> > @@ -237,7 +208,7 @@ xfs_end_io(
> >  
> >  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> >  {
> > -	return ioend->io_private ||
> > +	return xfs_ioend_is_append(ioend) ||
> >  		ioend->io_type == IOMAP_UNWRITTEN ||
> >  		(ioend->io_flags & IOMAP_F_SHARED);
> >  }
> > @@ -250,8 +221,6 @@ xfs_end_bio(
> >  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> >  	unsigned long		flags;
> >  
> > -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> > -
> >  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
> >  	if (list_empty(&ip->i_ioend_list))
> >  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> > @@ -501,14 +470,6 @@ xfs_prepare_ioend(
> >  				ioend->io_offset, ioend->io_size);
> >  	}
> >  
> > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > -	if (!status &&
> > -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> > -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> > -	    xfs_ioend_is_append(ioend) &&
> > -	    !ioend->io_private)
> > -		status = xfs_setfilesize_trans_alloc(ioend);
> > -
> >  	memalloc_nofs_restore(nofs_flag);
> >  
> >  	if (xfs_ioend_needs_workqueue(ioend))
> > -- 
> > 2.26.3
> > 
> 


  reply	other threads:[~2021-04-09 13:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
2021-04-07  6:33   ` Christoph Hellwig
2021-04-07 11:23     ` Brian Foster
2021-04-07 15:31   ` Darrick J. Wong
2021-04-09 13:47     ` Brian Foster [this message]
2021-04-09 16:07       ` Darrick J. Wong
2021-04-05 14:59 ` [PATCH 2/4] xfs: open code ioend needs workqueue helper Brian Foster
2021-04-07  6:34   ` Christoph Hellwig
2021-04-07 11:24     ` Brian Foster
2021-04-07 15:23       ` Darrick J. Wong
2021-04-07 15:42   ` Darrick J. Wong
2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
2021-04-05 17:55   ` Christoph Hellwig
2021-04-05 18:08     ` Brian Foster
2021-04-07  6:36   ` Christoph Hellwig
2021-04-07 15:40   ` Darrick J. Wong
2021-04-05 14:59 ` [PATCH 4/4] xfs: drop unnecessary setfilesize helper Brian Foster
2021-04-07  6:37   ` Christoph Hellwig
2021-04-07 15:40   ` Darrick J. Wong
2021-04-06 10:27 ` [PATCH 5/4] iomap: remove unused private field from ioend Brian Foster
2021-04-07  6:40   ` Christoph Hellwig
2021-04-07 15:36   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHBa7Duyvjcu+3zF@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.