All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] xfs: ioend batching log reservation deadlock
@ 2021-04-05 14:58 Brian Foster
  2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-05 14:58 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This series addresses the ioend completion batching log res deadlock
vector by removing the preallocated transaction from append ioends.
Instead, we continue to process append ioend completions via the
workqueue, but let the wq task allocate the transaction similar to other
ioend types.

Patch 1 makes the functional change and the remaining patches are
followon cleanups. Technically this could all be squashed down to a
single patch, if desired. Thoughts, reviews, flames appreciated.

Brian

RFD: https://lore.kernel.org/linux-xfs/YF4AOto30pC%2F0FYW@bfoster/

Brian Foster (4):
  xfs: drop submit side trans alloc for append ioends
  xfs: open code ioend needs workqueue helper
  xfs: drop unused ioend private merge and setfilesize code
  xfs: drop unnecessary setfilesize helper

 fs/xfs/xfs_aops.c | 129 ++++++----------------------------------------
 1 file changed, 15 insertions(+), 114 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
  2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
@ 2021-04-05 14:59 ` Brian Foster
  2021-04-07  6:33   ` Christoph Hellwig
  2021-04-07 15:31   ` Darrick J. Wong
  2021-04-05 14:59 ` [PATCH 2/4] xfs: open code ioend needs workqueue helper Brian Foster
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-05 14:59 UTC (permalink / raw)
  To: linux-xfs

Per-inode ioend completion batching has a log reservation deadlock
vector between preallocated append transactions and transactions
that are acquired at completion time for other purposes (i.e.,
unwritten extent conversion or COW fork remaps). For example, if the
ioend completion workqueue task executes on a batch of ioends that
are sorted such that an append ioend sits at the tail, it's possible
for the outstanding append transaction reservation to block
allocation of transactions required to process preceding ioends in
the list.

Append ioend completion is historically the common path for on-disk
inode size updates. While file extending writes may have completed
sometime earlier, the on-disk inode size is only updated after
successful writeback completion. These transactions are preallocated
serially from writeback context to mitigate concurrency and
associated log reservation pressure across completions processed by
multi-threaded workqueue tasks.

However, now that delalloc blocks unconditionally map to unwritten
extents at physical block allocation time, size updates via append
ioends are relatively rare. This means that inode size updates most
commonly occur as part of the preexisting completion time
transaction to convert unwritten extents. As a result, there is no
longer a strong need to preallocate size update transactions.

Remove the preallocation of inode size update transactions to avoid
the ioend completion processing log reservation deadlock. Instead,
continue to send all potential size extending ioends to workqueue
context for completion and allocate the transaction from that
context. This ensures that no outstanding log reservation is owned
by the ioend completion worker task when it begins to process
ioends.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 45 +++------------------------------------------
 1 file changed, 3 insertions(+), 42 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1cc7c36d98e9..c1951975bd6a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
 		XFS_I(ioend->io_inode)->i_d.di_size;
 }
 
-STATIC int
-xfs_setfilesize_trans_alloc(
-	struct iomap_ioend	*ioend)
-{
-	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
-	struct xfs_trans	*tp;
-	int			error;
-
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
-	if (error)
-		return error;
-
-	ioend->io_private = tp;
-
-	/*
-	 * We may pass freeze protection with a transaction.  So tell lockdep
-	 * we released it.
-	 */
-	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
-	/*
-	 * We hand off the transaction to the completion thread now, so
-	 * clear the flag here.
-	 */
-	xfs_trans_clear_context(tp);
-	return 0;
-}
-
 /*
  * Update on-disk file size now that data has been written to disk.
  */
@@ -182,12 +155,10 @@ xfs_end_ioend(
 		error = xfs_reflink_end_cow(ip, offset, size);
 	else if (ioend->io_type == IOMAP_UNWRITTEN)
 		error = xfs_iomap_write_unwritten(ip, offset, size, false);
-	else
-		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
 
 done:
-	if (ioend->io_private)
-		error = xfs_setfilesize_ioend(ioend, error);
+	if (!error && xfs_ioend_is_append(ioend))
+		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
 	iomap_finish_ioends(ioend, error);
 	memalloc_nofs_restore(nofs_flag);
 }
@@ -237,7 +208,7 @@ xfs_end_io(
 
 static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
 {
-	return ioend->io_private ||
+	return xfs_ioend_is_append(ioend) ||
 		ioend->io_type == IOMAP_UNWRITTEN ||
 		(ioend->io_flags & IOMAP_F_SHARED);
 }
@@ -250,8 +221,6 @@ xfs_end_bio(
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
 	unsigned long		flags;
 
-	ASSERT(xfs_ioend_needs_workqueue(ioend));
-
 	spin_lock_irqsave(&ip->i_ioend_lock, flags);
 	if (list_empty(&ip->i_ioend_list))
 		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
@@ -501,14 +470,6 @@ xfs_prepare_ioend(
 				ioend->io_offset, ioend->io_size);
 	}
 
-	/* Reserve log space if we might write beyond the on-disk inode size. */
-	if (!status &&
-	    ((ioend->io_flags & IOMAP_F_SHARED) ||
-	     ioend->io_type != IOMAP_UNWRITTEN) &&
-	    xfs_ioend_is_append(ioend) &&
-	    !ioend->io_private)
-		status = xfs_setfilesize_trans_alloc(ioend);
-
 	memalloc_nofs_restore(nofs_flag);
 
 	if (xfs_ioend_needs_workqueue(ioend))
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] xfs: open code ioend needs workqueue helper
  2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
  2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
@ 2021-04-05 14:59 ` Brian Foster
  2021-04-07  6:34   ` Christoph Hellwig
  2021-04-07 15:42   ` Darrick J. Wong
  2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-05 14:59 UTC (permalink / raw)
  To: linux-xfs

Open code xfs_ioend_needs_workqueue() into the only remaining
caller.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c1951975bd6a..63ecc04de64f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -206,13 +206,6 @@ xfs_end_io(
 	}
 }
 
-static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
-{
-	return xfs_ioend_is_append(ioend) ||
-		ioend->io_type == IOMAP_UNWRITTEN ||
-		(ioend->io_flags & IOMAP_F_SHARED);
-}
-
 STATIC void
 xfs_end_bio(
 	struct bio		*bio)
@@ -472,7 +465,9 @@ xfs_prepare_ioend(
 
 	memalloc_nofs_restore(nofs_flag);
 
-	if (xfs_ioend_needs_workqueue(ioend))
+	/* send ioends that might require a transaction to the completion wq */
+	if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN ||
+	    (ioend->io_flags & IOMAP_F_SHARED))
 		ioend->io_bio->bi_end_io = xfs_end_bio;
 	return status;
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code
  2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
  2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
  2021-04-05 14:59 ` [PATCH 2/4] xfs: open code ioend needs workqueue helper Brian Foster
@ 2021-04-05 14:59 ` Brian Foster
  2021-04-05 17:55   ` Christoph Hellwig
                     ` (2 more replies)
  2021-04-05 14:59 ` [PATCH 4/4] xfs: drop unnecessary setfilesize helper Brian Foster
  2021-04-06 10:27 ` [PATCH 5/4] iomap: remove unused private field from ioend Brian Foster
  4 siblings, 3 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-05 14:59 UTC (permalink / raw)
  To: linux-xfs

XFS no longer attaches anthing to ioend->io_private. Remove the
unnecessary ->io_private merging code. This removes the only remaining
user of xfs_setfilesize_ioend() so remove that function as well.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 46 +---------------------------------------------
 1 file changed, 1 insertion(+), 45 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 63ecc04de64f..a7f91f4186bc 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -85,31 +85,6 @@ xfs_setfilesize(
 	return __xfs_setfilesize(ip, tp, offset, size);
 }
 
-STATIC int
-xfs_setfilesize_ioend(
-	struct iomap_ioend	*ioend,
-	int			error)
-{
-	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
-	struct xfs_trans	*tp = ioend->io_private;
-
-	/*
-	 * The transaction may have been allocated in the I/O submission thread,
-	 * thus we need to mark ourselves as being in a transaction manually.
-	 * Similarly for freeze protection.
-	 */
-	xfs_trans_set_context(tp);
-	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
-
-	/* we abort the update if there was an IO error */
-	if (error) {
-		xfs_trans_cancel(tp);
-		return error;
-	}
-
-	return __xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
-}
-
 /*
  * IO write completion.
  */
@@ -163,25 +138,6 @@ xfs_end_ioend(
 	memalloc_nofs_restore(nofs_flag);
 }
 
-/*
- * If the to be merged ioend has a preallocated transaction for file
- * size updates we need to ensure the ioend it is merged into also
- * has one.  If it already has one we can simply cancel the transaction
- * as it is guaranteed to be clean.
- */
-static void
-xfs_ioend_merge_private(
-	struct iomap_ioend	*ioend,
-	struct iomap_ioend	*next)
-{
-	if (!ioend->io_private) {
-		ioend->io_private = next->io_private;
-		next->io_private = NULL;
-	} else {
-		xfs_setfilesize_ioend(next, -ECANCELED);
-	}
-}
-
 /* Finish all pending io completions. */
 void
 xfs_end_io(
@@ -201,7 +157,7 @@ xfs_end_io(
 	while ((ioend = list_first_entry_or_null(&tmp, struct iomap_ioend,
 			io_list))) {
 		list_del_init(&ioend->io_list);
-		iomap_ioend_try_merge(ioend, &tmp, xfs_ioend_merge_private);
+		iomap_ioend_try_merge(ioend, &tmp, NULL);
 		xfs_end_ioend(ioend);
 	}
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] xfs: drop unnecessary setfilesize helper
  2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
                   ` (2 preceding siblings ...)
  2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
@ 2021-04-05 14:59 ` Brian Foster
  2021-04-07  6:37   ` Christoph Hellwig
  2021-04-07 15:40   ` Darrick J. Wong
  2021-04-06 10:27 ` [PATCH 5/4] iomap: remove unused private field from ioend Brian Foster
  4 siblings, 2 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-05 14:59 UTC (permalink / raw)
  To: linux-xfs

xfs_setfilesize() is the only remaining caller of the internal
__xfs_setfilesize() helper. Fold them into a single function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a7f91f4186bc..87c2912f147d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -42,14 +42,20 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
 /*
  * Update on-disk file size now that data has been written to disk.
  */
-STATIC int
-__xfs_setfilesize(
+int
+xfs_setfilesize(
 	struct xfs_inode	*ip,
-	struct xfs_trans	*tp,
 	xfs_off_t		offset,
 	size_t			size)
 {
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
 	xfs_fsize_t		isize;
+	int			error;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
+	if (error)
+		return error;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	isize = xfs_new_eof(ip, offset + size);
@@ -68,23 +74,6 @@ __xfs_setfilesize(
 	return xfs_trans_commit(tp);
 }
 
-int
-xfs_setfilesize(
-	struct xfs_inode	*ip,
-	xfs_off_t		offset,
-	size_t			size)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_trans	*tp;
-	int			error;
-
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
-	if (error)
-		return error;
-
-	return __xfs_setfilesize(ip, tp, offset, size);
-}
-
 /*
  * IO write completion.
  */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code
  2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
@ 2021-04-05 17:55   ` Christoph Hellwig
  2021-04-05 18:08     ` Brian Foster
  2021-04-07  6:36   ` Christoph Hellwig
  2021-04-07 15:40   ` Darrick J. Wong
  2 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-04-05 17:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 10:59:02AM -0400, Brian Foster wrote:
>  			io_list))) {
>  		list_del_init(&ioend->io_list);
> -		iomap_ioend_try_merge(ioend, &tmp, xfs_ioend_merge_private);
> +		iomap_ioend_try_merge(ioend, &tmp, NULL);

The merge_private argument to iomap_ioend_try_merge and the io_private
field in struct ioend can go way now as well.

Otherwise the whole series looks good to me from a very cursory look.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code
  2021-04-05 17:55   ` Christoph Hellwig
@ 2021-04-05 18:08     ` Brian Foster
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-05 18:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 06:55:33PM +0100, Christoph Hellwig wrote:
> On Mon, Apr 05, 2021 at 10:59:02AM -0400, Brian Foster wrote:
> >  			io_list))) {
> >  		list_del_init(&ioend->io_list);
> > -		iomap_ioend_try_merge(ioend, &tmp, xfs_ioend_merge_private);
> > +		iomap_ioend_try_merge(ioend, &tmp, NULL);
> 
> The merge_private argument to iomap_ioend_try_merge and the io_private
> field in struct ioend can go way now as well.
> 

Indeed. I'll tack on another patch to remove all of that.

Brian

> Otherwise the whole series looks good to me from a very cursory look.
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/4] iomap: remove unused private field from ioend
  2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
                   ` (3 preceding siblings ...)
  2021-04-05 14:59 ` [PATCH 4/4] xfs: drop unnecessary setfilesize helper Brian Foster
@ 2021-04-06 10:27 ` Brian Foster
  2021-04-07  6:40   ` Christoph Hellwig
  2021-04-07 15:36   ` Darrick J. Wong
  4 siblings, 2 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-06 10:27 UTC (permalink / raw)
  To: linux-xfs

The only remaining user of ->io_private is the generic ioend merging
infrastructure. The only user of that is XFS, which no longer sets
->io_private or passes an associated merge callback. Remove the
unused parameter and the ->io_private field.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 7 +------
 fs/xfs/xfs_aops.c      | 2 +-
 include/linux/iomap.h  | 5 +----
 3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 414769a6ad11..b7753a7907e2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1134,9 +1134,7 @@ iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next)
 }
 
 void
-iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends,
-		void (*merge_private)(struct iomap_ioend *ioend,
-				struct iomap_ioend *next))
+iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends)
 {
 	struct iomap_ioend *next;
 
@@ -1148,8 +1146,6 @@ iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends,
 			break;
 		list_move_tail(&next->io_list, &ioend->io_list);
 		ioend->io_size += next->io_size;
-		if (next->io_private && merge_private)
-			merge_private(ioend, next);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_ioend_try_merge);
@@ -1235,7 +1231,6 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 	ioend->io_inode = inode;
 	ioend->io_size = 0;
 	ioend->io_offset = offset;
-	ioend->io_private = NULL;
 	ioend->io_bio = bio;
 	return ioend;
 }
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 87c2912f147d..e24e0a005b48 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -146,7 +146,7 @@ xfs_end_io(
 	while ((ioend = list_first_entry_or_null(&tmp, struct iomap_ioend,
 			io_list))) {
 		list_del_init(&ioend->io_list);
-		iomap_ioend_try_merge(ioend, &tmp, NULL);
+		iomap_ioend_try_merge(ioend, &tmp);
 		xfs_end_ioend(ioend);
 	}
 }
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index d202fd2d0f91..c87d0cb0de6d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -198,7 +198,6 @@ struct iomap_ioend {
 	struct inode		*io_inode;	/* file being written to */
 	size_t			io_size;	/* size of the extent */
 	loff_t			io_offset;	/* offset in the file */
-	void			*io_private;	/* file system private data */
 	struct bio		*io_bio;	/* bio being built */
 	struct bio		io_inline_bio;	/* MUST BE LAST! */
 };
@@ -234,9 +233,7 @@ struct iomap_writepage_ctx {
 
 void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
 void iomap_ioend_try_merge(struct iomap_ioend *ioend,
-		struct list_head *more_ioends,
-		void (*merge_private)(struct iomap_ioend *ioend,
-				struct iomap_ioend *next));
+		struct list_head *more_ioends);
 void iomap_sort_ioends(struct list_head *ioend_list);
 int iomap_writepage(struct page *page, struct writeback_control *wbc,
 		struct iomap_writepage_ctx *wpc,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
  2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
@ 2021-04-07  6:33   ` Christoph Hellwig
  2021-04-07 11:23     ` Brian Foster
  2021-04-07 15:31   ` Darrick J. Wong
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-04-07  6:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

> @@ -182,12 +155,10 @@ xfs_end_ioend(
>  		error = xfs_reflink_end_cow(ip, offset, size);
>  	else if (ioend->io_type == IOMAP_UNWRITTEN)
>  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> -	else
> -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);

I first though we'd now call xfs_setfilesize for unwritten extents
as well, but as those have already updated di_size we are fine here.

As a future enhancement it would be useful to let xfs_reflink_end_cow
update the file size similar to what we do for the unwritten case.

>  done:
> -	if (ioend->io_private)
> -		error = xfs_setfilesize_ioend(ioend, error);
> +	if (!error && xfs_ioend_is_append(ioend))
> +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
>  	iomap_finish_ioends(ioend, error);

The done label can move after the call to xfs_setfilesize now.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] xfs: open code ioend needs workqueue helper
  2021-04-05 14:59 ` [PATCH 2/4] xfs: open code ioend needs workqueue helper Brian Foster
@ 2021-04-07  6:34   ` Christoph Hellwig
  2021-04-07 11:24     ` Brian Foster
  2021-04-07 15:42   ` Darrick J. Wong
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-04-07  6:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 10:59:01AM -0400, Brian Foster wrote:
> Open code xfs_ioend_needs_workqueue() into the only remaining
> caller.

This description would all fit on a single line.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code
  2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
  2021-04-05 17:55   ` Christoph Hellwig
@ 2021-04-07  6:36   ` Christoph Hellwig
  2021-04-07 15:40   ` Darrick J. Wong
  2 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-04-07  6:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] xfs: drop unnecessary setfilesize helper
  2021-04-05 14:59 ` [PATCH 4/4] xfs: drop unnecessary setfilesize helper Brian Foster
@ 2021-04-07  6:37   ` Christoph Hellwig
  2021-04-07 15:40   ` Darrick J. Wong
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-04-07  6:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/4] iomap: remove unused private field from ioend
  2021-04-06 10:27 ` [PATCH 5/4] iomap: remove unused private field from ioend Brian Foster
@ 2021-04-07  6:40   ` Christoph Hellwig
  2021-04-07 15:36   ` Darrick J. Wong
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-04-07  6:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
  2021-04-07  6:33   ` Christoph Hellwig
@ 2021-04-07 11:23     ` Brian Foster
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2021-04-07 11:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Apr 07, 2021 at 07:33:21AM +0100, Christoph Hellwig wrote:
> > @@ -182,12 +155,10 @@ xfs_end_ioend(
> >  		error = xfs_reflink_end_cow(ip, offset, size);
> >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > -	else
> > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> 
> I first though we'd now call xfs_setfilesize for unwritten extents
> as well, but as those have already updated di_size we are fine here.
> 
> As a future enhancement it would be useful to let xfs_reflink_end_cow
> update the file size similar to what we do for the unwritten case.
> 

Agreed. I noticed that when first passing through the code but didn't
want to get into further functional changes.

> >  done:
> > -	if (ioend->io_private)
> > -		error = xfs_setfilesize_ioend(ioend, error);
> > +	if (!error && xfs_ioend_is_append(ioend))
> > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> >  	iomap_finish_ioends(ioend, error);
> 
> The done label can move after the call to xfs_setfilesize now.
> 

Fixed.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

Thanks.

Brian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] xfs: open code ioend needs workqueue helper
  2021-04-07  6:34   ` Christoph Hellwig
@ 2021-04-07 11:24     ` Brian Foster
  2021-04-07 15:23       ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2021-04-07 11:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Apr 07, 2021 at 07:34:40AM +0100, Christoph Hellwig wrote:
> On Mon, Apr 05, 2021 at 10:59:01AM -0400, Brian Foster wrote:
> > Open code xfs_ioend_needs_workqueue() into the only remaining
> > caller.
> 
> This description would all fit on a single line.
> 

I've used 68 character wide commit log descriptions for quite some time,
to which this seems to be wrapped accurately. This is the same as the
immediately previous patch for example, with the much longer
description. I don't care much about changing it, but is there a
canonical format defined somewhere? I've always just thought 68-72 was
acceptable.

Brian

> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] xfs: open code ioend needs workqueue helper
  2021-04-07 11:24     ` Brian Foster
@ 2021-04-07 15:23       ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-07 15:23 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Apr 07, 2021 at 07:24:13AM -0400, Brian Foster wrote:
> On Wed, Apr 07, 2021 at 07:34:40AM +0100, Christoph Hellwig wrote:
> > On Mon, Apr 05, 2021 at 10:59:01AM -0400, Brian Foster wrote:
> > > Open code xfs_ioend_needs_workqueue() into the only remaining
> > > caller.
> > 
> > This description would all fit on a single line.
> > 
> 
> I've used 68 character wide commit log descriptions for quite some time,
> to which this seems to be wrapped accurately. This is the same as the
> immediately previous patch for example, with the much longer
> description. I don't care much about changing it, but is there a
> canonical format defined somewhere? I've always just thought 68-72 was
> acceptable.

I set email to wrap at 72 and C code to wrap at 79 columns.

Though as I've said in the past, I don't enforce /any/ of those rules
with any specificity so long as they're not being abused.  (e.g.
wrapping at column 5 or 500)

--D

> Brian
> 
> > Looks good:
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
  2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
  2021-04-07  6:33   ` Christoph Hellwig
@ 2021-04-07 15:31   ` Darrick J. Wong
  2021-04-09 13:47     ` Brian Foster
  1 sibling, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-07 15:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> Per-inode ioend completion batching has a log reservation deadlock
> vector between preallocated append transactions and transactions
> that are acquired at completion time for other purposes (i.e.,
> unwritten extent conversion or COW fork remaps). For example, if the
> ioend completion workqueue task executes on a batch of ioends that
> are sorted such that an append ioend sits at the tail, it's possible
> for the outstanding append transaction reservation to block
> allocation of transactions required to process preceding ioends in
> the list.
> 
> Append ioend completion is historically the common path for on-disk
> inode size updates. While file extending writes may have completed
> sometime earlier, the on-disk inode size is only updated after
> successful writeback completion. These transactions are preallocated
> serially from writeback context to mitigate concurrency and
> associated log reservation pressure across completions processed by
> multi-threaded workqueue tasks.
> 
> However, now that delalloc blocks unconditionally map to unwritten
> extents at physical block allocation time, size updates via append
> ioends are relatively rare. This means that inode size updates most
> commonly occur as part of the preexisting completion time
> transaction to convert unwritten extents. As a result, there is no
> longer a strong need to preallocate size update transactions.
> 
> Remove the preallocation of inode size update transactions to avoid
> the ioend completion processing log reservation deadlock. Instead,
> continue to send all potential size extending ioends to workqueue
> context for completion and allocate the transaction from that
> context. This ensures that no outstanding log reservation is owned
> by the ioend completion worker task when it begins to process
> ioends.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
>  1 file changed, 3 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 1cc7c36d98e9..c1951975bd6a 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
>  		XFS_I(ioend->io_inode)->i_d.di_size;
>  }
>  
> -STATIC int
> -xfs_setfilesize_trans_alloc(
> -	struct iomap_ioend	*ioend)
> -{
> -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> -	struct xfs_trans	*tp;
> -	int			error;
> -
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> -	if (error)
> -		return error;
> -
> -	ioend->io_private = tp;
> -
> -	/*
> -	 * We may pass freeze protection with a transaction.  So tell lockdep
> -	 * we released it.
> -	 */
> -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> -	/*
> -	 * We hand off the transaction to the completion thread now, so
> -	 * clear the flag here.
> -	 */
> -	xfs_trans_clear_context(tp);
> -	return 0;
> -}
> -
>  /*
>   * Update on-disk file size now that data has been written to disk.
>   */
> @@ -182,12 +155,10 @@ xfs_end_ioend(
>  		error = xfs_reflink_end_cow(ip, offset, size);

Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
learn how to extend the ondisk EOF as patch 6/4.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  	else if (ioend->io_type == IOMAP_UNWRITTEN)
>  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> -	else
> -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
>  
>  done:
> -	if (ioend->io_private)
> -		error = xfs_setfilesize_ioend(ioend, error);
> +	if (!error && xfs_ioend_is_append(ioend))
> +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
>  	iomap_finish_ioends(ioend, error);
>  	memalloc_nofs_restore(nofs_flag);
>  }
> @@ -237,7 +208,7 @@ xfs_end_io(
>  
>  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
>  {
> -	return ioend->io_private ||
> +	return xfs_ioend_is_append(ioend) ||
>  		ioend->io_type == IOMAP_UNWRITTEN ||
>  		(ioend->io_flags & IOMAP_F_SHARED);
>  }
> @@ -250,8 +221,6 @@ xfs_end_bio(
>  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
>  	unsigned long		flags;
>  
> -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> -
>  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
>  	if (list_empty(&ip->i_ioend_list))
>  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> @@ -501,14 +470,6 @@ xfs_prepare_ioend(
>  				ioend->io_offset, ioend->io_size);
>  	}
>  
> -	/* Reserve log space if we might write beyond the on-disk inode size. */
> -	if (!status &&
> -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> -	    xfs_ioend_is_append(ioend) &&
> -	    !ioend->io_private)
> -		status = xfs_setfilesize_trans_alloc(ioend);
> -
>  	memalloc_nofs_restore(nofs_flag);
>  
>  	if (xfs_ioend_needs_workqueue(ioend))
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/4] iomap: remove unused private field from ioend
  2021-04-06 10:27 ` [PATCH 5/4] iomap: remove unused private field from ioend Brian Foster
  2021-04-07  6:40   ` Christoph Hellwig
@ 2021-04-07 15:36   ` Darrick J. Wong
  1 sibling, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-07 15:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 06, 2021 at 06:27:54AM -0400, Brian Foster wrote:
> The only remaining user of ->io_private is the generic ioend merging
> infrastructure. The only user of that is XFS, which no longer sets
> ->io_private or passes an associated merge callback. Remove the
> unused parameter and the ->io_private field.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/iomap/buffered-io.c | 7 +------
>  fs/xfs/xfs_aops.c      | 2 +-
>  include/linux/iomap.h  | 5 +----

This iomap change needs to be cc'd to fsdevel just in case there's
anyone planning to use it.  To my knowledge gfs2 and zonefs don't use
io_private and don't have any plans to, but let's not yank the rug from
under them.

(Code change looks good to me, fwiw)

--D

>  3 files changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 414769a6ad11..b7753a7907e2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1134,9 +1134,7 @@ iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next)
>  }
>  
>  void
> -iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends,
> -		void (*merge_private)(struct iomap_ioend *ioend,
> -				struct iomap_ioend *next))
> +iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends)
>  {
>  	struct iomap_ioend *next;
>  
> @@ -1148,8 +1146,6 @@ iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends,
>  			break;
>  		list_move_tail(&next->io_list, &ioend->io_list);
>  		ioend->io_size += next->io_size;
> -		if (next->io_private && merge_private)
> -			merge_private(ioend, next);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(iomap_ioend_try_merge);
> @@ -1235,7 +1231,6 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
>  	ioend->io_inode = inode;
>  	ioend->io_size = 0;
>  	ioend->io_offset = offset;
> -	ioend->io_private = NULL;
>  	ioend->io_bio = bio;
>  	return ioend;
>  }
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 87c2912f147d..e24e0a005b48 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -146,7 +146,7 @@ xfs_end_io(
>  	while ((ioend = list_first_entry_or_null(&tmp, struct iomap_ioend,
>  			io_list))) {
>  		list_del_init(&ioend->io_list);
> -		iomap_ioend_try_merge(ioend, &tmp, NULL);
> +		iomap_ioend_try_merge(ioend, &tmp);
>  		xfs_end_ioend(ioend);
>  	}
>  }
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index d202fd2d0f91..c87d0cb0de6d 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -198,7 +198,6 @@ struct iomap_ioend {
>  	struct inode		*io_inode;	/* file being written to */
>  	size_t			io_size;	/* size of the extent */
>  	loff_t			io_offset;	/* offset in the file */
> -	void			*io_private;	/* file system private data */
>  	struct bio		*io_bio;	/* bio being built */
>  	struct bio		io_inline_bio;	/* MUST BE LAST! */
>  };
> @@ -234,9 +233,7 @@ struct iomap_writepage_ctx {
>  
>  void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
>  void iomap_ioend_try_merge(struct iomap_ioend *ioend,
> -		struct list_head *more_ioends,
> -		void (*merge_private)(struct iomap_ioend *ioend,
> -				struct iomap_ioend *next));
> +		struct list_head *more_ioends);
>  void iomap_sort_ioends(struct list_head *ioend_list);
>  int iomap_writepage(struct page *page, struct writeback_control *wbc,
>  		struct iomap_writepage_ctx *wpc,
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] xfs: drop unnecessary setfilesize helper
  2021-04-05 14:59 ` [PATCH 4/4] xfs: drop unnecessary setfilesize helper Brian Foster
  2021-04-07  6:37   ` Christoph Hellwig
@ 2021-04-07 15:40   ` Darrick J. Wong
  1 sibling, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-07 15:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 10:59:03AM -0400, Brian Foster wrote:
> xfs_setfilesize() is the only remaining caller of the internal
> __xfs_setfilesize() helper. Fold them into a single function.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

LGTM
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_aops.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index a7f91f4186bc..87c2912f147d 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -42,14 +42,20 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
>  /*
>   * Update on-disk file size now that data has been written to disk.
>   */
> -STATIC int
> -__xfs_setfilesize(
> +int
> +xfs_setfilesize(
>  	struct xfs_inode	*ip,
> -	struct xfs_trans	*tp,
>  	xfs_off_t		offset,
>  	size_t			size)
>  {
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_trans	*tp;
>  	xfs_fsize_t		isize;
> +	int			error;
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> +	if (error)
> +		return error;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	isize = xfs_new_eof(ip, offset + size);
> @@ -68,23 +74,6 @@ __xfs_setfilesize(
>  	return xfs_trans_commit(tp);
>  }
>  
> -int
> -xfs_setfilesize(
> -	struct xfs_inode	*ip,
> -	xfs_off_t		offset,
> -	size_t			size)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_trans	*tp;
> -	int			error;
> -
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> -	if (error)
> -		return error;
> -
> -	return __xfs_setfilesize(ip, tp, offset, size);
> -}
> -
>  /*
>   * IO write completion.
>   */
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code
  2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
  2021-04-05 17:55   ` Christoph Hellwig
  2021-04-07  6:36   ` Christoph Hellwig
@ 2021-04-07 15:40   ` Darrick J. Wong
  2 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-07 15:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 10:59:02AM -0400, Brian Foster wrote:
> XFS no longer attaches anthing to ioend->io_private. Remove the
> unnecessary ->io_private merging code. This removes the only remaining
> user of xfs_setfilesize_ioend() so remove that function as well.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Hooray for de-warting,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_aops.c | 46 +---------------------------------------------
>  1 file changed, 1 insertion(+), 45 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 63ecc04de64f..a7f91f4186bc 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -85,31 +85,6 @@ xfs_setfilesize(
>  	return __xfs_setfilesize(ip, tp, offset, size);
>  }
>  
> -STATIC int
> -xfs_setfilesize_ioend(
> -	struct iomap_ioend	*ioend,
> -	int			error)
> -{
> -	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> -	struct xfs_trans	*tp = ioend->io_private;
> -
> -	/*
> -	 * The transaction may have been allocated in the I/O submission thread,
> -	 * thus we need to mark ourselves as being in a transaction manually.
> -	 * Similarly for freeze protection.
> -	 */
> -	xfs_trans_set_context(tp);
> -	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
> -
> -	/* we abort the update if there was an IO error */
> -	if (error) {
> -		xfs_trans_cancel(tp);
> -		return error;
> -	}
> -
> -	return __xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
> -}
> -
>  /*
>   * IO write completion.
>   */
> @@ -163,25 +138,6 @@ xfs_end_ioend(
>  	memalloc_nofs_restore(nofs_flag);
>  }
>  
> -/*
> - * If the to be merged ioend has a preallocated transaction for file
> - * size updates we need to ensure the ioend it is merged into also
> - * has one.  If it already has one we can simply cancel the transaction
> - * as it is guaranteed to be clean.
> - */
> -static void
> -xfs_ioend_merge_private(
> -	struct iomap_ioend	*ioend,
> -	struct iomap_ioend	*next)
> -{
> -	if (!ioend->io_private) {
> -		ioend->io_private = next->io_private;
> -		next->io_private = NULL;
> -	} else {
> -		xfs_setfilesize_ioend(next, -ECANCELED);
> -	}
> -}
> -
>  /* Finish all pending io completions. */
>  void
>  xfs_end_io(
> @@ -201,7 +157,7 @@ xfs_end_io(
>  	while ((ioend = list_first_entry_or_null(&tmp, struct iomap_ioend,
>  			io_list))) {
>  		list_del_init(&ioend->io_list);
> -		iomap_ioend_try_merge(ioend, &tmp, xfs_ioend_merge_private);
> +		iomap_ioend_try_merge(ioend, &tmp, NULL);
>  		xfs_end_ioend(ioend);
>  	}
>  }
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] xfs: open code ioend needs workqueue helper
  2021-04-05 14:59 ` [PATCH 2/4] xfs: open code ioend needs workqueue helper Brian Foster
  2021-04-07  6:34   ` Christoph Hellwig
@ 2021-04-07 15:42   ` Darrick J. Wong
  1 sibling, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-07 15:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 05, 2021 at 10:59:01AM -0400, Brian Foster wrote:
> Open code xfs_ioend_needs_workqueue() into the only remaining
> caller.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

I would have left it, but don't really care either way...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_aops.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index c1951975bd6a..63ecc04de64f 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -206,13 +206,6 @@ xfs_end_io(
>  	}
>  }
>  
> -static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> -{
> -	return xfs_ioend_is_append(ioend) ||
> -		ioend->io_type == IOMAP_UNWRITTEN ||
> -		(ioend->io_flags & IOMAP_F_SHARED);
> -}
> -
>  STATIC void
>  xfs_end_bio(
>  	struct bio		*bio)
> @@ -472,7 +465,9 @@ xfs_prepare_ioend(
>  
>  	memalloc_nofs_restore(nofs_flag);
>  
> -	if (xfs_ioend_needs_workqueue(ioend))
> +	/* send ioends that might require a transaction to the completion wq */
> +	if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN ||
> +	    (ioend->io_flags & IOMAP_F_SHARED))
>  		ioend->io_bio->bi_end_io = xfs_end_bio;
>  	return status;
>  }
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
  2021-04-07 15:31   ` Darrick J. Wong
@ 2021-04-09 13:47     ` Brian Foster
  2021-04-09 16:07       ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2021-04-09 13:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Apr 07, 2021 at 08:31:52AM -0700, Darrick J. Wong wrote:
> On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> > Per-inode ioend completion batching has a log reservation deadlock
> > vector between preallocated append transactions and transactions
> > that are acquired at completion time for other purposes (i.e.,
> > unwritten extent conversion or COW fork remaps). For example, if the
> > ioend completion workqueue task executes on a batch of ioends that
> > are sorted such that an append ioend sits at the tail, it's possible
> > for the outstanding append transaction reservation to block
> > allocation of transactions required to process preceding ioends in
> > the list.
> > 
> > Append ioend completion is historically the common path for on-disk
> > inode size updates. While file extending writes may have completed
> > sometime earlier, the on-disk inode size is only updated after
> > successful writeback completion. These transactions are preallocated
> > serially from writeback context to mitigate concurrency and
> > associated log reservation pressure across completions processed by
> > multi-threaded workqueue tasks.
> > 
> > However, now that delalloc blocks unconditionally map to unwritten
> > extents at physical block allocation time, size updates via append
> > ioends are relatively rare. This means that inode size updates most
> > commonly occur as part of the preexisting completion time
> > transaction to convert unwritten extents. As a result, there is no
> > longer a strong need to preallocate size update transactions.
> > 
> > Remove the preallocation of inode size update transactions to avoid
> > the ioend completion processing log reservation deadlock. Instead,
> > continue to send all potential size extending ioends to workqueue
> > context for completion and allocate the transaction from that
> > context. This ensures that no outstanding log reservation is owned
> > by the ioend completion worker task when it begins to process
> > ioends.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
> >  1 file changed, 3 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 1cc7c36d98e9..c1951975bd6a 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> >  		XFS_I(ioend->io_inode)->i_d.di_size;
> >  }
> >  
> > -STATIC int
> > -xfs_setfilesize_trans_alloc(
> > -	struct iomap_ioend	*ioend)
> > -{
> > -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> > -	struct xfs_trans	*tp;
> > -	int			error;
> > -
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> > -	if (error)
> > -		return error;
> > -
> > -	ioend->io_private = tp;
> > -
> > -	/*
> > -	 * We may pass freeze protection with a transaction.  So tell lockdep
> > -	 * we released it.
> > -	 */
> > -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> > -	/*
> > -	 * We hand off the transaction to the completion thread now, so
> > -	 * clear the flag here.
> > -	 */
> > -	xfs_trans_clear_context(tp);
> > -	return 0;
> > -}
> > -
> >  /*
> >   * Update on-disk file size now that data has been written to disk.
> >   */
> > @@ -182,12 +155,10 @@ xfs_end_ioend(
> >  		error = xfs_reflink_end_cow(ip, offset, size);
> 
> Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
> learn how to extend the ondisk EOF as patch 6/4.
> 

After looking into this since multiple people asked for it, I'm not as
convinced it's the right approach as I was on first thought. The reason
being that the reflink completion code actually remaps in reverse order
(I suspect due to depending on the same behavior of the bmap unmap code,
which is used to pull extents from the COW fork..).

That means that while inter-ioend completion takes some care (even if
not necessarily guaranteed) to order on-disk size updates against
completion within the associated i_size, a size update along with the
COW fork remap transaction could result in the opposite in some cases
(i.e. if the cow fork is fragmented relative to the data fork). I
suspect this is a rare situation and not a critical problem, but at the
same time I find it a bit odd to implement intra-ioend ordering
inconsistent with the broader inter-ioend logic.

I could probably be convinced otherwise, but I think this warrants more
thought and is not necessarily a straightforward cleanup. I'm also not
sure how common disk size updates via COW fork I/O completion really are
in the first place, so I'm not sure it's worth the oddness to save an
extra transaction. In any event, I'll probably leave this off from this
series for now because it doesn't have much to do with fixing the
deadlock problem and can revisit it if anybody has thoughts on any of
the above..

Brian

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
> >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > -	else
> > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> >  
> >  done:
> > -	if (ioend->io_private)
> > -		error = xfs_setfilesize_ioend(ioend, error);
> > +	if (!error && xfs_ioend_is_append(ioend))
> > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> >  	iomap_finish_ioends(ioend, error);
> >  	memalloc_nofs_restore(nofs_flag);
> >  }
> > @@ -237,7 +208,7 @@ xfs_end_io(
> >  
> >  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> >  {
> > -	return ioend->io_private ||
> > +	return xfs_ioend_is_append(ioend) ||
> >  		ioend->io_type == IOMAP_UNWRITTEN ||
> >  		(ioend->io_flags & IOMAP_F_SHARED);
> >  }
> > @@ -250,8 +221,6 @@ xfs_end_bio(
> >  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> >  	unsigned long		flags;
> >  
> > -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> > -
> >  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
> >  	if (list_empty(&ip->i_ioend_list))
> >  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> > @@ -501,14 +470,6 @@ xfs_prepare_ioend(
> >  				ioend->io_offset, ioend->io_size);
> >  	}
> >  
> > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > -	if (!status &&
> > -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> > -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> > -	    xfs_ioend_is_append(ioend) &&
> > -	    !ioend->io_private)
> > -		status = xfs_setfilesize_trans_alloc(ioend);
> > -
> >  	memalloc_nofs_restore(nofs_flag);
> >  
> >  	if (xfs_ioend_needs_workqueue(ioend))
> > -- 
> > 2.26.3
> > 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends
  2021-04-09 13:47     ` Brian Foster
@ 2021-04-09 16:07       ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-04-09 16:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 09, 2021 at 09:47:24AM -0400, Brian Foster wrote:
> On Wed, Apr 07, 2021 at 08:31:52AM -0700, Darrick J. Wong wrote:
> > On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> > > Per-inode ioend completion batching has a log reservation deadlock
> > > vector between preallocated append transactions and transactions
> > > that are acquired at completion time for other purposes (i.e.,
> > > unwritten extent conversion or COW fork remaps). For example, if the
> > > ioend completion workqueue task executes on a batch of ioends that
> > > are sorted such that an append ioend sits at the tail, it's possible
> > > for the outstanding append transaction reservation to block
> > > allocation of transactions required to process preceding ioends in
> > > the list.
> > > 
> > > Append ioend completion is historically the common path for on-disk
> > > inode size updates. While file extending writes may have completed
> > > sometime earlier, the on-disk inode size is only updated after
> > > successful writeback completion. These transactions are preallocated
> > > serially from writeback context to mitigate concurrency and
> > > associated log reservation pressure across completions processed by
> > > multi-threaded workqueue tasks.
> > > 
> > > However, now that delalloc blocks unconditionally map to unwritten
> > > extents at physical block allocation time, size updates via append
> > > ioends are relatively rare. This means that inode size updates most
> > > commonly occur as part of the preexisting completion time
> > > transaction to convert unwritten extents. As a result, there is no
> > > longer a strong need to preallocate size update transactions.
> > > 
> > > Remove the preallocation of inode size update transactions to avoid
> > > the ioend completion processing log reservation deadlock. Instead,
> > > continue to send all potential size extending ioends to workqueue
> > > context for completion and allocate the transaction from that
> > > context. This ensures that no outstanding log reservation is owned
> > > by the ioend completion worker task when it begins to process
> > > ioends.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
> > >  1 file changed, 3 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > index 1cc7c36d98e9..c1951975bd6a 100644
> > > --- a/fs/xfs/xfs_aops.c
> > > +++ b/fs/xfs/xfs_aops.c
> > > @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> > >  		XFS_I(ioend->io_inode)->i_d.di_size;
> > >  }
> > >  
> > > -STATIC int
> > > -xfs_setfilesize_trans_alloc(
> > > -	struct iomap_ioend	*ioend)
> > > -{
> > > -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> > > -	struct xfs_trans	*tp;
> > > -	int			error;
> > > -
> > > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> > > -	if (error)
> > > -		return error;
> > > -
> > > -	ioend->io_private = tp;
> > > -
> > > -	/*
> > > -	 * We may pass freeze protection with a transaction.  So tell lockdep
> > > -	 * we released it.
> > > -	 */
> > > -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> > > -	/*
> > > -	 * We hand off the transaction to the completion thread now, so
> > > -	 * clear the flag here.
> > > -	 */
> > > -	xfs_trans_clear_context(tp);
> > > -	return 0;
> > > -}
> > > -
> > >  /*
> > >   * Update on-disk file size now that data has been written to disk.
> > >   */
> > > @@ -182,12 +155,10 @@ xfs_end_ioend(
> > >  		error = xfs_reflink_end_cow(ip, offset, size);
> > 
> > Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
> > learn how to extend the ondisk EOF as patch 6/4.
> > 
> 
> After looking into this since multiple people asked for it, I'm not as
> convinced it's the right approach as I was on first thought. The reason
> being that the reflink completion code actually remaps in reverse order
> (I suspect due to depending on the same behavior of the bmap unmap code,
> which is used to pull extents from the COW fork..).

Yes.  There's no algorithmic reason why _end_cow has to call bunmapi
directly (and thereby remap in reverse order) -- it could log a bunmap
intent item for the data fork before logging the refcount btree intent
and the bmap remap intent.  With such a rework in place, we would
effectively be doing the remap in order of increasing file offset,
instead of forwards-backwards like we do now.

If you want to chase that, you might also want to look at this patch[1]
that eliminates the need to requeue bunmap operations.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=reflink-speedups&id=323b0acb563ea50bfff79801083f5ba8544c7985

> That means that while inter-ioend completion takes some care (even if
> not necessarily guaranteed) to order on-disk size updates against
> completion within the associated i_size, a size update along with the
> COW fork remap transaction could result in the opposite in some cases
> (i.e. if the cow fork is fragmented relative to the data fork). I
> suspect this is a rare situation and not a critical problem, but at the
> same time I find it a bit odd to implement intra-ioend ordering
> inconsistent with the broader inter-ioend logic.
> 
> I could probably be convinced otherwise, but I think this warrants more
> thought and is not necessarily a straightforward cleanup. I'm also not
> sure how common disk size updates via COW fork I/O completion really are
> in the first place, so I'm not sure it's worth the oddness to save an
> extra transaction. In any event, I'll probably leave this off from this
> series for now because it doesn't have much to do with fixing the
> deadlock problem and can revisit it if anybody has thoughts on any of
> the above..

The only case I can think of is an extending write to a reflinked file
if (say) you snapshot /var on a live system.

--D

> 
> Brian
> 
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > --D
> > 
> > >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> > >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > > -	else
> > > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> > >  
> > >  done:
> > > -	if (ioend->io_private)
> > > -		error = xfs_setfilesize_ioend(ioend, error);
> > > +	if (!error && xfs_ioend_is_append(ioend))
> > > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> > >  	iomap_finish_ioends(ioend, error);
> > >  	memalloc_nofs_restore(nofs_flag);
> > >  }
> > > @@ -237,7 +208,7 @@ xfs_end_io(
> > >  
> > >  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> > >  {
> > > -	return ioend->io_private ||
> > > +	return xfs_ioend_is_append(ioend) ||
> > >  		ioend->io_type == IOMAP_UNWRITTEN ||
> > >  		(ioend->io_flags & IOMAP_F_SHARED);
> > >  }
> > > @@ -250,8 +221,6 @@ xfs_end_bio(
> > >  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> > >  	unsigned long		flags;
> > >  
> > > -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> > > -
> > >  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
> > >  	if (list_empty(&ip->i_ioend_list))
> > >  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> > > @@ -501,14 +470,6 @@ xfs_prepare_ioend(
> > >  				ioend->io_offset, ioend->io_size);
> > >  	}
> > >  
> > > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > > -	if (!status &&
> > > -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> > > -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> > > -	    xfs_ioend_is_append(ioend) &&
> > > -	    !ioend->io_private)
> > > -		status = xfs_setfilesize_trans_alloc(ioend);
> > > -
> > >  	memalloc_nofs_restore(nofs_flag);
> > >  
> > >  	if (xfs_ioend_needs_workqueue(ioend))
> > > -- 
> > > 2.26.3
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-04-09 16:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-05 14:58 [PATCH 0/4] xfs: ioend batching log reservation deadlock Brian Foster
2021-04-05 14:59 ` [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Brian Foster
2021-04-07  6:33   ` Christoph Hellwig
2021-04-07 11:23     ` Brian Foster
2021-04-07 15:31   ` Darrick J. Wong
2021-04-09 13:47     ` Brian Foster
2021-04-09 16:07       ` Darrick J. Wong
2021-04-05 14:59 ` [PATCH 2/4] xfs: open code ioend needs workqueue helper Brian Foster
2021-04-07  6:34   ` Christoph Hellwig
2021-04-07 11:24     ` Brian Foster
2021-04-07 15:23       ` Darrick J. Wong
2021-04-07 15:42   ` Darrick J. Wong
2021-04-05 14:59 ` [PATCH 3/4] xfs: drop unused ioend private merge and setfilesize code Brian Foster
2021-04-05 17:55   ` Christoph Hellwig
2021-04-05 18:08     ` Brian Foster
2021-04-07  6:36   ` Christoph Hellwig
2021-04-07 15:40   ` Darrick J. Wong
2021-04-05 14:59 ` [PATCH 4/4] xfs: drop unnecessary setfilesize helper Brian Foster
2021-04-07  6:37   ` Christoph Hellwig
2021-04-07 15:40   ` Darrick J. Wong
2021-04-06 10:27 ` [PATCH 5/4] iomap: remove unused private field from ioend Brian Foster
2021-04-07  6:40   ` Christoph Hellwig
2021-04-07 15:36   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.