All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix an ABBA deadlock in xfs_rename
@ 2021-01-04 19:44 Darrick J. Wong
  2021-01-04 19:51 ` Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Darrick J. Wong @ 2021-01-04 19:44 UTC (permalink / raw)
  To: wenli xie; +Cc: xfs, chiluk, Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

When overlayfs is running on top of xfs and the user unlinks a file in
the overlay, overlayfs will create a whiteout inode and ask xfs to
"rename" the whiteout file atop the one being unlinked.  If the file
being unlinked loses its one nlink, we then have to put the inode on the
unlinked list.

This requires us to grab the AGI buffer of the whiteout inode to take it
off the unlinked list (which is where whiteouts are created) and to grab
the AGI buffer of the file being deleted.  If the whiteout was created
in a higher numbered AG than the file being deleted, we'll lock the AGIs
in the wrong order and deadlock.

Therefore, grab all the AGI locks we think we'll need ahead of time, and
in the correct order.

Reported-by: wenli xie <wlxie7296@gmail.com>
Tested-by: wenli xie <wlxie7296@gmail.com>
Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b7352bc4c815..dd419a1bc6ba 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
 	return 0;
 }
 
+/*
+ * For the general case of renaming files, lock all the AGI buffers we need to
+ * handle bumping the nlink of the whiteout inode off the unlinked list and to
+ * handle dropping the nlink of the target inode.  We have to do this in
+ * increasing AG order to avoid deadlocks.
+ */
+static int
+xfs_rename_lock_agis(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*wip,
+	struct xfs_inode	*target_ip)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_buf		*bp;
+	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
+	int			error;
+
+	if (wip)
+		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
+
+	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
+		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
+
+	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
+	    agi_locks[0] > agi_locks[1])
+		swap(agi_locks[0], agi_locks[1]);
+
+	if (agi_locks[0] != NULLAGNUMBER) {
+		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
+		if (error)
+			return error;
+	}
+
+	if (agi_locks[1] != NULLAGNUMBER) {
+		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /*
  * xfs_rename
  */
@@ -3130,6 +3172,10 @@ xfs_rename(
 		}
 	}
 
+	error = xfs_rename_lock_agis(tp, wip, target_ip);
+	if (error)
+		return error;
+
 	/*
 	 * Directory entry creation below may acquire the AGF. Remove
 	 * the whiteout from the unlinked list first to preserve correct

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-04 19:44 [PATCH] xfs: fix an ABBA deadlock in xfs_rename Darrick J. Wong
@ 2021-01-04 19:51 ` Darrick J. Wong
  2021-01-04 20:27 ` Brian Foster
  2021-01-05 22:12 ` Dave Chinner
  2 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2021-01-04 19:51 UTC (permalink / raw)
  To: wenli xie; +Cc: xfs, chiluk, Brian Foster

On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When overlayfs is running on top of xfs and the user unlinks a file in
> the overlay, overlayfs will create a whiteout inode and ask xfs to
> "rename" the whiteout file atop the one being unlinked.  If the file
> being unlinked loses its one nlink, we then have to put the inode on the
> unlinked list.
> 
> This requires us to grab the AGI buffer of the whiteout inode to take it
> off the unlinked list (which is where whiteouts are created) and to grab
> the AGI buffer of the file being deleted.  If the whiteout was created
> in a higher numbered AG than the file being deleted, we'll lock the AGIs
> in the wrong order and deadlock.
> 
> Therefore, grab all the AGI locks we think we'll need ahead of time, and
> in the correct order.
> 
> Reported-by: wenli xie <wlxie7296@gmail.com>
> Tested-by: wenli xie <wlxie7296@gmail.com>
> Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index b7352bc4c815..dd419a1bc6ba 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
>  	return 0;
>  }
>  
> +/*
> + * For the general case of renaming files, lock all the AGI buffers we need to
> + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> + * handle dropping the nlink of the target inode.  We have to do this in
> + * increasing AG order to avoid deadlocks.

One thing that occurred to me 5 seconds after hitting Send is that we
can still screw up the locking order if we grab even one AGI and the
dirent operations require the allocation of a new block for the
directory.  I /think/ the solution to that is to set tp->t_firstblock to
prevent the allocation from happening in a lower AG, though it's too bad
we can't just carve up rename operations into multiple smaller
transactions...

--D

> + */
> +static int
> +xfs_rename_lock_agis(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*wip,
> +	struct xfs_inode	*target_ip)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_buf		*bp;
> +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> +	int			error;
> +
> +	if (wip)
> +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> +
> +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> +
> +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> +	    agi_locks[0] > agi_locks[1])
> +		swap(agi_locks[0], agi_locks[1]);
> +
> +	if (agi_locks[0] != NULLAGNUMBER) {
> +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	if (agi_locks[1] != NULLAGNUMBER) {
> +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * xfs_rename
>   */
> @@ -3130,6 +3172,10 @@ xfs_rename(
>  		}
>  	}
>  
> +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> +	if (error)
> +		return error;
> +
>  	/*
>  	 * Directory entry creation below may acquire the AGF. Remove
>  	 * the whiteout from the unlinked list first to preserve correct

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-04 19:44 [PATCH] xfs: fix an ABBA deadlock in xfs_rename Darrick J. Wong
  2021-01-04 19:51 ` Darrick J. Wong
@ 2021-01-04 20:27 ` Brian Foster
  2021-01-05  1:14   ` Darrick J. Wong
  2021-01-05 22:12 ` Dave Chinner
  2 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2021-01-04 20:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: wenli xie, xfs, chiluk

On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When overlayfs is running on top of xfs and the user unlinks a file in
> the overlay, overlayfs will create a whiteout inode and ask xfs to
> "rename" the whiteout file atop the one being unlinked.  If the file
> being unlinked loses its one nlink, we then have to put the inode on the
> unlinked list.
> 
> This requires us to grab the AGI buffer of the whiteout inode to take it
> off the unlinked list (which is where whiteouts are created) and to grab
> the AGI buffer of the file being deleted.  If the whiteout was created
> in a higher numbered AG than the file being deleted, we'll lock the AGIs
> in the wrong order and deadlock.
> 
> Therefore, grab all the AGI locks we think we'll need ahead of time, and
> in the correct order.
> 
> Reported-by: wenli xie <wlxie7296@gmail.com>
> Tested-by: wenli xie <wlxie7296@gmail.com>
> Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index b7352bc4c815..dd419a1bc6ba 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
>  	return 0;
>  }
>  
> +/*
> + * For the general case of renaming files, lock all the AGI buffers we need to
> + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> + * handle dropping the nlink of the target inode.  We have to do this in
> + * increasing AG order to avoid deadlocks.
> + */
> +static int
> +xfs_rename_lock_agis(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*wip,
> +	struct xfs_inode	*target_ip)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_buf		*bp;
> +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> +	int			error;
> +
> +	if (wip)
> +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> +
> +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> +
> +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> +	    agi_locks[0] > agi_locks[1])
> +		swap(agi_locks[0], agi_locks[1]);
> +
> +	if (agi_locks[0] != NULLAGNUMBER) {
> +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	if (agi_locks[1] != NULLAGNUMBER) {
> +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	return 0;
> +}

This all looks reasonable to me, but I wonder if we can simplify
a bit by reusing the sorted inodes array we've already created earlier
in xfs_rename(). E.g., something like:

	for (i = 0; i < num_inodes; i++) {
		if (inodes[i] != wip && inodes[i] != target_ip)
			continue;
		error = xfs_read_agi(...);
		...
	}

IOW, similar to how xfs_lock_inodes() and xfs_qm_vop_rename_dqattach()
work.

Brian

> +
>  /*
>   * xfs_rename
>   */
> @@ -3130,6 +3172,10 @@ xfs_rename(
>  		}
>  	}
>  
> +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> +	if (error)
> +		return error;
> +
>  	/*
>  	 * Directory entry creation below may acquire the AGF. Remove
>  	 * the whiteout from the unlinked list first to preserve correct
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-04 20:27 ` Brian Foster
@ 2021-01-05  1:14   ` Darrick J. Wong
  2021-01-05  9:01     ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-01-05  1:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: wenli xie, xfs, chiluk

On Mon, Jan 04, 2021 at 03:27:14PM -0500, Brian Foster wrote:
> On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When overlayfs is running on top of xfs and the user unlinks a file in
> > the overlay, overlayfs will create a whiteout inode and ask xfs to
> > "rename" the whiteout file atop the one being unlinked.  If the file
> > being unlinked loses its one nlink, we then have to put the inode on the
> > unlinked list.
> > 
> > This requires us to grab the AGI buffer of the whiteout inode to take it
> > off the unlinked list (which is where whiteouts are created) and to grab
> > the AGI buffer of the file being deleted.  If the whiteout was created
> > in a higher numbered AG than the file being deleted, we'll lock the AGIs
> > in the wrong order and deadlock.
> > 
> > Therefore, grab all the AGI locks we think we'll need ahead of time, and
> > in the correct order.
> > 
> > Reported-by: wenli xie <wlxie7296@gmail.com>
> > Tested-by: wenli xie <wlxie7296@gmail.com>
> > Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 46 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index b7352bc4c815..dd419a1bc6ba 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
> >  	return 0;
> >  }
> >  
> > +/*
> > + * For the general case of renaming files, lock all the AGI buffers we need to
> > + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> > + * handle dropping the nlink of the target inode.  We have to do this in
> > + * increasing AG order to avoid deadlocks.
> > + */
> > +static int
> > +xfs_rename_lock_agis(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_inode	*wip,
> > +	struct xfs_inode	*target_ip)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_buf		*bp;
> > +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> > +	int			error;
> > +
> > +	if (wip)
> > +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> > +
> > +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> > +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> > +
> > +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> > +	    agi_locks[0] > agi_locks[1])
> > +		swap(agi_locks[0], agi_locks[1]);
> > +
> > +	if (agi_locks[0] != NULLAGNUMBER) {
> > +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	if (agi_locks[1] != NULLAGNUMBER) {
> > +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> This all looks reasonable to me, but I wonder if we can simplify
> a bit by reusing the sorted inodes array we've already created earlier
> in xfs_rename(). E.g., something like:
> 
> 	for (i = 0; i < num_inodes; i++) {
> 		if (inodes[i] != wip && inodes[i] != target_ip)
> 			continue;
> 		error = xfs_read_agi(...);
> 		...
> 	}
> 
> IOW, similar to how xfs_lock_inodes() and xfs_qm_vop_rename_dqattach()
> work.

I think it would be difficult to do that because we only need to grab
target_ip's AGI if we're going to droplink it, and we haven't yet taken
target_ip's ILOCK when we invoke the sorting hat so the link count isn't
stable.

--D

> Brian
> 
> > +
> >  /*
> >   * xfs_rename
> >   */
> > @@ -3130,6 +3172,10 @@ xfs_rename(
> >  		}
> >  	}
> >  
> > +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> > +	if (error)
> > +		return error;
> > +
> >  	/*
> >  	 * Directory entry creation below may acquire the AGF. Remove
> >  	 * the whiteout from the unlinked list first to preserve correct
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-05  1:14   ` Darrick J. Wong
@ 2021-01-05  9:01     ` Brian Foster
  2021-01-05 17:31       ` Darrick J. Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2021-01-05  9:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: wenli xie, xfs, chiluk

On Mon, Jan 04, 2021 at 05:14:32PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 04, 2021 at 03:27:14PM -0500, Brian Foster wrote:
> > On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > When overlayfs is running on top of xfs and the user unlinks a file in
> > > the overlay, overlayfs will create a whiteout inode and ask xfs to
> > > "rename" the whiteout file atop the one being unlinked.  If the file
> > > being unlinked loses its one nlink, we then have to put the inode on the
> > > unlinked list.
> > > 
> > > This requires us to grab the AGI buffer of the whiteout inode to take it
> > > off the unlinked list (which is where whiteouts are created) and to grab
> > > the AGI buffer of the file being deleted.  If the whiteout was created
> > > in a higher numbered AG than the file being deleted, we'll lock the AGIs
> > > in the wrong order and deadlock.
> > > 
> > > Therefore, grab all the AGI locks we think we'll need ahead of time, and
> > > in the correct order.
> > > 
> > > Reported-by: wenli xie <wlxie7296@gmail.com>
> > > Tested-by: wenli xie <wlxie7296@gmail.com>
> > > Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 46 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index b7352bc4c815..dd419a1bc6ba 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
> > >  	return 0;
> > >  }
> > >  
> > > +/*
> > > + * For the general case of renaming files, lock all the AGI buffers we need to
> > > + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> > > + * handle dropping the nlink of the target inode.  We have to do this in
> > > + * increasing AG order to avoid deadlocks.
> > > + */
> > > +static int
> > > +xfs_rename_lock_agis(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_inode	*wip,
> > > +	struct xfs_inode	*target_ip)
> > > +{
> > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > +	struct xfs_buf		*bp;
> > > +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> > > +	int			error;
> > > +
> > > +	if (wip)
> > > +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> > > +
> > > +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> > > +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> > > +
> > > +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> > > +	    agi_locks[0] > agi_locks[1])
> > > +		swap(agi_locks[0], agi_locks[1]);
> > > +
> > > +	if (agi_locks[0] != NULLAGNUMBER) {
> > > +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	if (agi_locks[1] != NULLAGNUMBER) {
> > > +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > 
> > This all looks reasonable to me, but I wonder if we can simplify
> > a bit by reusing the sorted inodes array we've already created earlier
> > in xfs_rename(). E.g., something like:
> > 
> > 	for (i = 0; i < num_inodes; i++) {
> > 		if (inodes[i] != wip && inodes[i] != target_ip)
> > 			continue;
> > 		error = xfs_read_agi(...);
> > 		...
> > 	}
> > 
> > IOW, similar to how xfs_lock_inodes() and xfs_qm_vop_rename_dqattach()
> > work.
> 
> I think it would be difficult to do that because we only need to grab
> target_ip's AGI if we're going to droplink it, and we haven't yet taken
> target_ip's ILOCK when we invoke the sorting hat so the link count isn't
> stable.
> 

I'm not following how using the inodes array affects this.
xfs_sort_for_rename() simply puts the inodes in inode number order. That
sorted array is reused for various purposes that require that ordering
information (such as acquiring inode locks in the first place). This
patch duplicates a subset of that sorting logic for the agnos of wip and
target_ip to ensure the AGIs are read (if necessary) in order.

The suggestion above would just refer to the already sorted array to
establish order of the associated AGI reads rather than checking and
sorting the agnos explicitly. This would still occur in
xfs_rename_lock_agis() where inode locks have already been acquired, and
so ISTM that the logic could be enhanced to also consider ->i_nlink just
as the original patch does. Hm?

Brian

> --D
> 
> > Brian
> > 
> > > +
> > >  /*
> > >   * xfs_rename
> > >   */
> > > @@ -3130,6 +3172,10 @@ xfs_rename(
> > >  		}
> > >  	}
> > >  
> > > +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> > > +	if (error)
> > > +		return error;
> > > +
> > >  	/*
> > >  	 * Directory entry creation below may acquire the AGF. Remove
> > >  	 * the whiteout from the unlinked list first to preserve correct
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-05  9:01     ` Brian Foster
@ 2021-01-05 17:31       ` Darrick J. Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2021-01-05 17:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: wenli xie, xfs, chiluk

On Tue, Jan 05, 2021 at 04:01:19AM -0500, Brian Foster wrote:
> On Mon, Jan 04, 2021 at 05:14:32PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 04, 2021 at 03:27:14PM -0500, Brian Foster wrote:
> > > On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > When overlayfs is running on top of xfs and the user unlinks a file in
> > > > the overlay, overlayfs will create a whiteout inode and ask xfs to
> > > > "rename" the whiteout file atop the one being unlinked.  If the file
> > > > being unlinked loses its one nlink, we then have to put the inode on the
> > > > unlinked list.
> > > > 
> > > > This requires us to grab the AGI buffer of the whiteout inode to take it
> > > > off the unlinked list (which is where whiteouts are created) and to grab
> > > > the AGI buffer of the file being deleted.  If the whiteout was created
> > > > in a higher numbered AG than the file being deleted, we'll lock the AGIs
> > > > in the wrong order and deadlock.
> > > > 
> > > > Therefore, grab all the AGI locks we think we'll need ahead of time, and
> > > > in the correct order.
> > > > 
> > > > Reported-by: wenli xie <wlxie7296@gmail.com>
> > > > Tested-by: wenli xie <wlxie7296@gmail.com>
> > > > Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 46 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > index b7352bc4c815..dd419a1bc6ba 100644
> > > > --- a/fs/xfs/xfs_inode.c
> > > > +++ b/fs/xfs/xfs_inode.c
> > > > @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +/*
> > > > + * For the general case of renaming files, lock all the AGI buffers we need to
> > > > + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> > > > + * handle dropping the nlink of the target inode.  We have to do this in
> > > > + * increasing AG order to avoid deadlocks.
> > > > + */
> > > > +static int
> > > > +xfs_rename_lock_agis(
> > > > +	struct xfs_trans	*tp,
> > > > +	struct xfs_inode	*wip,
> > > > +	struct xfs_inode	*target_ip)
> > > > +{
> > > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > > +	struct xfs_buf		*bp;
> > > > +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> > > > +	int			error;
> > > > +
> > > > +	if (wip)
> > > > +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> > > > +
> > > > +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> > > > +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> > > > +
> > > > +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> > > > +	    agi_locks[0] > agi_locks[1])
> > > > +		swap(agi_locks[0], agi_locks[1]);
> > > > +
> > > > +	if (agi_locks[0] != NULLAGNUMBER) {
> > > > +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> > > > +		if (error)
> > > > +			return error;
> > > > +	}
> > > > +
> > > > +	if (agi_locks[1] != NULLAGNUMBER) {
> > > > +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> > > > +		if (error)
> > > > +			return error;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > 
> > > This all looks reasonable to me, but I wonder if we can simplify
> > > a bit by reusing the sorted inodes array we've already created earlier
> > > in xfs_rename(). E.g., something like:
> > > 
> > > 	for (i = 0; i < num_inodes; i++) {
> > > 		if (inodes[i] != wip && inodes[i] != target_ip)
> > > 			continue;
> > > 		error = xfs_read_agi(...);
> > > 		...
> > > 	}
> > > 
> > > IOW, similar to how xfs_lock_inodes() and xfs_qm_vop_rename_dqattach()
> > > work.
> > 
> > I think it would be difficult to do that because we only need to grab
> > target_ip's AGI if we're going to droplink it, and we haven't yet taken
> > target_ip's ILOCK when we invoke the sorting hat so the link count isn't
> > stable.
> > 
> 
> I'm not following how using the inodes array affects this.
> xfs_sort_for_rename() simply puts the inodes in inode number order. That
> sorted array is reused for various purposes that require that ordering
> information (such as acquiring inode locks in the first place). This
> patch duplicates a subset of that sorting logic for the agnos of wip and
> target_ip to ensure the AGIs are read (if necessary) in order.
> 
> The suggestion above would just refer to the already sorted array to
> establish order of the associated AGI reads rather than checking and
> sorting the agnos explicitly. This would still occur in
> xfs_rename_lock_agis() where inode locks have already been acquired, and
> so ISTM that the logic could be enhanced to also consider ->i_nlink just
> as the original patch does. Hm?

*OH* you were asking if I could pass the inodes[] array to lock_agis,
not if I could lock AGIs in the sorting function!

Yes, that would cut out a fair amount of code, thanks for the
suggestion!

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > +
> > > >  /*
> > > >   * xfs_rename
> > > >   */
> > > > @@ -3130,6 +3172,10 @@ xfs_rename(
> > > >  		}
> > > >  	}
> > > >  
> > > > +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > >  	/*
> > > >  	 * Directory entry creation below may acquire the AGF. Remove
> > > >  	 * the whiteout from the unlinked list first to preserve correct
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-04 19:44 [PATCH] xfs: fix an ABBA deadlock in xfs_rename Darrick J. Wong
  2021-01-04 19:51 ` Darrick J. Wong
  2021-01-04 20:27 ` Brian Foster
@ 2021-01-05 22:12 ` Dave Chinner
  2021-01-06  0:26   ` Darrick J. Wong
  2 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2021-01-05 22:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: wenli xie, xfs, chiluk, Brian Foster

On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When overlayfs is running on top of xfs and the user unlinks a file in
> the overlay, overlayfs will create a whiteout inode and ask xfs to
> "rename" the whiteout file atop the one being unlinked.  If the file
> being unlinked loses its one nlink, we then have to put the inode on the
> unlinked list.
> 
> This requires us to grab the AGI buffer of the whiteout inode to take it
> off the unlinked list (which is where whiteouts are created) and to grab
> the AGI buffer of the file being deleted.  If the whiteout was created
> in a higher numbered AG than the file being deleted, we'll lock the AGIs
> in the wrong order and deadlock.
> 
> Therefore, grab all the AGI locks we think we'll need ahead of time, and
> in the correct order.
> 
> Reported-by: wenli xie <wlxie7296@gmail.com>
> Tested-by: wenli xie <wlxie7296@gmail.com>
> Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)

Hmmm.

> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index b7352bc4c815..dd419a1bc6ba 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
>  	return 0;
>  }
>  
> +/*
> + * For the general case of renaming files, lock all the AGI buffers we need to
> + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> + * handle dropping the nlink of the target inode.  We have to do this in
> + * increasing AG order to avoid deadlocks.
> + */
> +static int
> +xfs_rename_lock_agis(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*wip,
> +	struct xfs_inode	*target_ip)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_buf		*bp;
> +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> +	int			error;
> +
> +	if (wip)
> +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> +
> +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> +
> +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> +	    agi_locks[0] > agi_locks[1])
> +		swap(agi_locks[0], agi_locks[1]);
> +
> +	if (agi_locks[0] != NULLAGNUMBER) {
> +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	if (agi_locks[1] != NULLAGNUMBER) {
> +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * xfs_rename
>   */
> @@ -3130,6 +3172,10 @@ xfs_rename(
>  		}
>  	}
>  
> +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> +	if (error)
> +		return error;
> +
>  	/*
>  	 * Directory entry creation below may acquire the AGF. Remove
>  	 * the whiteout from the unlinked list first to preserve correct
> 

So the comment below this new hunk is all about AGI vs AGF ordering
and how we do the unlink first to grab the AGI before the AGF. But
noew we are adding explicit AGI locking for the case where unlink
list locking is required, thereby largely invalidating the need
for special casing the unlink list removal right up front.

Next question: The target_ip == NULL case below this (the
xfs_dir_repace() case) does this:

	/*
	 * Check whether the replace operation will need to allocate
	 * blocks.  This happens when the shortform directory lacks
	 * space and we have to convert it to a block format directory.
	 * When more blocks are necessary, we must lock the AGI first
	 * to preserve locking order (AGI -> AGF).
	 */
	if (xfs_dir2_sf_replace_needblock(target_dp, src_ip->i_ino)) {
		error = xfs_read_agi(mp, tp,
				XFS_INO_TO_AGNO(mp, target_ip->i_ino),
				&agibp);
		if (error)
			goto out_trans_cancel;
	}

IOWs, if we are actually locking AGIs up front, this can go away,
yes?

Seems to me that we should actually do a proper job of formalising
the locking in the rename code, not hack another special case into
it and keep all the other special case hacks that could go away if
the whole AGI/AGF locking order thing were done up front....

And with it formalised, we can then think about how to get rid of
those lock order dependecies altogether....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix an ABBA deadlock in xfs_rename
  2021-01-05 22:12 ` Dave Chinner
@ 2021-01-06  0:26   ` Darrick J. Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2021-01-06  0:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: wenli xie, xfs, chiluk, Brian Foster

On Wed, Jan 06, 2021 at 09:12:47AM +1100, Dave Chinner wrote:
> On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When overlayfs is running on top of xfs and the user unlinks a file in
> > the overlay, overlayfs will create a whiteout inode and ask xfs to
> > "rename" the whiteout file atop the one being unlinked.  If the file
> > being unlinked loses its one nlink, we then have to put the inode on the
> > unlinked list.
> > 
> > This requires us to grab the AGI buffer of the whiteout inode to take it
> > off the unlinked list (which is where whiteouts are created) and to grab
> > the AGI buffer of the file being deleted.  If the whiteout was created
> > in a higher numbered AG than the file being deleted, we'll lock the AGIs
> > in the wrong order and deadlock.
> > 
> > Therefore, grab all the AGI locks we think we'll need ahead of time, and
> > in the correct order.
> > 
> > Reported-by: wenli xie <wlxie7296@gmail.com>
> > Tested-by: wenli xie <wlxie7296@gmail.com>
> > Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 46 insertions(+)
> 
> Hmmm.
> 
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index b7352bc4c815..dd419a1bc6ba 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
> >  	return 0;
> >  }
> >  
> > +/*
> > + * For the general case of renaming files, lock all the AGI buffers we need to
> > + * handle bumping the nlink of the whiteout inode off the unlinked list and to
> > + * handle dropping the nlink of the target inode.  We have to do this in
> > + * increasing AG order to avoid deadlocks.
> > + */
> > +static int
> > +xfs_rename_lock_agis(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_inode	*wip,
> > +	struct xfs_inode	*target_ip)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_buf		*bp;
> > +	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
> > +	int			error;
> > +
> > +	if (wip)
> > +		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
> > +
> > +	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
> > +		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
> > +
> > +	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
> > +	    agi_locks[0] > agi_locks[1])
> > +		swap(agi_locks[0], agi_locks[1]);
> > +
> > +	if (agi_locks[0] != NULLAGNUMBER) {
> > +		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	if (agi_locks[1] != NULLAGNUMBER) {
> > +		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * xfs_rename
> >   */
> > @@ -3130,6 +3172,10 @@ xfs_rename(
> >  		}
> >  	}
> >  
> > +	error = xfs_rename_lock_agis(tp, wip, target_ip);
> > +	if (error)
> > +		return error;
> > +
> >  	/*
> >  	 * Directory entry creation below may acquire the AGF. Remove
> >  	 * the whiteout from the unlinked list first to preserve correct
> > 
> 
> So the comment below this new hunk is all about AGI vs AGF ordering
> and how we do the unlink first to grab the AGI before the AGF. But
> noew we are adding explicit AGI locking for the case where unlink
> list locking is required, thereby largely invalidating the need
> for special casing the unlink list removal right up front.

Yeah.  If I had my way I'd refactor the bumplink/droplink operations
into deferred log items so that we wouldn't have to think so hard about
locking order.  That's a /lot/ of extra code though.

> Next question: The target_ip == NULL case below this (the
> xfs_dir_repace() case) does this:
> 
> 	/*
> 	 * Check whether the replace operation will need to allocate
> 	 * blocks.  This happens when the shortform directory lacks
> 	 * space and we have to convert it to a block format directory.
> 	 * When more blocks are necessary, we must lock the AGI first
> 	 * to preserve locking order (AGI -> AGF).
> 	 */
> 	if (xfs_dir2_sf_replace_needblock(target_dp, src_ip->i_ino)) {
> 		error = xfs_read_agi(mp, tp,
> 				XFS_INO_TO_AGNO(mp, target_ip->i_ino),
> 				&agibp);
> 		if (error)
> 			goto out_trans_cancel;
> 	}
> 
> IOWs, if we are actually locking AGIs up front, this can go away,
> yes?

Right.

> Seems to me that we should actually do a proper job of formalising
> the locking in the rename code, not hack another special case into
> it and keep all the other special case hacks that could go away if
> the whole AGI/AGF locking order thing were done up front....

Hm.  I don't know how you'd do explicit AGF locking up front because the
AG is selected by the block allocator.  I think we can set t_firstblock
to trick the allocator into skipping the AGs before max(wip, target_ip),
but I don't see how we could get any closer than that?

I guess the downside is that locking the AGIs ahead of time means that
our allocation choices are severely constrained if either inode is in
the last AG.  We could try to reduce the likelihood of that by making
xfs_ialloc_ag_select start in AG 0 for whiteout creations since
RENAME_WHITEOUT is the only creation path, I think.  But that would
still leave us vulnerable to ENOSPC shutdowns if the last AGs are
totally full.

--D

> And with it formalised, we can then think about how to get rid of
> those lock order dependecies altogether....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-06  0:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 19:44 [PATCH] xfs: fix an ABBA deadlock in xfs_rename Darrick J. Wong
2021-01-04 19:51 ` Darrick J. Wong
2021-01-04 20:27 ` Brian Foster
2021-01-05  1:14   ` Darrick J. Wong
2021-01-05  9:01     ` Brian Foster
2021-01-05 17:31       ` Darrick J. Wong
2021-01-05 22:12 ` Dave Chinner
2021-01-06  0:26   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.