All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
@ 2019-11-05  9:52 kaixuxia
  2019-11-06  4:56 ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: kaixuxia @ 2019-11-05  9:52 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong, bfoster, newtongao, jasperwang

When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
need to hold the AGF lock to allocate more blocks, and then invoking
the xfs_droplink() call to hold AGI lock to drop target_ip onto the
unlinked list, so we get the lock order AGF->AGI. This would break the
ordering constraint on AGI and AGF locking - inode allocation locks
the AGI, then can allocate a new extent for new inodes, locking the
AGF after the AGI.

In this patch we check whether the replace operation need more
blocks firstly. If so, acquire the agi lock firstly to preserve
locking order(AGI/AGF). Actually, the locking order problem only
occurs when we are locking the AGI/AGF of the same AG. For multiple
AGs the AGI lock will be released after the transaction committed.

Signed-off-by: kaixuxia <kaixuxia@tencent.com>
---
Changes in v2:
 - Add xfs_dir2_sf_replace_needblock() helper in
   xfs_dir2_sf.c.

 fs/xfs/libxfs/xfs_dir2.c      | 23 +++++++++++++++++++++++
 fs/xfs/libxfs/xfs_dir2.h      |  2 ++
 fs/xfs/libxfs/xfs_dir2_priv.h |  2 ++
 fs/xfs/libxfs/xfs_dir2_sf.c   | 24 ++++++++++++++++++++++++
 fs/xfs/xfs_inode.c            | 14 ++++++++++++++
 5 files changed, 65 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 867c5de..1917990 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -463,6 +463,29 @@
 }
 
 /*
+ * Check whether the replace operation need more blocks. Ignore
+ * the parameters check since the real replace() call below will
+ * do that.
+ */
+bool
+xfs_dir_replace_needblock(
+	struct xfs_inode	*dp,
+	xfs_ino_t		inum)
+{
+	int			rval;
+
+	rval = xfs_dir_ino_validate(dp->i_mount, inum);
+	if (rval)
+		return false;
+
+	/*
+	 * Only convert the shortform directory to block form maybe
+	 * need more blocks.
+	 */
+	return xfs_dir2_sf_replace_needblock(dp, inum);
+}
+
+/*
  * Replace the inode number of a directory entry.
  */
 int
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index f542447..e436c14 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
 				xfs_extlen_t tot);
+extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
+				xfs_ino_t inum);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t inum,
 				xfs_extlen_t tot);
diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
index 59f9fb2..002103f 100644
--- a/fs/xfs/libxfs/xfs_dir2_priv.h
+++ b/fs/xfs/libxfs/xfs_dir2_priv.h
@@ -116,6 +116,8 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
 extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
 extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
 extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
+extern bool xfs_dir2_sf_replace_needblock(struct xfs_inode *dp,
+		xfs_ino_t inum);
 extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
 extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip);
 
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 85f14fc..0906f91 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -945,6 +945,30 @@ static int xfs_dir2_sf_addname_pick(xfs_da_args_t *args, int objchange,
 }
 
 /*
+ * Check whether the replace operation need more blocks.
+ */
+bool
+xfs_dir2_sf_replace_needblock(
+	struct xfs_inode	*dp,
+	xfs_ino_t		inum)
+{
+	int			newsize;
+	xfs_dir2_sf_hdr_t	*sfp;
+
+	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+		return false;
+
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
+
+	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
+	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
+		return true;
+	else
+		return false;
+}
+
+/*
  * Replace the inode number of an entry in a shortform directory.
  */
 int						/* error */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 18f4b26..c239070 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3196,6 +3196,7 @@ struct xfs_iunlink {
 	struct xfs_trans	*tp;
 	struct xfs_inode	*wip = NULL;		/* whiteout inode */
 	struct xfs_inode	*inodes[__XFS_SORT_INODES];
+	struct xfs_buf		*agibp;
 	int			num_inodes = __XFS_SORT_INODES;
 	bool			new_parent = (src_dp != target_dp);
 	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
@@ -3361,6 +3362,19 @@ struct xfs_iunlink {
 		 * In case there is already an entry with the same
 		 * name at the destination directory, remove it first.
 		 */
+
+		/*
+		 * Check whether the replace operation need more blocks.
+		 * If so, acquire the agi lock firstly to preserve locking
+		 * order(AGI/AGF).
+		 */
+		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
+			error = xfs_read_agi(mp, tp,
+					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
+			if (error)
+				goto out_trans_cancel;
+		}
+
 		error = xfs_dir_replace(tp, target_dp, target_name,
 					src_ip->i_ino, spaceres);
 		if (error)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-05  9:52 [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename() kaixuxia
@ 2019-11-06  4:56 ` Darrick J. Wong
  2019-11-06 12:49   ` Brian Foster
  2019-11-07  5:15   ` kaixuxia
  0 siblings, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2019-11-06  4:56 UTC (permalink / raw)
  To: kaixuxia; +Cc: linux-xfs, bfoster, newtongao, jasperwang

On Tue, Nov 05, 2019 at 05:52:12PM +0800, kaixuxia wrote:
> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
> need to hold the AGF lock to allocate more blocks, and then invoking
> the xfs_droplink() call to hold AGI lock to drop target_ip onto the
> unlinked list, so we get the lock order AGF->AGI. This would break the
> ordering constraint on AGI and AGF locking - inode allocation locks
> the AGI, then can allocate a new extent for new inodes, locking the
> AGF after the AGI.
> 
> In this patch we check whether the replace operation need more
> blocks firstly. If so, acquire the agi lock firstly to preserve
> locking order(AGI/AGF). Actually, the locking order problem only
> occurs when we are locking the AGI/AGF of the same AG. For multiple
> AGs the AGI lock will be released after the transaction committed.
> 
> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> ---
> Changes in v2:
>  - Add xfs_dir2_sf_replace_needblock() helper in
>    xfs_dir2_sf.c.
> 
>  fs/xfs/libxfs/xfs_dir2.c      | 23 +++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_dir2.h      |  2 ++
>  fs/xfs/libxfs/xfs_dir2_priv.h |  2 ++
>  fs/xfs/libxfs/xfs_dir2_sf.c   | 24 ++++++++++++++++++++++++
>  fs/xfs/xfs_inode.c            | 14 ++++++++++++++
>  5 files changed, 65 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 867c5de..1917990 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -463,6 +463,29 @@
>  }
>  
>  /*
> + * Check whether the replace operation need more blocks. Ignore
> + * the parameters check since the real replace() call below will
> + * do that.
> + */
> +bool
> +xfs_dir_replace_needblock(

xfs_dir2, to be consistent.

> +	struct xfs_inode	*dp,
> +	xfs_ino_t		inum)

If you passed the inode pointer (instead of ip->i_ino) here then you
don't need to revalidate the inode number.

> +{
> +	int			rval;
> +
> +	rval = xfs_dir_ino_validate(dp->i_mount, inum);
> +	if (rval)
> +		return false;
> +
> +	/*
> +	 * Only convert the shortform directory to block form maybe
> +	 * need more blocks.
> +	 */
> +	return xfs_dir2_sf_replace_needblock(dp, inum);

	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
		return xfs_dir2_sf_replace_needblock(...);

Also, do other directories formats need extra blocks allocated?

> +}
> +
> +/*
>   * Replace the inode number of a directory entry.
>   */
>  int
> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> index f542447..e436c14 100644
> --- a/fs/xfs/libxfs/xfs_dir2.h
> +++ b/fs/xfs/libxfs/xfs_dir2.h
> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
>  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
>  				struct xfs_name *name, xfs_ino_t ino,
>  				xfs_extlen_t tot);
> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
> +				xfs_ino_t inum);
>  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
>  				struct xfs_name *name, xfs_ino_t inum,
>  				xfs_extlen_t tot);
> diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
> index 59f9fb2..002103f 100644
> --- a/fs/xfs/libxfs/xfs_dir2_priv.h
> +++ b/fs/xfs/libxfs/xfs_dir2_priv.h
> @@ -116,6 +116,8 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
>  extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
>  extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
>  extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
> +extern bool xfs_dir2_sf_replace_needblock(struct xfs_inode *dp,
> +		xfs_ino_t inum);
>  extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
>  extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip);
>  
> diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
> index 85f14fc..0906f91 100644
> --- a/fs/xfs/libxfs/xfs_dir2_sf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
> @@ -945,6 +945,30 @@ static int xfs_dir2_sf_addname_pick(xfs_da_args_t *args, int objchange,
>  }
>  
>  /*
> + * Check whether the replace operation need more blocks.
> + */
> +bool
> +xfs_dir2_sf_replace_needblock(

Urgggh.  This is a predicate that we only ever call from xfs_rename(),
right?  And it addresses a particular quirk of the locking when the
caller wants us to rename on top of an existing entry and drop the link
count of the old inode, right?  So why can't this just be a predicate in
xfs_inode.c ?  Nobody else needs to know this particular piece of
information, AFAICT.

(Apologies, for Brian and I clearly aren't on the same page about
that...)

> +	struct xfs_inode	*dp,
> +	xfs_ino_t		inum)
> +{
> +	int			newsize;
> +	xfs_dir2_sf_hdr_t	*sfp;
> +
> +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> +		return false;

This check should be used up in xfs_dir2_replace_needblock() to decide
if we're calling xfs_dir2_sf_replace_needblock(), or just returning
false.

> +
> +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
> +
> +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
> +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
> +		return true;
> +	else
> +		return false;

return inum > XFS_DIR2_MAX_SHORT_INUM && (all the rest of that);

> +}
> +
> +/*
>   * Replace the inode number of an entry in a shortform directory.
>   */
>  int						/* error */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 18f4b26..c239070 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
>  	struct xfs_trans	*tp;
>  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
>  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> +	struct xfs_buf		*agibp;
>  	int			num_inodes = __XFS_SORT_INODES;
>  	bool			new_parent = (src_dp != target_dp);
>  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
>  		 * In case there is already an entry with the same
>  		 * name at the destination directory, remove it first.
>  		 */
> +
> +		/*
> +		 * Check whether the replace operation need more blocks.
> +		 * If so, acquire the agi lock firstly to preserve locking

                                               "first"

> +		 * order(AGI/AGF).

Nit: space between "order" and "(AGI/AGF)".
> +		 */
> +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
> +			error = xfs_read_agi(mp, tp,
> +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);

Overly long line here.

--D

> +			if (error)
> +				goto out_trans_cancel;
> +		}
> +
>  		error = xfs_dir_replace(tp, target_dp, target_name,
>  					src_ip->i_ino, spaceres);
>  		if (error)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-06  4:56 ` Darrick J. Wong
@ 2019-11-06 12:49   ` Brian Foster
  2019-11-06 15:46     ` Darrick J. Wong
  2019-11-07  5:15   ` kaixuxia
  1 sibling, 1 reply; 7+ messages in thread
From: Brian Foster @ 2019-11-06 12:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: kaixuxia, linux-xfs, newtongao, jasperwang

On Tue, Nov 05, 2019 at 08:56:30PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 05, 2019 at 05:52:12PM +0800, kaixuxia wrote:
> > When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
> > need to hold the AGF lock to allocate more blocks, and then invoking
> > the xfs_droplink() call to hold AGI lock to drop target_ip onto the
> > unlinked list, so we get the lock order AGF->AGI. This would break the
> > ordering constraint on AGI and AGF locking - inode allocation locks
> > the AGI, then can allocate a new extent for new inodes, locking the
> > AGF after the AGI.
> > 
> > In this patch we check whether the replace operation need more
> > blocks firstly. If so, acquire the agi lock firstly to preserve
> > locking order(AGI/AGF). Actually, the locking order problem only
> > occurs when we are locking the AGI/AGF of the same AG. For multiple
> > AGs the AGI lock will be released after the transaction committed.
> > 
> > Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> > ---
> > Changes in v2:
> >  - Add xfs_dir2_sf_replace_needblock() helper in
> >    xfs_dir2_sf.c.
> > 
> >  fs/xfs/libxfs/xfs_dir2.c      | 23 +++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_dir2.h      |  2 ++
> >  fs/xfs/libxfs/xfs_dir2_priv.h |  2 ++
> >  fs/xfs/libxfs/xfs_dir2_sf.c   | 24 ++++++++++++++++++++++++
> >  fs/xfs/xfs_inode.c            | 14 ++++++++++++++
> >  5 files changed, 65 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> > index 867c5de..1917990 100644
> > --- a/fs/xfs/libxfs/xfs_dir2.c
> > +++ b/fs/xfs/libxfs/xfs_dir2.c
> > @@ -463,6 +463,29 @@
> >  }
> >  
> >  /*
> > + * Check whether the replace operation need more blocks. Ignore
> > + * the parameters check since the real replace() call below will
> > + * do that.
> > + */
> > +bool
> > +xfs_dir_replace_needblock(
> 
> xfs_dir2, to be consistent.
> 
> > +	struct xfs_inode	*dp,
> > +	xfs_ino_t		inum)
> 
> If you passed the inode pointer (instead of ip->i_ino) here then you
> don't need to revalidate the inode number.
> 
> > +{
> > +	int			rval;
> > +
> > +	rval = xfs_dir_ino_validate(dp->i_mount, inum);
> > +	if (rval)
> > +		return false;
> > +
> > +	/*
> > +	 * Only convert the shortform directory to block form maybe
> > +	 * need more blocks.
> > +	 */
> > +	return xfs_dir2_sf_replace_needblock(dp, inum);
> 
> 	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> 		return xfs_dir2_sf_replace_needblock(...);
> 
> Also, do other directories formats need extra blocks allocated?
> 
> > +}
> > +
> > +/*
> >   * Replace the inode number of a directory entry.
> >   */
> >  int
> > diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> > index f542447..e436c14 100644
> > --- a/fs/xfs/libxfs/xfs_dir2.h
> > +++ b/fs/xfs/libxfs/xfs_dir2.h
> > @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
> >  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
> >  				struct xfs_name *name, xfs_ino_t ino,
> >  				xfs_extlen_t tot);
> > +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
> > +				xfs_ino_t inum);
> >  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
> >  				struct xfs_name *name, xfs_ino_t inum,
> >  				xfs_extlen_t tot);
> > diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
> > index 59f9fb2..002103f 100644
> > --- a/fs/xfs/libxfs/xfs_dir2_priv.h
> > +++ b/fs/xfs/libxfs/xfs_dir2_priv.h
> > @@ -116,6 +116,8 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
> >  extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
> >  extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
> >  extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
> > +extern bool xfs_dir2_sf_replace_needblock(struct xfs_inode *dp,
> > +		xfs_ino_t inum);
> >  extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
> >  extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip);
> >  
> > diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
> > index 85f14fc..0906f91 100644
> > --- a/fs/xfs/libxfs/xfs_dir2_sf.c
> > +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
> > @@ -945,6 +945,30 @@ static int xfs_dir2_sf_addname_pick(xfs_da_args_t *args, int objchange,
> >  }
> >  
> >  /*
> > + * Check whether the replace operation need more blocks.
> > + */
> > +bool
> > +xfs_dir2_sf_replace_needblock(
> 
> Urgggh.  This is a predicate that we only ever call from xfs_rename(),
> right?  And it addresses a particular quirk of the locking when the
> caller wants us to rename on top of an existing entry and drop the link
> count of the old inode, right?  So why can't this just be a predicate in
> xfs_inode.c ?  Nobody else needs to know this particular piece of
> information, AFAICT.
> 
> (Apologies, for Brian and I clearly aren't on the same page about
> that...)
> 

Hmm.. the crux of my feedback on the previous version was simply that if
we wanted to take this approach of pulling up lower level dir logic into
the higher level rename code, to simply factor out the existing checks
down in the dir replace code that currently trigger a format conversion,
and use that new helper in both places. That doesn't appear to be what
this patch does, and I'm not sure why there are now two new helpers that
each only have one caller instead of one new helper with two callers...

Brian

> > +	struct xfs_inode	*dp,
> > +	xfs_ino_t		inum)
> > +{
> > +	int			newsize;
> > +	xfs_dir2_sf_hdr_t	*sfp;
> > +
> > +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> > +		return false;
> 
> This check should be used up in xfs_dir2_replace_needblock() to decide
> if we're calling xfs_dir2_sf_replace_needblock(), or just returning
> false.
> 
> > +
> > +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> > +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
> > +
> > +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
> > +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
> > +		return true;
> > +	else
> > +		return false;
> 
> return inum > XFS_DIR2_MAX_SHORT_INUM && (all the rest of that);
> 
> > +}
> > +
> > +/*
> >   * Replace the inode number of an entry in a shortform directory.
> >   */
> >  int						/* error */
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 18f4b26..c239070 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
> >  	struct xfs_trans	*tp;
> >  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
> >  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> > +	struct xfs_buf		*agibp;
> >  	int			num_inodes = __XFS_SORT_INODES;
> >  	bool			new_parent = (src_dp != target_dp);
> >  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> > @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
> >  		 * In case there is already an entry with the same
> >  		 * name at the destination directory, remove it first.
> >  		 */
> > +
> > +		/*
> > +		 * Check whether the replace operation need more blocks.
> > +		 * If so, acquire the agi lock firstly to preserve locking
> 
>                                                "first"
> 
> > +		 * order(AGI/AGF).
> 
> Nit: space between "order" and "(AGI/AGF)".
> > +		 */
> > +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
> > +			error = xfs_read_agi(mp, tp,
> > +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
> 
> Overly long line here.
> 
> --D
> 
> > +			if (error)
> > +				goto out_trans_cancel;
> > +		}
> > +
> >  		error = xfs_dir_replace(tp, target_dp, target_name,
> >  					src_ip->i_ino, spaceres);
> >  		if (error)
> > -- 
> > 1.8.3.1
> > 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-06 12:49   ` Brian Foster
@ 2019-11-06 15:46     ` Darrick J. Wong
  2019-11-07  3:46       ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2019-11-06 15:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: kaixuxia, linux-xfs, newtongao, jasperwang

On Wed, Nov 06, 2019 at 07:49:32AM -0500, Brian Foster wrote:
> On Tue, Nov 05, 2019 at 08:56:30PM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 05, 2019 at 05:52:12PM +0800, kaixuxia wrote:
> > > When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
> > > need to hold the AGF lock to allocate more blocks, and then invoking
> > > the xfs_droplink() call to hold AGI lock to drop target_ip onto the
> > > unlinked list, so we get the lock order AGF->AGI. This would break the
> > > ordering constraint on AGI and AGF locking - inode allocation locks
> > > the AGI, then can allocate a new extent for new inodes, locking the
> > > AGF after the AGI.
> > > 
> > > In this patch we check whether the replace operation need more
> > > blocks firstly. If so, acquire the agi lock firstly to preserve
> > > locking order(AGI/AGF). Actually, the locking order problem only
> > > occurs when we are locking the AGI/AGF of the same AG. For multiple
> > > AGs the AGI lock will be released after the transaction committed.
> > > 
> > > Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> > > ---
> > > Changes in v2:
> > >  - Add xfs_dir2_sf_replace_needblock() helper in
> > >    xfs_dir2_sf.c.
> > > 
> > >  fs/xfs/libxfs/xfs_dir2.c      | 23 +++++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_dir2.h      |  2 ++
> > >  fs/xfs/libxfs/xfs_dir2_priv.h |  2 ++
> > >  fs/xfs/libxfs/xfs_dir2_sf.c   | 24 ++++++++++++++++++++++++
> > >  fs/xfs/xfs_inode.c            | 14 ++++++++++++++
> > >  5 files changed, 65 insertions(+)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> > > index 867c5de..1917990 100644
> > > --- a/fs/xfs/libxfs/xfs_dir2.c
> > > +++ b/fs/xfs/libxfs/xfs_dir2.c
> > > @@ -463,6 +463,29 @@
> > >  }
> > >  
> > >  /*
> > > + * Check whether the replace operation need more blocks. Ignore
> > > + * the parameters check since the real replace() call below will
> > > + * do that.
> > > + */
> > > +bool
> > > +xfs_dir_replace_needblock(
> > 
> > xfs_dir2, to be consistent.
> > 
> > > +	struct xfs_inode	*dp,
> > > +	xfs_ino_t		inum)
> > 
> > If you passed the inode pointer (instead of ip->i_ino) here then you
> > don't need to revalidate the inode number.
> > 
> > > +{
> > > +	int			rval;
> > > +
> > > +	rval = xfs_dir_ino_validate(dp->i_mount, inum);
> > > +	if (rval)
> > > +		return false;
> > > +
> > > +	/*
> > > +	 * Only convert the shortform directory to block form maybe
> > > +	 * need more blocks.
> > > +	 */
> > > +	return xfs_dir2_sf_replace_needblock(dp, inum);
> > 
> > 	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> > 		return xfs_dir2_sf_replace_needblock(...);
> > 
> > Also, do other directories formats need extra blocks allocated?
> > 
> > > +}
> > > +
> > > +/*
> > >   * Replace the inode number of a directory entry.
> > >   */
> > >  int
> > > diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> > > index f542447..e436c14 100644
> > > --- a/fs/xfs/libxfs/xfs_dir2.h
> > > +++ b/fs/xfs/libxfs/xfs_dir2.h
> > > @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
> > >  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
> > >  				struct xfs_name *name, xfs_ino_t ino,
> > >  				xfs_extlen_t tot);
> > > +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
> > > +				xfs_ino_t inum);
> > >  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
> > >  				struct xfs_name *name, xfs_ino_t inum,
> > >  				xfs_extlen_t tot);
> > > diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
> > > index 59f9fb2..002103f 100644
> > > --- a/fs/xfs/libxfs/xfs_dir2_priv.h
> > > +++ b/fs/xfs/libxfs/xfs_dir2_priv.h
> > > @@ -116,6 +116,8 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
> > >  extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
> > >  extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
> > >  extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
> > > +extern bool xfs_dir2_sf_replace_needblock(struct xfs_inode *dp,
> > > +		xfs_ino_t inum);
> > >  extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
> > >  extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip);
> > >  
> > > diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
> > > index 85f14fc..0906f91 100644
> > > --- a/fs/xfs/libxfs/xfs_dir2_sf.c
> > > +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
> > > @@ -945,6 +945,30 @@ static int xfs_dir2_sf_addname_pick(xfs_da_args_t *args, int objchange,
> > >  }
> > >  
> > >  /*
> > > + * Check whether the replace operation need more blocks.
> > > + */
> > > +bool
> > > +xfs_dir2_sf_replace_needblock(
> > 
> > Urgggh.  This is a predicate that we only ever call from xfs_rename(),
> > right?  And it addresses a particular quirk of the locking when the
> > caller wants us to rename on top of an existing entry and drop the link
> > count of the old inode, right?  So why can't this just be a predicate in
> > xfs_inode.c ?  Nobody else needs to know this particular piece of
> > information, AFAICT.
> > 
> > (Apologies, for Brian and I clearly aren't on the same page about
> > that...)
> > 
> 
> Hmm.. the crux of my feedback on the previous version was simply that if
> we wanted to take this approach of pulling up lower level dir logic into
> the higher level rename code, to simply factor out the existing checks
> down in the dir replace code that currently trigger a format conversion,
> and use that new helper in both places. That doesn't appear to be what
> this patch does, and I'm not sure why there are now two new helpers that
> each only have one caller instead of one new helper with two callers...

Aha, got it.  I'd wondered if that had been your intent. :)

--D

> Brian
> 
> > > +	struct xfs_inode	*dp,
> > > +	xfs_ino_t		inum)
> > > +{
> > > +	int			newsize;
> > > +	xfs_dir2_sf_hdr_t	*sfp;
> > > +
> > > +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> > > +		return false;
> > 
> > This check should be used up in xfs_dir2_replace_needblock() to decide
> > if we're calling xfs_dir2_sf_replace_needblock(), or just returning
> > false.
> > 
> > > +
> > > +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> > > +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
> > > +
> > > +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
> > > +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
> > > +		return true;
> > > +	else
> > > +		return false;
> > 
> > return inum > XFS_DIR2_MAX_SHORT_INUM && (all the rest of that);
> > 
> > > +}
> > > +
> > > +/*
> > >   * Replace the inode number of an entry in a shortform directory.
> > >   */
> > >  int						/* error */
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 18f4b26..c239070 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
> > >  	struct xfs_trans	*tp;
> > >  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
> > >  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> > > +	struct xfs_buf		*agibp;
> > >  	int			num_inodes = __XFS_SORT_INODES;
> > >  	bool			new_parent = (src_dp != target_dp);
> > >  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> > > @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
> > >  		 * In case there is already an entry with the same
> > >  		 * name at the destination directory, remove it first.
> > >  		 */
> > > +
> > > +		/*
> > > +		 * Check whether the replace operation need more blocks.
> > > +		 * If so, acquire the agi lock firstly to preserve locking
> > 
> >                                                "first"
> > 
> > > +		 * order(AGI/AGF).
> > 
> > Nit: space between "order" and "(AGI/AGF)".
> > > +		 */
> > > +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
> > > +			error = xfs_read_agi(mp, tp,
> > > +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
> > 
> > Overly long line here.
> > 
> > --D
> > 
> > > +			if (error)
> > > +				goto out_trans_cancel;
> > > +		}
> > > +
> > >  		error = xfs_dir_replace(tp, target_dp, target_name,
> > >  					src_ip->i_ino, spaceres);
> > >  		if (error)
> > > -- 
> > > 1.8.3.1
> > > 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-06 15:46     ` Darrick J. Wong
@ 2019-11-07  3:46       ` Dave Chinner
  2019-11-08 11:48         ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2019-11-07  3:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, kaixuxia, linux-xfs, newtongao, jasperwang

On Wed, Nov 06, 2019 at 07:46:12AM -0800, Darrick J. Wong wrote:
> On Wed, Nov 06, 2019 at 07:49:32AM -0500, Brian Foster wrote:
> > > >  /*
> > > > + * Check whether the replace operation need more blocks.
> > > > + */
> > > > +bool
> > > > +xfs_dir2_sf_replace_needblock(
> > > 
> > > Urgggh.  This is a predicate that we only ever call from xfs_rename(),
> > > right?  And it addresses a particular quirk of the locking when the
> > > caller wants us to rename on top of an existing entry and drop the link
> > > count of the old inode, right?  So why can't this just be a predicate in
> > > xfs_inode.c ?  Nobody else needs to know this particular piece of
> > > information, AFAICT.
> > > 
> > > (Apologies, for Brian and I clearly aren't on the same page about
> > > that...)
> > > 
> > 
> > Hmm.. the crux of my feedback on the previous version was simply that if
> > we wanted to take this approach of pulling up lower level dir logic into
> > the higher level rename code, to simply factor out the existing checks
> > down in the dir replace code that currently trigger a format conversion,
> > and use that new helper in both places. That doesn't appear to be what
> > this patch does, and I'm not sure why there are now two new helpers that
> > each only have one caller instead of one new helper with two callers...
> 
> Aha, got it.  I'd wondered if that had been your intent. :)

So as a structural question: should this be folded into
xfs_dir_canenter(), which is the function used to check if the
directory modification can go ahead without allocating blocks....

This seems very much like it is a "do we need to allocate blocks
during the directory modification?" sort of question being asked
here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-06  4:56 ` Darrick J. Wong
  2019-11-06 12:49   ` Brian Foster
@ 2019-11-07  5:15   ` kaixuxia
  1 sibling, 0 replies; 7+ messages in thread
From: kaixuxia @ 2019-11-07  5:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster, newtongao, jasperwang

On 2019/11/6 12:56, Darrick J. Wong wrote:
> On Tue, Nov 05, 2019 at 05:52:12PM +0800, kaixuxia wrote:
>> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
>> need to hold the AGF lock to allocate more blocks, and then invoking
>> the xfs_droplink() call to hold AGI lock to drop target_ip onto the
>> unlinked list, so we get the lock order AGF->AGI. This would break the
>> ordering constraint on AGI and AGF locking - inode allocation locks
>> the AGI, then can allocate a new extent for new inodes, locking the
>> AGF after the AGI.
>>
>> In this patch we check whether the replace operation need more
>> blocks firstly. If so, acquire the agi lock firstly to preserve
>> locking order(AGI/AGF). Actually, the locking order problem only
>> occurs when we are locking the AGI/AGF of the same AG. For multiple
>> AGs the AGI lock will be released after the transaction committed.
>>
>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
>> ---
>> Changes in v2:
>>  - Add xfs_dir2_sf_replace_needblock() helper in
>>    xfs_dir2_sf.c.
>>
>>  fs/xfs/libxfs/xfs_dir2.c      | 23 +++++++++++++++++++++++
>>  fs/xfs/libxfs/xfs_dir2.h      |  2 ++
>>  fs/xfs/libxfs/xfs_dir2_priv.h |  2 ++
>>  fs/xfs/libxfs/xfs_dir2_sf.c   | 24 ++++++++++++++++++++++++
>>  fs/xfs/xfs_inode.c            | 14 ++++++++++++++
>>  5 files changed, 65 insertions(+)
>>
>> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
>> index 867c5de..1917990 100644
>> --- a/fs/xfs/libxfs/xfs_dir2.c
>> +++ b/fs/xfs/libxfs/xfs_dir2.c
>> @@ -463,6 +463,29 @@
>>  }
>>  
>>  /*
>> + * Check whether the replace operation need more blocks. Ignore
>> + * the parameters check since the real replace() call below will
>> + * do that.
>> + */
>> +bool
>> +xfs_dir_replace_needblock(
> 
> xfs_dir2, to be consistent.
> 
>> +	struct xfs_inode	*dp,
>> +	xfs_ino_t		inum)
> 
> If you passed the inode pointer (instead of ip->i_ino) here then you
> don't need to revalidate the inode number.
> 
>> +{
>> +	int			rval;
>> +
>> +	rval = xfs_dir_ino_validate(dp->i_mount, inum);
>> +	if (rval)
>> +		return false;
>> +
>> +	/*
>> +	 * Only convert the shortform directory to block form maybe
>> +	 * need more blocks.
>> +	 */
>> +	return xfs_dir2_sf_replace_needblock(dp, inum);
> 
> 	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> 		return xfs_dir2_sf_replace_needblock(...);
> 
> Also, do other directories formats need extra blocks allocated?

Yeah, I think so. Other dirs formats only need to change the
inode number to the new value and extra blocks are not necessary
for them.
> 
>> +}
>> +
>> +/*
>>   * Replace the inode number of a directory entry.
>>   */
>>  int
>> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
>> index f542447..e436c14 100644
>> --- a/fs/xfs/libxfs/xfs_dir2.h
>> +++ b/fs/xfs/libxfs/xfs_dir2.h
>> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
>>  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
>>  				struct xfs_name *name, xfs_ino_t ino,
>>  				xfs_extlen_t tot);
>> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
>> +				xfs_ino_t inum);
>>  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
>>  				struct xfs_name *name, xfs_ino_t inum,
>>  				xfs_extlen_t tot);
>> diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
>> index 59f9fb2..002103f 100644
>> --- a/fs/xfs/libxfs/xfs_dir2_priv.h
>> +++ b/fs/xfs/libxfs/xfs_dir2_priv.h
>> @@ -116,6 +116,8 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
>>  extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
>>  extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
>>  extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
>> +extern bool xfs_dir2_sf_replace_needblock(struct xfs_inode *dp,
>> +		xfs_ino_t inum);
>>  extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
>>  extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip);
>>  
>> diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
>> index 85f14fc..0906f91 100644
>> --- a/fs/xfs/libxfs/xfs_dir2_sf.c
>> +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
>> @@ -945,6 +945,30 @@ static int xfs_dir2_sf_addname_pick(xfs_da_args_t *args, int objchange,
>>  }
>>  
>>  /*
>> + * Check whether the replace operation need more blocks.
>> + */
>> +bool
>> +xfs_dir2_sf_replace_needblock(
> 
> Urgggh.  This is a predicate that we only ever call from xfs_rename(),
> right?  And it addresses a particular quirk of the locking when the
> caller wants us to rename on top of an existing entry and drop the link
> count of the old inode, right?  So why can't this just be a predicate in
> xfs_inode.c ?  Nobody else needs to know this particular piece of
> information, AFAICT.
> > (Apologies, for Brian and I clearly aren't on the same page about
> that...)
Hmm... sorry, I had misunderstood Brian's mean. Right, maybe we only
need the xfs_dir2_sf_replace_needblock() call, and then involve it
in both places.

Thanks for your comments, will address them soon and send v3.

Kaixu
> 
>> +	struct xfs_inode	*dp,
>> +	xfs_ino_t		inum)
>> +{
>> +	int			newsize;
>> +	xfs_dir2_sf_hdr_t	*sfp;
>> +
>> +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
>> +		return false;
> 
> This check should be used up in xfs_dir2_replace_needblock() to decide
> if we're calling xfs_dir2_sf_replace_needblock(), or just returning
> false.
> 
>> +
>> +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
>> +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
>> +
>> +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
>> +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
>> +		return true;
>> +	else
>> +		return false;
> 
> return inum > XFS_DIR2_MAX_SHORT_INUM && (all the rest of that);
> 
>> +}
>> +
>> +/*
>>   * Replace the inode number of an entry in a shortform directory.
>>   */
>>  int						/* error */
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 18f4b26..c239070 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
>>  	struct xfs_trans	*tp;
>>  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
>>  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
>> +	struct xfs_buf		*agibp;
>>  	int			num_inodes = __XFS_SORT_INODES;
>>  	bool			new_parent = (src_dp != target_dp);
>>  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
>> @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
>>  		 * In case there is already an entry with the same
>>  		 * name at the destination directory, remove it first.
>>  		 */
>> +
>> +		/*
>> +		 * Check whether the replace operation need more blocks.
>> +		 * If so, acquire the agi lock firstly to preserve locking
> 
>                                                "first"
> 
>> +		 * order(AGI/AGF).
> 
> Nit: space between "order" and "(AGI/AGF)".
>> +		 */
>> +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
>> +			error = xfs_read_agi(mp, tp,
>> +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
> 
> Overly long line here.
> 
> --D
> 
>> +			if (error)
>> +				goto out_trans_cancel;
>> +		}
>> +
>>  		error = xfs_dir_replace(tp, target_dp, target_name,
>>  					src_ip->i_ino, spaceres);
>>  		if (error)
>> -- 
>> 1.8.3.1
>>

-- 
kaixuxia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-07  3:46       ` Dave Chinner
@ 2019-11-08 11:48         ` Brian Foster
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Foster @ 2019-11-08 11:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, kaixuxia, linux-xfs, newtongao, jasperwang

On Thu, Nov 07, 2019 at 02:46:21PM +1100, Dave Chinner wrote:
> On Wed, Nov 06, 2019 at 07:46:12AM -0800, Darrick J. Wong wrote:
> > On Wed, Nov 06, 2019 at 07:49:32AM -0500, Brian Foster wrote:
> > > > >  /*
> > > > > + * Check whether the replace operation need more blocks.
> > > > > + */
> > > > > +bool
> > > > > +xfs_dir2_sf_replace_needblock(
> > > > 
> > > > Urgggh.  This is a predicate that we only ever call from xfs_rename(),
> > > > right?  And it addresses a particular quirk of the locking when the
> > > > caller wants us to rename on top of an existing entry and drop the link
> > > > count of the old inode, right?  So why can't this just be a predicate in
> > > > xfs_inode.c ?  Nobody else needs to know this particular piece of
> > > > information, AFAICT.
> > > > 
> > > > (Apologies, for Brian and I clearly aren't on the same page about
> > > > that...)
> > > > 
> > > 
> > > Hmm.. the crux of my feedback on the previous version was simply that if
> > > we wanted to take this approach of pulling up lower level dir logic into
> > > the higher level rename code, to simply factor out the existing checks
> > > down in the dir replace code that currently trigger a format conversion,
> > > and use that new helper in both places. That doesn't appear to be what
> > > this patch does, and I'm not sure why there are now two new helpers that
> > > each only have one caller instead of one new helper with two callers...
> > 
> > Aha, got it.  I'd wondered if that had been your intent. :)
> 
> So as a structural question: should this be folded into
> xfs_dir_canenter(), which is the function used to check if the
> directory modification can go ahead without allocating blocks....
> 
> This seems very much like it is a "do we need to allocate blocks
> during the directory modification?" sort of question being asked
> here...
> 

I _think_ Kaixu brought this up briefly in looking at the previous
version of this patch. From a code standpoint, I agree that this path
seems like the most logical fit, but my understanding was that the
canenter thing is kind of an inconsistent and unreliable mechanism at
this point. IIRC, we've explicitly removed its use from the create path
to work around things like block reservation overruns leading to fs
shutdowns as opposed to digging into the mechanism and fixing whatever
accounting was broken. See commit f59cf5c299 ("xfs: remove
"no-allocation" reservations for file creations"), for example. I
believe the discussion around that patch basically concluded that the
complexity of maintaining/debugging the canenter path wasn't worth the
benefit of squeezing every last block out of the fs, but that was a
while ago now.

That aside, I don't have a strong opinion on the best way to fix this
particular deadlock problem. The only other thing (outside of
reliability) I might question with the canenter approach is whether it
has ever been used in situations outside of -ENOSPC, and if not, whether
there's any potential performance impact of invoking it more frequently.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-08 11:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-05  9:52 [PATCH v2] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename() kaixuxia
2019-11-06  4:56 ` Darrick J. Wong
2019-11-06 12:49   ` Brian Foster
2019-11-06 15:46     ` Darrick J. Wong
2019-11-07  3:46       ` Dave Chinner
2019-11-08 11:48         ` Brian Foster
2019-11-07  5:15   ` kaixuxia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.