linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
@ 2019-10-30  9:49 kaixuxia
  2019-10-31 12:27 ` Brian Foster
  0 siblings, 1 reply; 5+ messages in thread
From: kaixuxia @ 2019-10-30  9:49 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong, newtongao, jasperwang

When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
need to hold the AGF lock to allocate more blocks, and then invoking
the xfs_droplink() call to hold AGI lock to drop target_ip onto the
unlinked list, so we get the lock order AGF->AGI. This would break the
ordering constraint on AGI and AGF locking - inode allocation locks
the AGI, then can allocate a new extent for new inodes, locking the
AGF after the AGI.

In this patch we check whether the replace operation need more
blocks firstly. If so, acquire the agi lock firstly to preserve
locking order(AGI/AGF). Actually, the locking order problem only
occurs when we are locking the AGI/AGF of the same AG. For multiple
AGs the AGI lock will be released after the transaction committed.

Signed-off-by: kaixuxia <kaixuxia@tencent.com>
---
 fs/xfs/libxfs/xfs_dir2.c | 30 ++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_dir2.h |  2 ++
 fs/xfs/xfs_inode.c       | 14 ++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 867c5de..9d9ae16 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -463,6 +463,36 @@
 }
 
 /*
+ * Check whether the replace operation need more blocks. Ignore
+ * the parameters check since the real replace() call below will
+ * do that.
+ */
+bool
+xfs_dir_replace_needblock(
+	struct xfs_inode	*dp,
+	xfs_ino_t		inum)
+{
+	int			newsize;
+	xfs_dir2_sf_hdr_t	*sfp;
+
+	/*
+	 * Only convert the shortform directory to block form maybe need
+	 * more blocks.
+	 */
+	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+		return false;
+
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
+
+	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
+	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
+		return true;
+	else
+		return false;
+}
+
+/*
  * Replace the inode number of a directory entry.
  */
 int
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index f542447..e436c14 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
 				xfs_extlen_t tot);
+extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
+				xfs_ino_t inum);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t inum,
 				xfs_extlen_t tot);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 18f4b26..c239070 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3196,6 +3196,7 @@ struct xfs_iunlink {
 	struct xfs_trans	*tp;
 	struct xfs_inode	*wip = NULL;		/* whiteout inode */
 	struct xfs_inode	*inodes[__XFS_SORT_INODES];
+	struct xfs_buf		*agibp;
 	int			num_inodes = __XFS_SORT_INODES;
 	bool			new_parent = (src_dp != target_dp);
 	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
@@ -3361,6 +3362,19 @@ struct xfs_iunlink {
 		 * In case there is already an entry with the same
 		 * name at the destination directory, remove it first.
 		 */
+
+		/*
+		 * Check whether the replace operation need more blocks.
+		 * If so, acquire the agi lock firstly to preserve locking
+		 * order(AGI/AGF).
+		 */
+		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
+			error = xfs_read_agi(mp, tp,
+					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
+			if (error)
+				goto out_trans_cancel;
+		}
+
 		error = xfs_dir_replace(tp, target_dp, target_name,
 					src_ip->i_ino, spaceres);
 		if (error)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-10-30  9:49 [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename() kaixuxia
@ 2019-10-31 12:27 ` Brian Foster
  2019-11-01  7:04   ` kaixuxia
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Foster @ 2019-10-31 12:27 UTC (permalink / raw)
  To: kaixuxia; +Cc: linux-xfs, darrick.wong, newtongao, jasperwang

On Wed, Oct 30, 2019 at 05:49:34PM +0800, kaixuxia wrote:
> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
> need to hold the AGF lock to allocate more blocks, and then invoking
> the xfs_droplink() call to hold AGI lock to drop target_ip onto the
> unlinked list, so we get the lock order AGF->AGI. This would break the
> ordering constraint on AGI and AGF locking - inode allocation locks
> the AGI, then can allocate a new extent for new inodes, locking the
> AGF after the AGI.
> 
> In this patch we check whether the replace operation need more
> blocks firstly. If so, acquire the agi lock firstly to preserve
> locking order(AGI/AGF). Actually, the locking order problem only
> occurs when we are locking the AGI/AGF of the same AG. For multiple
> AGs the AGI lock will be released after the transaction committed.
> 
> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> ---
>  fs/xfs/libxfs/xfs_dir2.c | 30 ++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_dir2.h |  2 ++
>  fs/xfs/xfs_inode.c       | 14 ++++++++++++++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 867c5de..9d9ae16 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -463,6 +463,36 @@
>  }
>  
>  /*
> + * Check whether the replace operation need more blocks. Ignore
> + * the parameters check since the real replace() call below will
> + * do that.
> + */
> +bool
> +xfs_dir_replace_needblock(
> +	struct xfs_inode	*dp,
> +	xfs_ino_t		inum)
> +{
> +	int			newsize;
> +	xfs_dir2_sf_hdr_t	*sfp;
> +
> +	/*
> +	 * Only convert the shortform directory to block form maybe need
> +	 * more blocks.
> +	 */
> +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> +		return false;
> +
> +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
> +
> +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
> +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
> +		return true;
> +	else
> +		return false;
> +}
> +

It's slightly unfortunate we need to do these kind of double checks, but
it seems reasonable enough as an isolated fix. From a factoring
standpoint, it might be a little cleaner to move this down in
xfs_dir2_sf.c as an xfs_dir2_sf_replace_needblock() helper, actually use
it in the xfs_dir2_sf_replace() function where these checks are
currently open coded and then export it so we can call it in the higher
level function as well for the locking fix.

Brian

> +/*
>   * Replace the inode number of a directory entry.
>   */
>  int
> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> index f542447..e436c14 100644
> --- a/fs/xfs/libxfs/xfs_dir2.h
> +++ b/fs/xfs/libxfs/xfs_dir2.h
> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
>  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
>  				struct xfs_name *name, xfs_ino_t ino,
>  				xfs_extlen_t tot);
> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
> +				xfs_ino_t inum);
>  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
>  				struct xfs_name *name, xfs_ino_t inum,
>  				xfs_extlen_t tot);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 18f4b26..c239070 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
>  	struct xfs_trans	*tp;
>  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
>  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> +	struct xfs_buf		*agibp;
>  	int			num_inodes = __XFS_SORT_INODES;
>  	bool			new_parent = (src_dp != target_dp);
>  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
>  		 * In case there is already an entry with the same
>  		 * name at the destination directory, remove it first.
>  		 */
> +
> +		/*
> +		 * Check whether the replace operation need more blocks.
> +		 * If so, acquire the agi lock firstly to preserve locking
> +		 * order(AGI/AGF).
> +		 */
> +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
> +			error = xfs_read_agi(mp, tp,
> +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
> +			if (error)
> +				goto out_trans_cancel;
> +		}
> +
>  		error = xfs_dir_replace(tp, target_dp, target_name,
>  					src_ip->i_ino, spaceres);
>  		if (error)
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-10-31 12:27 ` Brian Foster
@ 2019-11-01  7:04   ` kaixuxia
  2019-11-01 10:30     ` Brian Foster
  0 siblings, 1 reply; 5+ messages in thread
From: kaixuxia @ 2019-11-01  7:04 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, darrick.wong, newtongao, jasperwang



On 2019/10/31 20:27, Brian Foster wrote:
> On Wed, Oct 30, 2019 at 05:49:34PM +0800, kaixuxia wrote:
>> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
>> need to hold the AGF lock to allocate more blocks, and then invoking
>> the xfs_droplink() call to hold AGI lock to drop target_ip onto the
>> unlinked list, so we get the lock order AGF->AGI. This would break the
>> ordering constraint on AGI and AGF locking - inode allocation locks
>> the AGI, then can allocate a new extent for new inodes, locking the
>> AGF after the AGI.
>>
>> In this patch we check whether the replace operation need more
>> blocks firstly. If so, acquire the agi lock firstly to preserve
>> locking order(AGI/AGF). Actually, the locking order problem only
>> occurs when we are locking the AGI/AGF of the same AG. For multiple
>> AGs the AGI lock will be released after the transaction committed.
>>
>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
>> ---
>>  fs/xfs/libxfs/xfs_dir2.c | 30 ++++++++++++++++++++++++++++++
>>  fs/xfs/libxfs/xfs_dir2.h |  2 ++
>>  fs/xfs/xfs_inode.c       | 14 ++++++++++++++
>>  3 files changed, 46 insertions(+)
>>
>> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
>> index 867c5de..9d9ae16 100644
>> --- a/fs/xfs/libxfs/xfs_dir2.c
>> +++ b/fs/xfs/libxfs/xfs_dir2.c
>> @@ -463,6 +463,36 @@
>>  }
>>  
>>  /*
>> + * Check whether the replace operation need more blocks. Ignore
>> + * the parameters check since the real replace() call below will
>> + * do that.
>> + */
>> +bool
>> +xfs_dir_replace_needblock(
>> +	struct xfs_inode	*dp,
>> +	xfs_ino_t		inum)
>> +{
>> +	int			newsize;
>> +	xfs_dir2_sf_hdr_t	*sfp;
>> +
>> +	/*
>> +	 * Only convert the shortform directory to block form maybe need
>> +	 * more blocks.
>> +	 */
>> +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
>> +		return false;
>> +
>> +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
>> +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
>> +
>> +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
>> +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
>> +		return true;
>> +	else
>> +		return false;
>> +}
>> +
> 
> It's slightly unfortunate we need to do these kind of double checks, but
> it seems reasonable enough as an isolated fix. From a factoring
> standpoint, it might be a little cleaner to move this down in
> xfs_dir2_sf.c as an xfs_dir2_sf_replace_needblock() helper, actually use
> it in the xfs_dir2_sf_replace() function where these checks are
> currently open coded and then export it so we can call it in the higher
> level function as well for the locking fix.
> 
Yeah, makes more sense. Also maybe we could add a function helper like
the xfs_dir_canenter() call, it just check whether the replace operation
need more blocks,

 int xfs_dir_replace_needblock(...)
 {
 	xfs_dir_replace(tp, dp, name, 0, 0);
 }

I'm not sure if this approach is reasonable...

Actually, there are some different solutions for the locking fix. One solution
is checking whether the replace operation need more blocks and acquiring AGI
lock before AGF lock. Another one is moving xfs_droplink() call to before the
xfs_dir_replace() call, but this solution may not be suitable. The third one
is expanding the directory in one transaction, but I'm not sure about this
solution and have no idea how to do it... 
Comments about these solutions, which one is more reasonable?

kaixu    

> Brian
> 
>> +/*
>>   * Replace the inode number of a directory entry.
>>   */
>>  int
>> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
>> index f542447..e436c14 100644
>> --- a/fs/xfs/libxfs/xfs_dir2.h
>> +++ b/fs/xfs/libxfs/xfs_dir2.h
>> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
>>  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
>>  				struct xfs_name *name, xfs_ino_t ino,
>>  				xfs_extlen_t tot);
>> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
>> +				xfs_ino_t inum);
>>  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
>>  				struct xfs_name *name, xfs_ino_t inum,
>>  				xfs_extlen_t tot);
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 18f4b26..c239070 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
>>  	struct xfs_trans	*tp;
>>  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
>>  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
>> +	struct xfs_buf		*agibp;
>>  	int			num_inodes = __XFS_SORT_INODES;
>>  	bool			new_parent = (src_dp != target_dp);
>>  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
>> @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
>>  		 * In case there is already an entry with the same
>>  		 * name at the destination directory, remove it first.
>>  		 */
>> +
>> +		/*
>> +		 * Check whether the replace operation need more blocks.
>> +		 * If so, acquire the agi lock firstly to preserve locking
>> +		 * order(AGI/AGF).
>> +		 */
>> +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
>> +			error = xfs_read_agi(mp, tp,
>> +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
>> +			if (error)
>> +				goto out_trans_cancel;
>> +		}
>> +
>>  		error = xfs_dir_replace(tp, target_dp, target_name,
>>  					src_ip->i_ino, spaceres);
>>  		if (error)
>> -- 
>> 1.8.3.1
>>
> 

-- 
kaixuxia

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-01  7:04   ` kaixuxia
@ 2019-11-01 10:30     ` Brian Foster
  2019-11-04  8:56       ` kaixuxia
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Foster @ 2019-11-01 10:30 UTC (permalink / raw)
  To: kaixuxia; +Cc: linux-xfs, darrick.wong, newtongao, jasperwang

On Fri, Nov 01, 2019 at 03:04:11PM +0800, kaixuxia wrote:
> 
> 
> On 2019/10/31 20:27, Brian Foster wrote:
> > On Wed, Oct 30, 2019 at 05:49:34PM +0800, kaixuxia wrote:
> >> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
> >> need to hold the AGF lock to allocate more blocks, and then invoking
> >> the xfs_droplink() call to hold AGI lock to drop target_ip onto the
> >> unlinked list, so we get the lock order AGF->AGI. This would break the
> >> ordering constraint on AGI and AGF locking - inode allocation locks
> >> the AGI, then can allocate a new extent for new inodes, locking the
> >> AGF after the AGI.
> >>
> >> In this patch we check whether the replace operation need more
> >> blocks firstly. If so, acquire the agi lock firstly to preserve
> >> locking order(AGI/AGF). Actually, the locking order problem only
> >> occurs when we are locking the AGI/AGF of the same AG. For multiple
> >> AGs the AGI lock will be released after the transaction committed.
> >>
> >> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> >> ---
> >>  fs/xfs/libxfs/xfs_dir2.c | 30 ++++++++++++++++++++++++++++++
> >>  fs/xfs/libxfs/xfs_dir2.h |  2 ++
> >>  fs/xfs/xfs_inode.c       | 14 ++++++++++++++
> >>  3 files changed, 46 insertions(+)
> >>
> >> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> >> index 867c5de..9d9ae16 100644
> >> --- a/fs/xfs/libxfs/xfs_dir2.c
> >> +++ b/fs/xfs/libxfs/xfs_dir2.c
> >> @@ -463,6 +463,36 @@
> >>  }
> >>  
> >>  /*
> >> + * Check whether the replace operation need more blocks. Ignore
> >> + * the parameters check since the real replace() call below will
> >> + * do that.
> >> + */
> >> +bool
> >> +xfs_dir_replace_needblock(
> >> +	struct xfs_inode	*dp,
> >> +	xfs_ino_t		inum)
> >> +{
> >> +	int			newsize;
> >> +	xfs_dir2_sf_hdr_t	*sfp;
> >> +
> >> +	/*
> >> +	 * Only convert the shortform directory to block form maybe need
> >> +	 * more blocks.
> >> +	 */
> >> +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> >> +		return false;
> >> +
> >> +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> >> +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
> >> +
> >> +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
> >> +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
> >> +		return true;
> >> +	else
> >> +		return false;
> >> +}
> >> +
> > 
> > It's slightly unfortunate we need to do these kind of double checks, but
> > it seems reasonable enough as an isolated fix. From a factoring
> > standpoint, it might be a little cleaner to move this down in
> > xfs_dir2_sf.c as an xfs_dir2_sf_replace_needblock() helper, actually use
> > it in the xfs_dir2_sf_replace() function where these checks are
> > currently open coded and then export it so we can call it in the higher
> > level function as well for the locking fix.
> > 
> Yeah, makes more sense. Also maybe we could add a function helper like
> the xfs_dir_canenter() call, it just check whether the replace operation
> need more blocks,
> 
>  int xfs_dir_replace_needblock(...)
>  {
>  	xfs_dir_replace(tp, dp, name, 0, 0);
>  }
> 
> I'm not sure if this approach is reasonable...
> 

I thought we were attempting to get rid of those calls, but I could be
mistaken.

> Actually, there are some different solutions for the locking fix. One solution
> is checking whether the replace operation need more blocks and acquiring AGI
> lock before AGF lock. Another one is moving xfs_droplink() call to before the
> xfs_dir_replace() call, but this solution may not be suitable. The third one
> is expanding the directory in one transaction, but I'm not sure about this
> solution and have no idea how to do it... 
> Comments about these solutions, which one is more reasonable?
> 

I'm not sure we want to split things up into multiple transactions (if
that's what you mean by the third option) because then we could be at
risk of creating an inconsistent state in the event of a crash.
Reordering the calls is cleaner in some respect because it doesn't
require any new code, but it would probably require a closer look to
make sure we don't create a problematic state for the current code in
any way (i.e. processing a directory entry of an already unlinked
inode). This patch requires some extra code, but it's the most simple
solution from a logical standpoint. If you want my .02, I think either
of the first two options are reasonable (provided they are correct).
Perhaps others have stronger opinions or other ideas...

Brian

> kaixu    
> 
> > Brian
> > 
> >> +/*
> >>   * Replace the inode number of a directory entry.
> >>   */
> >>  int
> >> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> >> index f542447..e436c14 100644
> >> --- a/fs/xfs/libxfs/xfs_dir2.h
> >> +++ b/fs/xfs/libxfs/xfs_dir2.h
> >> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
> >>  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
> >>  				struct xfs_name *name, xfs_ino_t ino,
> >>  				xfs_extlen_t tot);
> >> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
> >> +				xfs_ino_t inum);
> >>  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
> >>  				struct xfs_name *name, xfs_ino_t inum,
> >>  				xfs_extlen_t tot);
> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> index 18f4b26..c239070 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
> >>  	struct xfs_trans	*tp;
> >>  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
> >>  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> >> +	struct xfs_buf		*agibp;
> >>  	int			num_inodes = __XFS_SORT_INODES;
> >>  	bool			new_parent = (src_dp != target_dp);
> >>  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> >> @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
> >>  		 * In case there is already an entry with the same
> >>  		 * name at the destination directory, remove it first.
> >>  		 */
> >> +
> >> +		/*
> >> +		 * Check whether the replace operation need more blocks.
> >> +		 * If so, acquire the agi lock firstly to preserve locking
> >> +		 * order(AGI/AGF).
> >> +		 */
> >> +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
> >> +			error = xfs_read_agi(mp, tp,
> >> +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
> >> +			if (error)
> >> +				goto out_trans_cancel;
> >> +		}
> >> +
> >>  		error = xfs_dir_replace(tp, target_dp, target_name,
> >>  					src_ip->i_ino, spaceres);
> >>  		if (error)
> >> -- 
> >> 1.8.3.1
> >>
> > 
> 
> -- 
> kaixuxia


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
  2019-11-01 10:30     ` Brian Foster
@ 2019-11-04  8:56       ` kaixuxia
  0 siblings, 0 replies; 5+ messages in thread
From: kaixuxia @ 2019-11-04  8:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, darrick.wong, newtongao, jasperwang



On 2019/11/1 18:30, Brian Foster wrote:
> On Fri, Nov 01, 2019 at 03:04:11PM +0800, kaixuxia wrote:
>>
>>
>> On 2019/10/31 20:27, Brian Foster wrote:
>>> On Wed, Oct 30, 2019 at 05:49:34PM +0800, kaixuxia wrote:
>>>> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
>>>> need to hold the AGF lock to allocate more blocks, and then invoking
>>>> the xfs_droplink() call to hold AGI lock to drop target_ip onto the
>>>> unlinked list, so we get the lock order AGF->AGI. This would break the
>>>> ordering constraint on AGI and AGF locking - inode allocation locks
>>>> the AGI, then can allocate a new extent for new inodes, locking the
>>>> AGF after the AGI.
>>>>
>>>> In this patch we check whether the replace operation need more
>>>> blocks firstly. If so, acquire the agi lock firstly to preserve
>>>> locking order(AGI/AGF). Actually, the locking order problem only
>>>> occurs when we are locking the AGI/AGF of the same AG. For multiple
>>>> AGs the AGI lock will be released after the transaction committed.
>>>>
>>>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
>>>> ---
>>>>  fs/xfs/libxfs/xfs_dir2.c | 30 ++++++++++++++++++++++++++++++
>>>>  fs/xfs/libxfs/xfs_dir2.h |  2 ++
>>>>  fs/xfs/xfs_inode.c       | 14 ++++++++++++++
>>>>  3 files changed, 46 insertions(+)
>>>>
>>>> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
>>>> index 867c5de..9d9ae16 100644
>>>> --- a/fs/xfs/libxfs/xfs_dir2.c
>>>> +++ b/fs/xfs/libxfs/xfs_dir2.c
>>>> @@ -463,6 +463,36 @@
>>>>  }
>>>>  
>>>>  /*
>>>> + * Check whether the replace operation need more blocks. Ignore
>>>> + * the parameters check since the real replace() call below will
>>>> + * do that.
>>>> + */
>>>> +bool
>>>> +xfs_dir_replace_needblock(
>>>> +	struct xfs_inode	*dp,
>>>> +	xfs_ino_t		inum)
>>>> +{
>>>> +	int			newsize;
>>>> +	xfs_dir2_sf_hdr_t	*sfp;
>>>> +
>>>> +	/*
>>>> +	 * Only convert the shortform directory to block form maybe need
>>>> +	 * more blocks.
>>>> +	 */
>>>> +	if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL)
>>>> +		return false;
>>>> +
>>>> +	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
>>>> +	newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
>>>> +
>>>> +	if (inum > XFS_DIR2_MAX_SHORT_INUM &&
>>>> +	    sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp))
>>>> +		return true;
>>>> +	else
>>>> +		return false;
>>>> +}
>>>> +
>>>
>>> It's slightly unfortunate we need to do these kind of double checks, but
>>> it seems reasonable enough as an isolated fix. From a factoring
>>> standpoint, it might be a little cleaner to move this down in
>>> xfs_dir2_sf.c as an xfs_dir2_sf_replace_needblock() helper, actually use
>>> it in the xfs_dir2_sf_replace() function where these checks are
>>> currently open coded and then export it so we can call it in the higher
>>> level function as well for the locking fix.
>>>
>> Yeah, makes more sense. Also maybe we could add a function helper like
>> the xfs_dir_canenter() call, it just check whether the replace operation
>> need more blocks,
>>
>>  int xfs_dir_replace_needblock(...)
>>  {
>>  	xfs_dir_replace(tp, dp, name, 0, 0);
>>  }
>>
>> I'm not sure if this approach is reasonable...
>>
> 
> I thought we were attempting to get rid of those calls, but I could be
> mistaken.
> 
Yeah, just like what you said, we can call it in the higher level
function.

>> Actually, there are some different solutions for the locking fix. One solution
>> is checking whether the replace operation need more blocks and acquiring AGI
>> lock before AGF lock. Another one is moving xfs_droplink() call to before the
>> xfs_dir_replace() call, but this solution may not be suitable. The third one
>> is expanding the directory in one transaction, but I'm not sure about this
>> solution and have no idea how to do it... 
>> Comments about these solutions, which one is more reasonable?
>>
> 
> I'm not sure we want to split things up into multiple transactions (if
> that's what you mean by the third option) because then we could be at
> risk of creating an inconsistent state in the event of a crash.
> Reordering the calls is cleaner in some respect because it doesn't
> require any new code, but it would probably require a closer look to
> make sure we don't create a problematic state for the current code in
> any way (i.e. processing a directory entry of an already unlinked
> inode). This patch requires some extra code, but it's the most simple
> solution from a logical standpoint. If you want my .02, I think either
> of the first two options are reasonable (provided they are correct).
> Perhaps others have stronger opinions or other ideas...
> 
Thanks for your comments. I will continue to use the approach in this
patch, and sent the V2 patch to address the comments.

Kaixu

> Brian
> 
>> kaixu    
>>
>>> Brian
>>>
>>>> +/*
>>>>   * Replace the inode number of a directory entry.
>>>>   */
>>>>  int
>>>> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
>>>> index f542447..e436c14 100644
>>>> --- a/fs/xfs/libxfs/xfs_dir2.h
>>>> +++ b/fs/xfs/libxfs/xfs_dir2.h
>>>> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
>>>>  extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
>>>>  				struct xfs_name *name, xfs_ino_t ino,
>>>>  				xfs_extlen_t tot);
>>>> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp,
>>>> +				xfs_ino_t inum);
>>>>  extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
>>>>  				struct xfs_name *name, xfs_ino_t inum,
>>>>  				xfs_extlen_t tot);
>>>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>>>> index 18f4b26..c239070 100644
>>>> --- a/fs/xfs/xfs_inode.c
>>>> +++ b/fs/xfs/xfs_inode.c
>>>> @@ -3196,6 +3196,7 @@ struct xfs_iunlink {
>>>>  	struct xfs_trans	*tp;
>>>>  	struct xfs_inode	*wip = NULL;		/* whiteout inode */
>>>>  	struct xfs_inode	*inodes[__XFS_SORT_INODES];
>>>> +	struct xfs_buf		*agibp;
>>>>  	int			num_inodes = __XFS_SORT_INODES;
>>>>  	bool			new_parent = (src_dp != target_dp);
>>>>  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
>>>> @@ -3361,6 +3362,19 @@ struct xfs_iunlink {
>>>>  		 * In case there is already an entry with the same
>>>>  		 * name at the destination directory, remove it first.
>>>>  		 */
>>>> +
>>>> +		/*
>>>> +		 * Check whether the replace operation need more blocks.
>>>> +		 * If so, acquire the agi lock firstly to preserve locking
>>>> +		 * order(AGI/AGF).
>>>> +		 */
>>>> +		if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) {
>>>> +			error = xfs_read_agi(mp, tp,
>>>> +					XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp);
>>>> +			if (error)
>>>> +				goto out_trans_cancel;
>>>> +		}
>>>> +
>>>>  		error = xfs_dir_replace(tp, target_dp, target_name,
>>>>  					src_ip->i_ino, spaceres);
>>>>  		if (error)
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>
>>
>> -- 
>> kaixuxia
> 

-- 
kaixuxia

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-04  8:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30  9:49 [PATCH RFC] xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename() kaixuxia
2019-10-31 12:27 ` Brian Foster
2019-11-01  7:04   ` kaixuxia
2019-11-01 10:30     ` Brian Foster
2019-11-04  8:56       ` kaixuxia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).