All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: introduce protection for drop nlink
@ 2023-08-24  7:43 cheng.lin130
  2023-08-24 16:12 ` Darrick J. Wong
  2023-08-24 23:02 ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: cheng.lin130 @ 2023-08-24  7:43 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

From: Cheng Lin <cheng.lin130@zte.com.cn>
An dir nlinks overflow which down form 0 to 0xffffffff, cause the
directory to become unusable until the next xfs_repair run.

Introduce protection for drop nlink to reduce the impact of this.
And produce a warning for directory nlink error during remove.

Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
---
 fs/xfs/xfs_inode.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9e62cc5..536dbe4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
 	xfs_trans_t *tp,
 	xfs_inode_t *ip)
 {
+	xfs_mount_t     *mp;
+
+	if (VFS_I(ip)->i_nlink == 0) {
+		mp = ip->i_mount;
+		xfs_warn(mp, "%s: Deleting inode %llu with no links.",
+			 __func__, ip->i_ino);
+		return 0;
+	}
+
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);

 	drop_nlink(VFS_I(ip));
@@ -2442,7 +2451,12 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
 	 */
 	if (is_dir) {
 		ASSERT(VFS_I(ip)->i_nlink >= 2);
-		if (VFS_I(ip)->i_nlink != 2) {
+		if (VFS_I(ip)->i_nlink < 2) {
+			xfs_warn(ip->i_mount,
+			"%s: Remove dir (inode %llu) with invalid links.",
+				 __func__, ip->i_ino);
+		}
+		if (VFS_I(ip)->i_nlink > 2) {
 			error = -ENOTEMPTY;
 			goto out_trans_cancel;
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-24  7:43 [PATCH] xfs: introduce protection for drop nlink cheng.lin130
@ 2023-08-24 16:12 ` Darrick J. Wong
  2023-08-25  8:32   ` cheng.lin130
  2023-08-24 23:02 ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2023-08-24 16:12 UTC (permalink / raw)
  To: cheng.lin130
  Cc: linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> From: Cheng Lin <cheng.lin130@zte.com.cn>
> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> directory to become unusable until the next xfs_repair run.
> 
> Introduce protection for drop nlink to reduce the impact of this.
> And produce a warning for directory nlink error during remove.
> 
> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> ---
>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 9e62cc5..536dbe4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
>  	xfs_trans_t *tp,
>  	xfs_inode_t *ip)
>  {
> +	xfs_mount_t     *mp;
> +
> +	if (VFS_I(ip)->i_nlink == 0) {
> +		mp = ip->i_mount;
> +		xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> +			 __func__, ip->i_ino);
> +		return 0;
> +	}
> +
>  	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> 
>  	drop_nlink(VFS_I(ip));

I'm not sure how nlink would ever get to 0xFFFFFFFF since the VFS won't
let a link count exceed s_max_links, and XFS sets that to 0x7FFFFFFF.
Unless, of course, you did that outside of Linux.

That said, why wouldn't you /pin/ the link count at -1U instead of
allowing it to overflow to zero?

Could you please take a look at this patch that's waiting in my
submission queue?

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=inode-repair-improvements&id=05f5a82efa6395c92038e18e008aaf7154238f27

--D

> @@ -2442,7 +2451,12 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
>  	 */
>  	if (is_dir) {
>  		ASSERT(VFS_I(ip)->i_nlink >= 2);
> -		if (VFS_I(ip)->i_nlink != 2) {
> +		if (VFS_I(ip)->i_nlink < 2) {
> +			xfs_warn(ip->i_mount,
> +			"%s: Remove dir (inode %llu) with invalid links.",
> +				 __func__, ip->i_ino);
> +		}
> +		if (VFS_I(ip)->i_nlink > 2) {
>  			error = -ENOTEMPTY;
>  			goto out_trans_cancel;
>  		}
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-24  7:43 [PATCH] xfs: introduce protection for drop nlink cheng.lin130
  2023-08-24 16:12 ` Darrick J. Wong
@ 2023-08-24 23:02 ` Dave Chinner
  2023-08-25  9:09   ` cheng.lin130
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2023-08-24 23:02 UTC (permalink / raw)
  To: cheng.lin130
  Cc: djwong, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> From: Cheng Lin <cheng.lin130@zte.com.cn>
> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> directory to become unusable until the next xfs_repair run.

Hmmm.  How does this ever happen?

IMO, if it does happen, we need to fix whatever bug that causes it
to happen, not issue a warning and do nothing about the fact we
just hit a corrupt inode state...

> Introduce protection for drop nlink to reduce the impact of this.
> And produce a warning for directory nlink error during remove.
> 
> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> ---
>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 9e62cc5..536dbe4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
>  	xfs_trans_t *tp,
>  	xfs_inode_t *ip)
>  {
> +	xfs_mount_t     *mp;
> +
> +	if (VFS_I(ip)->i_nlink == 0) {
> +		mp = ip->i_mount;
> +		xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> +			 __func__, ip->i_ino);
> +		return 0;
> +	}

This is obviously incorrect - whiteout inodes (RENAME_WHITEOUT) have an
i_nlink of zero when they are removed from the unlinked list. As do
O_TMPFILE inodes - when they are linked into the filesystem, we
explicitly check for i_nlink being zero before calling
xfs_iunlink_remove().

> +
>  	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> 
>  	drop_nlink(VFS_I(ip));

Wait a second - this code doesn't match an upstream kernel. What
kernel did you make this patch against?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-24 16:12 ` Darrick J. Wong
@ 2023-08-25  8:32   ` cheng.lin130
  2023-08-25 18:02     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: cheng.lin130 @ 2023-08-25  8:32 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

> On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
>> From: Cheng Lin <cheng.lin130@zte.com.cn>
>> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
>> directory to become unusable until the next xfs_repair run.
>>
>> Introduce protection for drop nlink to reduce the impact of this.
>> And produce a warning for directory nlink error during remove.
>>
>> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
>> ---
>>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 9e62cc5..536dbe4 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
>>      xfs_trans_t *tp,
>>      xfs_inode_t *ip)
>>  {
>> +    xfs_mount_t     *mp;
>> +
>> +    if (VFS_I(ip)->i_nlink == 0) {
>> +        mp = ip->i_mount;
>> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
>> +             __func__, ip->i_ino);
>> +        return 0;
>> +    }
>> +
>>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
>>
>>      drop_nlink(VFS_I(ip));
> I'm not sure how nlink would ever get to 0xFFFFFFFF since the VFS won't
> let a link count exceed s_max_links, and XFS sets that to 0x7FFFFFFF.
> Unless, of course, you did that outside of Linux.
In VFS drop_nlink() only produce a warning, when (inode->i_nlink == 0),
not prevent its self-reduce(inode->__i_nlink--), cause it underflow
from 0 to 0xffffffff. In the old kernel version, this situation was
encountered, but I don't know how it happened. It was already a scene
with directory errors: "Too many links".

 kernel: WARNING: CPU: 12 PID: 12928 at fs/inode.c:286 drop_nlink+0x3e/0x50
 kernel: CPU: 12 PID: 12928 Comm: gbased Tainted: G        W  OE  ------------ T 3.10.0-693.21.1.el7.x86_64 #1
 kernel: Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 01/23/2021
 kernel: Call Trace:-------------------
 kernel: [<ffffffff816c5fce>] dump_stack+0x19/0x1b
 kernel: [<ffffffff8108dfa8>] __warn+0xd8/0x100/*
 kernel: [<ffffffff8108e0ed>] warn_slowpath_null+0x1d/0x20
 kernel: [<ffffffff8122cdfe>] drop_nlink+0x3e/0x50
 kernel: [<ffffffffc03cdc78>] xfs_droplink+0x28/0x60 [xfs]
 kernel: [<ffffffffc03cf87a>] xfs_remove+0x2aa/0x320 [xfs]
 kernel: [<ffffffffc03c9f7a>] xfs_vn_unlink+0x5a/0xa0 [xfs]
 kernel: [<ffffffff8121f19c>] vfs_rmdir+0xdc/0x150
 kernel: [<ffffffff81221e41>] do_rmdir+0x1f1/0x220
 kernel: [<ffffffff81223046>] SyS_rmdir+0x16/0x20
 kernel: [<ffffffff816d86d5>] system_call_fastpath+0x1c/0x21
> That said, why wouldn't you /pin/ the link count at -1U instead of
> allowing it to overflow to zero?
> Could you please take a look at this patch that's waiting in my
> submission queue?
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=inode-repair-improvements&id=05f5a82efa6395c92038e18e008aaf7154238f27
I think the XFS_NLINK_PINNEED(~0U) can be used prevent Overflow in inc_nlink().
Is it better to compare i_nlink with (0U) in drop_nlink() to prevent Underflow?
(like this patch does, do not make i_nlink underflow from 0 to 0xffffffff)

Thanks.
> --D
>> @@ -2442,7 +2451,12 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
>>       */
>>      if (is_dir) {
>>          ASSERT(VFS_I(ip)->i_nlink >= 2);
>> -        if (VFS_I(ip)->i_nlink != 2) {
>> +        if (VFS_I(ip)->i_nlink < 2) {
>> +            xfs_warn(ip->i_mount,
>> +            "%s: Remove dir (inode %llu) with invalid links.",
>> +                 __func__, ip->i_ino);
>> +        }
>> +        if (VFS_I(ip)->i_nlink > 2) {
>>              error = -ENOTEMPTY;
>>              goto out_trans_cancel;
>>          }
>> --
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-24 23:02 ` Dave Chinner
@ 2023-08-25  9:09   ` cheng.lin130
  2023-08-25 17:56     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: cheng.lin130 @ 2023-08-25  9:09 UTC (permalink / raw)
  To: david
  Cc: djwong, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

> On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
>> From: Cheng Lin <cheng.lin130@zte.com.cn>
>> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
>> directory to become unusable until the next xfs_repair run.
> Hmmm.  How does this ever happen?
> IMO, if it does happen, we need to fix whatever bug that causes it
> to happen, not issue a warning and do nothing about the fact we
> just hit a corrupt inode state...
Yes, I'm very agree with your opinion. But I don't know how it happened,
and how to reproduce it.
>> Introduce protection for drop nlink to reduce the impact of this.
>> And produce a warning for directory nlink error during remove.
>>
>> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
>> ---
>>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 9e62cc5..536dbe4 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
>>      xfs_trans_t *tp,
>>      xfs_inode_t *ip)
>>  {
>> +    xfs_mount_t     *mp;
>> +
>> +    if (VFS_I(ip)->i_nlink == 0) {
>> +        mp = ip->i_mount;
>> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
>> +             __func__, ip->i_ino);
>> +        return 0;
>> +    }
> This is obviously incorrect - whiteout inodes (RENAME_WHITEOUT) have an
> i_nlink of zero when they are removed from the unlinked list. As do
> O_TMPFILE inodes - when they are linked into the filesystem, we
> explicitly check for i_nlink being zero before calling
> xfs_iunlink_remove().
I am not familiar with the above process. You means there is such a
scenario, even if it is (i_nlink==0), it still needs to run drop_nlink()
in xfs_droplink()? But this will cause i_nlink to underflow to 0xffffffff.
>> +
>>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
>>
>>      drop_nlink(VFS_I(ip));
> Wait a second - this code doesn't match an upstream kernel. What
> kernel did you make this patch against?
It's kernel mainline linux-6.5-rc7

Thanks.
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-25  9:09   ` cheng.lin130
@ 2023-08-25 17:56     ` Darrick J. Wong
  2023-08-26  3:08       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2023-08-25 17:56 UTC (permalink / raw)
  To: cheng.lin130
  Cc: david, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Fri, Aug 25, 2023 at 05:09:20PM +0800, cheng.lin130@zte.com.cn wrote:
> > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> >> directory to become unusable until the next xfs_repair run.
> > Hmmm.  How does this ever happen?
> > IMO, if it does happen, we need to fix whatever bug that causes it
> > to happen, not issue a warning and do nothing about the fact we
> > just hit a corrupt inode state...
> Yes, I'm very agree with your opinion. But I don't know how it happened,
> and how to reproduce it.

Wait, is this the result of a customer problem?  Or static analysis?

> >> Introduce protection for drop nlink to reduce the impact of this.
> >> And produce a warning for directory nlink error during remove.
> >>
> >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> >> ---
> >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> >>  1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> index 9e62cc5..536dbe4 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,

I'm not sure why your diff program thinks this hunk is from
xfs_iunlink_remove, seeing as the line numbers of the chunk point to
xfs_droplink.  Maybe that's what's going on in this part of the thread?

> >>      xfs_trans_t *tp,
> >>      xfs_inode_t *ip)
> >>  {
> >> +    xfs_mount_t     *mp;
> >> +
> >> +    if (VFS_I(ip)->i_nlink == 0) {
> >> +        mp = ip->i_mount;
> >> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> >> +             __func__, ip->i_ino);
> >> +        return 0;
> >> +    }
> > This is obviously incorrect - whiteout inodes (RENAME_WHITEOUT) have an
> > i_nlink of zero when they are removed from the unlinked list. As do
> > O_TMPFILE inodes - when they are linked into the filesystem, we
> > explicitly check for i_nlink being zero before calling
> > xfs_iunlink_remove().
> I am not familiar with the above process. You means there is such a
> scenario, even if it is (i_nlink==0), it still needs to run drop_nlink()
> in xfs_droplink()? But this will cause i_nlink to underflow to 0xffffffff.

xfs_iunlink_remove doesn't touch the link count.  I think Dave is
confused because of the misleading --show-c-function output.

> >> +
> >>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> >>
> >>      drop_nlink(VFS_I(ip));
> > Wait a second - this code doesn't match an upstream kernel. What
> > kernel did you make this patch against?
> It's kernel mainline linux-6.5-rc7

...and what did you use to generate the patch?  git diff?

--D

> 
> Thanks.
> > -Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-25  8:32   ` cheng.lin130
@ 2023-08-25 18:02     ` Darrick J. Wong
  2023-08-26 14:54       ` cheng.lin130
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2023-08-25 18:02 UTC (permalink / raw)
  To: cheng.lin130
  Cc: linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Fri, Aug 25, 2023 at 04:32:22PM +0800, cheng.lin130@zte.com.cn wrote:
> > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> >> directory to become unusable until the next xfs_repair run.
> >>
> >> Introduce protection for drop nlink to reduce the impact of this.
> >> And produce a warning for directory nlink error during remove.
> >>
> >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> >> ---
> >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> >>  1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> index 9e62cc5..536dbe4 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> >>      xfs_trans_t *tp,
> >>      xfs_inode_t *ip)
> >>  {
> >> +    xfs_mount_t     *mp;
> >> +
> >> +    if (VFS_I(ip)->i_nlink == 0) {
> >> +        mp = ip->i_mount;
> >> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> >> +             __func__, ip->i_ino);
> >> +        return 0;
> >> +    }
> >> +
> >>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> >>
> >>      drop_nlink(VFS_I(ip));
> > I'm not sure how nlink would ever get to 0xFFFFFFFF since the VFS won't
> > let a link count exceed s_max_links, and XFS sets that to 0x7FFFFFFF.
> > Unless, of course, you did that outside of Linux.
> In VFS drop_nlink() only produce a warning, when (inode->i_nlink == 0),
> not prevent its self-reduce(inode->__i_nlink--), cause it underflow
> from 0 to 0xffffffff.

It is interesting that vfs_unlink doesn't check the link counts of
either the parent or the child.  Maybe it should, since the VFS
link/mkdir/rename functions check.

I wonder if this is a historical leftover from the days when the VFS
did no checking at all?

> In the old kernel version, this situation was
> encountered, but I don't know how it happened. It was already a scene
> with directory errors: "Too many links".
> 
>  kernel: WARNING: CPU: 12 PID: 12928 at fs/inode.c:286 drop_nlink+0x3e/0x50
>  kernel: CPU: 12 PID: 12928 Comm: gbased Tainted: G        W  OE  ------------ T 3.10.0-693.21.1.el7.x86_64 #1
>  kernel: Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 01/23/2021
>  kernel: Call Trace:-------------------
>  kernel: [<ffffffff816c5fce>] dump_stack+0x19/0x1b
>  kernel: [<ffffffff8108dfa8>] __warn+0xd8/0x100/*
>  kernel: [<ffffffff8108e0ed>] warn_slowpath_null+0x1d/0x20
>  kernel: [<ffffffff8122cdfe>] drop_nlink+0x3e/0x50
>  kernel: [<ffffffffc03cdc78>] xfs_droplink+0x28/0x60 [xfs]
>  kernel: [<ffffffffc03cf87a>] xfs_remove+0x2aa/0x320 [xfs]
>  kernel: [<ffffffffc03c9f7a>] xfs_vn_unlink+0x5a/0xa0 [xfs]
>  kernel: [<ffffffff8121f19c>] vfs_rmdir+0xdc/0x150
>  kernel: [<ffffffff81221e41>] do_rmdir+0x1f1/0x220
>  kernel: [<ffffffff81223046>] SyS_rmdir+0x16/0x20
>  kernel: [<ffffffff816d86d5>] system_call_fastpath+0x1c/0x21
> > That said, why wouldn't you /pin/ the link count at -1U instead of
> > allowing it to overflow to zero?
> > Could you please take a look at this patch that's waiting in my
> > submission queue?
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=inode-repair-improvements&id=05f5a82efa6395c92038e18e008aaf7154238f27
> I think the XFS_NLINK_PINNEED(~0U) can be used prevent Overflow in inc_nlink().
> Is it better to compare i_nlink with (0U) in drop_nlink() to prevent Underflow?
> (like this patch does, do not make i_nlink underflow from 0 to 0xffffffff)

Is it a problem if a directory i_nlink underflows to XFS_NLINK_PINNED?
At that point the directory will never be freed, and xfs_repair/scrub
get to figure out the correct link count.

--D

> 
> Thanks.
> > --D
> >> @@ -2442,7 +2451,12 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> >>       */
> >>      if (is_dir) {
> >>          ASSERT(VFS_I(ip)->i_nlink >= 2);
> >> -        if (VFS_I(ip)->i_nlink != 2) {
> >> +        if (VFS_I(ip)->i_nlink < 2) {
> >> +            xfs_warn(ip->i_mount,
> >> +            "%s: Remove dir (inode %llu) with invalid links.",
> >> +                 __func__, ip->i_ino);
> >> +        }
> >> +        if (VFS_I(ip)->i_nlink > 2) {
> >>              error = -ENOTEMPTY;
> >>              goto out_trans_cancel;
> >>          }
> >> --
> >> 1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-25 17:56     ` Darrick J. Wong
@ 2023-08-26  3:08       ` Dave Chinner
  2023-08-26 15:08         ` cheng.lin130
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2023-08-26  3:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: cheng.lin130, linux-xfs, linux-kernel, jiang.yong5, wang.liang82,
	liu.dong3

On Fri, Aug 25, 2023 at 10:56:27AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 25, 2023 at 05:09:20PM +0800, cheng.lin130@zte.com.cn wrote:
> > > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> > >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> > >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> > >> directory to become unusable until the next xfs_repair run.
> > > Hmmm.  How does this ever happen?
> > > IMO, if it does happen, we need to fix whatever bug that causes it
> > > to happen, not issue a warning and do nothing about the fact we
> > > just hit a corrupt inode state...
> > Yes, I'm very agree with your opinion. But I don't know how it happened,
> > and how to reproduce it.
> 
> Wait, is this the result of a customer problem?  Or static analysis?
> 
> > >> Introduce protection for drop nlink to reduce the impact of this.
> > >> And produce a warning for directory nlink error during remove.
> > >>
> > >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> > >> ---
> > >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> > >>  1 file changed, 15 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > >> index 9e62cc5..536dbe4 100644
> > >> --- a/fs/xfs/xfs_inode.c
> > >> +++ b/fs/xfs/xfs_inode.c
> > >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> 
> I'm not sure why your diff program thinks this hunk is from
> xfs_iunlink_remove, seeing as the line numbers of the chunk point to
> xfs_droplink.  Maybe that's what's going on in this part of the thread?

Yes.

I don't expect patches to be mangled like this - I generally
take the hunk prefix to indicate what code is being modified when
reading patches, not expecting that the hunk is modifying code over
a thousand lines prior to the function in the prefix...

So, yeah, something went very wrong with the generation of this
patch...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-25 18:02     ` Darrick J. Wong
@ 2023-08-26 14:54       ` cheng.lin130
  2023-08-26 21:28         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: cheng.lin130 @ 2023-08-26 14:54 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

> On Fri, Aug 25, 2023 at 04:32:22PM +0800, cheng.lin130@zte.com.cn wrote:
> > > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> > >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> > >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> > >> directory to become unusable until the next xfs_repair run.
> > >>
> > >> Introduce protection for drop nlink to reduce the impact of this.
> > >> And produce a warning for directory nlink error during remove.
> > >>
> > >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> > >> ---
> > >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> > >>  1 file changed, 15 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > >> index 9e62cc5..536dbe4 100644
> > >> --- a/fs/xfs/xfs_inode.c
> > >> +++ b/fs/xfs/xfs_inode.c
> > >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> > >>      xfs_trans_t *tp,
> > >>      xfs_inode_t *ip)
> > >>  {
> > >> +    xfs_mount_t     *mp;
> > >> +
> > >> +    if (VFS_I(ip)->i_nlink == 0) {
> > >> +        mp = ip->i_mount;
> > >> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> > >> +             __func__, ip->i_ino);
> > >> +        return 0;
> > >> +    }
> > >> +
> > >>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> > >>
> > >>      drop_nlink(VFS_I(ip));
> > > I'm not sure how nlink would ever get to 0xFFFFFFFF since the VFS won't
> > > let a link count exceed s_max_links, and XFS sets that to 0x7FFFFFFF.
> > > Unless, of course, you did that outside of Linux.
> > In VFS drop_nlink() only produce a warning, when (inode->i_nlink == 0),
> > not prevent its self-reduce(inode->__i_nlink--), cause it underflow
> > from 0 to 0xffffffff.
> It is interesting that vfs_unlink doesn't check the link counts of
> either the parent or the child.  Maybe it should, since the VFS
> link/mkdir/rename functions check.
> I wonder if this is a historical leftover from the days when the VFS
> did no checking at all?
VFS produce a warning means it has discovered an abnormal situation.
I don't know why it just produce a warning. But, in other fs like
fuse/nfs/overlayfs/ext4, there is further protection for this situation.
> > In the old kernel version, this situation was
> > encountered, but I don't know how it happened. It was already a scene
> > with directory errors: "Too many links".
> >
> >  kernel: WARNING: CPU: 12 PID: 12928 at fs/inode.c:286 drop_nlink+0x3e/0x50
> >  kernel: CPU: 12 PID: 12928 Comm: gbased Tainted: G        W  OE  ------------ T 3.10.0-693.21.1.el7.x86_64 #1
> >  kernel: Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 01/23/2021
> >  kernel: Call Trace:-------------------
> >  kernel: [<ffffffff816c5fce>] dump_stack+0x19/0x1b
> >  kernel: [<ffffffff8108dfa8>] __warn+0xd8/0x100/*
> >  kernel: [<ffffffff8108e0ed>] warn_slowpath_null+0x1d/0x20
> >  kernel: [<ffffffff8122cdfe>] drop_nlink+0x3e/0x50
> >  kernel: [<ffffffffc03cdc78>] xfs_droplink+0x28/0x60 [xfs]
> >  kernel: [<ffffffffc03cf87a>] xfs_remove+0x2aa/0x320 [xfs]
> >  kernel: [<ffffffffc03c9f7a>] xfs_vn_unlink+0x5a/0xa0 [xfs]
> >  kernel: [<ffffffff8121f19c>] vfs_rmdir+0xdc/0x150
> >  kernel: [<ffffffff81221e41>] do_rmdir+0x1f1/0x220
> >  kernel: [<ffffffff81223046>] SyS_rmdir+0x16/0x20
> >  kernel: [<ffffffff816d86d5>] system_call_fastpath+0x1c/0x21
> > > That said, why wouldn't you /pin/ the link count at -1U instead of
> > > allowing it to overflow to zero?
> > > Could you please take a look at this patch that's waiting in my
> > > submission queue?
> > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=inode-repair-improvements&id=05f5a82efa6395c92038e18e008aaf7154238f27
> > I think the XFS_NLINK_PINNEED(~0U) can be used prevent Overflow in inc_nlink().
> > Is it better to compare i_nlink with (0U) in drop_nlink() to prevent Underflow?
> > (like this patch does, do not make i_nlink underflow from 0 to 0xffffffff)
> Is it a problem if a directory i_nlink underflows to XFS_NLINK_PINNED?
> At that point the directory will never be freed, and xfs_repair/scrub
> get to figure out the correct link count.
> --D
Yes, with i_nlink underflows to XFS_NLINK_PINNED, the directory will become
unavailable until be repaired. But the running service on this directory will be
failed. If i_nlink is protected from underflow, the use of the directory can continue,
and the continuity of services is guaranteed. The incorrect count also will be fixed
at next repair.
> >
> > Thanks.
> > > --D
> > >> @@ -2442,7 +2451,12 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> > >>       */
> > >>      if (is_dir) {
> > >>          ASSERT(VFS_I(ip)->i_nlink >= 2);
> > >> -        if (VFS_I(ip)->i_nlink != 2) {
> > >> +        if (VFS_I(ip)->i_nlink < 2) {
> > >> +            xfs_warn(ip->i_mount,
> > >> +            "%s: Remove dir (inode %llu) with invalid links.",
> > >> +                 __func__, ip->i_ino);
> > >> +        }
> > >> +        if (VFS_I(ip)->i_nlink > 2) {
> > >>              error = -ENOTEMPTY;
> > >>              goto out_trans_cancel;
> > >>          }
> > >> --
> > >> 1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-26  3:08       ` Dave Chinner
@ 2023-08-26 15:08         ` cheng.lin130
  0 siblings, 0 replies; 14+ messages in thread
From: cheng.lin130 @ 2023-08-26 15:08 UTC (permalink / raw)
  To: david, djwong
  Cc: linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

> On Fri, Aug 25, 2023 at 10:56:27AM -0700, Darrick J. Wong wrote:
> > On Fri, Aug 25, 2023 at 05:09:20PM +0800, cheng.lin130@zte.com.cn wrote:
> > > > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> > > >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> > > >> directory to become unusable until the next xfs_repair run.
> > > > Hmmm.  How does this ever happen?
> > > > IMO, if it does happen, we need to fix whatever bug that causes it
> > > > to happen, not issue a warning and do nothing about the fact we
> > > > just hit a corrupt inode state...
> > > Yes, I'm very agree with your opinion. But I don't know how it happened,
> > > and how to reproduce it.
> >
> > Wait, is this the result of a customer problem?  Or static analysis?
It's a customer problem.

> >
> > > >> Introduce protection for drop nlink to reduce the impact of this.
> > > >> And produce a warning for directory nlink error during remove.
> > > >>
> > > >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> > > >> ---
> > > >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> > > >>  1 file changed, 15 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > >> index 9e62cc5..536dbe4 100644
> > > >> --- a/fs/xfs/xfs_inode.c
> > > >> +++ b/fs/xfs/xfs_inode.c
> > > >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> >
> > I'm not sure why your diff program thinks this hunk is from
> > xfs_iunlink_remove, seeing as the line numbers of the chunk point to
> > xfs_droplink.  Maybe that's what's going on in this part of the thread?
> Yes.
> I don't expect patches to be mangled like this - I generally
> take the hunk prefix to indicate what code is being modified when
> reading patches, not expecting that the hunk is modifying code over
> a thousand lines prior to the function in the prefix...
> So, yeah, something went very wrong with the generation of this
> patch...
> -Dave.
It may be a problem with the git version. After using 2.18.1 instead of 1.8.3.1,
the patch looks normal.

> --
> Dave Chinner
> david@fromorbit.com

> > > Wait a second - this code doesn't match an upstream kernel. What
> > > kernel did you make this patch against?
> > It's kernel mainline linux-6.5-rc7

> ....and what did you use to generate the patch?  git diff?
>
> --D
It's  git format-patch 
git version 1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-26 14:54       ` cheng.lin130
@ 2023-08-26 21:28         ` Dave Chinner
  2023-08-28  3:29           ` cheng.lin130
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2023-08-26 21:28 UTC (permalink / raw)
  To: cheng.lin130
  Cc: djwong, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Sat, Aug 26, 2023 at 10:54:11PM +0800, cheng.lin130@zte.com.cn wrote:
> > On Fri, Aug 25, 2023 at 04:32:22PM +0800, cheng.lin130@zte.com.cn wrote:
> > > > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> > > >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> > > >> directory to become unusable until the next xfs_repair run.
> > > >>
> > > >> Introduce protection for drop nlink to reduce the impact of this.
> > > >> And produce a warning for directory nlink error during remove.
> > > >>
> > > >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> > > >> ---
> > > >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> > > >>  1 file changed, 15 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > >> index 9e62cc5..536dbe4 100644
> > > >> --- a/fs/xfs/xfs_inode.c
> > > >> +++ b/fs/xfs/xfs_inode.c
> > > >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> > > >>      xfs_trans_t *tp,
> > > >>      xfs_inode_t *ip)
> > > >>  {
> > > >> +    xfs_mount_t     *mp;
> > > >> +
> > > >> +    if (VFS_I(ip)->i_nlink == 0) {
> > > >> +        mp = ip->i_mount;
> > > >> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> > > >> +             __func__, ip->i_ino);
> > > >> +        return 0;
> > > >> +    }
> > > >> +
> > > >>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> > > >>
> > > >>      drop_nlink(VFS_I(ip));
> > > > I'm not sure how nlink would ever get to 0xFFFFFFFF since the VFS won't
> > > > let a link count exceed s_max_links, and XFS sets that to 0x7FFFFFFF.
> > > > Unless, of course, you did that outside of Linux.
> > > In VFS drop_nlink() only produce a warning, when (inode->i_nlink == 0),
> > > not prevent its self-reduce(inode->__i_nlink--), cause it underflow
> > > from 0 to 0xffffffff.
> > It is interesting that vfs_unlink doesn't check the link counts of
> > either the parent or the child.  Maybe it should, since the VFS
> > link/mkdir/rename functions check.
> > I wonder if this is a historical leftover from the days when the VFS
> > did no checking at all?
> VFS produce a warning means it has discovered an abnormal situation.
> I don't know why it just produce a warning. But, in other fs like
> fuse/nfs/overlayfs/ext4, there is further protection for this situation.

Well, the question is how the link count got corrupted in the first
place....

> > > In the old kernel version, this situation was
> > > encountered, but I don't know how it happened. It was already a scene
> > > with directory errors: "Too many links".

How do you overflow the directory link count in XFS? You can't fit
2^31 unique names in the directory data segment - the directory will
ENOSPC at 32GB of name data, and that typically occurs with at most
300-500 million dirents (depending on name lengths) in the
directory.

IOWs, normal operation shouldn't be able overflow the directory link
count at all, and so underruns shouldn't occur, either.

> > >  kernel: WARNING: CPU: 12 PID: 12928 at fs/inode.c:286 drop_nlink+0x3e/0x50
> > >  kernel: CPU: 12 PID: 12928 Comm: gbased Tainted: G        W  OE  ------------ T 3.10.0-693.21.1.el7.x86_64 #1

So this is a RHEL 7 kernel, and it is tainted with unknown out of
tree (3rd party) kernel modules. Hence if could be memory corruption
from whatever drivers are loaded.  It's also old enough that it is
posible this is a v4 filesystem, and if so, it could be that storage
media corruption occurred and it wasn't detected.

> > >  kernel: Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 01/23/2021
> > >  kernel: Call Trace:-------------------
> > >  kernel: [<ffffffff816c5fce>] dump_stack+0x19/0x1b
> > >  kernel: [<ffffffff8108dfa8>] __warn+0xd8/0x100/*
> > >  kernel: [<ffffffff8108e0ed>] warn_slowpath_null+0x1d/0x20
> > >  kernel: [<ffffffff8122cdfe>] drop_nlink+0x3e/0x50
> > >  kernel: [<ffffffffc03cdc78>] xfs_droplink+0x28/0x60 [xfs]
> > >  kernel: [<ffffffffc03cf87a>] xfs_remove+0x2aa/0x320 [xfs]
> > >  kernel: [<ffffffffc03c9f7a>] xfs_vn_unlink+0x5a/0xa0 [xfs]
> > >  kernel: [<ffffffff8121f19c>] vfs_rmdir+0xdc/0x150
> > >  kernel: [<ffffffff81221e41>] do_rmdir+0x1f1/0x220
> > >  kernel: [<ffffffff81223046>] SyS_rmdir+0x16/0x20
> > >  kernel: [<ffffffff816d86d5>] system_call_fastpath+0x1c/0x21

Without actually knowing the root cause, and directory link count
overflows not being possible in normal operation, I'm not sure that
we should be jumping to conclusions that the directory link count
code in the upstream kernel is actually broken and needs fixing.

> > > > That said, why wouldn't you /pin/ the link count at -1U instead of
> > > > allowing it to overflow to zero?
> > > > Could you please take a look at this patch that's waiting in my
> > > > submission queue?
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=inode-repair-improvements&id=05f5a82efa6395c92038e18e008aaf7154238f27

This is protecting against regular files with too many hardlinks;
a directory will never have the link count set to XFS_NLINK_PINNED
because it just can't have that many entries in it....

> > > I think the XFS_NLINK_PINNEED(~0U) can be used prevent Overflow in inc_nlink().
> > > Is it better to compare i_nlink with (0U) in drop_nlink() to prevent Underflow?
> > > (like this patch does, do not make i_nlink underflow from 0 to 0xffffffff)
> > Is it a problem if a directory i_nlink underflows to XFS_NLINK_PINNED?
> > At that point the directory will never be freed, and xfs_repair/scrub
> > get to figure out the correct link count.

I think that's wrong. The directory inode gets unlinked when the
link count goes to zero, before the underflow occurs and can be
detected. i.e. The warning occurs when the link could goes from 0 to
-1 and this is after it has been unlinked on the transition between
1 to 0. If there are more entries removed from the directory at this
point, the NLINK_PINNED detection then kicks in and we don't drop
the nlink of the directory any further.

But at this point, the damage has already been done - the directory
is on the unlinked list at this point, and now it has a non-zero
nlink count which means the VFS will not drop the inode and it
remains cached. i.e. we have a corrupt runtime state where an inode
is on the unlinked list yet isn't scheduled for inactivation/freeing
when the last reference to it goes away. Indeed, inactivation won't
trigger unlinked list removal, either, because the link count is not
zero.

I suspect at this point we have multiple on-disk corruptions in the
filesystem. The parent directory points to an inode on the unlinked
list, and if we crash at this point we have an inode on the unlinked
that will be skipped by the recovery code because i_nlink is not
zero (same iput_final/drop_inode problem). This ends up with a
corrupt in-memory unlinked list on the next mount (i.e. inode on the
list that isn't in memory...), and nothing good happens from that...

> > --D
> Yes, with i_nlink underflows to XFS_NLINK_PINNED, the directory will become
> unavailable until be repaired. But the running service on this directory will be
> failed. If i_nlink is protected from underflow, the use of the directory can continue,
> and the continuity of services is guaranteed. The incorrect count also will be fixed
> at next repair.

I think that given what an underflow represents - some kind of
corruption - and the fact that it can propagate in nasty ways if we
allow operation to continue, we should be shutting down the
filesystem if we detect an underflow on a directory inode. This will
force repair of the filesystem as soon as possible.

IOWs, we're already in bad situation when this warning is issued for
directories, and so not dropping the nlink count after it has
already underflowed doesn't matter one bit - we should be shutting
down the filesystem, not trying to continue onwards as it nothing
happened...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-26 21:28         ` Dave Chinner
@ 2023-08-28  3:29           ` cheng.lin130
  2023-08-28  5:21             ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: cheng.lin130 @ 2023-08-28  3:29 UTC (permalink / raw)
  To: david
  Cc: djwong, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

> On Sat, Aug 26, 2023 at 10:54:11PM +0800, cheng.lin130@zte.com.cn wrote:
> > > On Fri, Aug 25, 2023 at 04:32:22PM +0800, cheng.lin130@zte.com.cn wrote:
> > > > > On Thu, Aug 24, 2023 at 03:43:52PM +0800, cheng.lin130@zte.com.cn wrote:
> > > > >> From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > > >> An dir nlinks overflow which down form 0 to 0xffffffff, cause the
> > > > >> directory to become unusable until the next xfs_repair run.
> > > > >>
> > > > >> Introduce protection for drop nlink to reduce the impact of this.
> > > > >> And produce a warning for directory nlink error during remove.
> > > > >>
> > > > >> Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
> > > > >> ---
> > > > >>  fs/xfs/xfs_inode.c | 16 +++++++++++++++-
> > > > >>  1 file changed, 15 insertions(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > >> index 9e62cc5..536dbe4 100644
> > > > >> --- a/fs/xfs/xfs_inode.c
> > > > >> +++ b/fs/xfs/xfs_inode.c
> > > > >> @@ -919,6 +919,15 @@ STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
> > > > >>      xfs_trans_t *tp,
> > > > >>      xfs_inode_t *ip)
> > > > >>  {
> > > > >> +    xfs_mount_t     *mp;
> > > > >> +
> > > > >> +    if (VFS_I(ip)->i_nlink == 0) {
> > > > >> +        mp = ip->i_mount;
> > > > >> +        xfs_warn(mp, "%s: Deleting inode %llu with no links.",
> > > > >> +             __func__, ip->i_ino);
> > > > >> +        return 0;
> > > > >> +    }
> > > > >> +
> > > > >>      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> > > > >>
> > > > >>      drop_nlink(VFS_I(ip));
> > > > > I'm not sure how nlink would ever get to 0xFFFFFFFF since the VFS won't
> > > > > let a link count exceed s_max_links, and XFS sets that to 0x7FFFFFFF.
> > > > > Unless, of course, you did that outside of Linux.
> > > > In VFS drop_nlink() only produce a warning, when (inode->i_nlink == 0),
> > > > not prevent its self-reduce(inode->__i_nlink--), cause it underflow
> > > > from 0 to 0xffffffff.
> > > It is interesting that vfs_unlink doesn't check the link counts of
> > > either the parent or the child.  Maybe it should, since the VFS
> > > link/mkdir/rename functions check.
> > > I wonder if this is a historical leftover from the days when the VFS
> > > did no checking at all?
> > VFS produce a warning means it has discovered an abnormal situation.
> > I don't know why it just produce a warning. But, in other fs like
> > fuse/nfs/overlayfs/ext4, there is further protection for this situation.
> Well, the question is how the link count got corrupted in the first
> place....
> > > > In the old kernel version, this situation was
> > > > encountered, but I don't know how it happened. It was already a scene
> > > > with directory errors: "Too many links".
> How do you overflow the directory link count in XFS? You can't fit
> 2^31 unique names in the directory data segment - the directory will
> ENOSPC at 32GB of name data, and that typically occurs with at most
> 300-500 million dirents (depending on name lengths) in the
> directory.
> IOWs, normal operation shouldn't be able overflow the directory link
> count at all, and so underruns shouldn't occur, either.
Customer's explanation: in the nlink incorrect directory, not many directories
will be created, and normally there are only 2 regular files.
And only found this one directory with incorrect nlink when xfs_repair.
  systemd-fsck[5635]: Phase 2 - using internal log
  systemd-fsck[5635]: - zero log...
  systemd-fsck[5635]: - scan filesystem freespace and inode maps...
  systemd-fsck[5635]: agi unlinked bucket 9 is 73 in ag 22 (inode=23622320201)
  systemd-fsck[5635]: - 21:46:00: scanning filesystem freespace - 32 of 32 allocation groups done
  systemd-fsck[5635]: - found root inode chunk
  ...
  systemd-fsck[5635]: Phase 7 - verify and correct link counts...
  systemd-fsck[5635]: resetting inode 23622320201 nlinks from 4294967284 to 2
  systemd-fsck[5635]: - 22:06:34: verify and correct link counts - 32 of 32 allocation groups done
  systemd-fsck[5635]: done
> > > >  kernel: WARNING: CPU: 12 PID: 12928 at fs/inode.c:286 drop_nlink+0x3e/0x50
> > > >  kernel: CPU: 12 PID: 12928 Comm: gbased Tainted: G        W  OE  ------------ T 3.10.0-693.21.1.el7.x86_64 #1
> So this is a RHEL 7 kernel, and it is tainted with unknown out of
> tree (3rd party) kernel modules. Hence if could be memory corruption
> from whatever drivers are loaded.  It's also old enough that it is
> posible this is a v4 filesystem, and if so, it could be that storage
> media corruption occurred and it wasn't detected.
> > > >  kernel: Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 01/23/2021
> > > >  kernel: Call Trace:-------------------
> > > >  kernel: [<ffffffff816c5fce>] dump_stack+0x19/0x1b
> > > >  kernel: [<ffffffff8108dfa8>] __warn+0xd8/0x100/*
> > > >  kernel: [<ffffffff8108e0ed>] warn_slowpath_null+0x1d/0x20
> > > >  kernel: [<ffffffff8122cdfe>] drop_nlink+0x3e/0x50
> > > >  kernel: [<ffffffffc03cdc78>] xfs_droplink+0x28/0x60 [xfs]
> > > >  kernel: [<ffffffffc03cf87a>] xfs_remove+0x2aa/0x320 [xfs]
> > > >  kernel: [<ffffffffc03c9f7a>] xfs_vn_unlink+0x5a/0xa0 [xfs]
> > > >  kernel: [<ffffffff8121f19c>] vfs_rmdir+0xdc/0x150
> > > >  kernel: [<ffffffff81221e41>] do_rmdir+0x1f1/0x220
> > > >  kernel: [<ffffffff81223046>] SyS_rmdir+0x16/0x20
> > > >  kernel: [<ffffffff816d86d5>] system_call_fastpath+0x1c/0x21
> Without actually knowing the root cause, and directory link count
> overflows not being possible in normal operation, I'm not sure that
> we should be jumping to conclusions that the directory link count
> code in the upstream kernel is actually broken and needs fixing.
> > > > > That said, why wouldn't you /pin/ the link count at -1U instead of
> > > > > allowing it to overflow to zero?
> > > > > Could you please take a look at this patch that's waiting in my
> > > > > submission queue?
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=inode-repair-improvements&id=05f5a82efa6395c92038e18e008aaf7154238f27
> This is protecting against regular files with too many hardlinks;
> a directory will never have the link count set to XFS_NLINK_PINNED
> because it just can't have that many entries in it....
> > > > I think the XFS_NLINK_PINNEED(~0U) can be used prevent Overflow in inc_nlink().
> > > > Is it better to compare i_nlink with (0U) in drop_nlink() to prevent Underflow?
> > > > (like this patch does, do not make i_nlink underflow from 0 to 0xffffffff)
> > > Is it a problem if a directory i_nlink underflows to XFS_NLINK_PINNED?
> > > At that point the directory will never be freed, and xfs_repair/scrub
> > > get to figure out the correct link count.
> I think that's wrong. The directory inode gets unlinked when the
> link count goes to zero, before the underflow occurs and can be
> detected. i.e. The warning occurs when the link could goes from 0 to
> -1 and this is after it has been unlinked on the transition between
> 1 to 0. If there are more entries removed from the directory at this
> point, the NLINK_PINNED detection then kicks in and we don't drop
> the nlink of the directory any further.
> But at this point, the damage has already been done - the directory
> is on the unlinked list at this point, and now it has a non-zero
> nlink count which means the VFS will not drop the inode and it
> remains cached. i.e. we have a corrupt runtime state where an inode
> is on the unlinked list yet isn't scheduled for inactivation/freeing
> when the last reference to it goes away. Indeed, inactivation won't
> trigger unlinked list removal, either, because the link count is not
> zero.
> I suspect at this point we have multiple on-disk corruptions in the
> filesystem. The parent directory points to an inode on the unlinked
> list, and if we crash at this point we have an inode on the unlinked
> that will be skipped by the recovery code because i_nlink is not
> zero (same iput_final/drop_inode problem). This ends up with a
> corrupt in-memory unlinked list on the next mount (i.e. inode on the
> list that isn't in memory...), and nothing good happens from that...
> > > --D
> > Yes, with i_nlink underflows to XFS_NLINK_PINNED, the directory will become
> > unavailable until be repaired. But the running service on this directory will be
> > failed. If i_nlink is protected from underflow, the use of the directory can continue,
> > and the continuity of services is guaranteed. The incorrect count also will be fixed
> > at next repair.
> I think that given what an underflow represents - some kind of
> corruption - and the fact that it can propagate in nasty ways if we
> allow operation to continue, we should be shutting down the
> filesystem if we detect an underflow on a directory inode. This will
> force repair of the filesystem as soon as possible.
> IOWs, we're already in bad situation when this warning is issued for
> directories, and so not dropping the nlink count after it has
> already underflowed doesn't matter one bit - we should be shutting
> down the filesystem, not trying to continue onwards as it nothing
> happened...
> Cheers,
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
If it's just a incorrect count of one dicrectory, after ignore it, the fs
can work normally(with error). Is it worth stopping the entire fs
immediately for this condition?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
  2023-08-28  3:29           ` cheng.lin130
@ 2023-08-28  5:21             ` Dave Chinner
       [not found]               ` <202309041042177773780@zte.com.cn>
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2023-08-28  5:21 UTC (permalink / raw)
  To: cheng.lin130
  Cc: djwong, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Mon, Aug 28, 2023 at 11:29:51AM +0800, cheng.lin130@zte.com.cn wrote:
> > On Sat, Aug 26, 2023 at 10:54:11PM +0800, cheng.lin130@zte.com.cn wrote:
> > > > > In the old kernel version, this situation was
> > > > > encountered, but I don't know how it happened. It was already a scene
> > > > > with directory errors: "Too many links".
> > How do you overflow the directory link count in XFS? You can't fit
> > 2^31 unique names in the directory data segment - the directory will
> > ENOSPC at 32GB of name data, and that typically occurs with at most
> > 300-500 million dirents (depending on name lengths) in the
> > directory.
> > IOWs, normal operation shouldn't be able overflow the directory link
> > count at all, and so underruns shouldn't occur, either.
> Customer's explanation: in the nlink incorrect directory, not many directories
> will be created, and normally there are only 2 regular files.
> And only found this one directory with incorrect nlink when xfs_repair.
>   systemd-fsck[5635]: Phase 2 - using internal log
>   systemd-fsck[5635]: - zero log...
>   systemd-fsck[5635]: - scan filesystem freespace and inode maps...
>   systemd-fsck[5635]: agi unlinked bucket 9 is 73 in ag 22 (inode=23622320201)

So the directory inode is on the unlinked list, as I suggested it
would be.

>   systemd-fsck[5635]: - 21:46:00: scanning filesystem freespace - 32 of 32 allocation groups done
>   systemd-fsck[5635]: - found root inode chunk
>   ...

How many other inodes were repaired or trashed or moved to
lost+found?

>   systemd-fsck[5635]: Phase 7 - verify and correct link counts...
>   systemd-fsck[5635]: resetting inode 23622320201 nlinks from 4294967284 to 2

The link count of the directory inode on the unlinked list was
actually -12, so this isn't an "off by one" error. It's still just 2
adjacent bits being cleared when they shouldn't have been, though.

What is the xfs_info (or mkfs) output for the filesystem that this
occurred on?

.....

> If it's just a incorrect count of one dicrectory, after ignore it, the fs
> can work normally(with error). Is it worth stopping the entire fs
> immediately for this condition?

The inode is on the unlinked list with a non-zero link count. That
means it cannot be removed from the unlinked list (because the inode
will not be freed during inactivation) and so the unlinked list is
effectively corrupt. Anything that removes an inode or creates a
O_TMPFILE or uses RENAME_WHITEOUT can trip over this corrupt
unlinked list and have things go from bad to worse. Hence the
corruption is not limited to the directory inode or operations
involving that directory inode. We generally shut down the
filesystem when this sort of corruption occurs - it needs to be
repaired ASAP, otherwise other stuff will randomly fail and you'll
still end up with a shut down filesystem. Better to fail fast in
corruption cases than try to ignore it and vainly hope that
everything will work out for the best....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] xfs: introduce protection for drop nlink
       [not found]               ` <202309041042177773780@zte.com.cn>
@ 2023-09-04 22:49                 ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2023-09-04 22:49 UTC (permalink / raw)
  To: cheng.lin130
  Cc: djwong, linux-xfs, linux-kernel, jiang.yong5, wang.liang82, liu.dong3

On Mon, Sep 04, 2023 at 10:42:17AM +0800, cheng.lin130@zte.com.cn wrote:
> > On Mon, Aug 28, 2023 at 11:29:51AM +0800, cheng.lin130@zte.com.cn wrote:
> > > > On Sat, Aug 26, 2023 at 10:54:11PM +0800, cheng.lin130@zte.com.cn wrote:
> > > > > > > In the old kernel version, this situation was
> > > > > > > encountered, but I don't know how it happened. It was already a scene
> > > > > > > with directory errors: "Too many links".
> > > > How do you overflow the directory link count in XFS? You can't fit
> > > > 2^31 unique names in the directory data segment - the directory will
> > > > ENOSPC at 32GB of name data, and that typically occurs with at most
> > > > 300-500 million dirents (depending on name lengths) in the
> > > > directory.
> > > > IOWs, normal operation shouldn't be able overflow the directory link
> > > > count at all, and so underruns shouldn't occur, either.
> > > Customer's explanation: in the nlink incorrect directory, not many directories
> > > will be created, and normally there are only 2 regular files.
> > > And only found this one directory with incorrect nlink when xfs_repair.
> > > systemd-fsck[5635]: Phase 2 - using internal log
> > > systemd-fsck[5635]: - zero log...
> > > systemd-fsck[5635]: - scan filesystem freespace and inode maps...
> > > systemd-fsck[5635]: agi unlinked bucket 9 is 73 in ag 22 (inode=23622320201)
> > So the directory inode is on the unlinked list, as I suggested it
> > would be.
> Yes.
> > > systemd-fsck[5635]: - 21:46:00: scanning filesystem freespace - 32 of 32 allocation groups done
> > > systemd-fsck[5635]: - found root inode chunk
> > > ...
> > How many other inodes were repaired or trashed or moved to
> > lost+found?
> Just (inode=23622320201) repaired.

So only one inode on the unlinked list, and it's the inode with
the bad link count.

> > > systemd-fsck[5635]: Phase 7 - verify and correct link counts...
> > > systemd-fsck[5635]: resetting inode 23622320201 nlinks from 4294967284 to 2
> > The link count of the directory inode on the unlinked list was
> > actually -12, so this isn't an "off by one" error. It's still just 2
> > adjacent bits being cleared when they shouldn't have been, though.
> > What is the xfs_info (or mkfs) output for the filesystem that this
> > occurred on?
> meta-data=/dev/mapper/vg_gbaseserver-lv_gbaseserver isize=512 agcount=32, agsize=78643168 blks
>          = sectsz=512 attr=2, projid32bit=1
>          = crc=1 finobt=0 spinodes=0

Ok, crcs are enabled, so it's likely not storage level corruption.

> data     = bsize=4096 blocks=2516581376, imaxpct=5
>          = sunit=0 swidth=0 blks
> naming   =version 2 bsize=4096 ascii-ci=0 ftype=1
> log      =internal bsize=4096 blocks=521728, version=2
>          = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> 
> > ......
> > > If it's just a incorrect count of one dicrectory, after ignore it, the fs
> > > can work normally(with error). Is it worth stopping the entire fs
> > > immediately for this condition?
> > The inode is on the unlinked list with a non-zero link count. That
> > means it cannot be removed from the unlinked list (because the inode
> > will not be freed during inactivation) and so the unlinked list is
> > effectively corrupt. Anything that removes an inode or creates a
> > O_TMPFILE or uses RENAME_WHITEOUT can trip over this corrupt
> > unlinked list and have things go from bad to worse. Hence the
> If protect the nlink not to underflow(minimum value of nlink is 0),
> does it means can avoid unlinked list to be corrupted?

The VFS already warns when an underflow occurs - stuff has already
gone wrong at this point, and if we are going to do anything then
we should be shutting the filesystem down at this point because
something is corrupt either in memory or on disk, and continuing
after underflow propagates the corruption and makes things worse...

The fact that your customer's system didn't log warnings about the
link count going from 0 to -1 when the link count was -12 on disk
(like it should if this happens via xfs_droplink() -> drop_link())
it really brings into question how this situation silently
occurred....

Until we actually understand the root cause of the bad value and why
it occurred silently in a decade old kernel, trying to fix it in a
current upstream kernel is premature.

> > corruption is not limited to the directory inode or operations
> > involving that directory inode. We generally shut down the
> > filesystem when this sort of corruption occurs - it needs to be
> > repaired ASAP, otherwise other stuff will randomly fail and
> > you'll still end up with a shut down filesystem. Better to fail
> > fast in corruption cases than try to ignore it and vainly hope
> > that everything will work out for the best....  Cheers, Dave.
> > --
> Directly shutdown filesystem is really a relatively safe approach.
> But for customer, it's suddenly and unprepared. If keep fs
> available as possible (If can be achieved) and allow delayed
> repair, then customer can make more preparations before do that.
> Do you preferred more to shutdown filesystem directly?

Yes, if we've detected corruption in a modification situation (such
as an unlink) we need to shut down the filesystem. Propagating a
corruption from in-memory to on-disk is the worst thing we can do.
As such, XFS has had a policy since the mid 1990s that we shut down
the filesystem immediately rather than risk propagating a corruption
into on-disk structures.

This will change in the future as we start to leverage online repair
in response to corruption detections like this. But that's not a
magic bullet, and that does not help the situation with problems on
RHEL-7 era kernels....

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-04 22:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24  7:43 [PATCH] xfs: introduce protection for drop nlink cheng.lin130
2023-08-24 16:12 ` Darrick J. Wong
2023-08-25  8:32   ` cheng.lin130
2023-08-25 18:02     ` Darrick J. Wong
2023-08-26 14:54       ` cheng.lin130
2023-08-26 21:28         ` Dave Chinner
2023-08-28  3:29           ` cheng.lin130
2023-08-28  5:21             ` Dave Chinner
     [not found]               ` <202309041042177773780@zte.com.cn>
2023-09-04 22:49                 ` Dave Chinner
2023-08-24 23:02 ` Dave Chinner
2023-08-25  9:09   ` cheng.lin130
2023-08-25 17:56     ` Darrick J. Wong
2023-08-26  3:08       ` Dave Chinner
2023-08-26 15:08         ` cheng.lin130

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.