All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
@ 2022-06-10  8:27 bugzilla-daemon
  2022-06-10  8:30 ` [Bug 216110] " bugzilla-daemon
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-06-10  8:27 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

            Bug ID: 216110
           Summary: rmdir sub directory cause i_nlink of parent directory
                    down from 0 to 0xffffffff
           Product: File System
           Version: 2.5
    Kernel Version: linux-3.10.0-957.el7
          Hardware: Other
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: XFS
          Assignee: filesystem_xfs@kernel-bugs.kernel.org
          Reporter: hexiaole1994@126.com
        Regression: No

1. synptom
when user executed mkdir command under parent directory, mkdir command prompted
"Too many links".


2. basic analysis
(1)use "getconf LINK_MAX ." under parent directory, the max i_nlink of the
xfs(the filesystem that parent directory belongs) is 2147483647, but the
i_nlink of the parent directory now is 4294967109, because the mkdir command
will check if the i_nlink of the parent directory is lower than the LINK_MAX,
in our environment this check failed, so mkdir command prompt "Too many links".
(2)we "cd" into the parent directory, and execute "ls|wc" to accounting the
total files of the parent directory, the result is 308875
(3)the i_nlink by definition is "the number of links to the inode from
directories", a newly created directory has i_nlink of 2, and the i_nlink of
this newly created directory will plus 1 once there has a sub directory created
under it(the sub directory's ".." points to parent directory cause the i_nlink
of the parent directory plus 1), so the i_nlink of the parent directory can
also reflect the number of the sub directories(the number of sub directory =
i_nlink of the parent - 2). the i_nlink of the parent directory now is
4294967109, if this i_nlink is valid, the number of the sub directoryes might
be 4294967109, but like the (2) shows, the total files(include directories)
under the parent directory is 308875. so we can assert the i_nlink metadata of
the parent direcotry was corrupted.
(4)in the dmesg file of the sos_report, we saw an call trace that related to
this corrupted i_nlink of parent directory:
...
[26038585.616782] ------------[ cut here ]------------
[26038585.616794] WARNING: CPU: 22 PID: 21088 at fs/inode.c:284
drop_nlink+0x3e/0x50
[26038585.616796] Modules linked in: binfmt_misc tcp_diag inet_diag 8021q garp
mrp stp llc bonding vfat fat ipmi_ssif amd64_edac_mod edac_mce_amd kvm joydev
irqbypass ses enclosure pcspkr scsi_transport_sas sg ipmi_si ipmi_devintf
ipmi_msghandler i2c_piix4 acpi_cpufreq ip_tables xfs libcrc32c sd_mod
crc_t10dif crct10dif_generic crct10dif_common ast crc32c_intel drm_kms_helper
syscopyarea sysfillrect igb ixgbe sysimgblt fb_sys_fops ttm i2c_algo_bit mdio
ptp drm pps_core megaraid_sas dca drm_panel_orientation_quirks ahci libahci
libata nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[26038585.616850] CPU: 22 PID: 21088 Comm: gbased Not tainted
3.10.0-957.el7.hg.3.x86_64 #1
[26038585.616851] Hardware name: Sugon H620-G30/65N32-US, BIOS 0QL1001207
03/03/2021
[26038585.616853] Call Trace:
[26038585.616861]  [<ffffffff86161de9>] dump_stack+0x19/0x1b
[26038585.616866]  [<ffffffff85a976c8>] __warn+0xd8/0x100
[26038585.616868]  [<ffffffff85a9780d>] warn_slowpath_null+0x1d/0x20
[26038585.616870]  [<ffffffff85c5df5e>] drop_nlink+0x3e/0x50
[26038585.616904]  [<ffffffffc03f5d08>] xfs_droplink+0x28/0x60 [xfs]
[26038585.616927]  [<ffffffffc03f922f>] xfs_remove+0x29f/0x310 [xfs]
[26038585.616930]  [<ffffffff85c595a0>] ? take_dentry_name_snapshot+0xf0/0xf0
[26038585.616951]  [<ffffffffc03f3bb7>] xfs_vn_unlink+0x57/0xa0 [xfs]
[26038585.616953]  [<ffffffff85c4dcac>] vfs_rmdir+0xdc/0x150
[26038585.616956]  [<ffffffff85c53151>] do_rmdir+0x1f1/0x220
[26038585.616959]  [<ffffffff85c436be>] ? ____fput+0xe/0x10
[26038585.616964]  [<ffffffff85abe820>] ? task_work_run+0xc0/0xe0
[26038585.616966]  [<ffffffff85c54386>] SyS_rmdir+0x16/0x20
[26038585.616970]  [<ffffffff86174ddb>] system_call_fastpath+0x22/0x27
[26038585.616972] ---[ end trace 23639deaf902c67e ]---
...
(5)the call trace is from the "WARN_ON" function below:
void drop_nlink(struct inode *inode)
{
        WARN_ON(inode->i_nlink == 0);
        inode->__i_nlink--;
        if (!inode->i_nlink)
                atomic_long_inc(&inode->i_sb->s_remove_count);
}
(6)the call trace above shows at some time earlier, the i_nlink of the parent
direcotry substracted from 0 by 1, because the i_nlink is 32-bit unsigned int,
it became 0xffffffff, and from then, the parent direcory can only decreasing
the i_nlink rather than increasing due to the LINK_MAX.


3. the root cause of corrupted i_nlink of parent directory
(1)we saw another call trace in dmesg file of the same process that cause the
call trace of "SyS_rmdir" above:
...
[18317578.683304] gbased invoked oom-killer: gfp_mask=0x200da, order=0,
oom_score_adj=0
[18317578.683311] gbased cpuset=/ mems_allowed=0-7
[18317578.683315] CPU: 11 PID: 17701 Comm: gbased Not tainted
3.10.0-957.el7.hg.3.x86_64 #1
[18317578.683318] Hardware name: Sugon H620-G30/65N32-US, BIOS 0QL1001207
03/03/2021
[18317578.683320] Call Trace:
[18317578.683330]  [<ffffffff86161de9>] dump_stack+0x19/0x1b
[18317578.683334]  [<ffffffff8615c812>] dump_header+0x90/0x229
[18317578.683339]  [<ffffffff85bba2f4>] oom_kill_process+0x254/0x3d0
[18317578.683342]  [<ffffffff85bb9d63>] ? oom_unkillable_task+0x93/0x120
[18317578.683345]  [<ffffffff85bb9e46>] ? find_lock_task_mm+0x56/0xc0
[18317578.683347]  [<ffffffff85bbab36>] out_of_memory+0x4b6/0x4f0
[18317578.683350]  [<ffffffff8615d316>] __alloc_pages_slowpath+0x5d6/0x724
[18317578.683353]  [<ffffffff85bc0f15>] __alloc_pages_nodemask+0x405/0x420
[18317578.683357]  [<ffffffff85c11185>] alloc_pages_vma+0xb5/0x200
[18317578.683361]  [<ffffffff85bce3d0>] shmem_alloc_page+0x70/0xc0
[18317578.683366]  [<ffffffff85ac2dab>] ? autoremove_wake_function+0x2b/0x40
[18317578.683369]  [<ffffffff85acbb1b>] ? __wake_up_common+0x5b/0x90
[18317578.683374]  [<ffffffff85d7c6c4>] ? __radix_tree_lookup+0x84/0xf0
[18317578.683377]  [<ffffffff85da00ea>] ? __percpu_counter_compare+0x2a/0x90
[18317578.683379]  [<ffffffff85bd12e1>] shmem_getpage_gfp+0x451/0x840
[18317578.683382]  [<ffffffff85bd19a4>] shmem_write_begin+0x54/0x80
[18317578.683384]  [<ffffffff85bb5d94>] generic_file_buffered_write+0x124/0x2c0
[18317578.683386]  [<ffffffff85bb86d2>] __generic_file_aio_write+0x1e2/0x400
[18317578.683389]  [<ffffffff85bb8949>] generic_file_aio_write+0x59/0xa0
[18317578.683392]  [<ffffffff85c40633>] do_sync_write+0x93/0xe0
[18317578.683395]  [<ffffffff85c41120>] vfs_write+0xc0/0x1f0
[18317578.683397]  [<ffffffff85c41f3f>] SyS_write+0x7f/0xf0
[18317578.683401]  [<ffffffff86174ddb>] system_call_fastpath+0x22/0x27
[18317578.683402] Mem-Info:
[18317578.683486] active_anon:59939847 inactive_anon:3882578 isolated_anon:0
...
(2)the call trace shows this process was killed due to the "oom", we suspect if
at the time this process being kill, its other threads(other than the
"SyS_write" thread that the call trace shows) was doing concurrent rmdir or
mkdir under the parent direcotry, the kill will cause the corrupted i_nlink of
the parent directory, and we simulate this "oom" situation where multithread do
concurrent mkdir and rmdir under parent directory, but the problem can not
reproduce at all.
(3)the dmesg file also shows an error related to "power saving mode":
...
[23647870.874579] Uhhuh. NMI received for unknown reason 3d on CPU 56.
[23647870.874624] Do you have a strange power saving mode enabled?
[23647870.874650] Dazed and confused, but trying to continue
...
(4)we are simulating this "power saving mode" error to determine if this can
cause the corrupted i_nlink problem, this is in progressing.
(5)the problematic environment now repaired by hand throught the xfs_db tool,
we manually modify the corrupted i_nlink of the parent directory to the correct
value.
(6)in short, by now we still confusing why the corrupted i_nlink of the parent
can happen.


4. attachment descriptions
(1)the screenshot of the problematic environment that shows the corrupted
i_nlink of the parent directory.
(2)the dmesg file.


5. other informations
(1)the similar problem that caused on ext4 filesystem:
https://lkml.kernel.org/lkml/4febf11b-31ea-82a1-bf08-b6bebe08bc75@huawei.com/T/

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 216110] rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
@ 2022-06-10  8:30 ` bugzilla-daemon
  2022-06-10  8:31 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-06-10  8:30 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

--- Comment #1 from hexiaole (hexiaole1994@126.com) ---
Created attachment 301146
  --> https://bugzilla.kernel.org/attachment.cgi?id=301146&action=edit
screenshot of the corrupted i_nlink of the parent directory

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 216110] rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
  2022-06-10  8:30 ` [Bug 216110] " bugzilla-daemon
@ 2022-06-10  8:31 ` bugzilla-daemon
  2022-06-10 14:44 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-06-10  8:31 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

--- Comment #2 from hexiaole (hexiaole1994@126.com) ---
Created attachment 301147
  --> https://bugzilla.kernel.org/attachment.cgi?id=301147&action=edit
dmesg file

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 216110] rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
  2022-06-10  8:30 ` [Bug 216110] " bugzilla-daemon
  2022-06-10  8:31 ` bugzilla-daemon
@ 2022-06-10 14:44 ` bugzilla-daemon
  2022-06-22 23:08 ` [Bug 216110] New: " Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-06-10 14:44 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

--- Comment #3 from hexiaole (hexiaole1994@126.com) ---
in section 2., and subsection (2), the description "the number of the sub
directoryes might be 4294967109" has some mistakes, the corrected is "the
number of the sub directoryes should be 4294967107".

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-06-10 14:44 ` bugzilla-daemon
@ 2022-06-22 23:08 ` Darrick J. Wong
  2022-06-22 23:08 ` [Bug 216110] " bugzilla-daemon
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2022-06-22 23:08 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-xfs

On Fri, Jun 10, 2022 at 08:27:38AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216110
> 
>             Bug ID: 216110
>            Summary: rmdir sub directory cause i_nlink of parent directory
>                     down from 0 to 0xffffffff
>            Product: File System
>            Version: 2.5
>     Kernel Version: linux-3.10.0-957.el7

Please contact your RHEL7   ^^^^^^^^^^^^^^ account representative for
assistance in triaging this bug.

--D

>           Hardware: Other
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@kernel-bugs.kernel.org
>           Reporter: hexiaole1994@126.com
>         Regression: No
> 
> 1. synptom
> when user executed mkdir command under parent directory, mkdir command prompted
> "Too many links".
> 
> 
> 2. basic analysis
> (1)use "getconf LINK_MAX ." under parent directory, the max i_nlink of the
> xfs(the filesystem that parent directory belongs) is 2147483647, but the
> i_nlink of the parent directory now is 4294967109, because the mkdir command
> will check if the i_nlink of the parent directory is lower than the LINK_MAX,
> in our environment this check failed, so mkdir command prompt "Too many links".
> (2)we "cd" into the parent directory, and execute "ls|wc" to accounting the
> total files of the parent directory, the result is 308875
> (3)the i_nlink by definition is "the number of links to the inode from
> directories", a newly created directory has i_nlink of 2, and the i_nlink of
> this newly created directory will plus 1 once there has a sub directory created
> under it(the sub directory's ".." points to parent directory cause the i_nlink
> of the parent directory plus 1), so the i_nlink of the parent directory can
> also reflect the number of the sub directories(the number of sub directory =
> i_nlink of the parent - 2). the i_nlink of the parent directory now is
> 4294967109, if this i_nlink is valid, the number of the sub directoryes might
> be 4294967109, but like the (2) shows, the total files(include directories)
> under the parent directory is 308875. so we can assert the i_nlink metadata of
> the parent direcotry was corrupted.
> (4)in the dmesg file of the sos_report, we saw an call trace that related to
> this corrupted i_nlink of parent directory:
> ...
> [26038585.616782] ------------[ cut here ]------------
> [26038585.616794] WARNING: CPU: 22 PID: 21088 at fs/inode.c:284
> drop_nlink+0x3e/0x50
> [26038585.616796] Modules linked in: binfmt_misc tcp_diag inet_diag 8021q garp
> mrp stp llc bonding vfat fat ipmi_ssif amd64_edac_mod edac_mce_amd kvm joydev
> irqbypass ses enclosure pcspkr scsi_transport_sas sg ipmi_si ipmi_devintf
> ipmi_msghandler i2c_piix4 acpi_cpufreq ip_tables xfs libcrc32c sd_mod
> crc_t10dif crct10dif_generic crct10dif_common ast crc32c_intel drm_kms_helper
> syscopyarea sysfillrect igb ixgbe sysimgblt fb_sys_fops ttm i2c_algo_bit mdio
> ptp drm pps_core megaraid_sas dca drm_panel_orientation_quirks ahci libahci
> libata nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
> [26038585.616850] CPU: 22 PID: 21088 Comm: gbased Not tainted
> 3.10.0-957.el7.hg.3.x86_64 #1
> [26038585.616851] Hardware name: Sugon H620-G30/65N32-US, BIOS 0QL1001207
> 03/03/2021
> [26038585.616853] Call Trace:
> [26038585.616861]  [<ffffffff86161de9>] dump_stack+0x19/0x1b
> [26038585.616866]  [<ffffffff85a976c8>] __warn+0xd8/0x100
> [26038585.616868]  [<ffffffff85a9780d>] warn_slowpath_null+0x1d/0x20
> [26038585.616870]  [<ffffffff85c5df5e>] drop_nlink+0x3e/0x50
> [26038585.616904]  [<ffffffffc03f5d08>] xfs_droplink+0x28/0x60 [xfs]
> [26038585.616927]  [<ffffffffc03f922f>] xfs_remove+0x29f/0x310 [xfs]
> [26038585.616930]  [<ffffffff85c595a0>] ? take_dentry_name_snapshot+0xf0/0xf0
> [26038585.616951]  [<ffffffffc03f3bb7>] xfs_vn_unlink+0x57/0xa0 [xfs]
> [26038585.616953]  [<ffffffff85c4dcac>] vfs_rmdir+0xdc/0x150
> [26038585.616956]  [<ffffffff85c53151>] do_rmdir+0x1f1/0x220
> [26038585.616959]  [<ffffffff85c436be>] ? ____fput+0xe/0x10
> [26038585.616964]  [<ffffffff85abe820>] ? task_work_run+0xc0/0xe0
> [26038585.616966]  [<ffffffff85c54386>] SyS_rmdir+0x16/0x20
> [26038585.616970]  [<ffffffff86174ddb>] system_call_fastpath+0x22/0x27
> [26038585.616972] ---[ end trace 23639deaf902c67e ]---
> ...
> (5)the call trace is from the "WARN_ON" function below:
> void drop_nlink(struct inode *inode)
> {
>         WARN_ON(inode->i_nlink == 0);
>         inode->__i_nlink--;
>         if (!inode->i_nlink)
>                 atomic_long_inc(&inode->i_sb->s_remove_count);
> }
> (6)the call trace above shows at some time earlier, the i_nlink of the parent
> direcotry substracted from 0 by 1, because the i_nlink is 32-bit unsigned int,
> it became 0xffffffff, and from then, the parent direcory can only decreasing
> the i_nlink rather than increasing due to the LINK_MAX.
> 
> 
> 3. the root cause of corrupted i_nlink of parent directory
> (1)we saw another call trace in dmesg file of the same process that cause the
> call trace of "SyS_rmdir" above:
> ...
> [18317578.683304] gbased invoked oom-killer: gfp_mask=0x200da, order=0,
> oom_score_adj=0
> [18317578.683311] gbased cpuset=/ mems_allowed=0-7
> [18317578.683315] CPU: 11 PID: 17701 Comm: gbased Not tainted
> 3.10.0-957.el7.hg.3.x86_64 #1
> [18317578.683318] Hardware name: Sugon H620-G30/65N32-US, BIOS 0QL1001207
> 03/03/2021
> [18317578.683320] Call Trace:
> [18317578.683330]  [<ffffffff86161de9>] dump_stack+0x19/0x1b
> [18317578.683334]  [<ffffffff8615c812>] dump_header+0x90/0x229
> [18317578.683339]  [<ffffffff85bba2f4>] oom_kill_process+0x254/0x3d0
> [18317578.683342]  [<ffffffff85bb9d63>] ? oom_unkillable_task+0x93/0x120
> [18317578.683345]  [<ffffffff85bb9e46>] ? find_lock_task_mm+0x56/0xc0
> [18317578.683347]  [<ffffffff85bbab36>] out_of_memory+0x4b6/0x4f0
> [18317578.683350]  [<ffffffff8615d316>] __alloc_pages_slowpath+0x5d6/0x724
> [18317578.683353]  [<ffffffff85bc0f15>] __alloc_pages_nodemask+0x405/0x420
> [18317578.683357]  [<ffffffff85c11185>] alloc_pages_vma+0xb5/0x200
> [18317578.683361]  [<ffffffff85bce3d0>] shmem_alloc_page+0x70/0xc0
> [18317578.683366]  [<ffffffff85ac2dab>] ? autoremove_wake_function+0x2b/0x40
> [18317578.683369]  [<ffffffff85acbb1b>] ? __wake_up_common+0x5b/0x90
> [18317578.683374]  [<ffffffff85d7c6c4>] ? __radix_tree_lookup+0x84/0xf0
> [18317578.683377]  [<ffffffff85da00ea>] ? __percpu_counter_compare+0x2a/0x90
> [18317578.683379]  [<ffffffff85bd12e1>] shmem_getpage_gfp+0x451/0x840
> [18317578.683382]  [<ffffffff85bd19a4>] shmem_write_begin+0x54/0x80
> [18317578.683384]  [<ffffffff85bb5d94>] generic_file_buffered_write+0x124/0x2c0
> [18317578.683386]  [<ffffffff85bb86d2>] __generic_file_aio_write+0x1e2/0x400
> [18317578.683389]  [<ffffffff85bb8949>] generic_file_aio_write+0x59/0xa0
> [18317578.683392]  [<ffffffff85c40633>] do_sync_write+0x93/0xe0
> [18317578.683395]  [<ffffffff85c41120>] vfs_write+0xc0/0x1f0
> [18317578.683397]  [<ffffffff85c41f3f>] SyS_write+0x7f/0xf0
> [18317578.683401]  [<ffffffff86174ddb>] system_call_fastpath+0x22/0x27
> [18317578.683402] Mem-Info:
> [18317578.683486] active_anon:59939847 inactive_anon:3882578 isolated_anon:0
> ...
> (2)the call trace shows this process was killed due to the "oom", we suspect if
> at the time this process being kill, its other threads(other than the
> "SyS_write" thread that the call trace shows) was doing concurrent rmdir or
> mkdir under the parent direcotry, the kill will cause the corrupted i_nlink of
> the parent directory, and we simulate this "oom" situation where multithread do
> concurrent mkdir and rmdir under parent directory, but the problem can not
> reproduce at all.
> (3)the dmesg file also shows an error related to "power saving mode":
> ...
> [23647870.874579] Uhhuh. NMI received for unknown reason 3d on CPU 56.
> [23647870.874624] Do you have a strange power saving mode enabled?
> [23647870.874650] Dazed and confused, but trying to continue
> ...
> (4)we are simulating this "power saving mode" error to determine if this can
> cause the corrupted i_nlink problem, this is in progressing.
> (5)the problematic environment now repaired by hand throught the xfs_db tool,
> we manually modify the corrupted i_nlink of the parent directory to the correct
> value.
> (6)in short, by now we still confusing why the corrupted i_nlink of the parent
> can happen.
> 
> 
> 4. attachment descriptions
> (1)the screenshot of the problematic environment that shows the corrupted
> i_nlink of the parent directory.
> (2)the dmesg file.
> 
> 
> 5. other informations
> (1)the similar problem that caused on ext4 filesystem:
> https://lkml.kernel.org/lkml/4febf11b-31ea-82a1-bf08-b6bebe08bc75@huawei.com/T/
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 216110] rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-06-22 23:08 ` [Bug 216110] New: " Darrick J. Wong
@ 2022-06-22 23:08 ` bugzilla-daemon
  2022-06-22 23:39 ` bugzilla-daemon
  2023-06-15  3:35 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-06-22 23:08 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

--- Comment #4 from Darrick J. Wong (djwong@kernel.org) ---
On Fri, Jun 10, 2022 at 08:27:38AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216110
> 
>             Bug ID: 216110
>            Summary: rmdir sub directory cause i_nlink of parent directory
>                     down from 0 to 0xffffffff
>            Product: File System
>            Version: 2.5
>     Kernel Version: linux-3.10.0-957.el7

Please contact your RHEL7   ^^^^^^^^^^^^^^ account representative for
assistance in triaging this bug.

--D

>           Hardware: Other
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@kernel-bugs.kernel.org
>           Reporter: hexiaole1994@126.com
>         Regression: No
> 
> 1. synptom
> when user executed mkdir command under parent directory, mkdir command
> prompted
> "Too many links".
> 
> 
> 2. basic analysis
> (1)use "getconf LINK_MAX ." under parent directory, the max i_nlink of the
> xfs(the filesystem that parent directory belongs) is 2147483647, but the
> i_nlink of the parent directory now is 4294967109, because the mkdir command
> will check if the i_nlink of the parent directory is lower than the LINK_MAX,
> in our environment this check failed, so mkdir command prompt "Too many
> links".
> (2)we "cd" into the parent directory, and execute "ls|wc" to accounting the
> total files of the parent directory, the result is 308875
> (3)the i_nlink by definition is "the number of links to the inode from
> directories", a newly created directory has i_nlink of 2, and the i_nlink of
> this newly created directory will plus 1 once there has a sub directory
> created
> under it(the sub directory's ".." points to parent directory cause the
> i_nlink
> of the parent directory plus 1), so the i_nlink of the parent directory can
> also reflect the number of the sub directories(the number of sub directory =
> i_nlink of the parent - 2). the i_nlink of the parent directory now is
> 4294967109, if this i_nlink is valid, the number of the sub directoryes might
> be 4294967109, but like the (2) shows, the total files(include directories)
> under the parent directory is 308875. so we can assert the i_nlink metadata
> of
> the parent direcotry was corrupted.
> (4)in the dmesg file of the sos_report, we saw an call trace that related to
> this corrupted i_nlink of parent directory:
> ...
> [26038585.616782] ------------[ cut here ]------------
> [26038585.616794] WARNING: CPU: 22 PID: 21088 at fs/inode.c:284
> drop_nlink+0x3e/0x50
> [26038585.616796] Modules linked in: binfmt_misc tcp_diag inet_diag 8021q
> garp
> mrp stp llc bonding vfat fat ipmi_ssif amd64_edac_mod edac_mce_amd kvm joydev
> irqbypass ses enclosure pcspkr scsi_transport_sas sg ipmi_si ipmi_devintf
> ipmi_msghandler i2c_piix4 acpi_cpufreq ip_tables xfs libcrc32c sd_mod
> crc_t10dif crct10dif_generic crct10dif_common ast crc32c_intel drm_kms_helper
> syscopyarea sysfillrect igb ixgbe sysimgblt fb_sys_fops ttm i2c_algo_bit mdio
> ptp drm pps_core megaraid_sas dca drm_panel_orientation_quirks ahci libahci
> libata nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
> [26038585.616850] CPU: 22 PID: 21088 Comm: gbased Not tainted
> 3.10.0-957.el7.hg.3.x86_64 #1
> [26038585.616851] Hardware name: Sugon H620-G30/65N32-US, BIOS 0QL1001207
> 03/03/2021
> [26038585.616853] Call Trace:
> [26038585.616861]  [<ffffffff86161de9>] dump_stack+0x19/0x1b
> [26038585.616866]  [<ffffffff85a976c8>] __warn+0xd8/0x100
> [26038585.616868]  [<ffffffff85a9780d>] warn_slowpath_null+0x1d/0x20
> [26038585.616870]  [<ffffffff85c5df5e>] drop_nlink+0x3e/0x50
> [26038585.616904]  [<ffffffffc03f5d08>] xfs_droplink+0x28/0x60 [xfs]
> [26038585.616927]  [<ffffffffc03f922f>] xfs_remove+0x29f/0x310 [xfs]
> [26038585.616930]  [<ffffffff85c595a0>] ? take_dentry_name_snapshot+0xf0/0xf0
> [26038585.616951]  [<ffffffffc03f3bb7>] xfs_vn_unlink+0x57/0xa0 [xfs]
> [26038585.616953]  [<ffffffff85c4dcac>] vfs_rmdir+0xdc/0x150
> [26038585.616956]  [<ffffffff85c53151>] do_rmdir+0x1f1/0x220
> [26038585.616959]  [<ffffffff85c436be>] ? ____fput+0xe/0x10
> [26038585.616964]  [<ffffffff85abe820>] ? task_work_run+0xc0/0xe0
> [26038585.616966]  [<ffffffff85c54386>] SyS_rmdir+0x16/0x20
> [26038585.616970]  [<ffffffff86174ddb>] system_call_fastpath+0x22/0x27
> [26038585.616972] ---[ end trace 23639deaf902c67e ]---
> ...
> (5)the call trace is from the "WARN_ON" function below:
> void drop_nlink(struct inode *inode)
> {
>         WARN_ON(inode->i_nlink == 0);
>         inode->__i_nlink--;
>         if (!inode->i_nlink)
>                 atomic_long_inc(&inode->i_sb->s_remove_count);
> }
> (6)the call trace above shows at some time earlier, the i_nlink of the parent
> direcotry substracted from 0 by 1, because the i_nlink is 32-bit unsigned
> int,
> it became 0xffffffff, and from then, the parent direcory can only decreasing
> the i_nlink rather than increasing due to the LINK_MAX.
> 
> 
> 3. the root cause of corrupted i_nlink of parent directory
> (1)we saw another call trace in dmesg file of the same process that cause the
> call trace of "SyS_rmdir" above:
> ...
> [18317578.683304] gbased invoked oom-killer: gfp_mask=0x200da, order=0,
> oom_score_adj=0
> [18317578.683311] gbased cpuset=/ mems_allowed=0-7
> [18317578.683315] CPU: 11 PID: 17701 Comm: gbased Not tainted
> 3.10.0-957.el7.hg.3.x86_64 #1
> [18317578.683318] Hardware name: Sugon H620-G30/65N32-US, BIOS 0QL1001207
> 03/03/2021
> [18317578.683320] Call Trace:
> [18317578.683330]  [<ffffffff86161de9>] dump_stack+0x19/0x1b
> [18317578.683334]  [<ffffffff8615c812>] dump_header+0x90/0x229
> [18317578.683339]  [<ffffffff85bba2f4>] oom_kill_process+0x254/0x3d0
> [18317578.683342]  [<ffffffff85bb9d63>] ? oom_unkillable_task+0x93/0x120
> [18317578.683345]  [<ffffffff85bb9e46>] ? find_lock_task_mm+0x56/0xc0
> [18317578.683347]  [<ffffffff85bbab36>] out_of_memory+0x4b6/0x4f0
> [18317578.683350]  [<ffffffff8615d316>] __alloc_pages_slowpath+0x5d6/0x724
> [18317578.683353]  [<ffffffff85bc0f15>] __alloc_pages_nodemask+0x405/0x420
> [18317578.683357]  [<ffffffff85c11185>] alloc_pages_vma+0xb5/0x200
> [18317578.683361]  [<ffffffff85bce3d0>] shmem_alloc_page+0x70/0xc0
> [18317578.683366]  [<ffffffff85ac2dab>] ? autoremove_wake_function+0x2b/0x40
> [18317578.683369]  [<ffffffff85acbb1b>] ? __wake_up_common+0x5b/0x90
> [18317578.683374]  [<ffffffff85d7c6c4>] ? __radix_tree_lookup+0x84/0xf0
> [18317578.683377]  [<ffffffff85da00ea>] ? __percpu_counter_compare+0x2a/0x90
> [18317578.683379]  [<ffffffff85bd12e1>] shmem_getpage_gfp+0x451/0x840
> [18317578.683382]  [<ffffffff85bd19a4>] shmem_write_begin+0x54/0x80
> [18317578.683384]  [<ffffffff85bb5d94>]
> generic_file_buffered_write+0x124/0x2c0
> [18317578.683386]  [<ffffffff85bb86d2>] __generic_file_aio_write+0x1e2/0x400
> [18317578.683389]  [<ffffffff85bb8949>] generic_file_aio_write+0x59/0xa0
> [18317578.683392]  [<ffffffff85c40633>] do_sync_write+0x93/0xe0
> [18317578.683395]  [<ffffffff85c41120>] vfs_write+0xc0/0x1f0
> [18317578.683397]  [<ffffffff85c41f3f>] SyS_write+0x7f/0xf0
> [18317578.683401]  [<ffffffff86174ddb>] system_call_fastpath+0x22/0x27
> [18317578.683402] Mem-Info:
> [18317578.683486] active_anon:59939847 inactive_anon:3882578 isolated_anon:0
> ...
> (2)the call trace shows this process was killed due to the "oom", we suspect
> if
> at the time this process being kill, its other threads(other than the
> "SyS_write" thread that the call trace shows) was doing concurrent rmdir or
> mkdir under the parent direcotry, the kill will cause the corrupted i_nlink
> of
> the parent directory, and we simulate this "oom" situation where multithread
> do
> concurrent mkdir and rmdir under parent directory, but the problem can not
> reproduce at all.
> (3)the dmesg file also shows an error related to "power saving mode":
> ...
> [23647870.874579] Uhhuh. NMI received for unknown reason 3d on CPU 56.
> [23647870.874624] Do you have a strange power saving mode enabled?
> [23647870.874650] Dazed and confused, but trying to continue
> ...
> (4)we are simulating this "power saving mode" error to determine if this can
> cause the corrupted i_nlink problem, this is in progressing.
> (5)the problematic environment now repaired by hand throught the xfs_db tool,
> we manually modify the corrupted i_nlink of the parent directory to the
> correct
> value.
> (6)in short, by now we still confusing why the corrupted i_nlink of the
> parent
> can happen.
> 
> 
> 4. attachment descriptions
> (1)the screenshot of the problematic environment that shows the corrupted
> i_nlink of the parent directory.
> (2)the dmesg file.
> 
> 
> 5. other informations
> (1)the similar problem that caused on ext4 filesystem:
>
> https://lkml.kernel.org/lkml/4febf11b-31ea-82a1-bf08-b6bebe08bc75@huawei.com/T/
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 216110] rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-06-22 23:08 ` [Bug 216110] " bugzilla-daemon
@ 2022-06-22 23:39 ` bugzilla-daemon
  2023-06-15  3:35 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-06-22 23:39 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

Eric Sandeen (sandeen@sandeen.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |sandeen@sandeen.net
         Resolution|---                         |INVALID

--- Comment #5 from Eric Sandeen (sandeen@sandeen.net) ---
I agree with Darrick here, this is the upstream bug tracker, and upstream can't
help users with distribution kernels, especially on older distributions.

-Eric

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 216110] rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff
  2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-06-22 23:39 ` bugzilla-daemon
@ 2023-06-15  3:35 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2023-06-15  3:35 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=216110

swindlerproduct (saxophonebritish@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |saxophonebritish@gmail.com

--- Comment #6 from swindlerproduct (saxophonebritish@gmail.com) ---
As Darrick pointed out, this is the upstream bug tracker, and upstream cannot
assist users with distribution kernels, especially on older distributions.
https://blobopera.io

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-15  3:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10  8:27 [Bug 216110] New: rmdir sub directory cause i_nlink of parent directory down from 0 to 0xffffffff bugzilla-daemon
2022-06-10  8:30 ` [Bug 216110] " bugzilla-daemon
2022-06-10  8:31 ` bugzilla-daemon
2022-06-10 14:44 ` bugzilla-daemon
2022-06-22 23:08 ` [Bug 216110] New: " Darrick J. Wong
2022-06-22 23:08 ` [Bug 216110] " bugzilla-daemon
2022-06-22 23:39 ` bugzilla-daemon
2023-06-15  3:35 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.