Advice needed with file system corruption

* Advice needed with file system corruption
@ 2016-07-14 12:27 Steve Brooks
  2016-07-14 13:05 ` Carlos Maiolino
  2016-08-08 14:11 ` Emmanuel Florac
  0 siblings, 2 replies; 13+ messages in thread
From: Steve Brooks @ 2016-07-14 12:27 UTC (permalink / raw)
  To: xfs

Hi All,

We have a RAID system with file system issues as follows,

50 TB in RAID 6 hosted on an Adaptec 71605 controller using WD4000FYYZ 
drives.

Centos 6.7  2.6.32-642.el6.x86_64   :   xfsprogs-3.1.1-16.el6

While rebuilding a replaced disk, with the file system online and in 
use, the system logs showed multiple entries of;

XFS (sde): Corruption detected. Unmount and run xfs_repair.

[See also at the end of post for a section of XFS related errors in the log]

I unmounted the filesystem and waited for the controller to finish 
rebuilding the array. I then moved the most important data to another 
RAID array on a different server. The data is generated from HPC 
simulations and is not backed up but can be regenerated in needed.

The default el6 "xfs_repair" is in "xfsprogs-3.1.1-16.el6". I notice 
that the "elrepo_testing" repository has a much later version of 
"xfsprogs" namely

  xfsprogs.x86_64 4.3.0-1.el6.elrepo

As far as I understand the user based tools are backwards compatible so 
would it be better to use the "4.3" release of "xfsprogs"instead of the 
default "3.1.1" included in the installation of el6?

I ran an "xfs_repair -nv /dev/sde" for both "3.1.1" and "4.3" and both 
completed successfully showing the repairs that would have taken place. 
I can post these if requested.

The "3.1.1"  version of "xfs_repair -n" ran in 1 minute, 32 seconds

The "4.3"     version of "xfs_repair -n" ran in 50 seconds

So my questions are

[1] Which version of "xfs_repair" should I use to make the repair?

[2] Is there anything I should have done differently?

Many thanks for any advice given it is much appreciated.

Thanks,  Steve

Many blocks (about 20) of code similar to this were repeated in the logs.

Jul  8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00  ................
Jul  8 18:40:17 sraid1v kernel: XFS (sde): Internal error 
xfs_da_do_buf(2) at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 
0xffffffffa0e6e81a
Jul  8 18:40:17 sraid1v kernel:
Jul  8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: 
P           -- ------------    2.6.32-642.el6.x86_64 #1
Jul  8 18:40:17 sraid1v kernel: Call Trace:
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e7b68f>] ? 
xfs_error_report+0x3f/0x50 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? 
xfs_da_read_buf+0x2a/0x30 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e7b6fe>] ? 
xfs_corruption_error+0x5e/0x90 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e6e6fc>] ? 
xfs_da_do_buf+0x6cc/0x770 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? 
xfs_da_read_buf+0x2a/0x30 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffff810154e3>] ? 
native_sched_clock+0x13/0x80
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? 
xfs_da_read_buf+0x2a/0x30 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? 
xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? 
xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e74e05>] ? 
xfs_dir2_leaf_lookup+0x35/0xf0 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e71306>] ? 
xfs_dir2_isleaf+0x26/0x60 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e71ce4>] ? 
xfs_dir_lookup+0x174/0x190 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e9ea47>] ? 
xfs_lookup+0x87/0x110 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0eabd74>] ? 
xfs_vn_lookup+0x54/0xa0 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffff811a9ca5>] ? do_lookup+0x1a5/0x230
Jul  8 18:40:17 sraid1v kernel: [<ffffffff811aa823>] ? 
__link_path_walk+0x763/0x1060
Jul  8 18:40:17 sraid1v kernel: [<ffffffff811ab3da>] ? path_walk+0x6a/0xe0
Jul  8 18:40:17 sraid1v kernel: [<ffffffff811ab5eb>] ? 
filename_lookup+0x6b/0xc0
Jul  8 18:40:17 sraid1v kernel: [<ffffffff8123ac46>] ? 
security_file_alloc+0x16/0x20
Jul  8 18:40:17 sraid1v kernel: [<ffffffff811acac4>] ? 
do_filp_open+0x104/0xd20
Jul  8 18:40:17 sraid1v kernel: [<ffffffffa0e9a4fc>] ? 
_xfs_trans_commit+0x25c/0x310 [xfs]
Jul  8 18:40:17 sraid1v kernel: [<ffffffff812a749a>] ? 
strncpy_from_user+0x4a/0x90
Jul  8 18:40:17 sraid1v kernel: [<ffffffff811ba252>] ? alloc_fd+0x92/0x160
Jul  8 18:40:17 sraid1v kernel: [<ffffffff81196bd7>] ? 
do_sys_open+0x67/0x130
Jul  8 18:40:17 sraid1v kernel: [<ffffffff81196ce0>] ? sys_open+0x20/0x30
Jul  8 18:40:17 sraid1v kernel: [<ffffffff8100b0d2>] ? 
system_call_fastpath+0x16/0x1b
Jul  8 18:40:17 sraid1v kernel: XFS (sde): Corruption detected. Unmount 
and run xfs_repair
Jul  8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00  ................
Jul  8 18:40:17 sraid1v kernel: XFS (sde): Internal error 
xfs_da_do_buf(2) at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 
0xffffffffa0e6e81a
Jul  8 18:40:17 sraid1v kernel:
Jul  8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: 
P           -- ------------    2.6.32-642.el6.x86_64 #1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread