Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.

* Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
@ 2014-07-02  9:57 Carlos E. R.
  2014-07-02 12:04 ` Brian Foster
  2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
  0 siblings, 2 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-02  9:57 UTC (permalink / raw)
  To: XFS mail list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I got this error:

<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] 
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875 Comm: kworker/0:2 Tainted: P           O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390]  0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391]  ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403]  [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408]  [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411]  [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)

Brief description:

  * It happens only on restore from hibernation.
  * It happens randomly, spaced a month or two.
  * It happens always on the same partition, the one that holds /home
    (I have 10 XFS partitions spread on 4 internal hard disks, and a few
    more external). It is a new disk, 2 TB, traditional MBR partitions.
  * Disk has no defects, or at least so says smartctl long test.
  * When it happens, recovery is impossible: xfs_repair does not seem to
    find anything, or maybe it does, silently; but on system reuse,
    it crashes again, fast.
  * Thus recovery procedure is to use "xfsdump" to get a backup copy,
    reformat the partition, and recover the files with xfsrestore.

The worst issue for me is that "xfs_repair" fails to repair it.

I do not have more info than what appears on the logs, but four times 
(two different kernels):

cer@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
cer@Telcontar:~>

The first time that this happened I used a rescue usb stick (openSUSE 
13.1 xfce). xfs_repair said to mount the partition to force re-play the 
log. When I did, mount hung. It was unkillable. Reboot of system hung. I 
then used "xfs_repair -L" on that disk, which succeeded with no 
error report. On reuse, the system crashed soon: you can see above two 
entries on the same day.

This last time, I simply rebooted to runlevel 3, logon as root, perform 
the backup, format, restore. No testing, I was in a real hurry, and even 
so took hours.

I suppose that to diagnose this further you will want data extracted from 
the filesystem: you have to tell me what operations to perform to obtain 
that data the next time it happens, without me having to ask here for your 
help. It may happen tomorrow, or in two months time, so I have to be 
prepared for it. And as usual, it may happen at the worst time, when I 
have work to be done in a hurry, as this last time (or I would have asked 
you).

The only data I have is the system logs.

I don't suppose that the "xfs_dump" archive contains anything of interest?

- From what I have googled, one suspect is something wrong in that 
partition. It was created using gparted, as the rest of the disk. This 
last time I used "YaST" to reformat it, not mkfs.xfs.

Wait! I have a "dd" copy of the entire partition (500 GB), made on March 
16th, 5 AM, so hard data could be obtained from there. I had 
forgotten. I'll get something for you now:

Telcontar:/data/storage_d/old_backup # xfs_info xfs_copy_home
meta-data=/dev/sdf2              isize=256    agcount=4, agsize=122341568 
blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=489366272, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=238948, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #

I could do a "xfs_metadump" on it - just tell me what options to use, and 
where can the result be uploaded to, if big.

Current versions:

Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux

xfs_repair version 3.1.11

CPU:  Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz

System:  openSUSE Linux 13.1, 64 bit.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlOz14UACgkQtTMYHG2NR9XWLgCfRXInLwE/FrToinuYjpgWQyu6
dA4AnjAP0DdUvOnsdZfLVaI7wm+c7U0N
=vxuS
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread