All of lore.kernel.org
 help / color / mirror / Atom feed
* Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
@ 2014-07-02  9:57 Carlos E. R.
  2014-07-02 12:04 ` Brian Foster
  2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
  0 siblings, 2 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-02  9:57 UTC (permalink / raw)
  To: XFS mail list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Hi,

I got this error:


<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] 
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875 Comm: kworker/0:2 Tainted: P           O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390]  0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391]  ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403]  [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408]  [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411]  [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)


Brief description:


  * It happens only on restore from hibernation.
  * It happens randomly, spaced a month or two.
  * It happens always on the same partition, the one that holds /home
    (I have 10 XFS partitions spread on 4 internal hard disks, and a few
    more external). It is a new disk, 2 TB, traditional MBR partitions.
  * Disk has no defects, or at least so says smartctl long test.
  * When it happens, recovery is impossible: xfs_repair does not seem to
    find anything, or maybe it does, silently; but on system reuse,
    it crashes again, fast.
  * Thus recovery procedure is to use "xfsdump" to get a backup copy,
    reformat the partition, and recover the files with xfsrestore.


The worst issue for me is that "xfs_repair" fails to repair it.

I do not have more info than what appears on the logs, but four times 
(two different kernels):

cer@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
cer@Telcontar:~>


The first time that this happened I used a rescue usb stick (openSUSE 
13.1 xfce). xfs_repair said to mount the partition to force re-play the 
log. When I did, mount hung. It was unkillable. Reboot of system hung. I 
then used "xfs_repair -L" on that disk, which succeeded with no 
error report. On reuse, the system crashed soon: you can see above two 
entries on the same day.

This last time, I simply rebooted to runlevel 3, logon as root, perform 
the backup, format, restore. No testing, I was in a real hurry, and even 
so took hours.


I suppose that to diagnose this further you will want data extracted from 
the filesystem: you have to tell me what operations to perform to obtain 
that data the next time it happens, without me having to ask here for your 
help. It may happen tomorrow, or in two months time, so I have to be 
prepared for it. And as usual, it may happen at the worst time, when I 
have work to be done in a hurry, as this last time (or I would have asked 
you).

The only data I have is the system logs.

I don't suppose that the "xfs_dump" archive contains anything of interest?

- From what I have googled, one suspect is something wrong in that 
partition. It was created using gparted, as the rest of the disk. This 
last time I used "YaST" to reformat it, not mkfs.xfs.



Wait! I have a "dd" copy of the entire partition (500 GB), made on March 
16th, 5 AM, so hard data could be obtained from there. I had 
forgotten. I'll get something for you now:


Telcontar:/data/storage_d/old_backup # xfs_info xfs_copy_home
meta-data=/dev/sdf2              isize=256    agcount=4, agsize=122341568 
blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=489366272, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=238948, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #


I could do a "xfs_metadump" on it - just tell me what options to use, and 
where can the result be uploaded to, if big.



Current versions:

Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux

xfs_repair version 3.1.11

CPU:  Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz

System:  openSUSE Linux 13.1, 64 bit.


- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlOz14UACgkQtTMYHG2NR9XWLgCfRXInLwE/FrToinuYjpgWQyu6
dA4AnjAP0DdUvOnsdZfLVaI7wm+c7U0N
=vxuS
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-02  9:57 Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue Carlos E. R.
@ 2014-07-02 12:04 ` Brian Foster
  2014-07-02 13:07   ` Mark Tinguely
  2014-07-03  3:00   ` Carlos E. R.
  2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
  1 sibling, 2 replies; 56+ messages in thread
From: Brian Foster @ 2014-07-02 12:04 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mail list

On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> Hi,
> 
> I got this error:
> 
> 
> <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
> <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] <0.4>
> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875
> Comm: kworker/0:2 Tainted: P           O 3.11.10-11-desktop #1
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390]  0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391]  ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403]  [<ffffffff81004a28>] dump_trace+0x88/0x310
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408]  [<ffffffff810061bc>] show_stack+0x1c/0x50
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411]  [<ffffffff815a0252>] dump_stack+0x50/0x89
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
> <0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
> 

This is the background eofblocks scanner attempting to free preallocated
space on a file. The scanner looks for files that have been recently
grown and since been flushed to disk (i.e., no longer concurrently being
written to) and trims the post-eof preallocation that comes along with
growing files.

The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
extent we are attempting to free is already accounted for in the
by-block allocation btree. IOW, this is attempting to free an extent
that the allocation metadata thinks is already free.

> 
> Brief description:
> 
> 
>  * It happens only on restore from hibernation.

Interesting, could you elaborate a bit more on the behavior this system
is typically subjected to? i.e., is this a server that sees a constant
workload that is also frequently hibernated/awakened?

>  * It happens randomly, spaced a month or two.
>  * It happens always on the same partition, the one that holds /home
>    (I have 10 XFS partitions spread on 4 internal hard disks, and a few
>    more external). It is a new disk, 2 TB, traditional MBR partitions.
>  * Disk has no defects, or at least so says smartctl long test.
>  * When it happens, recovery is impossible: xfs_repair does not seem to
>    find anything, or maybe it does, silently; but on system reuse,
>    it crashes again, fast.
>  * Thus recovery procedure is to use "xfsdump" to get a backup copy,
>    reformat the partition, and recover the files with xfsrestore.
> 
> 
> The worst issue for me is that "xfs_repair" fails to repair it.
> 
> I do not have more info than what appears on the logs, but four times (two
> different kernels):
> 
> cer@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
> /var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> /var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> /var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> /var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
> cer@Telcontar:~>
> 
> 
> The first time that this happened I used a rescue usb stick (openSUSE 13.1
> xfce). xfs_repair said to mount the partition to force re-play the log. When
> I did, mount hung. It was unkillable. Reboot of system hung. I then used
> "xfs_repair -L" on that disk, which succeeded with no error report. On
> reuse, the system crashed soon: you can see above two entries on the same
> day.
> 
> This last time, I simply rebooted to runlevel 3, logon as root, perform the
> backup, format, restore. No testing, I was in a real hurry, and even so took
> hours.
> 

So you have reproduced this, reformatted with mkfs, restored from
backups and continued to reproduce the problem? And still only on this
particular partition?

This is interesting because the corruption appears to be associated with
post-eof space, which is generally transient. The worst case is that
this space is trimmed off files when they are evicted from cache, such
as during a umount. To me, that seems to correlate with a more
recent/runtime problem rather than something that might be lingering on
disk, but we don't really know for sure.

> 
> I suppose that to diagnose this further you will want data extracted from
> the filesystem: you have to tell me what operations to perform to obtain
> that data the next time it happens, without me having to ask here for your
> help. It may happen tomorrow, or in two months time, so I have to be
> prepared for it. And as usual, it may happen at the worst time, when I have
> work to be done in a hurry, as this last time (or I would have asked you).
> 
> The only data I have is the system logs.
> 
> I don't suppose that the "xfs_dump" archive contains anything of interest?
> 
> - From what I have googled, one suspect is something wrong in that
> partition. It was created using gparted, as the rest of the disk. This last
> time I used "YaST" to reformat it, not mkfs.xfs.
> 
> 
> 
> Wait! I have a "dd" copy of the entire partition (500 GB), made on March
> 16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
> get something for you now:
> 
> 
> Telcontar:/data/storage_d/old_backup # xfs_info xfs_copy_home
> meta-data=/dev/sdf2              isize=256    agcount=4, agsize=122341568
> blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=489366272, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=238948, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> Telcontar:/data/storage_d/old_backup #
> 
> 
> I could do a "xfs_metadump" on it - just tell me what options to use, and
> where can the result be uploaded to, if big.
> 

A metadump would be helpful, though that only gives us the on-disk
state. What was the state of this fs at the time the dd image was
created? I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.

Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
obfuscate filenames by default. It should also be compressible. In the
future, it's probably worth grabbing a metadump as a first step (before
repair, zeroing the log, etc.) so we can look at the fs in the state
most recent to the crash.

Brian

> 
> 
> Current versions:
> 
> Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux
> 
> xfs_repair version 3.1.11
> 
> CPU:  Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
> 
> System:  openSUSE Linux 13.1, 64 bit.
> 
> 
> - -- Cheers
>        Carlos E. R.
> 
>        (from 13.1 x86_64 "Bottle" at Telcontar)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iEYEARECAAYFAlOz14UACgkQtTMYHG2NR9XWLgCfRXInLwE/FrToinuYjpgWQyu6
> dA4AnjAP0DdUvOnsdZfLVaI7wm+c7U0N
> =vxuS
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-02 12:04 ` Brian Foster
@ 2014-07-02 13:07   ` Mark Tinguely
  2014-07-03  2:54     ` Carlos E. R.
  2014-07-03  3:00   ` Carlos E. R.
  1 sibling, 1 reply; 56+ messages in thread
From: Mark Tinguely @ 2014-07-02 13:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: Carlos E. R., XFS mail list

On 07/02/14 07:04, Brian Foster wrote:
> On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> Hi,
>>
>> I got this error:
>>
>>
>> <0.6>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
>> <0.6>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
>> <0.1>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
>> <0.1>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346]<0.4>
>> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875
>> Comm: kworker/0:2 Tainted: P           O 3.11.10-11-desktop #1
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390]  0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391]  ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403]  [<ffffffff81004a28>] dump_trace+0x88/0x310
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408]  [<ffffffff810061bc>] show_stack+0x1c/0x50
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411]  [<ffffffff815a0252>] dump_stack+0x50/0x89
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
>> <0.4>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
>> <0.5>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
>> <0.1>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
>> <0.1>  2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
>>
>
> This is the background eofblocks scanner attempting to free preallocated
> space on a file. The scanner looks for files that have been recently
> grown and since been flushed to disk (i.e., no longer concurrently being
> written to) and trims the post-eof preallocation that comes along with
> growing files.
>
> The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
> extent we are attempting to free is already accounted for in the
> by-block allocation btree. IOW, this is attempting to free an extent
> that the allocation metadata thinks is already free.
>
>>
>> Brief description:
>>
>>
>>   * It happens only on restore from hibernation.
>
> Interesting, could you elaborate a bit more on the behavior this system
> is typically subjected to? i.e., is this a server that sees a constant
> workload that is also frequently hibernated/awakened?
>
>>   * It happens randomly, spaced a month or two.
>>   * It happens always on the same partition, the one that holds /home
>>     (I have 10 XFS partitions spread on 4 internal hard disks, and a few
>>     more external). It is a new disk, 2 TB, traditional MBR partitions.
>>   * Disk has no defects, or at least so says smartctl long test.
>>   * When it happens, recovery is impossible: xfs_repair does not seem to
>>     find anything, or maybe it does, silently; but on system reuse,
>>     it crashes again, fast.
>>   * Thus recovery procedure is to use "xfsdump" to get a backup copy,
>>     reformat the partition, and recover the files with xfsrestore.
>>
>>
>> The worst issue for me is that "xfs_repair" fails to repair it.

what version of xfs_repair? Did you try to mount to replay the log 
before repair?

Besides Brian's good advice, is kdump configured to dump vmcore?

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-02 13:07   ` Mark Tinguely
@ 2014-07-03  2:54     ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-03  2:54 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Wednesday, 2014-07-02 at 08:07 -0500, Mark Tinguely wrote:
> On 07/02/14 07:04, Brian Foster wrote:

>>> The worst issue for me is that "xfs_repair" fails to repair it.
>
> what version of xfs_repair?

xfs_repair version 3.1.11


which what comes with openSUSE 13.1

> Did you try to mount to replay the log before 
> repair?

Sure.

This last time, I first tried "umount" the partition, which initially 
failed, because despite being read only, some applications thought they 
had opened files on it (I was already in runlevel 1). I found them with 
lsof, killed them, umounted, mounted, system crash. Had to hit reset 
button on machine.

Reboot machine, and partition is automatically mounted, so the log 
replayed here. umount, repair (finds nothing, as far as I can see), 
backup, format, restore.



> Besides Brian's good advice, is kdump configured to dump vmcore?

I'm not sure I understand the question :-?


If you want me to run the system for a month, waiting for this to happen 
again, in some special kernel debug mode... I don't know if that will be 
feasible :-}


- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO0xewACgkQtTMYHG2NR9WYsQCfTWjvcHB8IJfyXN4jVzHTnh5Q
lOEAn0TPwL03enbn8zrXbIQ9yMfknPi2
=39NE
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-02 12:04 ` Brian Foster
  2014-07-02 13:07   ` Mark Tinguely
@ 2014-07-03  3:00   ` Carlos E. R.
  2014-07-03  9:43     ` Dave Chinner
  2014-07-03 17:39     ` Brian Foster
  1 sibling, 2 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-03  3:00 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:

...

> This is the background eofblocks scanner attempting to free preallocated
> space on a file. The scanner looks for files that have been recently
> grown and since been flushed to disk (i.e., no longer concurrently being
> written to) and trims the post-eof preallocation that comes along with
> growing files.
>
> The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
> extent we are attempting to free is already accounted for in the
> by-block allocation btree. IOW, this is attempting to free an extent
> that the allocation metadata thinks is already free.
>
>>
>> Brief description:
>>
>>
>>  * It happens only on restore from hibernation.
>
> Interesting, could you elaborate a bit more on the behavior this system
> is typically subjected to? i.e., is this a server that sees a constant
> workload that is also frequently hibernated/awakened?

It is a desktop machine I use for work at home. I typically have many 
applications opened on diferent workspaces in XFCE. Say one has terminals, 
another has Thunderbird/Pine, another Firefox, another LibreOffice; 
another may have gimp, another may be kbabel or lokalize, another may have 
vmplayer, etc, whatever. When I go out or go to sleep, I hibernate the 
machine, instead of powering down, because it is much faster than reboot, 
login, and start the wanted applications, and I want to conserve some 
electricity.

I also use the machine for testing configurations, but these I try to do 
on virtual machines, instead of my work partition.


The machine may be used anywhere from 4 to 16 hours a day, and hibernated 
at least once a day, perhaps three times if I have to go out several 
times. It makes no sense to me to leave the machine powered doing nothing, 
if hibernating is so easy and reliable - till now. If I have to leave for 
more than a week, I tend to do a full "halt".



By the way, this started hapening when I replaced an old 500 GB hard disk 
(Seagate ST3500418AS) with a 2 TB new unit (Seagate ST2000DM001-1CH164). 
Smartctl long test says fine (and seatools from Windows, too).



>> I do not have more info than what appears on the logs, but four times (two
>> different kernels):
>>
>> cer@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
>> /var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
>> /var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
>> /var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
>> /var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
>> cer@Telcontar:~>

> So you have reproduced this, reformatted with mkfs, restored from
> backups and continued to reproduce the problem? And still only on this
> particular partition?

Right. Exactly that.

Only that I can not reproduce the issue at will, but about once a month,
randomly.

AFAIK, xfsdump can not carry over a filesystem corruption, right?



**** LONG DESCRIPTION and LOGS start here ********


The first time was on 2014-03-15 03:35:17, instantly after thawing:


<0.7> 2014-03-15 03:35:14 Telcontar kernel - - - [37682.109726] PM: Basic memory bitmaps freed
<3.6> 2014-03-15 03:35:14 Telcontar systemd 1 - -  Time has been changed
<3.4> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - -  The canary thread is apparently starving. Taking action.
<3.6> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - -  Demoting known real-time threads.
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - -  Successfully demoted thread 4175 of process 4168 (/usr/bin/pulseaudio).
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - -  Successfully demoted thread 4174 of process 4168 (/usr/bin/pulseaudio).
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - -  Successfully demoted thread 4168 of process 4168 (/usr/bin/pulseaudio).
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - -  Demoted 3 threads.
<3.6> 2014-03-15 03:35:16 Telcontar acpid - - -  1 client rule loaded
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] 
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111792] CPU: 1 PID: 5245 Comm: thunderbird-bin Tainted: P           O 3.11.10-7-desktop #1
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111793] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111795]  0000000000000002 ffffffff8159ff82 000000000027610d ffffffffa0c53996
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111799]  ffff8802303533c0 ffff8802344e4300 ffff8802263a1f20 0000000000000002
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111801]  0000000000000000 ffff8801a08bfa8c 0000000000000000 0027611300000001
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111804] Call Trace:
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111815]  [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111818]  [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111821]  [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111825]  [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111861]  [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111905]  [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111948]  [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111999]  [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112073]  [<ffffffffa0c4935b>] xfs_setattr_size+0x41b/0x4a0 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112107]  [<ffffffffa0c4940e>] xfs_vn_setattr+0x2e/0x40 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112130]  [<ffffffff811a060c>] notify_change+0x1dc/0x360
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112135]  [<ffffffff811845ee>] do_truncate+0x5e/0x90
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112139]  [<ffffffff81193c53>] do_last+0x253/0xec0
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112142]  [<ffffffff81194976>] path_openat+0xb6/0x670
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112145]  [<ffffffff81195cb5>] do_filp_open+0x35/0x80
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112147]  [<ffffffff81185599>] do_sys_open+0x129/0x210
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112151]  [<ffffffff815adbed>] system_call_fastpath+0x1a/0x1f
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112157]  [<00007f6ec359078d>] 0x7f6ec359078c
<0.5> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112976] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_b
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.163643] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.163648] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<0.4> 2014-03-15 03:35:18 Telcontar kernel - - - [37686.496013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.5> 2014-03-15 03:35:18 Telcontar dbus 1005 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<5.4> 2014-03-15 03:35:18 Telcontar pm-utils - - -  Thawing (95)...
<1.5> 2014-03-15 03:35:22 Telcontar network 11556 - -  redirecting to "systemctl  restart network.service"




I managed to halt somehow, and booted. The log says that the partition 
passes automatic boot tests (excerpted):


<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.173599] XFS (sdd5): Mounting Filesystem
<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.377918] XFS (sdd5): Starting recovery (logdev: internal)
<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.747914] XFS (sdd5): Ending recovery (logdev: internal)


But soon after, it oopses:


<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Starting Default.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Reached target Default.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Startup finished in 57ms.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 1 - -  Started User Manager for 9.
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] 
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857530] CPU: 3 PID: 57 Comm: kworker/3:1 Tainted: P           O 3.11.10-7-desktop #1
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857532] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857570] Workqueue: xfsalloc xfs_bmapi_allocate_worker [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857572]  0000000000000000 ffffffff8159ff82 ffff880192c89080 ffffffffa0c50ee9
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857576]  0000003d30691240 00000000a0c55781 ffff880234917d58 ffff880192c89080
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857579]  000000000000003d 000000000000003d 0000000000000002 0000000000022dab
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857583] Call Trace:
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857583] Call Trace:
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857596]  [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857600]  [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857604]  [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857609]  [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857630]  [<ffffffffa0c50ee9>] xfs_alloc_fixup_trees+0x1f9/0x340 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857689]  [<ffffffffa0c5344e>] xfs_alloc_ag_vextent_near+0x9ee/0xcd0 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857751]  [<ffffffffa0c5408d>] xfs_alloc_ag_vextent+0xbd/0x100 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857810]  [<ffffffffa0c54cd6>] xfs_alloc_vextent+0x4e6/0x740 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857870]  [<ffffffffa0c60447>] xfs_bmap_btalloc+0x2a7/0x7a0 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857937]  [<ffffffffa0c63ecd>] __xfs_bmapi_allocate+0xbd/0x2d0 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.858002]  [<ffffffffa0c64107>] xfs_bmapi_allocate_worker+0x27/0x50 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.858069]  [<ffffffff8106ac68>] process_one_work+0x168/0x490
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.858074]  [<ffffffff8106b904>] worker_thread+0x114/0x3a0
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.858079]  [<ffffffff81071c2f>] kthread+0xaf/0xc0
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.858084]  [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.858095] XFS (sdd5): page discard on page ffffea0005357d98, inode 0x602084fd, offset 339968.
<0.1> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896051] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896051] 
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896056] CPU: 2 PID: 56 Comm: kworker/2:1 Tainted: P           O 3.11.10-7-desktop #1
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896057] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896091] Workqueue: xfsalloc xfs_bmapi_allocate_worker [xfs]
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896093]  0000000000000000 ffffffff8159ff82 ffff880192c89150 ffffffffa0c50ee9
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896096]  0000003c30691240 00000000a0c55781 ffff88023490fd58 ffff880192c89150
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896098]  000000000000003c 000000000000003c 0000000000000002 0000000000022dab
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [  326.896100] Call Trace:


and pages and pages of log entries (which I'm unsure I saw at the time)

Aparently, I logged in text mode, without reboot, and mounted home again 
(perhaps systemd mounted it automatically, I do not remember).  It is 
possible that I did an xfs repair in the interval, it is not logged.



<0.4> 2014-03-15 04:06:09 Telcontar kernel - - - [ 1044.485279]  [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.1> 2014-03-15 04:06:09 Telcontar kernel - - - [ 1044.486104] XFS (sdd5): page discard on page ffffea00053b68e0, inode 0x602084fd, offset 749568.
<3.6> 2014-03-15 04:07:39 Telcontar systemd 1 - -  Starting Session 9 of user root.
<4.6> 2014-03-15 04:07:39 Telcontar systemd-logind 1002 - -  New session 9 of user root.
<10.5> 2014-03-15 04:07:39 Telcontar login - - -  ROOT LOGIN ON tty2
<3.6> 2014-03-15 04:08:01 Telcontar systemd 1 - -  Starting Session 10 of user news.
<0.5> 2014-03-15 04:09:55 Telcontar kernel - - - [ 1270.594691] XFS (sdd5): Mounting Filesystem
<0.6> 2014-03-15 04:09:55 Telcontar kernel - - - [ 1270.681282] XFS (sdd5): Ending clean mount
<3.6> 2014-03-15 04:10:02 Telcontar acpid - - -  1 client rule loaded
<3.6> 2014-03-15 04:11:41 Telcontar acpid - - -  1 client rule loaded
<3.6> 2014-03-15 04:11:47 Telcontar systemd 1 - -  Starting Session 11 of user cer.
<4.6> 2014-03-15 04:11:47 Telcontar systemd-logind 1002 - -  New session 11 of user cer.
<4.6> 2014-03-15 04:11:47 Telcontar systemd-logind 1002 - -  Linked /tmp/.X11-unix/X0 to /run/user/1000/X11-display.
<3.4> 2014-03-15 04:11:47 Telcontar kdm - - -  :0 '[5904]: Cannot update authorization file in home dir /home/cer
<3.3> 2014-03-15 04:11:47 Telcontar kdm - - -  :0 '[5904]: Cannot chdir to cer's home /home/cer: No such file or directory


But as you can see, despite it saying that it was a "clean mount", my 
"/home/cer/", ie, my HOME, is not visible.


<0.5> 2014-03-15 04:12:03 Telcontar kernel - - - [ 1397.853848] XFS (sdd5): Mounting Filesystem
<0.6> 2014-03-15 04:12:03 Telcontar kernel - - - [ 1397.932327] XFS (sdd5): Ending clean mount
<3.6> 2014-03-15 04:12:25 Telcontar systemd 1 - -  Starting Getty on tty3...
<3.6> 2014-03-15 04:12:25 Telcontar systemd 1 - -  Started Getty on tty3.
<3.6> 2014-03-15 04:12:29 Telcontar systemd 1 - -  Starting Session 12 of user cer.
<4.6> 2014-03-15 04:12:29 Telcontar systemd-logind 1002 - -  New session 12 of user cer.
<10.6> 2014-03-15 04:12:29 Telcontar login - - -  LOGIN ON tty3 BY cer


and this time I apparently managed to log in graphical mode:


<3.6> 2014-03-15 04:13:24 Telcontar systemd 1 - -  Starting Session 14 of user cer.
<4.6> 2014-03-15 04:13:24 Telcontar systemd-logind 1002 - -  New session 14 of user cer.
<4.6> 2014-03-15 04:13:24 Telcontar systemd-logind 1002 - -  Linked /tmp/.X11-unix/X0 to /run/user/1000/X11-display.
<23.4> 2014-03-15 04:13:24 Telcontar checkproc - - -  checkproc: can not get session id for process 4131!
<4.5> 2014-03-15 04:13:25 Telcontar gnome-keyring-daemon 6210 - -  Gkm: using old keyring directory: /home/cer/.gnome2/keyrings
<4.5> 2014-03-15 04:13:25 Telcontar gnome-keyring-daemon 6210 - -  Gkm: using old keyring directory: /home/cer/.gnome2/keyrings


Being late, and confident that the issue was solved (which was wrong, I 
maybe did not see those XFS_WANT_CORRUPTED_RETURN above), I hibernated:


<5.4> 2014-03-15 04:23:41 Telcontar pm-utils - - -  Hibernating (1)...
<1.5> 2014-03-15 04:23:41 Telcontar network 7779 - -  redirecting to "systemctl --signal=9 kill network.service"

... next morning:

<5.4> 2014-03-15 13:23:41 Telcontar pm-utils - - -  Thawing (95)...

... afternoon:

<5.4> 2014-03-15 17:50:45 Telcontar pm-utils - - -  Hibernating (1)...
...
<5.4> 2014-03-15 19:47:58 Telcontar pm-utils - - -  Thawing (95)...


... again once more, and crash!


<5.4> 2014-03-15 20:20:56 Telcontar pm-utils - - -  Hibernating (1)...
...
<5.4> 2014-03-15 22:20:21 Telcontar pm-utils - - -  Thawing (95)...
<5.4> 2014-03-15 22:20:32 Telcontar pm-utils - - -  Thawing (1)...
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] 
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298351] CPU: 0 PID: 28877 Comm: kworker/0:7 Tainted: P           O 3.11.10-7-desktop #1
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298353] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298388] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298391]  0000000000000000 ffffffff8159ff82 0000000000007121 ffffffffa0c53996
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298395]  ffff880151e21cc0 ffff880234093600 ffff88023016bbe0 0000000000000000
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298398]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298402] Call Trace:
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298415]  [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298419]  [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298423]  [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298428]  [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298449]  [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298511]  [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298571]  [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298643]  [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298734]  [<ffffffffa0c4e633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298786]  [<ffffffffa0c441ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298828]  [<ffffffffa0c42f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298868]  [<ffffffffa0c43a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298909]  [<ffffffffa0c43d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298937]  [<ffffffff8106ac68>] process_one_work+0x168/0x490
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298942]  [<ffffffff8106b904>] worker_thread+0x114/0x3a0
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298946]  [<ffffffff81071c2f>] kthread+0xaf/0xc0
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298952]  [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.5> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298959] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_b
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.331745] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.331748] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<4.5> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  GLib-GObject: invalid unclassed pointer in cast to 'GkmObject'
<4.3> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  Gkm: gkm_object_expose_full: assertion 'GKM_IS_OBJECT (self)' failed
<4.5> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.5> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  Gkm: couldn't create temporary file for: /home/cer/.gnome2/keyrings/login_1.keyring: Input/output error
<4.4> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - -  couldn't create login keyring: An error occurred on the device
<10.3> 2014-03-15 22:20:40 Telcontar unix2_chkpwd - - -  gkr-pam: the password for the login keyring was invalid.
<0.4> 2014-03-15 22:20:50 Telcontar kernel - - - [20168.032019] XFS (sdd5): xfs_log_force: error 5 returned.
<5.4> 2014-03-15 22:20:57 Telcontar router - - -  (Thawing 1) Logging the current IP= 83.41.119.142
<0.4> 2014-03-15 22:21:20 Telcontar kernel - - - [20198.112018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:21:50 Telcontar kernel - - - [20228.192016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:22:21 Telcontar kernel - - - [20258.272013] XFS (sdd5): xfs_log_force: error 5 returned.
<10.5> 2014-03-15 22:22:31 Telcontar polkitd 4115 - -  Unregistered Authentication Agent for unix-session:14 (system bus name :1.93, object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8
<3.3> 2014-03-15 22:22:37 Telcontar kdm 3931 - -  X server for display :0 terminated unexpectedly
<3.4> 2014-03-15 22:22:37 Telcontar kdm - - -  :0[31291]: Cannot update authorization file in home dir /home/cer
<0.7> 2014-03-15 22:22:37 Telcontar kernel - - - [20275.208508] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
<3.6> 2014-03-15 22:22:38 Telcontar acpid - - -  1 client rule loaded
<0.4> 2014-03-15 22:22:51 Telcontar kernel - - - [20288.352018] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-03-15 22:23:01 Telcontar systemd 1 - -  Starting Session 126 of user news.
<0.4> 2014-03-15 22:23:21 Telcontar kernel - - - [20318.432014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:23:51 Telcontar kernel - - - [20348.512013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:24:21 Telcontar kernel - - - [20378.592014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:24:51 Telcontar kernel - - - [20408.672014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - -  Stopping User Manager for 9...
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - -  Stopping Disk Manager...
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - -  Stopping Daemon for power management...
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - -  Stopping Bluetooth service...


I was attemtping to go to reboot, I think.


<3.6> 2014-03-15 22:25:20 Telcontar systemd 1 - -  Starting Rescue Shell...
<3.6> 2014-03-15 22:25:20 Telcontar systemd 1 - -  Started Rescue Shell.
<3.6> 2014-03-15 22:20:19 Telcontar systemd 3976 - -  message repeated 3 times: [ Time has been changed]
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - -  Stopping Default.
<3.6> 2014-03-15 22:20:19 Telcontar systemd 4987 - -  message repeated 3 times: [ Time has been changed]
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - -  Stopping Default.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - -  Stopped target Default.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - -  Stopped target Default.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - -  Starting Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - -  Starting Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - -  Reached target Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - -  Reached target Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - -  Starting Exit the Session...
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - -  Starting Exit the Session...
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920075] type=1131 audit(1394918720.685:1133): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920075]  msg=' comm="auditd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920273] type=1131 audit(1394918720.685:1134): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920273]  msg=' comm="systemd-logind" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920490] type=1131 audit(1394918720.685:1135): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920490]  msg=' comm="smb" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525253] type=1131 audit(1394918721.290:1136): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525253]  msg=' comm="cron" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525643] type=1131 audit(1394918721.290:1137): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525643]  msg=' comm="avahi-daemon" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525937] type=1131 audit(1394918721.290:1138): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525937]  msg=' comm="console-kit-daemon" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526359] type=1131 audit(1394918721.291:1139): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526359]  msg=' comm="polkit" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526577] type=1131 audit(1394918721.291:1140): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526577]  msg=' comm="rtkit-daemon" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.527021] type=1131 audit(1394918721.292:1141): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.527021]  msg=' comm="bluetooth" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.4> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.752008] XFS (sdd5): xfs_log_force: error 5 returned.
<5.6> 2014-03-15 22:25:22 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1067" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-03-15 22:25:23+01:00 - Halting the system now  =========================================== uptime:  22:25pm  up  18:36,  2 users,  load average: 2.08, 1.04, 0.78
2014-03-15 22:25:31+01:00 - Booting the system now  ================================================================================  Linux Telcontar 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC 
<5.6> 2014-03-15 22:25:39 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="32300" x-info="http://www.rsyslog.com"] start
<3.6> 2014-03-15 22:25:39 Telcontar systemd 1 - -  Stopping Rescue Shell...


This time, the system detects problems:


<0.4> 2014-03-15 22:25:51 Telcontar kernel - - - [20468.832024] XFS (sdd5): xfs_log_force: error 5 returned.
...
<3.6> 2014-03-15 22:26:16 Telcontar systemd 1 - -  Started Console Manager.
<10.5> 2014-03-15 22:26:16 Telcontar login - - -  ROOT LOGIN ON tty1
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - -  Mounted /sys/fs/fuse/connections.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - -  Stopped target Sound Card.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - -  Starting Default.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - -  Reached target Default.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - -  Startup finished in 316ms.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 1 - -  Started User Manager for 0.
<0.4> 2014-03-15 22:26:21 Telcontar kernel - - - [20498.912018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:26:51 Telcontar kernel - - - [20528.992014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:27:21 Telcontar kernel - - - [20559.072014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:27:51 Telcontar kernel - - - [20589.152013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-03-15 22:28:01 Telcontar systemd 1 - -  Starting user-9.slice.


But aparently I decided to abort:


2014-03-15 22:28:03+01:00 - Halting the system now  =========================================== uptime:  22:28pm  up  18:39,  0 users,  load average: 0.70, 1.40, 1.01
2014-03-16 14:07:21+01:00 - Booting the system now  ================================================================================  Linux Telcontar 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC


Judging from the time of the next boot, I guess that it was here that I 
decided to use the live system and reformat.

    The cloned image I have of the filesystem is dated Mar 16 05:42, so it
    was made somewhere here - at late hours, you see, if I started to
    attempt recovery at 22:30 (I used dd, rsync, and xfsdump, so that took
    time).

    Unfortunately, I do not remember where I placed my notes on the repair
    procedure, so I do not know for certain at which point in my attempts
    to repair I took the photo.  Seeing that I probably started around
    midnight, and the file is dated 05:42, I guess I did it too late.  But
    that surprises me, as I'm absolutely sure I took the photo to be able
    to provide it for investigation.

As it was evident by now that xfsrepair failed to repair the partition, 
which crashed soon after "repair", and as it was mountable, I decided to 
do an both an rsync copy and an xfsdump copy.  I then reformatted the 
affected partition, but I don't remember if I used gparted (probably) or 
mkfs.xfs, and when done, I copied back the data from the backup made just 
an hour before, with xfsrestore.  I remember I also used rsync to verify 
the copy, and it was correct.



And the procedure succeeded:

<0.5> 2014-03-16 14:07:23 Telcontar kernel - - - [   20.239542] XFS (sdd5): Mounting Filesystem
<0.5> 2014-03-16 14:07:23 Telcontar kernel - - - [   20.280604] XFS (sdd8): Mounting Filesystem
<0.6> 2014-03-16 14:07:23 Telcontar kernel - - - [   20.450123] XFS (sdd8): Ending clean mount
<0.6> 2014-03-16 14:07:23 Telcontar kernel - - - [   20.459463] XFS (sdd5): Ending clean mount


Next log entry related to "sdd5" was days later, all normal:

<3.6> 2014-03-19 00:18:12 Telcontar dbus-daemon 1004 - -  **** ADDING /sys/devices/pci0000:00/0000:00:1f.2/ata10/host9/target9:0:0/9:0:0:0/block/sdd/sdd5







Next crash event happened on 2014-04-17 22:47:08, after 15 sucesful 
hibernation cycles:


<5.4> 2014-04-17 20:15:56 Telcontar pm-utils - - -  Hibernating (1)...
<1.5> 2014-04-17 20:15:56 Telcontar network 314 - -  redirecting to "systemctl --signal=9 kill network.service"
<3.5> 2014-04-17 20:15:56 Telcontar systemd 1 - -  network@eth0.service: main process exited, code=killed, status=9/KILL
<5.4> 2014-04-17 20:15:56 Telcontar pm-utils - - -  Hibernating (95)...
<0.7> 2014-04-17 20:15:59 Telcontar kernel - - - [280263.870791] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2014-04-17 20:15:59 Telcontar kernel - - - [280263.870797] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2014-04-17 20:15:59 Telcontar kernel - - - [280263.871414] PM: Basic memory bitmaps created
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280264.493703] Syncing filesystems ... done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280265.043237] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280265.046032] PM: Preallocating image memory... done (allocated 1140779 pages)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.609430] PM: Allocated 4563116 kbytes in 1.56 seconds (2925.07 MB/s)
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.609554] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.611525] Suspending console(s) (use no_console_suspend to debug)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.612352] serial 00:05: disabled
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812165] PM: freeze of devices complete after 200.520 msecs
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812452] PM: late freeze of devices complete after 0.285 msecs
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812999] PM: noirq freeze of devices complete after 0.544 msecs
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812999] Disabling non-boot CPUs ...
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.814329] smpboot: CPU 1 is now offline
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.816455] smpboot: CPU 2 is now offline
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.818199] smpboot: CPU 3 is now offline
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.818656] PM: Creating hibernation image:
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] PM: Need to copy 923283 pages
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] PM: Normal pages needed: 923283 + 1024, available pages: 1173501
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] Enabling non-boot CPUs ...
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832336] CPU1 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832467] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.845865] CPU2 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.846034] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.859609] CPU3 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.887223] PM: noirq restore of devices complete after 22.590 msecs
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.887356] PM: early restore of devices complete after 0.107 msecs
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059840] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059859] usb usb3: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059869] uhci_hcd 0000:00:1a.1: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059885] usb usb4: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059893] uhci_hcd 0000:00:1a.2: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059910] usb usb5: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059919] ehci-pci 0000:00:1a.7: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059937] usb usb1: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061145] uhci_hcd 0000:00:1d.0: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061167] usb usb6: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061177] uhci_hcd 0000:00:1d.1: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061196] usb usb7: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061205] uhci_hcd 0000:00:1d.2: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061225] usb usb8: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061236] ehci-pci 0000:00:1d.7: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061254] usb usb2: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.062031] pci 0000:00:1e.0: setting latency timer to 64
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.062123] ata_piix 0000:00:1f.2: setting latency timer to 64
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.062182] ata_piix 0000:00:1f.5: setting latency timer to 64
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.063832] ehci-pci 0000:00:1a.7: cache line size of 32 is not supported
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.065134] ehci-pci 0000:00:1d.7: cache line size of 32 is not supported
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162023] pciehp 0000:00:1c.4:pcie04: Device 0000:06:00.0 already exists at 0000:06:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162025] pciehp 0000:00:1c.4:pcie04: Cannot add device at 0000:06:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162047] pciehp 0000:00:1c.2:pcie04: Device 0000:04:00.0 already exists at 0000:04:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162049] pciehp 0000:00:1c.2:pcie04: Cannot add device at 0000:04:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162051] pciehp 0000:00:1c.5:pcie04: Device 0000:07:00.0 already exists at 0000:07:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162053] pciehp 0000:00:1c.5:pcie04: Cannot add device at 0000:07:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162098] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162100] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162123] pciehp 0000:00:1c.3:pcie04: Device 0000:05:00.0 already exists at 0000:05:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162125] pciehp 0000:00:1c.3:pcie04: Cannot add device at 0000:05:00
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162308] pata_jmicron 0000:05:00.1: setting latency timer to 64
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.163546] serial 00:05: activated
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.164041] pata_jmicron 0000:04:00.1: setting latency timer to 64
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.173271] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.386975] ata11: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.467054] ata2: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.468030] ata1: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.481019] usb 1-2: reset high-speed USB device number 3 using ehci-pci
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.485262] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.538037] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.541148] ata12.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.541149] ata12.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.541151] ata12.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.563113] ata12.00: configured for UDMA/100
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.621020] firewire_core 0000:08:02.0: rediscovered device fw0
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.622018] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.624027] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.624176] ata3.00: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.624207] sd 2:0:0:0: [sda] Starting disk
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.625665] ata4.00: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.626090] sd 3:0:0:0: [sdb] Starting disk
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.656005] /dev/vmmon[0]: HostIFReadUptimeWork: detected settimeofday: fixed uptimeBase old 18445346595345864640 new 18445346586286024561 attempts 1
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.833055] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.833064] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836117] ata9.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836119] ata9.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836296] ata9.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836298] ata9.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842067] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842082] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842175] ata9.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842176] ata9.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842344] ata9.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842345] ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845187] ata10.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845189] ata10.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845378] ata10.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845380] ata10.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.847015] usb 3-1: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851234] ata10.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851235] ata10.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851359] ata9.00: configured for UDMA/133
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851456] ata10.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851458] ata10.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.857339] ata9.01: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.857369] sd 8:0:0:0: [sdc] Starting disk
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.857371] sd 8:0:1:0: [sdd] Starting disk
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.879326] ata10.00: configured for UDMA/133
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.885331] ata10.01: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.885365] sd 9:0:0:0: [sde] Starting disk
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.885369] sd 9:0:1:0: [sdf] Starting disk
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280268.242014] usb 2-5: reset high-speed USB device number 2 using ehci-pci
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280268.608013] usb 8-2: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280268.959113] usb 2-5.4: reset high-speed USB device number 4 using ehci-pci
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.287977] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.796130] PM: restore of devices complete after 2736.343 msecs
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.115233] PM: Basic memory bitmaps freed
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.191345] bridge-eth0: disabling the bridge
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.196021] bridge-eth0: down
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.196026] bridge-eth0: detached
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762859] /dev/vmnet: open called by PID 3122 (vmnet-bridge)
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762873] /dev/vmnet: hub 0 does not exist, allocating memory.
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762888] /dev/vmnet: port on hub 0 successfully opened
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762899] bridge-eth0: up
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762904] bridge-eth0: attached
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.396460] userif-2: sent link down event.
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.396463] userif-2: sent link up event.
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] 
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851709] CPU: 0 PID: 27785 Comm: kworker/0:4 Tainted: P           O 3.11.10-7-desktop #1
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851864] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852074] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852211]  0000000000000000 ffffffff8159ff82 0000000000216bae ffffffffa0c53996
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852486]  ffff88019907e0c0 ffff880234160740 ffff88012e9e5cb0 0000000000000000
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852638]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852790] Call Trace:
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852847]  [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852947]  [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853063]  [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853164]  [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853275]  [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853439]  [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853594]  [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853761]  [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853950]  [<ffffffffa0c4e633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854110]  [<ffffffffa0c441ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854268]  [<ffffffffa0c42f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854428]  [<ffffffffa0c43a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854585]  [<ffffffffa0c43d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854725]  [<ffffffff8106ac68>] process_one_work+0x168/0x490
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854835]  [<ffffffff8106b904>] worker_thread+0x114/0x3a0
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854941]  [<ffffffff81071c2f>] kthread+0xaf/0xc0
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.855037]  [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.855142] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c673d8
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.901296] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.901447] XFS (sde5): Please umount the filesystem and rectify the problem(s)
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280272.480011] XFS (sde5): xfs_log_force: error 5 returned.
<3.4> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - -  The canary thread is apparently starving. Taking action.
<3.6> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - -  Demoting known real-time threads.
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - -  Successfully demoted thread 31337 of process 31334 (/usr/bin/pulseaudio).
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - -  Successfully demoted thread 31336 of process 31334 (/usr/bin/pulseaudio).
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - -  Successfully demoted thread 31334 of process 31334 (/usr/bin/pulseaudio).
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - -  Demoted 3 threads.
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - -  RTM_NEWLINK: name:eth0 index:2 flags:0x00001003
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - -  Removing interface eth0 index:2
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - -  Stopped bridge eth0 to virtual network 0.
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - -  RTM_NEWLINK: name:eth0 index:2 flags:0x00011043
<3.6> 2014-04-17 22:47:07 Telcontar vmnet-natd - - -  RTM_NEWLINK: name:eth0 index:2 flags:0x00001003
<3.6> 2014-04-17 22:47:08 Telcontar systemd 1 - -  Time has been changed
<3.6> 2014-04-17 22:47:11 Telcontar acpid - - -  1 client rule loaded
<3.5> 2014-04-17 22:47:12 Telcontar dbus 1013 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<5.4> 2014-04-17 22:47:12 Telcontar pm-utils - - -  Thawing (95)...
<3.5> 2014-04-17 22:47:14 Telcontar dbus 1013 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-04-17 22:47:16 Telcontar network 788 - -  redirecting to "systemctl  restart network.service"
<3.6> 2014-04-17 22:47:16 Telcontar systemd 1 - -  Stopping ifup managed network interface eth1...
<3.6> 2014-04-17 22:47:16 Telcontar systemd 1 - -  Stopping ifup managed network interface eth0...
<3.6> 2014-04-17 22:47:16 Telcontar systemd 1 - -  Stopping LSB: Configure network interfaces and set up routing...


Apparently, I rebooted:


2014-04-17 23:27:32+02:00 - Halting the system now  =========================================== uptime:  23:27pm  up 6 days 19:54,  1 user,  load average: 12.51, 3.63, 1.38
2014-04-17 23:32:17+02:00 - Booting the system now  ================================================================================  Linux Telcontar 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC

<10.5> 2014-04-17 23:33:13 Telcontar login - - -  ROOT LOGIN ON tty1
<10.5> 2014-04-17 23:39:17 Telcontar login - - -  ROOT LOGIN ON tty2
<10.5> 2014-04-17 23:43:14 Telcontar login - - -  ROOT LOGIN ON tty3
<10.5> 2014-04-17 23:43:21 Telcontar login - - -  ROOT LOGIN ON tty4



I have reason to believe, looking at my logs, that I restored my home 
here, using the same procedure, but using this work system, instead of the 
rescue live stick (oS 13.1 XFCE), using text mode tools.  Thus I guess 
this time I used plain mkfs.xfs.  Later I see dozens of hibernate cycles, 
till I halt normally about two weeks later, on 2014-05-02, so the 
procedure succeded.






Next crash event was this Sunday:


Hibernating and thawing sequence, complete:


<3.4> 2014-06-29 04:51:49 Telcontar pm-utils - - -  Hibernating the system now (04)...
<3.5> 2014-06-29 04:51:49 Telcontar pm-utils - - -  There appears not be any pending nntp post to be sent. I just checked :-)
<1.5> 2014-06-29 04:51:50 Telcontar network 29169 - -  redirecting to "systemctl --signal=9 kill network.service"
<3.5> 2014-06-29 04:51:50 Telcontar systemd 1 - -  network@eth0.service: main process exited, code=killed, status=9/KILL
<3.4> 2014-06-29 04:51:50 Telcontar pm-utils - - -  Hibernating (95)...
<0.7> 2014-06-29 04:51:53 Telcontar kernel - - - [212878.926048] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2014-06-29 04:51:53 Telcontar kernel - - - [212878.926052] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2014-06-29 04:51:53 Telcontar kernel - - - [212878.927502] PM: Basic memory bitmaps created
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212879.561676] Syncing filesystems ... done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212880.077132] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212880.080024] PM: Preallocating image memory... done (allocated 1140811 pages)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.351277] PM: Allocated 4563244 kbytes in 7.27 seconds (627.68 MB/s)
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.351400] Freezing remaining freezable tasks ... (elapsed 0.080 seconds) done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.432284] Suspending console(s) (use no_console_suspend to debug)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.433051] serial 00:05: disabled
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633138] PM: freeze of devices complete after 200.734 msecs
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633370] PM: late freeze of devices complete after 0.230 msecs
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633913] PM: noirq freeze of devices complete after 0.541 msecs
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633913] Disabling non-boot CPUs ...
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.635222] smpboot: CPU 1 is now offline
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.637153] smpboot: CPU 2 is now offline
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.639195] smpboot: CPU 3 is now offline
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.639658] PM: Creating hibernation image:
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] PM: Need to copy 923219 pages
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] PM: Normal pages needed: 923219 + 1024, available pages: 1173563
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] microcode: CPU0 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] Enabling non-boot CPUs ...
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.653119] microcode: CPU1 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.653307] CPU1 is up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.653440] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.666704] microcode: CPU2 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.666844] CPU2 is up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.667011] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.680398] microcode: CPU3 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.680598] CPU3 is up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.708225] PM: noirq restore of devices complete after 22.576 msecs
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.708358] PM: early restore of devices complete after 0.109 msecs
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880083] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880086] uhci_hcd 0000:00:1a.1: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880107] usb usb3: root hub lost power or was reset
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880110] usb usb4: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880120] uhci_hcd 0000:00:1a.2: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880124] ehci-pci 0000:00:1a.7: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880139] usb usb5: root hub lost power or was reset
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880188] usb usb1: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880243] uhci_hcd 0000:00:1d.0: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880265] usb usb6: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880275] uhci_hcd 0000:00:1d.1: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880296] usb usb7: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880306] uhci_hcd 0000:00:1d.2: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880326] usb usb8: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880338] ehci-pci 0000:00:1d.7: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880349] usb usb2: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.881094] pci 0000:00:1e.0: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.881199] ata_piix 0000:00:1f.2: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.881237] ata_piix 0000:00:1f.5: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.884086] ehci-pci 0000:00:1a.7: cache line size of 32 is not supported
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.884236] ehci-pci 0000:00:1d.7: cache line size of 32 is not supported
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981023] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981025] pciehp 0000:00:1c.2:pcie04: Device 0000:04:00.0 already exists at 0000:04:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981026] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981028] pciehp 0000:00:1c.2:pcie04: Cannot add device at 0000:04:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981032] pciehp 0000:00:1c.5:pcie04: Device 0000:07:00.0 already exists at 0000:07:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981034] pciehp 0000:00:1c.5:pcie04: Cannot add device at 0000:07:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981058] pciehp 0000:00:1c.3:pcie04: Device 0000:05:00.0 already exists at 0000:05:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981059] pciehp 0000:00:1c.3:pcie04: Cannot add device at 0000:05:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981089] pciehp 0000:00:1c.4:pcie04: Device 0000:06:00.0 already exists at 0000:06:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981090] pciehp 0000:00:1c.4:pcie04: Cannot add device at 0000:06:00
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981220] pata_jmicron 0000:04:00.1: setting latency timer to 64
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.982188] serial 00:05: activated
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.982714] pata_jmicron 0000:05:00.1: setting latency timer to 64
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.186275] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.192270] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.206012] ata11: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.286032] ata1: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.287030] ata4: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.357035] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.360116] ata12.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.360118] ata12.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.360119] ata12.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.366112] ata12.00: configured for UDMA/100
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.440022] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.440024] firewire_core 0000:08:02.0: rediscovered device fw0
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.442190] ata3.00: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.442223] sd 2:0:0:0: [sdb] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.450017] usb 8-2: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.659048] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.659058] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.661048] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.661058] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662114] ata9.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662115] ata9.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662293] ata9.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662295] ata9.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664113] ata10.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664114] ata10.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664326] ata10.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664327] ata10.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668112] ata9.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668113] ata9.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668293] ata9.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668294] ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670113] ata10.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670114] ata10.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670323] ata10.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670324] ata10.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.677300] ata9.00: configured for UDMA/133
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.683286] ata9.01: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.683311] sd 8:0:0:0: [sdc] Starting disk
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.683369] sd 8:0:1:0: [sdd] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.698321] ata10.00: configured for UDMA/133
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.704335] ata10.01: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.704361] sd 9:0:0:0: [sde] Starting disk
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.704418] sd 9:0:1:0: [sdf] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.829028] usb 2-5: reset high-speed USB device number 2 using ehci-pci
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.901026] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.903237] ata2.00: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.903279] sd 1:0:0:0: [sda] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212889.045020] usb 1-2: reset high-speed USB device number 3 using ehci-pci
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212889.411014] usb 3-1: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212889.778047] usb 2-5.4: reset high-speed USB device number 4 using ehci-pci
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] 
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875 Comm: kworker/0:2 Tainted: P           O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390]  0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391]  ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403]  [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408]  [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411]  [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026207] usb 1-6: USB disconnect, device number 4
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.025944] Restarting kernel threads ... done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026371] Restarting tasks ... done.
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.079743] PM: Basic memory bitmaps freed
<3.4> 2014-06-29 12:32:19 Telcontar rtkit-daemon 4287 - -  The canary thread is apparently starving. Taking action.
<3.6> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - -  Demoting known real-time threads.
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - -  Successfully demoted thread 4293 of process 4286 (/usr/bin/pulseaudio).
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - -  Successfully demoted thread 4292 of process 4286 (/usr/bin/pulseaudio).
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - -  Successfully demoted thread 4286 of process 4286 (/usr/bin/pulseaudio).
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - -  Demoted 3 threads.
<3.6> 2014-06-29 12:32:20 Telcontar systemd 1 - -  Time has been changed
<3.3> 2014-06-29 12:32:21 Telcontar systemd-udevd 29550 - -  inotify_add_watch(7, /dev/sdg, 10) failed: No such file or directory
<3.3> 2014-06-29 12:32:21 Telcontar systemd-udevd 29551 - -  inotify_add_watch(7, /dev/sdh, 10) failed: No such file or directory
<0.4> 2014-06-29 12:32:25 Telcontar kernel - - - [212898.656011] XFS (sde5): xfs_log_force: error 5 returned.
<3.5> 2014-06-29 12:32:26 Telcontar dbus 1033 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<3.4> 2014-06-29 12:32:27 Telcontar pm-utils - - -  Thawing (95)...
<3.5> 2014-06-29 12:32:29 Telcontar dbus 1033 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-06-29 12:32:30 Telcontar network 29606 - -  redirecting to "systemctl  restart network.service"
<3.6> 2014-06-29 12:32:30 Telcontar systemd 1 - -  Stopping ifup managed network interface eth1...
<3.6> 2014-06-29 12:32:30 Telcontar systemd 1 - -  Stopping ifup managed network interface eth0...
<3.6> 2014-06-29 12:32:30 Telcontar systemd 1 - -  Stopping LSB: Configure network interfaces and set up routing...
<3.6> 2014-06-29 12:32:31 Telcontar systemd 1 - -  Starting LSB: Configure network interfaces and set up routing...
<3.6> 2014-06-29 12:32:32 Telcontar acpid - - -  1 client rule loaded
<3.6> 2014-06-29 12:32:32 Telcontar ifdown 29624 - -  eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-06-29 12:32:32 Telcontar ifdown 29625 - -  eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:32 Telcontar ifdown 29624 - -      eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:32 Telcontar ifdown 29625 - -      eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-06-29 12:32:32 Telcontar network 29638 - -  Setting up network interfaces:
<3.6> 2014-06-29 12:32:34 Telcontar network 29638 - -  lo
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - -      lo
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - -      lo
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - -  IP address: 127.0.0.1/8
<3.6> 2014-06-29 12:32:34 Telcontar network 29638 - -  lo        IP address: 127.0.0.1/8
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - - 
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212922.866033] Chrome_ChildThr[14100]: segfault at 0 ip 00007fd3d820d596 sp 00007fd3cbc5c410 error 6 in libmozalloc.so[7fd3d820c000+2000]
<16.3> 2014-06-29 12:32:49 Telcontar dhcpcd 30417 - -  eth1: dhcpcd not running
<16.6> 2014-06-29 12:32:49 Telcontar dhcpcd 30417 - -  eth1: exiting
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Interface eth0.IPv6 no longer relevant for mDNS.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Leaving mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Interface eth0.IPv4 no longer relevant for mDNS.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Withdrawing address record for fc00::14 on eth0.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Withdrawing address record for 192.168.1.14 on eth0.
<3.5> 2014-06-29 12:32:49 Telcontar systemd 1 - -  Unit network@eth0.service entered failed state.
<3.6> 2014-06-29 12:32:49 Telcontar systemd 1 - -  Starting ifup managed network interface eth0...
<3.6> 2014-06-29 12:32:49 Telcontar ifup 30485 - -  eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:49 Telcontar ifup 30485 - -      eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212923.549298] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212923.549323] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212923.549369] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  New relevant interface eth0.IPv4 for mDNS.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - -  Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-06-29 12:32:50 Telcontar systemd 1 - -  Starting ifup managed network interface eth1...
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - -  ifplugd 0.28 initializing.
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - -  Using interface eth1/00:21:85:16:2D:0C with driver <r8169> (version: 2.3LK-NAPI)
<0.6> 2014-06-29 12:32:50 Telcontar kernel - - - [212924.375304] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-06-29 12:32:50 Telcontar kernel - - - [212924.375373] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - -  Using detection mode: SIOCETHTOOL
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - -  Initialization complete, link beat not detected.
<3.6> 2014-06-29 12:32:50 Telcontar ifup 30780 - -  eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-06-29 12:32:50 Telcontar ifup 30780 - -  eth1      is controlled by ifplugd
<1.5> 2014-06-29 12:32:50 Telcontar ifup 30780 - -      eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:50 Telcontar ifup 30780 - -      eth1      is controlled by ifplugd
<3.6> 2014-06-29 12:32:50 Telcontar systemd 1 - -  Started ifup managed network interface eth1.
<0.6> 2014-06-29 12:32:52 Telcontar kernel - - - [212925.693147] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:52 Telcontar kernel - - - [212925.693155] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  Joining mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  New relevant interface eth0.IPv6 for mDNS.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  Registering new address record for fe80::221:85ff:fe16:2d0b on eth0.*.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  Joining mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - -  Withdrawing address record for fe80::221:85ff:fe16:2d0b on eth0.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Withdrawing workstation service for eth1.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Withdrawing address record for 192.168.1.14 on eth0.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Withdrawing workstation service for eth0.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Withdrawing workstation service for lo.
<3.4> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Host name conflict, retrying with Telcontar-2
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - -  Registering HINFO record with values 'X86_64'/'LINUX'.
<0.4> 2014-06-29 12:32:55 Telcontar kernel - - - [212928.736057] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:32:55 Telcontar avahi-daemon 1020 - -  Server startup complete. Host name is Telcontar-2.local. Local service cookie is 580789639.
<4.6> 2014-06-29 12:32:55 Telcontar SuSEfirewall2 - - -  Setting up rules from /etc/sysconfig/SuSEfirewall2 ...
<4.6> 2014-06-29 12:32:55 Telcontar SuSEfirewall2 - - -  using default zone 'ext' for interface eth1
<4.6> 2014-06-29 12:32:55 Telcontar SuSEfirewall2 - - -  Firewall customary rules loaded from /etc/sysconfig/scripts/SuSEfirewall2-custom
<3.6> 2014-06-29 12:32:56 Telcontar avahi-daemon 1020 - -  Service "Telcontar-2" (/etc/avahi/services/udisks.service) successfully established.
<3.6> 2014-06-29 12:32:56 Telcontar avahi-daemon 1020 - -  Service "Telcontar-2" (/etc/avahi/services/ssh.service) successfully established.
<3.6> 2014-06-29 12:32:56 Telcontar avahi-daemon 1020 - -  Service "Telcontar-2" (/etc/avahi/services/sftp-ssh.service) successfully established.
<4.6> 2014-06-29 12:32:58 Telcontar SuSEfirewall2 - - -  Firewall rules successfully set
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - -  Found user 'avahi-autoipd' (UID 495) and group 'avahi-autoipd' (GID 491).
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - -  Successfully called chroot().
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - -  Successfully dropped root privileges.
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - -  Starting with address 169.254.3.89
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - -  Routable address already assigned, sleeping.
<3.6> 2014-06-29 12:32:58 Telcontar systemd 1 - -  Started ifup managed network interface eth0.
<3.6> 2014-06-29 12:32:58 Telcontar systemd 1 - -  Started ifup managed network interface eth1.
<3.6> 2014-06-29 12:32:58 Telcontar network 29638 - -  ..done..done..done    ppp0      Startmode is 'manual' -> skipping
<1.5> 2014-06-29 12:32:58 Telcontar ifup 31756 - -      ppp0      Startmode is 'manual' -> skipping
<3.6> 2014-06-29 12:32:58 Telcontar network 29638 - -  ..skippedSetting up service network  .  .  .  .  .  .  .  .  .  .  .  .  ...done
<3.6> 2014-06-29 12:32:58 Telcontar systemd 1 - -  Started LSB: Configure network interfaces and set up routing.
<3.4> 2014-06-29 12:32:58 Telcontar pm-utils - - -  Thawing the system now (04)...
<3.6> 2014-06-29 12:33:01 Telcontar systemd 1 - -  Starting Session 1605 of user news.
<3.4> 2014-06-29 12:33:21 Telcontar router - - -  (Thawing 04) Logging the current IP= 79.150.228.90
<0.4> 2014-06-29 12:33:25 Telcontar kernel - - - [212958.816015] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:33:55 Telcontar kernel - - - [212988.896014] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:34:25 Telcontar kernel - - - [213018.976015] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:34:55 Telcontar kernel - - - [213049.056014] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:35:01 Telcontar systemd 1 - -  Starting Session 1606 of user news.
<0.4> 2014-06-29 12:35:25 Telcontar kernel - - - [213079.136015] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:35:55 Telcontar kernel - - - [213109.216011] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:36:25 Telcontar kernel - - - [213139.296014] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:36:55 Telcontar kernel - - - [213169.376016] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:37:25 Telcontar kernel - - - [213199.456013] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:37:55 Telcontar kernel - - - [213229.536014] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:38:01 Telcontar systemd 1 - -  Starting Session 1607 of user news.
<0.4> 2014-06-29 12:38:25 Telcontar kernel - - - [213259.616018] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:38:56 Telcontar kernel - - - [213289.696014] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:39:26 Telcontar kernel - - - [213319.776019] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:39:56 Telcontar kernel - - - [213349.856014] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:40:01 Telcontar systemd 1 - -  Starting Session 1608 of user cer.
...
<5.6> 2014-06-29 12:48:34 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1111" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-06-29 12:48:35+02:00 - Halting the system now  =========================================== uptime:  12:48pm  up 4 days  8:43,  33 users,  load average: 1.40, 0.53, 0.67
2014-06-29 12:57:41+02:00 - Booting the system now  ================================================================================  Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux

(it does not show in the log that I had to hit the hardware reset button, 
the machine refused to reboot normally, apparently)


    (If you ask why I took so long to notice the problem after thawing,
    my routine is to power up the machine, then go prepare tea.  :-)
    When I come back with the mug, I'm dismayed to see I can not
    start working; and this day I was in a a hurry)


So I reboot (text mode, level 3), umount home, run xfsrepair, mount again, 
do xfsdump, do simultanesouly an rsync (it is a file by file copy, in case 
of problems with dump), umount, use YaST in text mode to reformat the 
partition, mount, and then xfsrestore.  It did not occur to me to make a 
'dd' photo this time: I was tired and busy.

Maybe next time I can take the photo with dd before doing anything else 
(it takes about 80 minutes), or simply do an "xfs_metadump", which should 
be faster.  And I might not have then 500 GiB of free space to make a dd 
copy, anyway.






Question.

As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap?  I thought that the procedure includes some
checksum, but I don't know for sure.






> This is interesting because the corruption appears to be associated with
> post-eof space, which is generally transient. The worst case is that
> this space is trimmed off files when they are evicted from cache, such
> as during a umount. To me, that seems to correlate with a more
> recent/runtime problem rather than something that might be lingering on
> disk, but we don't really know for sure.

Dunno.

To me, there are two problems:

  1) The corruption itself.
  2) That xfs_repair fails to repair the filesystem. In fact, I believe
     it does not detect it!

To me, #2 is the worst, and it is what makes me do the backup, format, 
restore cycle for recovery. An occassional kernel crash is somewhat 
acceptable :-}



>> Wait! I have a "dd" copy of the entire partition (500 GB), made on March
>> 16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
>> get something for you now:

...

>> I could do a "xfs_metadump" on it - just tell me what options to use, and
>> where can the result be uploaded to, if big.
>>
>
> A metadump would be helpful, though that only gives us the on-disk
> state. What was the state of this fs at the time the dd image was
> created?

I'm sorry, I'm not absolutely sure. I believe it is corrupted, but I can 
not vouch it.

> I'm curious if something like an 'rm -rf *' on the metadump
> would catch any other corruptions or if this is indeed limited to
> something associated with recent (pre)allocations.

Sorry, run 'rm -rf *' where???


> Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
> obfuscate filenames by default. It should also be compressible. In the
> future, it's probably worth grabbing a metadump as a first step (before
> repair, zeroing the log, etc.) so we can look at the fs in the state
> most recent to the crash.

I will take that photo next time, using a rescue system in order to impede 
the system from mounting the partition and replaying the log. Dunno how 
long that will take to happen, though... usually a month - but at least 
now I know how to do it.




Meanwhile, I have done a xfs_metadump of the image, and compressed it with 
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email 
that, and even less to a mail list.

Do you still have a bugzilla system where I can upload it? I had an 
account at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if 
it still runs :-?

If you don't, I can try to create it a bugzilla on openSUSE instead, and 
tell you the number... but I don't know if it takes files that big. If it 
doesn't, I'll fragment the file. You need to have an account there, I 
think, to retrieve the attachment, and I would prefer to mark the bug 
private, or at least the attachment.




I did the following.

First I made a copy, with "dd", of the partition image, all 489G of it. On 
this copy I ran "xfs_check", "xfs_repair -n", and "xfs_repair", with these 
results:


Telcontar:/data/storage_d/old_backup # xfs_check xfs_copy_home_workonit
xfs_check is deprecated and scheduled for removal in June 2014.
Please use xfs_repair -n <dev> instead.
Telcontar:/data/storage_d/old_backup # xfs_repair -n xfs_copy_home_workonit
Phase 1 - find and verify superblock...
Phase 2 - using internal log
         - scan filesystem freespace and inode maps...
         - found root inode chunk
Phase 3 - for each AG...
         - scan (but don't clear) agi unlinked lists...
         - process known inodes and perform inode discovery...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
         - setting up duplicate extent list...
         - check for inodes claiming duplicate blocks...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
         - traversing filesystem ...
         - traversal finished ...
         - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
Telcontar:/data/storage_d/old_backup # time xfs_repair xfs_copy_home_workonit
Phase 1 - find and verify superblock...
Phase 2 - using internal log
         - zero log...
         - scan filesystem freespace and inode maps...
         - found root inode chunk
Phase 3 - for each AG...
         - scan and clear agi unlinked lists...
         - process known inodes and perform inode discovery...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
         - setting up duplicate extent list...
         - check for inodes claiming duplicate blocks...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
Phase 5 - rebuild AG headers and trees...
         - reset superblock...
Phase 6 - check inode connectivity...
         - resetting contents of realtime bitmap and summary inodes
         - traversing filesystem ...
         - traversal finished ...
         - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

real    0m28.058s
user    0m1.692s
sys     0m2.265s
Telcontar:/data/storage_d/old_backup #


Maybe the image was made after repair, or maybe xfs_repair doesn't detect 
anything, which as far as I remember, was the case.



I recreate the copy, to try "mount" on an unaltered copy.


Telcontar:/data/storage_d/old_backup # time dd if=xfs_copy_home 
of=xfs_copy_home_workonit && mount -v xfs_copy_home_workonit mount/
1024000000+0 records in
1024000000+0 records out
524288000000 bytes (524 GB) copied, 4662.7 s, 112 MB/s

real    77m43.697s
user    3m1.420s
sys     28m41.958s
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
(reverse-i-search)`mount': time dd if=xfs_copy_home 
Telcontar:/data/storage_d/old_backup #


So it mounts...





- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO0x18ACgkQtTMYHG2NR9X6QwCcD8r5qXIHVh4ELklM/tzXASds
yskAoIcwxYNC2tKsS7wE9Jp+g4MNUdpd
=pIZI
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-03  3:00   ` Carlos E. R.
@ 2014-07-03  9:43     ` Dave Chinner
  2014-07-03 17:40       ` Brian Foster
  2014-07-03 23:34       ` Carlos E. R.
  2014-07-03 17:39     ` Brian Foster
  1 sibling, 2 replies; 56+ messages in thread
From: Dave Chinner @ 2014-07-03  9:43 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> >On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
> 
> ...
> 
> >This is the background eofblocks scanner attempting to free preallocated
> >space on a file. The scanner looks for files that have been recently
> >grown and since been flushed to disk (i.e., no longer concurrently being
> >written to) and trims the post-eof preallocation that comes along with
> >growing files.
> >
> >The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
> >extent we are attempting to free is already accounted for in the
> >by-block allocation btree. IOW, this is attempting to free an extent
> >that the allocation metadata thinks is already free.
> >
> >>
> >>Brief description:
> >>
> >>
> >> * It happens only on restore from hibernation.
> >
> >Interesting, could you elaborate a bit more on the behavior this system
> >is typically subjected to? i.e., is this a server that sees a constant
> >workload that is also frequently hibernated/awakened?

....

> The machine may be used anywhere from 4 to 16 hours a day, and
> hibernated at least once a day, perhaps three times if I have to go
> out several times. It makes no sense to me to leave the machine
> powered doing nothing, if hibernating is so easy and reliable - till
> now. If I have to leave for more than a week, I tend to do a full
> "halt".

Hibernation has always been suspect w.r.t. flushing filesystem
metadata. It does not guarantee that the filesystem is quiesced
and idle, it just does a sync() and hopes that is sufficient to get
the filesystem into a consistent state. The mess that this leaves is
then left to filesystem developers to play whack-a-mole with when
users have problems.

> But soon after, it oopses:

Point of note: there is no oops or crash occurring. XFS dumps the
stack when a corruption occurs to tell use where it was detected
and then shuts down the filesystem. Your system is still just fine
apart from not being able to access that filesystem until you
unmount it, rpeair it and mount it again.

> 3 PID: 57 Comm: kworker/3:1 Tainted: P           O 3.11.10-7-desktop

What's tainting your kernel? If you remove that taint, does the
problem still occur?

....
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] Enabling non-boot CPUs ...
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] smpboot: Booting Node 0 Processor 1 APIC 0x1
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832336] CPU1 is up
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832467] smpboot: Booting Node 0 Processor 2 APIC 0x2
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.845865] CPU2 is up
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.846034] smpboot: Booting Node 0 Processor 3 APIC 0x3
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.859609] CPU3 is up
....
> <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.796130] PM: restore of devices complete after 2736.343 msecs
> <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
> <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
.....
> <0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9

So the corruption occurred within 2s of the kernel restarting tasks
after a hibernation. It's really looking like a hibernation issue.

> <3.4> 2014-06-29 04:51:50 Telcontar pm-utils - - -  Hibernating (95)...
.....
> <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] Enabling non-boot CPUs ...
.....
> <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
.....
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
> <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
> <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026207] usb 1-6: USB disconnect, device number 4
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.025944] Restarting kernel threads ... done.
> <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026371] Restarting tasks ... done.

Well, there's the smoking gun. The XFS kworker is running and
reporting errors before the thawing process has restarted
the frozen workqueues:

void thaw_kernel_threads(void)
{
        struct task_struct *g, *p;

        pm_nosig_freezing = false;
        printk("Restarting kernel threads ... ");

        thaw_workqueues();
....

Which points to the fact that we probably need WQ_FREEZABLE on some
of our workqueues. Brian, do you want to have a look at this?

> Question.
> 
> As this always happens on recovery from hibernation, and seeing the message
> "Corruption of in-memory data detected", could it be that thawing does a bad
> memory recovery from the swap?  I thought that the procedure includes some
> checksum, but I don't know for sure.

It's the fact that the filesystem si still running and modifying
state when the snapshot is being taken that results in the snapshot
image containing an inconsistent snapshot. That then gets loaded
on thaw and it goes boom.

> To me, there are two problems:
> 
>  1) The corruption itself.
>  2) That xfs_repair fails to repair the filesystem. In fact, I believe
>     it does not detect it!

That's because the filesystem is likely to be consistent on disk.
The issue is in-memory corruption, not on-disk corruption, like
the messages are telling us:

XFS (sde5): Corruption of in-memory data detected.

Basically, XFS is catching a bad state in memory and preventing it
from being propagated to disk. if it gets to disk, then you are
likely to lose data. IOWs, XFS is behaving as designed and is
actually preventing data loss in this situation.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-03  3:00   ` Carlos E. R.
  2014-07-03  9:43     ` Dave Chinner
@ 2014-07-03 17:39     ` Brian Foster
  2014-07-04 21:32       ` Carlos E. R.
  1 sibling, 1 reply; 56+ messages in thread
From: Brian Foster @ 2014-07-03 17:39 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> >On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
> 
> ...
> 
> >This is the background eofblocks scanner attempting to free preallocated
> >space on a file. The scanner looks for files that have been recently
> >grown and since been flushed to disk (i.e., no longer concurrently being
> >written to) and trims the post-eof preallocation that comes along with
> >growing files.
> >
> >The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
> >extent we are attempting to free is already accounted for in the
> >by-block allocation btree. IOW, this is attempting to free an extent
> >that the allocation metadata thinks is already free.
> >
> >>
> >>Brief description:
> >>
> >>
> >> * It happens only on restore from hibernation.
> >
> >Interesting, could you elaborate a bit more on the behavior this system
> >is typically subjected to? i.e., is this a server that sees a constant
> >workload that is also frequently hibernated/awakened?
> 
> It is a desktop machine I use for work at home. I typically have many
> applications opened on diferent workspaces in XFCE. Say one has terminals,
> another has Thunderbird/Pine, another Firefox, another LibreOffice; another
> may have gimp, another may be kbabel or lokalize, another may have vmplayer,
> etc, whatever. When I go out or go to sleep, I hibernate the machine,
> instead of powering down, because it is much faster than reboot, login, and
> start the wanted applications, and I want to conserve some electricity.
> 
> I also use the machine for testing configurations, but these I try to do on
> virtual machines, instead of my work partition.
> 
> 
> The machine may be used anywhere from 4 to 16 hours a day, and hibernated at
> least once a day, perhaps three times if I have to go out several times. It
> makes no sense to me to leave the machine powered doing nothing, if
> hibernating is so easy and reliable - till now. If I have to leave for more
> than a week, I tend to do a full "halt".
> 
> 
> 
> By the way, this started hapening when I replaced an old 500 GB hard disk
> (Seagate ST3500418AS) with a 2 TB new unit (Seagate ST2000DM001-1CH164).
> Smartctl long test says fine (and seatools from Windows, too).
> 

Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.

> 
> 
> >>I do not have more info than what appears on the logs, but four times (two
> >>different kernels):
> >>
> >>cer@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
> >>/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> >>/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> >>/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> >>/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
> >>cer@Telcontar:~>
> 
> >So you have reproduced this, reformatted with mkfs, restored from
> >backups and continued to reproduce the problem? And still only on this
> >particular partition?
> 
> Right. Exactly that.
> 
> Only that I can not reproduce the issue at will, but about once a month,
> randomly.
> 
> AFAIK, xfsdump can not carry over a filesystem corruption, right?
> 

I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.

> 
> 
> **** LONG DESCRIPTION and LOGS start here ********
> 
...
> <5.6> 2014-06-29 12:48:34 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1111" x-info="http://www.rsyslog.com"] exiting on signal 15.
> 2014-06-29 12:48:35+02:00 - Halting the system now  =========================================== uptime:  12:48pm  up 4 days  8:43,  33 users,  load average: 1.40, 0.53, 0.67
> 2014-06-29 12:57:41+02:00 - Booting the system now  ================================================================================  Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux
> 
> (it does not show in the log that I had to hit the hardware reset button,
> the machine refused to reboot normally, apparently)
> 
> 
>    (If you ask why I took so long to notice the problem after thawing,
>    my routine is to power up the machine, then go prepare tea.  :-)
>    When I come back with the mug, I'm dismayed to see I can not
>    start working; and this day I was in a a hurry)
> 
> 
> So I reboot (text mode, level 3), umount home, run xfsrepair, mount again,
> do xfsdump, do simultanesouly an rsync (it is a file by file copy, in case
> of problems with dump), umount, use YaST in text mode to reformat the
> partition, mount, and then xfsrestore.  It did not occur to me to make a
> 'dd' photo this time: I was tired and busy.
> 
> Maybe next time I can take the photo with dd before doing anything else (it
> takes about 80 minutes), or simply do an "xfs_metadump", which should be
> faster.  And I might not have then 500 GiB of free space to make a dd copy,
> anyway.
> 

xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.

> 
> 
> 
> 
> 
> Question.
> 
> As this always happens on recovery from hibernation, and seeing the message
> "Corruption of in-memory data detected", could it be that thawing does a bad
> memory recovery from the swap?  I thought that the procedure includes some
> checksum, but I don't know for sure.
> 

Not sure, though if so I would think that might be a more common source
of problems. 

> 
> 
> 
> 
> 
> >This is interesting because the corruption appears to be associated with
> >post-eof space, which is generally transient. The worst case is that
> >this space is trimmed off files when they are evicted from cache, such
> >as during a umount. To me, that seems to correlate with a more
> >recent/runtime problem rather than something that might be lingering on
> >disk, but we don't really know for sure.
> 
> Dunno.
> 
> To me, there are two problems:
> 
>  1) The corruption itself.
>  2) That xfs_repair fails to repair the filesystem. In fact, I believe
>     it does not detect it!
> 
> To me, #2 is the worst, and it is what makes me do the backup, format,
> restore cycle for recovery. An occassional kernel crash is somewhat
> acceptable :-}
> 

Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
;)

That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.

> 
> 
> >>Wait! I have a "dd" copy of the entire partition (500 GB), made on March
> >>16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
> >>get something for you now:
> 
> ...
> 
> >>I could do a "xfs_metadump" on it - just tell me what options to use, and
> >>where can the result be uploaded to, if big.
> >>
> >
> >A metadump would be helpful, though that only gives us the on-disk
> >state. What was the state of this fs at the time the dd image was
> >created?
> 
> I'm sorry, I'm not absolutely sure. I believe it is corrupted, but I can not
> vouch it.
> 
> >I'm curious if something like an 'rm -rf *' on the metadump
> >would catch any other corruptions or if this is indeed limited to
> >something associated with recent (pre)allocations.
> 
> Sorry, run 'rm -rf *' where???
> 

On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).

> 
> >Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
> >obfuscate filenames by default. It should also be compressible. In the
> >future, it's probably worth grabbing a metadump as a first step (before
> >repair, zeroing the log, etc.) so we can look at the fs in the state
> >most recent to the crash.
> 
> I will take that photo next time, using a rescue system in order to impede
> the system from mounting the partition and replaying the log. Dunno how long
> that will take to happen, though... usually a month - but at least now I
> know how to do it.
> 
> 
> 
> 
> Meanwhile, I have done a xfs_metadump of the image, and compressed it with
> xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
> that, and even less to a mail list.
> 
> Do you still have a bugzilla system where I can upload it? I had an account
> at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
> runs :-?
> 

I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.

Brian

> If you don't, I can try to create it a bugzilla on openSUSE instead, and
> tell you the number... but I don't know if it takes files that big. If it
> doesn't, I'll fragment the file. You need to have an account there, I think,
> to retrieve the attachment, and I would prefer to mark the bug private, or
> at least the attachment.
> 
> 
> 
> 
> I did the following.
> 
> First I made a copy, with "dd", of the partition image, all 489G of it. On
> this copy I ran "xfs_check", "xfs_repair -n", and "xfs_repair", with these
> results:
> 
> 
> Telcontar:/data/storage_d/old_backup # xfs_check xfs_copy_home_workonit
> xfs_check is deprecated and scheduled for removal in June 2014.
> Please use xfs_repair -n <dev> instead.
> Telcontar:/data/storage_d/old_backup # xfs_repair -n xfs_copy_home_workonit
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> Telcontar:/data/storage_d/old_backup # time xfs_repair xfs_copy_home_workonit
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> 
> real    0m28.058s
> user    0m1.692s
> sys     0m2.265s
> Telcontar:/data/storage_d/old_backup #
> 
> 
> Maybe the image was made after repair, or maybe xfs_repair doesn't detect
> anything, which as far as I remember, was the case.
> 
> 
> 
> I recreate the copy, to try "mount" on an unaltered copy.
> 
> 
> Telcontar:/data/storage_d/old_backup # time dd if=xfs_copy_home
> of=xfs_copy_home_workonit && mount -v xfs_copy_home_workonit mount/
> 1024000000+0 records in
> 1024000000+0 records out
> 524288000000 bytes (524 GB) copied, 4662.7 s, 112 MB/s
> 
> real    77m43.697s
> user    3m1.420s
> sys     28m41.958s
> mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
> (reverse-i-search)`mount': time dd if=xfs_copy_home
> Telcontar:/data/storage_d/old_backup #
> 
> 
> So it mounts...
> 
> 
> 
> 
> 
> - -- Cheers,
>        Carlos E. R.
>        (from 13.1 x86_64 "Bottle" at Telcontar)
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iEYEARECAAYFAlO0x18ACgkQtTMYHG2NR9X6QwCcD8r5qXIHVh4ELklM/tzXASds
> yskAoIcwxYNC2tKsS7wE9Jp+g4MNUdpd
> =pIZI
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-03  9:43     ` Dave Chinner
@ 2014-07-03 17:40       ` Brian Foster
  2014-07-03 23:34       ` Carlos E. R.
  1 sibling, 0 replies; 56+ messages in thread
From: Brian Foster @ 2014-07-03 17:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Carlos E. R., XFS mailing list

On Thu, Jul 03, 2014 at 07:43:47PM +1000, Dave Chinner wrote:
> On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> > On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> > >On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
> > 
> > ...
> > 
> > >This is the background eofblocks scanner attempting to free preallocated
> > >space on a file. The scanner looks for files that have been recently
> > >grown and since been flushed to disk (i.e., no longer concurrently being
> > >written to) and trims the post-eof preallocation that comes along with
> > >growing files.
> > >
> > >The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
> > >extent we are attempting to free is already accounted for in the
> > >by-block allocation btree. IOW, this is attempting to free an extent
> > >that the allocation metadata thinks is already free.
> > >
> > >>
> > >>Brief description:
> > >>
> > >>
> > >> * It happens only on restore from hibernation.
> > >
> > >Interesting, could you elaborate a bit more on the behavior this system
> > >is typically subjected to? i.e., is this a server that sees a constant
> > >workload that is also frequently hibernated/awakened?
> 
> ....
> 
> > The machine may be used anywhere from 4 to 16 hours a day, and
> > hibernated at least once a day, perhaps three times if I have to go
> > out several times. It makes no sense to me to leave the machine
> > powered doing nothing, if hibernating is so easy and reliable - till
> > now. If I have to leave for more than a week, I tend to do a full
> > "halt".
> 
> Hibernation has always been suspect w.r.t. flushing filesystem
> metadata. It does not guarantee that the filesystem is quiesced
> and idle, it just does a sync() and hopes that is sufficient to get
> the filesystem into a consistent state. The mess that this leaves is
> then left to filesystem developers to play whack-a-mole with when
> users have problems.
> 
> > But soon after, it oopses:
> 
> Point of note: there is no oops or crash occurring. XFS dumps the
> stack when a corruption occurs to tell use where it was detected
> and then shuts down the filesystem. Your system is still just fine
> apart from not being able to access that filesystem until you
> unmount it, rpeair it and mount it again.
> 
> > 3 PID: 57 Comm: kworker/3:1 Tainted: P           O 3.11.10-7-desktop
> 
> What's tainting your kernel? If you remove that taint, does the
> problem still occur?
> 
> ....
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] Enabling non-boot CPUs ...
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] smpboot: Booting Node 0 Processor 1 APIC 0x1
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832336] CPU1 is up
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832467] smpboot: Booting Node 0 Processor 2 APIC 0x2
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.845865] CPU2 is up
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.846034] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.859609] CPU3 is up
> ....
> > <0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.796130] PM: restore of devices complete after 2736.343 msecs
> > <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
> > <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
> .....
> > <0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
> 
> So the corruption occurred within 2s of the kernel restarting tasks
> after a hibernation. It's really looking like a hibernation issue.
> 
> > <3.4> 2014-06-29 04:51:50 Telcontar pm-utils - - -  Hibernating (95)...
> .....
> > <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] Enabling non-boot CPUs ...
> .....
> > <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
> > <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
> .....
> > <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected.  Shutting down filesystem
> > <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
> > <0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026207] usb 1-6: USB disconnect, device number 4
> > <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.025944] Restarting kernel threads ... done.
> > <0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026371] Restarting tasks ... done.
> 
> Well, there's the smoking gun. The XFS kworker is running and
> reporting errors before the thawing process has restarted
> the frozen workqueues:
> 
> void thaw_kernel_threads(void)
> {
>         struct task_struct *g, *p;
> 
>         pm_nosig_freezing = false;
>         printk("Restarting kernel threads ... ");
> 
>         thaw_workqueues();
> ....
> 
> Which points to the fact that we probably need WQ_FREEZABLE on some
> of our workqueues. Brian, do you want to have a look at this?
> 

Yeah, I'll look into it. I might see if I can try to reproduce this by
suspending a vm. It sounds like a preallocating workload and a reduced
eofblocks scan timer test might be worth a shot. Thanks Dave.

Brian

> > Question.
> > 
> > As this always happens on recovery from hibernation, and seeing the message
> > "Corruption of in-memory data detected", could it be that thawing does a bad
> > memory recovery from the swap?  I thought that the procedure includes some
> > checksum, but I don't know for sure.
> 
> It's the fact that the filesystem si still running and modifying
> state when the snapshot is being taken that results in the snapshot
> image containing an inconsistent snapshot. That then gets loaded
> on thaw and it goes boom.
> 
> > To me, there are two problems:
> > 
> >  1) The corruption itself.
> >  2) That xfs_repair fails to repair the filesystem. In fact, I believe
> >     it does not detect it!
> 
> That's because the filesystem is likely to be consistent on disk.
> The issue is in-memory corruption, not on-disk corruption, like
> the messages are telling us:
> 
> XFS (sde5): Corruption of in-memory data detected.
> 
> Basically, XFS is catching a bad state in memory and preventing it
> from being propagated to disk. if it gets to disk, then you are
> likely to lose data. IOWs, XFS is behaving as designed and is
> actually preventing data loss in this situation.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-03  9:43     ` Dave Chinner
  2014-07-03 17:40       ` Brian Foster
@ 2014-07-03 23:34       ` Carlos E. R.
  2014-07-04  0:04         ` Dave Chinner
  1 sibling, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-07-03 23:34 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Thursday, 2014-07-03 at 19:43 +1000, Dave Chinner wrote:
> On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
>> On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
>>> On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
>>
>> ...

>> hibernated at least once a day, perhaps three times if I have to go
>> out several times. It makes no sense to me to leave the machine
>> powered doing nothing, if hibernating is so easy and reliable - till
>> now. If I have to leave for more than a week, I tend to do a full
>> "halt".
>
> Hibernation has always been suspect w.r.t. flushing filesystem
> metadata. It does not guarantee that the filesystem is quiesced
> and idle, it just does a sync() and hopes that is sufficient to get
> the filesystem into a consistent state. The mess that this leaves is
> then left to filesystem developers to play whack-a-mole with when
> users have problems.


Ah, but my problem would then not happen always on the same partition. It 
would affect others, would not?




>> But soon after, it oopses:
>
> Point of note: there is no oops or crash occurring. XFS dumps the
> stack when a corruption occurs to tell use where it was detected
> and then shuts down the filesystem. Your system is still just fine
> apart from not being able to access that filesystem until you
> unmount it, rpeair it and mount it again.

Ok, true, there is no formal "Oops".

But no, the system does not remains fine, I had to hit the hardware reset 
or power off button to get out.



>> 3 PID: 57 Comm: kworker/3:1 Tainted: P           O 3.11.10-7-desktop
>
> What's tainting your kernel? If you remove that taint, does the
> problem still occur?

Sorry, I can't find that out. It is either the nvidia driver, or the 
vmware kernel module. I can temporarily remove it for some days, but 
hardly for a month. I agree that it might have unknown influence on the 
initial corruption, but not on doing the repair, which I do in text mode, 
or with another boot partition that doesn't have that driver.

That is, it would not have influence on "xfs_repair", when done on a non 
tainted system.


I don't know of a way to provoking the problem at will, in order to remove 
the taint for a brief period :-?


>> <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
>> <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
> .....
>> <0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
>
> So the corruption occurred within 2s of the kernel restarting tasks
> after a hibernation. It's really looking like a hibernation issue.

It's got to be related, of course.



>> Question.
>>
>> As this always happens on recovery from hibernation, and seeing the message
>> "Corruption of in-memory data detected", could it be that thawing does a bad
>> memory recovery from the swap?  I thought that the procedure includes some
>> checksum, but I don't know for sure.
>
> It's the fact that the filesystem si still running and modifying
> state when the snapshot is being taken that results in the snapshot
> image containing an inconsistent snapshot. That then gets loaded
> on thaw and it goes boom.

But it only happens on the /home partition, not on the email partition, 
for instance, also in the same hard disk.

Unless... there are probably more things writing on the home partition 
than on the mail partition any time.



>> To me, there are two problems:
>>
>>  1) The corruption itself.
>>  2) That xfs_repair fails to repair the filesystem. In fact, I believe
>>     it does not detect it!
>
> That's because the filesystem is likely to be consistent on disk.
> The issue is in-memory corruption, not on-disk corruption, like
> the messages are telling us:

No, the on disk filesystem is not healthy. If I continue using it, after 
reboot and using "xfs_repair" several times, it fails again within a day.

I got after booting (the first event):

0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all


And some hours later:

<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo


So, instead of using xfs_repair, I re-formatted and restored backup, which 
worked for a month till next event.



- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO16JwACgkQtTMYHG2NR9VmzQCdHaeuKC3UkLWWzHRewx7wTC/N
zKAAn3VKi2bBYLrUA4edokFQ8RWXGm5z
=F5YK
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-03 23:34       ` Carlos E. R.
@ 2014-07-04  0:04         ` Dave Chinner
  2014-07-04  1:29           ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2014-07-04  0:04 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Fri, Jul 04, 2014 at 01:34:52AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Thursday, 2014-07-03 at 19:43 +1000, Dave Chinner wrote:
> >On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> >>On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> >>>On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
> >>
> >>...
> 
> >>hibernated at least once a day, perhaps three times if I have to go
> >>out several times. It makes no sense to me to leave the machine
> >>powered doing nothing, if hibernating is so easy and reliable - till
> >>now. If I have to leave for more than a week, I tend to do a full
> >>"halt".
> >
> >Hibernation has always been suspect w.r.t. flushing filesystem
> >metadata. It does not guarantee that the filesystem is quiesced
> >and idle, it just does a sync() and hopes that is sufficient to get
> >the filesystem into a consistent state. The mess that this leaves is
> >then left to filesystem developers to play whack-a-mole with when
> >users have problems.
> 
> 
> Ah, but my problem would then not happen always on the same
> partition. It would affect others, would not?

It needs a busy/dirty filesystem. if the other filesystems are
mostly idle, then they are unlikely to trip over the problem.

> >>But soon after, it oopses:
> >
> >Point of note: there is no oops or crash occurring. XFS dumps the
> >stack when a corruption occurs to tell use where it was detected
> >and then shuts down the filesystem. Your system is still just fine
> >apart from not being able to access that filesystem until you
> >unmount it, rpeair it and mount it again.
> 
> Ok, true, there is no formal "Oops".
> 
> But no, the system does not remains fine, I had to hit the hardware
> reset or power off button to get out.

That usually only happens when the root filesystem is shut down and
you can't access any of the binaries needed to run the system. Is
the filesystem that is shutting down the root?

> >>Question.
> >>
> >>As this always happens on recovery from hibernation, and seeing the message
> >>"Corruption of in-memory data detected", could it be that thawing does a bad
> >>memory recovery from the swap?  I thought that the procedure includes some
> >>checksum, but I don't know for sure.
> >
> >It's the fact that the filesystem si still running and modifying
> >state when the snapshot is being taken that results in the snapshot
> >image containing an inconsistent snapshot. That then gets loaded
> >on thaw and it goes boom.
> 
> But it only happens on the /home partition, not on the email
> partition, for instance, also in the same hard disk.

/home is typically where all the application have open files and are
writing data to.

Email partitions are unlikely to have problems because email
programs are pretty good about using fsync() to ensure your email
doesn't go missing and so aren't dirty at the time of a hibernation.

> Unless... there are probably more things writing on the home
> partition than on the mail partition any time.

*nod*

> >>To me, there are two problems:
> >>
> >> 1) The corruption itself.
> >> 2) That xfs_repair fails to repair the filesystem. In fact, I believe
> >>    it does not detect it!
> >
> >That's because the filesystem is likely to be consistent on disk.
> >The issue is in-memory corruption, not on-disk corruption, like
> >the messages are telling us:
> 
> No, the on disk filesystem is not healthy. If I continue using it,
> after reboot and using "xfs_repair" several times, it fails again
> within a day.

After at least one hibernation and thaw cycle, right?

FWIW, to rule out other issues with repair, you should probably
upgrade to the 3.2.0 xfsprogs release...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04  0:04         ` Dave Chinner
@ 2014-07-04  1:29           ` Carlos E. R.
  2014-07-04  1:40             ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-07-04  1:29 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Friday, 2014-07-04 at 10:04 +1000, Dave Chinner wrote:
> On Fri, Jul 04, 2014 at 01:34:52AM +0200, Carlos E. R. wrote:

>> Ah, but my problem would then not happen always on the same
>> partition. It would affect others, would not?
>
> It needs a busy/dirty filesystem. if the other filesystems are
> mostly idle, then they are unlikely to trip over the problem.

Right...

>> Ok, true, there is no formal "Oops".
>>
>> But no, the system does not remains fine, I had to hit the hardware
>> reset or power off button to get out.
>
> That usually only happens when the root filesystem is shut down and
> you can't access any of the binaries needed to run the system. Is
> the filesystem that is shutting down the root?

No, it is not. Root is separate and using ext4. The problematic one is 
/home.


What I did, as far I remember, was, when I noticed that home had failed 
and was read only, to switch to runlevel 1, umount /home (killing the apps 
that were still using it), then tried to mount it again to replay the log, 
prior to using xfs-repair on it. Mount hung. ctrl-alt-supr failed, or 
appeared to fail. So reset button...



>> But it only happens on the /home partition, not on the email
>> partition, for instance, also in the same hard disk.
>
> /home is typically where all the application have open files and are
> writing data to.
>
> Email partitions are unlikely to have problems because email
> programs are pretty good about using fsync() to ensure your email
> doesn't go missing and so aren't dirty at the time of a hibernation.

Ok, understood.


>> No, the on disk filesystem is not healthy. If I continue using it,
>> after reboot and using "xfs_repair" several times, it fails again
>> within a day.
>
> After at least one hibernation and thaw cycle, right?

Yes. 3, I think.

But there were kernel errors right after boot (XFS_WANT_CORRUPTED_RETURN).


> FWIW, to rule out other issues with repair, you should probably
> upgrade to the 3.2.0 xfsprogs release...

I may try that... I see it is available on http://download.opensuse.org/repositories/filesystems/openSUSE_13.1/,
version xfsprogs-3.2.0


Ok, I'll work on it.


- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO2A4cACgkQtTMYHG2NR9UABgCfZm0bbTGbOU80+V7BKyCi9cdB
yqAAn16udhFKpvx+ABdb/rplzZV7Kal+
=jaUl
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04  1:29           ` Carlos E. R.
@ 2014-07-04  1:40             ` Dave Chinner
  2014-07-04  2:42               ` Carlos E. R.
  2014-07-04 12:40               ` Brian Foster
  0 siblings, 2 replies; 56+ messages in thread
From: Dave Chinner @ 2014-07-04  1:40 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Fri, Jul 04, 2014 at 03:29:31AM +0200, Carlos E. R. wrote:
> On Friday, 2014-07-04 at 10:04 +1000, Dave Chinner wrote:
> >On Fri, Jul 04, 2014 at 01:34:52AM +0200, Carlos E. R. wrote:
> >>Ok, true, there is no formal "Oops".
> >>
> >>But no, the system does not remains fine, I had to hit the hardware
> >>reset or power off button to get out.
> >
> >That usually only happens when the root filesystem is shut down and
> >you can't access any of the binaries needed to run the system. Is
> >the filesystem that is shutting down the root?
> 
> No, it is not. Root is separate and using ext4. The problematic one
> is /home.
> 
> 
> What I did, as far I remember, was, when I noticed that home had
> failed and was read only, to switch to runlevel 1, umount /home
> (killing the apps that were still using it), then tried to mount it
> again to replay the log, prior to using xfs-repair on it. Mount
> hung. ctrl-alt-supr failed, or appeared to fail. So reset button...

That's a completely different issue to having a shutdown filesystem
hang your system. That's a mount problem, and likely a known issue.
You need to be specific when describing a problem, otherwise we
waste time going down the wrong paths.

> >>No, the on disk filesystem is not healthy. If I continue using it,
> >>after reboot and using "xfs_repair" several times, it fails again
> >>within a day.
> >
> >After at least one hibernation and thaw cycle, right?
> 
> Yes. 3, I think.

Then hibernation has caused the corruption. It may take some time
for the corruption to be detected, but there isn't any doubt in my
mind that hibernation is the cause of your problems.

So, until we have kernel fixes, you'd do best to turn off
hibernation. If you can't live with leaving your machine powered up
or switching it off, then use suspend-to-ram rather than
suspend-to-disk to avoid the problematic snapshot/restore
situation....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04  1:40             ` Dave Chinner
@ 2014-07-04  2:42               ` Carlos E. R.
  2014-07-04  3:12                 ` Carlos E. R.
  2014-07-04 12:40               ` Brian Foster
  1 sibling, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-07-04  2:42 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Friday, 2014-07-04 at 11:40 +1000, Dave Chinner wrote:

> On Fri, Jul 04, 2014 at 03:29:31AM +0200, Carlos E. R. wrote:

>> No, it is not. Root is separate and using ext4. The problematic one
>> is /home.
>>
>>
>> What I did, as far I remember, was, when I noticed that home had
>> failed and was read only, to switch to runlevel 1, umount /home
>> (killing the apps that were still using it), then tried to mount it
>> again to replay the log, prior to using xfs-repair on it. Mount
>> hung. ctrl-alt-supr failed, or appeared to fail. So reset button...
>
> That's a completely different issue to having a shutdown filesystem
> hang your system. That's a mount problem, and likely a known issue.
> You need to be specific when describing a problem, otherwise we
> waste time going down the wrong paths.

Sorry for the misunderstanding.

But halt/reboot did hung, even if it was after a failed mount. I was 
trying to recover the system, remember, and I'm trying to remember what 
exactly I did do, from memory, not written records.

>>>> No, the on disk filesystem is not healthy. If I continue using it,
>>>> after reboot and using "xfs_repair" several times, it fails again
>>>> within a day.
>>>
>>> After at least one hibernation and thaw cycle, right?
>>
>> Yes. 3, I think.
>
> Then hibernation has caused the corruption. It may take some time
> for the corruption to be detected, but there isn't any doubt in my
> mind that hibernation is the cause of your problems.

Wait.

The sequence was:

   healthy system
   several hibernation cycles.
   failure on come back from hibernation, with kernel error: XFS_WANT_CORRUPTED_GOTO.

   reboot - kernel error messages: XFS_WANT_CORRUPTED_RETURN, which I probably did not see.
   repair filesytem
   several hibernation cycles during some hours.
   failure on come back from hibernation, with kernel error: XFS_WANT_CORRUPTED_GOTO


See that there were kernel error messages right after rebooting, which I 
think I did not see at the time, because had I seen them I would have 
rebooted again, and I did not.


- From the log, already posted:

   <0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.173599] XFS (sdd5): Mounting Filesystem
   <0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.377918] XFS (sdd5): Starting recovery (logdev: internal)
   <0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.747914] XFS (sdd5): Ending recovery (logdev: internal)

   <3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Starting Default.
   <3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Reached target Default.
   <3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Startup finished in 57ms.
   <3.6> 2014-03-15 03:53:01 Telcontar systemd 1 - -  Started User Manager for 9.
   <0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all



Then I think I run xfs-repair, which did not complain, and I continued 
working. Within the day, after 3 hibernations, it failed again with 
XFS_WANT_CORRUPTED_GOTO, and I decided I had to reboot, backup, reformat, 
restore.




> So, until we have kernel fixes, you'd do best to turn off
> hibernation. If you can't live with leaving your machine powered up
> or switching it off, then use suspend-to-ram rather than
> suspend-to-disk to avoid the problematic snapshot/restore
> situation....

Impossible... this is a desktop, not a laptop. Suspend to ram is high 
risk, even if it works (which I think it doesn't).

If the failure is unavoidable, I'll reformat the partition as ext4 
instead... which I do not like, but such is life.


But before that, I'll try upgrade xfsprogs.


- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO2FKQACgkQtTMYHG2NR9USxgCeOdJeJORl2JpnsnhqtXDj2ZCL
3IIAniMFd9X+ETWr3gVPHYq7SFwIPKSt
=WPe7
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04  2:42               ` Carlos E. R.
@ 2014-07-04  3:12                 ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-04  3:12 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Friday, 2014-07-04 at 04:42 +0200, Carlos E. R. wrote:
> On Friday, 2014-07-04 at 11:40 +1000, Dave Chinner wrote:

>> So, until we have kernel fixes, you'd do best to turn off
>> hibernation. If you can't live with leaving your machine powered up
>> or switching it off, then use suspend-to-ram rather than
>> suspend-to-disk to avoid the problematic snapshot/restore
>> situation....

Forgot to mention:

I have been working the same way for years on this same machine, and with 
the same software versions for some months. Only when I replaced the hard 
disk that contains home, mail, and some other things, the problem started.

The partitions not cloned; I partitioned and formatted fresh (much bigger 
partitions), with gparted, and copied files over with rsync.

- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO2G4gACgkQtTMYHG2NR9UtpACdFnxw8/nZKAVI/Hy7s2bVF41j
/+8AoJLmY2ZuyX+kKeXNzo9/6BOVx4T0
=pZAc
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04  1:40             ` Dave Chinner
  2014-07-04  2:42               ` Carlos E. R.
@ 2014-07-04 12:40               ` Brian Foster
  2014-07-04 13:36                 ` Carlos E. R.
  1 sibling, 1 reply; 56+ messages in thread
From: Brian Foster @ 2014-07-04 12:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Carlos E. R., XFS mailing list

On Fri, Jul 04, 2014 at 11:40:08AM +1000, Dave Chinner wrote:
> On Fri, Jul 04, 2014 at 03:29:31AM +0200, Carlos E. R. wrote:
> > On Friday, 2014-07-04 at 10:04 +1000, Dave Chinner wrote:
> > >On Fri, Jul 04, 2014 at 01:34:52AM +0200, Carlos E. R. wrote:
> > >>Ok, true, there is no formal "Oops".
> > >>
> > >>But no, the system does not remains fine, I had to hit the hardware
> > >>reset or power off button to get out.
> > >
> > >That usually only happens when the root filesystem is shut down and
> > >you can't access any of the binaries needed to run the system. Is
> > >the filesystem that is shutting down the root?
> > 
> > No, it is not. Root is separate and using ext4. The problematic one
> > is /home.
> > 
> > 
> > What I did, as far I remember, was, when I noticed that home had
> > failed and was read only, to switch to runlevel 1, umount /home
> > (killing the apps that were still using it), then tried to mount it
> > again to replay the log, prior to using xfs-repair on it. Mount
> > hung. ctrl-alt-supr failed, or appeared to fail. So reset button...
> 
> That's a completely different issue to having a shutdown filesystem
> hang your system. That's a mount problem, and likely a known issue.
> You need to be specific when describing a problem, otherwise we
> waste time going down the wrong paths.
> 
> > >>No, the on disk filesystem is not healthy. If I continue using it,
> > >>after reboot and using "xfs_repair" several times, it fails again
> > >>within a day.
> > >
> > >After at least one hibernation and thaw cycle, right?
> > 
> > Yes. 3, I think.
> 
> Then hibernation has caused the corruption. It may take some time
> for the corruption to be detected, but there isn't any doubt in my
> mind that hibernation is the cause of your problems.
> 
> So, until we have kernel fixes, you'd do best to turn off
> hibernation. If you can't live with leaving your machine powered up
> or switching it off, then use suspend-to-ram rather than
> suspend-to-disk to avoid the problematic snapshot/restore
> situation....
> 

FWIW, I ran through a bunch of hibernation tests yesterday and couldn't
seem to reproduce anything interesting. I ran a preallocating workload
while constantly hibernating and waking a vm. I also tried using a hack
to avoid the eofblocks trim on release to make the test more effective,
and another to invoke the hibernation from the eofblocks background
scanner to "improve" the chances of conflict. I also ran a truncate test
to stress xfs_itruncate_extents() during hibernation cycles (there's
actually an instance of this in Carlos' reported output that doesn't
seem to involve a workqueue, attributed to thunderbird iirc) and ran
these similar tests going back to v3.11.0 as well as the latest
3.16.0-rc2.

None of this really means anything outside of there isn't quite enough
information to reproduce. It looks simple enough to enable freezing on
the eofblocks (or other xfs) workqueues by setting a flag, so we could
go and do that, but that still isn't definite. E.g., that thunderbird
truncate instance of failure stands out a bit to me.

Carlos,

You've indicated in your previous replies that you have reproduced this
repeatedly or more easily after you hit the problem and before you run a
reformat and restore sequence, enough to give you the impression at
least that the reformat is necessary. If you have the time, could you
run some of your typical activities through some hibernation cycles in
an attempt to narrow down what might contribute to this? E.g., perhaps
this only occurs with thunderbird or some other particular application
running, etc. If you have the ability to try a more recent kernel for a
period of time, that could be interesting as well.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04 12:40               ` Brian Foster
@ 2014-07-04 13:36                 ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-04 13:36 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2014-07-04 14:40, Brian Foster wrote:


Thanks.



Yes, that's right.



Yes, certainly. I can do more hibernation cycles to try trigger it
again. Thunderbird is an application that I use a lot, it is always
open. I have several remote imap accounts, and one local imap account,
using a local dovecot daemon on another partition (which has not been
affected so far). It also pulls nntp from a local daemon (leafnode),
which uses a different partition, on reiserfs.

It is a complex setup, you see :-)



I'll investigate if it is possible.

Meanwhile, I have upgraded the xfsprogs package to version 3.2.0, and
the kernel has got an update to 3.11.10 (openSUSE policy is to
backport security patches, while maintaining the same kernel version
through the lifetime of a release, so that this kernel has in fact
additions and patches from more advanced versions).

Having upgraded xfsprogs, I'm right now in the process of
backup-format-restore this home partition again, to take advantage of
any modification this new xfsprogs package may have. I think I will
use this time rsync instead of xfsrestore, although it is much slower
- - unless you ask me to use xfsrestore.

- -- 
Cheers / Saludos,

		Carlos E. R.

  (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF4EAREIAAYFAlO2re8ACgkQja8UbcUWM1yi7QD/b0V+gASfApDWNqIaf6nceWvr
IAGUb+jFwqGeZppqdEUA/1hqknkWDC7St4kpR4SiYfdt9gzuKMX4abQ3nU2SlVlA
=mgSa
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-03 17:39     ` Brian Foster
@ 2014-07-04 21:32       ` Carlos E. R.
  2014-07-05 12:28         ` Brian Foster
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-07-04 21:32 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



[This email has been delayed, while I thought about where to upload 
metadata file - see near the end]


On Thursday, 2014-07-03 at 13:39 -0400, Brian Foster wrote:
> On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:


> Ok, so there's a lot going on. I was mainly curious to see what was
> causing lingering preallocations, but it could be anything extending a
> file multiple times.

Right.


>> AFAIK, xfsdump can not carry over a filesystem corruption, right?
>
> I think that's accurate, though it might complain/fail in the act of
> dumping an fs that is corrupted. The behavior here suggests there might
> not be on disk corruption, however.

At least, not a detectable one.

If I don't do that backup-format-restore, I get issues soon, and it 
crashes within a day - I got after booting (the first event):

0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all

And some hours later:

<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo


It was here that I decided to backup-format-restore instead.


>> Maybe next time I can take the photo with dd before doing anything else (it
>> takes about 80 minutes), or simply do an "xfs_metadump", which should be
>> faster.  And I might not have then 500 GiB of free space to make a dd copy,
>> anyway.
>>
>
> xfs_metadump should be faster. It will grab the metadata only and
> obfuscate filenames so as to hide sensitive information.


Ok, I have a post-it label on the monitor so that I remember - my notes 
are typically stored in the home partition :-)


But the obfuscation is not complete, I can recognize file names:


00008DC0   .leeme.kfPTgt . ....... .2aujzfJ.%;u. .   .0...
00008DF0    .pepe_after_gnome.tar.bz2.vcTJ8c.@.. . .......
00008E20   .amyN3xYjaldFXYpeUry. 3;&.K.. ..  .0... !.pepe_j
00008E50   ust_created.tar.bz2.JlyD0W .. .@....... .NGb0URO
00008E80   C0Bh9cHwp-hBh.6wMS .. .p  . ... ..registro.0DPzS
00008EB0   G  .. . ....... .8n-.w$.9. .. .   .8... +.suse_u
00008EE0   pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
00008F10   #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. ..  .8...
00008F40   '.suse_upgrade_to_102_pkglist.txt.0KTuDa  7.. .8


I just had a quick look with 'mc', the dump is to large too inspect it 
all.


>> Question.
>>
>> As this always happens on recovery from hibernation, and seeing the message
>> "Corruption of in-memory data detected", could it be that thawing does a bad
>> memory recovery from the swap?  I thought that the procedure includes some
>> checksum, but I don't know for sure.
>>
>
> Not sure, though if so I would think that might be a more common source
> of problems.

And it only affects my /home partition - although it may be the busiest 
one.


>> To me, there are two problems:
>>
>>  1) The corruption itself.
>>  2) That xfs_repair fails to repair the filesystem. In fact, I believe
>>     it does not detect it!
>>
>> To me, #2 is the worst, and it is what makes me do the backup, format,
>> restore cycle for recovery. An occassional kernel crash is somewhat
>> acceptable :-}
>>
>
> Well it could be that the "corruption" is gone at the point of a
> remount. E.g., something becomes inconsistent in memory, the fs detects
> it and shuts down before going any further. That's actually a positive.
> ;)
>
> That also means it's probably not be necessary to do a full backup,
> reformat and restore sequence as part of your routine here. xfs_repair
> should scour through all of the allocation metadata and yell if it finds
> something like free blocks allocated to a file.

No, if I don't backup-format-restore it happens again within a day. There 
is something lingering. Unless that was just chance... :-?

It is true that during that day I hibernated several times more than 
needed to see if it happened again - and it did.



>>> I'm curious if something like an 'rm -rf *' on the metadump
>>> would catch any other corruptions or if this is indeed limited to
>>> something associated with recent (pre)allocations.
>>
>> Sorry, run 'rm -rf *' where???
>>
>
> On the metadump... mainly just to see whether freeing all of the used
> blocks in the fs triggered any other errors (i.e., a brute force way to
> check for further corruptions).

Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.

If I run:

Telcontar:/data/storage_d/old_backup # ls -lh
total 604G
drwxr-xr-x 22 root root  4.0K Mar  8 20:30 home
drwxr-xr-x  3 root root    16 Sep 25  2010 home1
drwxr-xr-x  2 root root     6 Jul  3 02:36 mount
- -rw-r--r--  1 root root    45 Jul  3 04:25 procedure
- -rw-r--r--  1 root root  388M Jul  3 02:42 tgtfile
- -rw-r--r--  1 root root   11M Jul  3 02:50 tgtfile2.xz
- -rw-r--r--  1 root users 489G Mar 16 05:42 xfs_copy_home
- -rw-r--r--  1 root root  489G Jul  3 04:40 xfs_copy_home_workonit
- -rw-r--r--  1 root users  39G Mar 16 05:49 xfsdump__home
- -rw-r--r--  1 root users  39G Mar 16 05:57 xfsdump__home1
Telcontar:/data/storage_d/old_backup # rm -rf *


that would destroy my entire backup!


If you mean:

  rm -rf tgtfile

I fail to see what that would accomplish, except to remove a file that is actually on a different partition, not home.

However, I can do:

Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
Telcontar:/data/storage_d/old_backup # cd mount
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*

real    2m45.380s
user    0m0.265s
sys     0m6.878s
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # ls -la
total 4
drwxr-xr-x 2 root root    6 Jul  4 01:56 .
drwxr-xr-x 5 root root 4096 Jul  3 04:25 ..
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      489G   33M  489G   1% /data/storage_d/old_backup/mount
Telcontar:/data/storage_d/old_backup/mount #


And I do not see anything on the log, only that it mounted cleanly.



>> Meanwhile, I have done a xfs_metadump of the image, and compressed it with
>> xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
>> that, and even less to a mail list.
>>
>> Do you still have a bugzilla system where I can upload it? I had an account
>> at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
>> runs :-?


I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm 
logged in there now. I haven't checked if I can create a bug, not been 
sure what parameters to use (product, component, whom to assign to). I 
think that would be the most appropriate place.

Meanwhile, I have uploaded the file to my google drive account, so I can 
share it with anybody on request - ie, it is not public, I need to add a 
gmail address to the list of people that can read the file.

Alternatively, I could just email the file to people asking for it, 
offlist, but not in a single email, in chunks limited to 1.5 MB per 
email.


> I think http://bugzilla.redhat.com should allow you to file a bug and
> attach the file.

Sorry, I don't have an account there...

I do have one at openSUSE, though, and it does allow me to attach files, up 
to a limit. If the file is to big, it can be fragmented in pieces. But I 
will not use it unless you people say that you have an account there.

For using a bugzilla, the most appropriate one would be at SGI, IMHO, if 
they are still supporting this project.

- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO3HXUACgkQtTMYHG2NR9VndgCgillZYmQCvUynytO/7YALlUyv
c9gAnj8GmFfnMHGd+P9GaWm9ScVVTH81
=GEXl
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-04 21:32       ` Carlos E. R.
@ 2014-07-05 12:28         ` Brian Foster
  2014-07-12  0:30           ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Brian Foster @ 2014-07-05 12:28 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Fri, Jul 04, 2014 at 11:32:26PM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> [This email has been delayed, while I thought about where to upload metadata
> file - see near the end]
> 
> 
> On Thursday, 2014-07-03 at 13:39 -0400, Brian Foster wrote:
> >On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> 
> 
> >Ok, so there's a lot going on. I was mainly curious to see what was
> >causing lingering preallocations, but it could be anything extending a
> >file multiple times.
> 
> Right.
> 
> 
> >>AFAIK, xfsdump can not carry over a filesystem corruption, right?
> >
> >I think that's accurate, though it might complain/fail in the act of
> >dumping an fs that is corrupted. The behavior here suggests there might
> >not be on disk corruption, however.
> 
> At least, not a detectable one.
> 
> If I don't do that backup-format-restore, I get issues soon, and it crashes
> within a day - I got after booting (the first event):
> 

I echo Dave's previous question... within a day of doing what? Just
using the system or doing more hibernation cycles?

> 0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
> 
> And some hours later:
> 
> <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
> 
> 
> It was here that I decided to backup-format-restore instead.
> 
> 
> >>Maybe next time I can take the photo with dd before doing anything else (it
> >>takes about 80 minutes), or simply do an "xfs_metadump", which should be
> >>faster.  And I might not have then 500 GiB of free space to make a dd copy,
> >>anyway.
> >>
> >
> >xfs_metadump should be faster. It will grab the metadata only and
> >obfuscate filenames so as to hide sensitive information.
> 
> 
> Ok, I have a post-it label on the monitor so that I remember - my notes are
> typically stored in the home partition :-)
> 
> 
> But the obfuscation is not complete, I can recognize file names:
> 
> 
> 00008DC0   .leeme.kfPTgt . ....... .2aujzfJ.%;u. .   .0...
> 00008DF0    .pepe_after_gnome.tar.bz2.vcTJ8c.@.. . .......
> 00008E20   .amyN3xYjaldFXYpeUry. 3;&.K.. ..  .0... !.pepe_j
> 00008E50   ust_created.tar.bz2.JlyD0W .. .@....... .NGb0URO
> 00008E80   C0Bh9cHwp-hBh.6wMS .. .p  . ... ..registro.0DPzS
> 00008EB0   G  .. . ....... .8n-.w$.9. .. .   .8... +.suse_u
> 00008EE0   pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
> 00008F10   #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. ..  .8...
> 00008F40   '.suse_upgrade_to_102_pkglist.txt.0KTuDa  7.. .8
> 
> 
> I just had a quick look with 'mc', the dump is to large too inspect it all.
> 
> 
> >>Question.
> >>
> >>As this always happens on recovery from hibernation, and seeing the message
> >>"Corruption of in-memory data detected", could it be that thawing does a bad
> >>memory recovery from the swap?  I thought that the procedure includes some
> >>checksum, but I don't know for sure.
> >>
> >
> >Not sure, though if so I would think that might be a more common source
> >of problems.
> 
> And it only affects my /home partition - although it may be the busiest one.
> 
> 
> >>To me, there are two problems:
> >>
> >> 1) The corruption itself.
> >> 2) That xfs_repair fails to repair the filesystem. In fact, I believe
> >>    it does not detect it!
> >>
> >>To me, #2 is the worst, and it is what makes me do the backup, format,
> >>restore cycle for recovery. An occassional kernel crash is somewhat
> >>acceptable :-}
> >>
> >
> >Well it could be that the "corruption" is gone at the point of a
> >remount. E.g., something becomes inconsistent in memory, the fs detects
> >it and shuts down before going any further. That's actually a positive.
> >;)
> >
> >That also means it's probably not be necessary to do a full backup,
> >reformat and restore sequence as part of your routine here. xfs_repair
> >should scour through all of the allocation metadata and yell if it finds
> >something like free blocks allocated to a file.
> 
> No, if I don't backup-format-restore it happens again within a day. There is
> something lingering. Unless that was just chance... :-?
> 
> It is true that during that day I hibernated several times more than needed
> to see if it happened again - and it did.
> 

This depends on what causes this to happen, not how frequent it happens.
Does it continue to happen along with hibernation, or do you start
seeing these kind of errors during normal use?

If the latter, that could suggest something broken on disk. If the
former, that could simply suggest the fs (perhaps on-disk) has made it
into some kind of state that makes this easier to reproduce, for
whatever reason. It could be timing, location of metadata,
fragmentation, or anything really for that matter, but it doesn't
necessarily mean corruption (even though it doesn't rule it out).
Perhaps the clean regeneration of everything by a from-scratch recovery
simply makes this more difficult to reproduce until the fs naturally
becomes more aged/fragmented, for example.

This probably makes a pristine, pre-repair metadump of the reproducing
fs more interesting. I could try some of my previous tests against a
restore of that metadump.

> 
> 
> >>>I'm curious if something like an 'rm -rf *' on the metadump
> >>>would catch any other corruptions or if this is indeed limited to
> >>>something associated with recent (pre)allocations.
> >>
> >>Sorry, run 'rm -rf *' where???
> >>
> >
> >On the metadump... mainly just to see whether freeing all of the used
> >blocks in the fs triggered any other errors (i.e., a brute force way to
> >check for further corruptions).
> 
> Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.
> 
> If I run:
> 
> Telcontar:/data/storage_d/old_backup # ls -lh
> total 604G
> drwxr-xr-x 22 root root  4.0K Mar  8 20:30 home
> drwxr-xr-x  3 root root    16 Sep 25  2010 home1
> drwxr-xr-x  2 root root     6 Jul  3 02:36 mount
> - -rw-r--r--  1 root root    45 Jul  3 04:25 procedure
> - -rw-r--r--  1 root root  388M Jul  3 02:42 tgtfile
> - -rw-r--r--  1 root root   11M Jul  3 02:50 tgtfile2.xz
> - -rw-r--r--  1 root users 489G Mar 16 05:42 xfs_copy_home
> - -rw-r--r--  1 root root  489G Jul  3 04:40 xfs_copy_home_workonit
> - -rw-r--r--  1 root users  39G Mar 16 05:49 xfsdump__home
> - -rw-r--r--  1 root users  39G Mar 16 05:57 xfsdump__home1
> Telcontar:/data/storage_d/old_backup # rm -rf *
> 
> 
> that would destroy my entire backup!
> 

I was somewhat thinking out loud originally discussing this topic. I was
suggesting to run this against a restored metadump, not the primary
dataset or a backup.

The metadump creates an image of the metadata of the source fs in a file
(no data is copied). This metadump image can be restored at will via
'xfs_mdrestore.' This allows restoring to a file, mounting the file
loopback, and performing experiments or investigation on the fs
generally as it existed when the shutdown was reproducible.

So basically:

- xfs_mdrestore <mdimgfile> <tmpfileimg>
- mount <tmpfileimg> /mnt
- rm -rf /mnt/*

... was what I was suggesting. <tmpfileimg> can be recreated from the
metadump image afterwards to get back to square one.

> 
> If you mean:
> 
>  rm -rf tgtfile
> 
> I fail to see what that would accomplish, except to remove a file that is actually on a different partition, not home.
> 
> However, I can do:
> 
> Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
> mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
> Telcontar:/data/storage_d/old_backup # cd mount
> Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
> Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
> 
> real    2m45.380s
> user    0m0.265s
> sys     0m6.878s
> Telcontar:/data/storage_d/old_backup/mount #
> Telcontar:/data/storage_d/old_backup/mount # ls -la
> total 4
> drwxr-xr-x 2 root root    6 Jul  4 01:56 .
> drwxr-xr-x 5 root root 4096 Jul  3 04:25 ..
> Telcontar:/data/storage_d/old_backup/mount #
> Telcontar:/data/storage_d/old_backup/mount # df -h .
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/loop0      489G   33M  489G   1% /data/storage_d/old_backup/mount
> Telcontar:/data/storage_d/old_backup/mount #
> 
> 
> And I do not see anything on the log, only that it mounted cleanly.
> 
> 
> 
> >>Meanwhile, I have done a xfs_metadump of the image, and compressed it with
> >>xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
> >>that, and even less to a mail list.
> >>
> >>Do you still have a bugzilla system where I can upload it? I had an account
> >>at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
> >>runs :-?
> 
> 
> I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
> logged in there now. I haven't checked if I can create a bug, not been sure
> what parameters to use (product, component, whom to assign to). I think that
> would be the most appropriate place.
> 
> Meanwhile, I have uploaded the file to my google drive account, so I can
> share it with anybody on request - ie, it is not public, I need to add a
> gmail address to the list of people that can read the file.
> 
> Alternatively, I could just email the file to people asking for it, offlist,
> but not in a single email, in chunks limited to 1.5 MB per email.
> 

Either of the bugzilla or google drive options works Ok for me.

Brian

> 
> >I think http://bugzilla.redhat.com should allow you to file a bug and
> >attach the file.
> 
> Sorry, I don't have an account there...
> 
> I do have one at openSUSE, though, and it does allow me to attach files, up
> to a limit. If the file is to big, it can be fragmented in pieces. But I
> will not use it unless you people say that you have an account there.
> 
> For using a bugzilla, the most appropriate one would be at SGI, IMHO, if
> they are still supporting this project.
> 
> - -- Cheers,
>        Carlos E. R.
>        (from 13.1 x86_64 "Bottle" at Telcontar)
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iEYEARECAAYFAlO3HXUACgkQtTMYHG2NR9VndgCgillZYmQCvUynytO/7YALlUyv
> c9gAnj8GmFfnMHGd+P9GaWm9ScVVTH81
> =GEXl
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-05 12:28         ` Brian Foster
@ 2014-07-12  0:30           ` Carlos E. R.
  2014-07-12  1:30             ` Carlos E. R.
  2014-07-12 14:19             ` Brian Foster
  0 siblings, 2 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-07-12  0:30 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Saturday, 2014-07-05 at 08:28 -0400, Brian Foster wrote:
> On Fri, Jul 04, 2014 at 11:32:26PM +0200, Carlos E. R. wrote:


>> If I don't do that backup-format-restore, I get issues soon, and it crashes
>> within a day - I got after booting (the first event):
>>
>
> I echo Dave's previous question... within a day of doing what? Just
> using the system or doing more hibernation cycles?

It is in the long post with the logs I posted.

The first time it crashed, I rebooted, got some errors I probably did not 
see, managed to mount the device, and I used the machine normally, doing 
several hibernation cycles. On one of these, it crashed, within the day.


As explained in this part of the previous post:

>> 0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
>>
>> And some hours later:
>>
>> <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
>>
>>
>> It was here that I decided to backup-format-restore instead.





>>> That also means it's probably not be necessary to do a full backup,
>>> reformat and restore sequence as part of your routine here. xfs_repair
>>> should scour through all of the allocation metadata and yell if it finds
>>> something like free blocks allocated to a file.
>>
>> No, if I don't backup-format-restore it happens again within a day. There is
>> something lingering. Unless that was just chance... :-?
>>
>> It is true that during that day I hibernated several times more than needed
>> to see if it happened again - and it did.
>>
>
> This depends on what causes this to happen, not how frequent it happens.
> Does it continue to happen along with hibernation, or do you start
> seeing these kind of errors during normal use?


Except the first time that this happened, the sequence is this:

I use the machine for weeks, without event, booting once, then hibernating 
at least once per day. I finally reboot when I have to apply some 
system update, or something special.

Till one day, this "thing" happens. It happens inmediately after coming 
out from hibernation, and puts the affected partition, always /home, in 
read only mode. When it happens, I reboot, repair partition manually if 
needed, then I back up the files, format it, and replace all the files 
from the backup just made, with xfsdump. Well, this last time, I used 
rsync instead.


It has happened "only" four times:

2014-03-15 03:35:17
2014-03-15 22:20:34
2014-04-17 22:47:08
2014-06-29 12:32:18


> If the latter, that could suggest something broken on disk.

That was my first thought, because it started hapening after replacing the 
hard disk, but also after a kernel update. But I have tested that disk 
several times, with smartctl and with the manufacturer test tool, and 
nothing came out.


> If the
> former, that could simply suggest the fs (perhaps on-disk) has made it
> into some kind of state that makes this easier to reproduce, for
> whatever reason. It could be timing, location of metadata,
> fragmentation, or anything really for that matter, but it doesn't
> necessarily mean corruption (even though it doesn't rule it out).
> Perhaps the clean regeneration of everything by a from-scratch recovery
> simply makes this more difficult to reproduce until the fs naturally
> becomes more aged/fragmented, for example.
>
> This probably makes a pristine, pre-repair metadump of the reproducing
> fs more interesting. I could try some of my previous tests against a
> restore of that metadump.


Well, I suggest that, unless you can find something on the metadata (I 
just sent you the link via email from google), we wait till the next 
event. I will at that time take an intact metadata photo. But this can 
take a month or two to happen again, if the pattern keeps.




> I was somewhat thinking out loud originally discussing this topic. I was
> suggesting to run this against a restored metadump, not the primary
> dataset or a backup.
>
> The metadump creates an image of the metadata of the source fs in a file
> (no data is copied). This metadump image can be restored at will via
> 'xfs_mdrestore.' This allows restoring to a file, mounting the file
> loopback, and performing experiments or investigation on the fs
> generally as it existed when the shutdown was reproducible.

Ah... I see.


> So basically:
>
> - xfs_mdrestore <mdimgfile> <tmpfileimg>
> - mount <tmpfileimg> /mnt
> - rm -rf /mnt/*
>
> ... was what I was suggesting. <tmpfileimg> can be recreated from the
> metadump image afterwards to get back to square one.

I see.

Well, I tried this on a copy of the 'dd' image days ago, and nothing 
hapened. I guess the procedure above would be the same.





>> I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
>> logged in there now. I haven't checked if I can create a bug, not been sure
>> what parameters to use (product, component, whom to assign to). I think that
>> would be the most appropriate place.
>>
>> Meanwhile, I have uploaded the file to my google drive account, so I can
>> share it with anybody on request - ie, it is not public, I need to add a
>> gmail address to the list of people that can read the file.
>>
>> Alternatively, I could just email the file to people asking for it, offlist,
>> but not in a single email, in chunks limited to 1.5 MB per email.
>>
>
> Either of the bugzilla or google drive options works Ok for me.

It's here:

<https://drive.google.com/file/d/0Bx2OgfTa-XC9UDBnQzZIMTVyN0k/edit?usp=sharing>

Whoever wants to read it, has to tell me the address to add to it, access 
is not public.


- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlPAgb0ACgkQtTMYHG2NR9U/FQCgjtwuDC0HTSG3i7DrEV8+qZeT
6mUAn0FGf42SsU1WeRx/AAk4X2oqV4Bc
=pASJ
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-12  0:30           ` Carlos E. R.
@ 2014-07-12  1:30             ` Carlos E. R.
  2014-07-12  1:45               ` Carlos E. R.
  2014-07-12 14:19             ` Brian Foster
  1 sibling, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-07-12  1:30 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Saturday, 2014-07-12 at 02:30 +0200, Carlos E. R. wrote:
> On Saturday, 2014-07-05 at 08:28 -0400, Brian Foster wrote:

[xfs_metadump]

> It's here:
>
> <https://drive.google.com/file/d/0Bx2OgfTa-XC9UDBnQzZIMTVyN0k/edit?usp=sharing>
>
> Whoever wants to read it, has to tell me the address to add to it, access
> is not public.

Wait.

I just found out that I did something very wrong. That xfs_metadump file 
is very wrong, it is not look to be the correct one.


The info on it says:


Telcontar:/data/storage_d/old_backup # xfs_info tgtfile
meta-data=/dev/sdf2              isize=256    agcount=4, agsize=122341568 
blks
          =                       sectsz=512   attr=2, projid32bit=0
          =                       crc=0
data     =                       bsize=4096   blocks=489366272, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=238948, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


while the currently mounted home says:


elcontar:~ # mount | grep home
/dev/sde5 on /home type xfs (rw,noatime,attr2,inode64,noquota)

Telcontar:~ # xfs_info /dev/sde5
meta-data=/dev/sde5              isize=256    agcount=4, agsize=32000000 
blks
          =                       sectsz=512   attr=2, projid32bit=1
          =                       crc=0
data     =                       bsize=4096   blocks=128000000, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=62500, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Telcontar:~ # mount | grep /home




So, please wait till I verify things again. Tomorrow, it is 3 AM here. 
Sorry :-(


Unless "xfs_info tgtfile" gives the information about the device where 
"tgtfile" is stored (/dev/sdf2), not on the image file itself :-?


I'm very confused.


- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlPAj6kACgkQtTMYHG2NR9X+IwCeJtpsS6XJJ1xNeLPmb6PlXA8D
C9IAn1hn1g/ty/41dG5h4ijQoXqs1N7G
=Xtcq
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-12  1:30             ` Carlos E. R.
@ 2014-07-12  1:45               ` Carlos E. R.
  2014-07-12 14:26                 ` Brian Foster
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-07-12  1:45 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Saturday, 2014-07-12 at 03:30 +0200, Carlos E. R. wrote:

> So, please wait till I verify things again. Tomorrow, it is 3 AM here.
> Sorry :-(
>
> Unless "xfs_info tgtfile" gives the information about the device where
> "tgtfile" is stored (/dev/sdf2), not on the image file itself :-?
>
> I'm very confused.

False alarm. See:


Telcontar:/data/storage_c/tmp_borrar # xfs_info tgtfile
meta-data=/dev/sde18             isize=256    agcount=4, agsize=35770496 blks
          =                       sectsz=512   attr=2, projid32bit=0
          =                       crc=0
data     =                       bsize=4096   blocks=143081984, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=69864, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Telcontar:/data/storage_c/tmp_borrar #

Telcontar:/data/storage_d/old_backup # xfs_info tgtfile
meta-data=/dev/sdf2              isize=256    agcount=4, agsize=122341568 blks
          =                       sectsz=512   attr=2, projid32bit=0
          =                       crc=0
data     =                       bsize=4096   blocks=489366272, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=238948, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #

Telcontar:/data/storage_d/old_backup # file tgtfile
tgtfile: XFS filesystem metadump image
Telcontar:/data/storage_d/old_backup #


It appears that the command "xfs_info" analyzes the current, underlying, 
filesystem, not the one given on the command line. Or something in that 
line, I'm too sleepy. I hope you can understand my meaning better than my 
words...


So the uploaded file is the correct one.


- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlPAkyMACgkQtTMYHG2NR9U2uACfTdPx8DGCkBzLGiSVGn3XCcSV
7ukAnAvR1CjR9Jx3rPosLYNceBtQjJjf
=/odv
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-12  0:30           ` Carlos E. R.
  2014-07-12  1:30             ` Carlos E. R.
@ 2014-07-12 14:19             ` Brian Foster
  1 sibling, 0 replies; 56+ messages in thread
From: Brian Foster @ 2014-07-12 14:19 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Sat, Jul 12, 2014 at 02:30:45AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Saturday, 2014-07-05 at 08:28 -0400, Brian Foster wrote:
> >On Fri, Jul 04, 2014 at 11:32:26PM +0200, Carlos E. R. wrote:
> 
> 
> >>If I don't do that backup-format-restore, I get issues soon, and it crashes
> >>within a day - I got after booting (the first event):
> >>
> >
> >I echo Dave's previous question... within a day of doing what? Just
> >using the system or doing more hibernation cycles?
> 
> It is in the long post with the logs I posted.
> 
> The first time it crashed, I rebooted, got some errors I probably did not
> see, managed to mount the device, and I used the machine normally, doing
> several hibernation cycles. On one of these, it crashed, within the day.
> 

That still suggests something could be going on at runtime during the
hibernation or wakeup cycle. Identifying some kind of runtime error or
metadata inconsistency without involving hibernation would be a smoking
gun for a general corruption. So far we have no evidence of reproduction
without hibernation and no evidence of a persistent corruption. That
doesn't rule out something going on on-disk, but it certainly suggests a
runtime corruption during hibernation/wake is more likely.

> 
> As explained in this part of the previous post:
> 
> >>0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
> >>
> >>And some hours later:
> >>
> >><0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
> >>
> >>
> >>It was here that I decided to backup-format-restore instead.
> 
> 
> 
> 
> 
> >>>That also means it's probably not be necessary to do a full backup,
> >>>reformat and restore sequence as part of your routine here. xfs_repair
> >>>should scour through all of the allocation metadata and yell if it finds
> >>>something like free blocks allocated to a file.
> >>
> >>No, if I don't backup-format-restore it happens again within a day. There is
> >>something lingering. Unless that was just chance... :-?
> >>
> >>It is true that during that day I hibernated several times more than needed
> >>to see if it happened again - and it did.
> >>
> >
> >This depends on what causes this to happen, not how frequent it happens.
> >Does it continue to happen along with hibernation, or do you start
> >seeing these kind of errors during normal use?
> 
> 
> Except the first time that this happened, the sequence is this:
> 
> I use the machine for weeks, without event, booting once, then hibernating
> at least once per day. I finally reboot when I have to apply some system
> update, or something special.
> 
> Till one day, this "thing" happens. It happens inmediately after coming out
> from hibernation, and puts the affected partition, always /home, in read
> only mode. When it happens, I reboot, repair partition manually if needed,
> then I back up the files, format it, and replace all the files from the
> backup just made, with xfsdump. Well, this last time, I used rsync instead.
> 
> 
> It has happened "only" four times:
> 
> 2014-03-15 03:35:17
> 2014-03-15 22:20:34
> 2014-04-17 22:47:08
> 2014-06-29 12:32:18
> 
> 
> >If the latter, that could suggest something broken on disk.
> 
> That was my first thought, because it started hapening after replacing the
> hard disk, but also after a kernel update. But I have tested that disk
> several times, with smartctl and with the manufacturer test tool, and
> nothing came out.
> 

I was referring to a potential on-disk corruption, but that's good to
know as well.

> 
> >If the
> >former, that could simply suggest the fs (perhaps on-disk) has made it
> >into some kind of state that makes this easier to reproduce, for
> >whatever reason. It could be timing, location of metadata,
> >fragmentation, or anything really for that matter, but it doesn't
> >necessarily mean corruption (even though it doesn't rule it out).
> >Perhaps the clean regeneration of everything by a from-scratch recovery
> >simply makes this more difficult to reproduce until the fs naturally
> >becomes more aged/fragmented, for example.
> >
> >This probably makes a pristine, pre-repair metadump of the reproducing
> >fs more interesting. I could try some of my previous tests against a
> >restore of that metadump.
> 
> 
> Well, I suggest that, unless you can find something on the metadata (I just
> sent you the link via email from google), we wait till the next event. I
> will at that time take an intact metadata photo. But this can take a month
> or two to happen again, if the pattern keeps.
> 

That would be a good idea. I'll take a look at the metadump when I have
a chance. If there is nothing out of the ordinary, the next best option
is to metadump the fs that reproduces the behavior. I could retry some
of my previous vm hibernation tests against that. As mentioned
previously, once you have a more reliably reproducing state, that's also
a good opportunity to see if you can narrow down which of the things you
have running against the fs appear to trigger this.

> 
> 
> 
> >I was somewhat thinking out loud originally discussing this topic. I was
> >suggesting to run this against a restored metadump, not the primary
> >dataset or a backup.
> >
> >The metadump creates an image of the metadata of the source fs in a file
> >(no data is copied). This metadump image can be restored at will via
> >'xfs_mdrestore.' This allows restoring to a file, mounting the file
> >loopback, and performing experiments or investigation on the fs
> >generally as it existed when the shutdown was reproducible.
> 
> Ah... I see.
> 
> 
> >So basically:
> >
> >- xfs_mdrestore <mdimgfile> <tmpfileimg>
> >- mount <tmpfileimg> /mnt
> >- rm -rf /mnt/*
> >
> >... was what I was suggesting. <tmpfileimg> can be recreated from the
> >metadump image afterwards to get back to square one.
> 
> I see.
> 
> Well, I tried this on a copy of the 'dd' image days ago, and nothing
> hapened. I guess the procedure above would be the same.
> 

A dd of the raw block device will preserve the metadata, so yeah that's
effectively the same test. If there were an obvious free space
corruption, the fs probably would have shutdown. I can retry the same
test via the metadump on a debug kernel as well.

Brian

> 
> 
> 
> 
> >>I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
> >>logged in there now. I haven't checked if I can create a bug, not been sure
> >>what parameters to use (product, component, whom to assign to). I think that
> >>would be the most appropriate place.
> >>
> >>Meanwhile, I have uploaded the file to my google drive account, so I can
> >>share it with anybody on request - ie, it is not public, I need to add a
> >>gmail address to the list of people that can read the file.
> >>
> >>Alternatively, I could just email the file to people asking for it, offlist,
> >>but not in a single email, in chunks limited to 1.5 MB per email.
> >>
> >
> >Either of the bugzilla or google drive options works Ok for me.
> 
> It's here:
> 
> <https://drive.google.com/file/d/0Bx2OgfTa-XC9UDBnQzZIMTVyN0k/edit?usp=sharing>
> 
> Whoever wants to read it, has to tell me the address to add to it, access is
> not public.
> 
> 
> - -- Cheers,
>        Carlos E. R.
>        (from 13.1 x86_64 "Bottle" at Telcontar)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iEYEARECAAYFAlPAgb0ACgkQtTMYHG2NR9U/FQCgjtwuDC0HTSG3i7DrEV8+qZeT
> 6mUAn0FGf42SsU1WeRx/AAk4X2oqV4Bc
> =pASJ
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-12  1:45               ` Carlos E. R.
@ 2014-07-12 14:26                 ` Brian Foster
  0 siblings, 0 replies; 56+ messages in thread
From: Brian Foster @ 2014-07-12 14:26 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Sat, Jul 12, 2014 at 03:45:07AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Saturday, 2014-07-12 at 03:30 +0200, Carlos E. R. wrote:
> 
> >So, please wait till I verify things again. Tomorrow, it is 3 AM here.
> >Sorry :-(
> >
> >Unless "xfs_info tgtfile" gives the information about the device where
> >"tgtfile" is stored (/dev/sdf2), not on the image file itself :-?
> >
> >I'm very confused.
> 
> False alarm. See:
> 
> 
> Telcontar:/data/storage_c/tmp_borrar # xfs_info tgtfile
> meta-data=/dev/sde18             isize=256    agcount=4, agsize=35770496 blks
>          =                       sectsz=512   attr=2, projid32bit=0
>          =                       crc=0
> data     =                       bsize=4096   blocks=143081984, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> log      =internal               bsize=4096   blocks=69864, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> Telcontar:/data/storage_c/tmp_borrar #
> 
> Telcontar:/data/storage_d/old_backup # xfs_info tgtfile
> meta-data=/dev/sdf2              isize=256    agcount=4, agsize=122341568 blks
>          =                       sectsz=512   attr=2, projid32bit=0
>          =                       crc=0
> data     =                       bsize=4096   blocks=489366272, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> log      =internal               bsize=4096   blocks=238948, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> Telcontar:/data/storage_d/old_backup #
> 
> Telcontar:/data/storage_d/old_backup # file tgtfile
> tgtfile: XFS filesystem metadump image
> Telcontar:/data/storage_d/old_backup #
> 
> 
> It appears that the command "xfs_info" analyzes the current, underlying,
> filesystem, not the one given on the command line. Or something in that
> line, I'm too sleepy. I hope you can understand my meaning better than my
> words...
> 

xfs_info reports on the mounted fs. If you check out 'man xfs_info,'
you'll see it specifies the mountpoint as a parameter but it can query
the fs info from the actual mountpoint or any file therein. E.g., so it
doesn't know anything about a metadump file and pointing it at one will
just report on the fs that contains the file.

If you wanted to test an actual metadump image, restore the metadump to
an fs image, mount and test that:

xfs_mdrestore ./metadump ./mynewfsimage
mount ./mynewfsimage /mnt -o loop
xfs_info /mnt/

Brian

> 
> So the uploaded file is the correct one.
> 
> 
> - -- Cheers,
>        Carlos E. R.
>        (from 13.1 x86_64 "Bottle" at Telcontar)
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iEYEARECAAYFAlPAkyMACgkQtTMYHG2NR9U2uACfTdPx8DGCkBzLGiSVGn3XCcSV
> 7ukAnAvR1CjR9Jx3rPosLYNceBtQjJjf
> =/odv
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-07-02  9:57 Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue Carlos E. R.
  2014-07-02 12:04 ` Brian Foster
@ 2014-08-11 14:23 ` Carlos E. R.
  2014-08-11 14:44   ` Brian Foster
                     ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 14:23 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1320 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



El 2014-07-02 a las 11:57 +0200, Carlos E. R. escribió:

> I got this error:

Happened again, I'm on middle of recovery procedures, and using my laptop 
to post.


The system did not "die", I could still use xterms owned by root. So I 
tried to use xfs_metadump before rebooting, but it refused, said that the 
partition was mounted (and I know from previous times that umounting fails 
or locks the machine). It also said that it could not intialize the XFS 
library.

So I logged out, and issued "reboot" on tty1 as root. No go, it got stuck 
somewhere, and I had to hit the physical reset button on the machine. I 
have not looked at the logs yet.

I am now running the machine off a live usb stick (13.1 XFCE rescue 
system) to avoid the automatics to fsck the home partition, and I already 
obtained a xfs_metadump of it.

I post this in case you have some suggestion before I nuke the partition 
(rsync, reformat, etc). It shold take some hours.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPo0c4ACgkQja8UbcUWM1wIewD/eEwnzZpDjJLuytDOD9bqiypF
ly6QCDckRvc2rVuCbwcA/0IX5tXGhAHr6izQvWol3F4RoxLk0uf74Ayn8lvSlDU0
=WAIZ
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
@ 2014-08-11 14:44   ` Brian Foster
  2014-08-11 14:58     ` Carlos E. R.
  2014-08-11 14:57   ` Mark Tinguely
  2014-09-30 22:27   ` Happened again, 20140930 " Carlos E. R.
  2 siblings, 1 reply; 56+ messages in thread
From: Brian Foster @ 2014-08-11 14:44 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Mon, Aug 11, 2014 at 04:23:01PM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> 
> 
> El 2014-07-02 a las 11:57 +0200, Carlos E. R. escribió:
> 
> >I got this error:
> 
> Happened again, I'm on middle of recovery procedures, and using my laptop to
> post.
> 
> 
> The system did not "die", I could still use xterms owned by root. So I tried
> to use xfs_metadump before rebooting, but it refused, said that the
> partition was mounted (and I know from previous times that umounting fails
> or locks the machine). It also said that it could not intialize the XFS
> library.
> 
> So I logged out, and issued "reboot" on tty1 as root. No go, it got stuck
> somewhere, and I had to hit the physical reset button on the machine. I have
> not looked at the logs yet.
> 
> I am now running the machine off a live usb stick (13.1 XFCE rescue system)
> to avoid the automatics to fsck the home partition, and I already obtained a
> xfs_metadump of it.
> 
> I post this in case you have some suggestion before I nuke the partition
> (rsync, reformat, etc). It shold take some hours.
> 

Assuming you already have a pre-repair metadump, I'd suggest to
xfs_repair, capture and post the repair output to the list and leave it
at that (for now at least). I think you mentioned previously that the
problem hits more frequently at this point, so I wonder if you could try
to reproduce and get a better idea of what might contribute to the
failure.

For example, can you actively reproduce at this point? Perhaps get some
work going on all of the applications you typically have running and run
some hibernation cycles..? While a reformat might spare you from the
issue for a bit, it's going to make it that much harder to get more
information on what's going on.

Brian

> - -- Cheers
>        Carlos E. R.
> 
>        (from 13.1 x86_64 "Bottle" (Minas Tirith))
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iF4EAREIAAYFAlPo0c4ACgkQja8UbcUWM1wIewD/eEwnzZpDjJLuytDOD9bqiypF
> ly6QCDckRvc2rVuCbwcA/0IX5tXGhAHr6izQvWol3F4RoxLk0uf74Ayn8lvSlDU0
> =WAIZ
> -----END PGP SIGNATURE-----

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
  2014-08-11 14:44   ` Brian Foster
@ 2014-08-11 14:57   ` Mark Tinguely
  2014-08-11 15:34     ` Carlos E. R.
  2014-09-30 22:27   ` Happened again, 20140930 " Carlos E. R.
  2 siblings, 1 reply; 56+ messages in thread
From: Mark Tinguely @ 2014-08-11 14:57 UTC (permalink / raw)
  To: xfs

On 08/11/14 09:23, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
>
> El 2014-07-02 a las 11:57 +0200, Carlos E. R. escribió:
>
>> I got this error:
>
> Happened again, I'm on middle of recovery procedures, and using my
> laptop to post.
>
>
> The system did not "die", I could still use xterms owned by root. So I
> tried to use xfs_metadump before rebooting, but it refused, said that
> the partition was mounted (and I know from previous times that umounting
> fails or locks the machine). It also said that it could not intialize
> the XFS library.
>
> So I logged out, and issued "reboot" on tty1 as root. No go, it got
> stuck somewhere, and I had to hit the physical reset button on the
> machine. I have not looked at the logs yet.
>
> I am now running the machine off a live usb stick (13.1 XFCE rescue
> system) to avoid the automatics to fsck the home partition, and I
> already obtained a xfs_metadump of it.
>
> I post this in case you have some suggestion before I nuke the partition
> (rsync, reformat, etc). It shold take some hours.
>
> - -- Cheers
> Carlos E. R.

Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?

I am interested in the metadata dump.

Also, some one hit back to back duplicate block allocation 
XFS_WANT_CORRUPTED_GOTO bugs, you may want to do a metadata dump before 
and after the xfs_repair in case you hit it again soon.

If this is a duplicate block allocation, some user blocks will have 
overwritten the metadata.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 14:44   ` Brian Foster
@ 2014-08-11 14:58     ` Carlos E. R.
  2014-08-11 17:05       ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 14:58 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1570 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Content-ID: <alpine.LSU.2.11.1408111658020.7326@minas-tirith.valinor>


El 2014-08-11 a las 10:44 -0400, Brian Foster escribió:

>> I post this in case you have some suggestion before I nuke the partition
>> (rsync, reformat, etc). It shold take some hours.
>>
>
> Assuming you already have a pre-repair metadump, I'd suggest to
> xfs_repair, capture and post the repair output to the list and leave it
> at that (for now at least). I think you mentioned previously that the
> problem hits more frequently at this point, so I wonder if you could try
> to reproduce and get a better idea of what might contribute to the
> failure.
>
> For example, can you actively reproduce at this point? Perhaps get some
> work going on all of the applications you typically have running and run
> some hibernation cycles..? While a reformat might spare you from the
> issue for a bit, it's going to make it that much harder to get more
> information on what's going on.

Ok, will do.

I will create a backup of my partition, with xfsdump, after attempting
repair of the partition, and reboot, and see (without the reformat cycle).

At this instant I'm doing a full dd of the partition, just in case it
becomes useful.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPo2i8ACgkQja8UbcUWM1wj3AD/T1eMkxiQUYSa6OBHvmTNj64g
cF6Gi5Gjv/dsF4aIcL0A/0vHz3bFmvhzh7D/2ugvb84tj8NtHC1QWPrY6Lbw4FmW
=8+lJ
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 14:57   ` Mark Tinguely
@ 2014-08-11 15:34     ` Carlos E. R.
  2014-08-11 16:14       ` Brian Foster
  2014-08-11 21:27       ` Mark Tinguely
  0 siblings, 2 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 15:34 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 53330 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



El 2014-08-11 a las 09:57 -0500, Mark Tinguely escribió:
> On 08/11/14 09:23, Carlos E. R. wrote:

> Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?

This time?
Did not look at the log yet. Let me see...

Here is the full log of the event. It starts prior to hibernating, all 
things nominal. And ends on shutdown (had to hit reset button, despite 
what log says). If you want to see entries prior to that, since boot, I 
can do that.


<3.6> 2014-08-11 05:15:01 Telcontar systemd 1 - -  Starting Session 556 of user news.
<3.6> 2014-08-11 05:18:01 Telcontar systemd 1 - -  Starting Session 557 of user news.
<3.6> 2014-08-11 05:20:01 Telcontar systemd 1 - -  Starting Session 558 of user cer.
<3.4> 2014-08-11 05:22:25 Telcontar pm-utils - - -  Hibernating the system now (04)...
<3.5> 2014-08-11 05:22:25 Telcontar pm-utils - - -  There appears not be any pending nntp post to be sent. I just checked :-)
<1.5> 2014-08-11 05:22:25 Telcontar network 5840 - -  redirecting to "systemctl --signal=9 kill network.service"
<3.5> 2014-08-11 05:22:25 Telcontar systemd 1 - -  network@eth0.service: main process exited, code=killed, status=9/KILL
<3.6> 2014-08-11 05:22:25 Telcontar systemd 1 - -  Stopping LSB: Network time protocol daemon (ntpd)...
<3.6> 2014-08-11 05:22:25 Telcontar ntp 5867 - -  Shutting down network time protocol daemon (NTPD)..done
<3.6> 2014-08-11 05:22:25 Telcontar systemd 1 - -  Stopped LSB: Network time protocol daemon (ntpd).
<3.4> 2014-08-11 05:22:25 Telcontar pm-utils - - -  Hibernating (95)...
<0.7> 2014-08-11 05:22:30 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2014-08-11 05:22:30 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2014-08-11 05:22:30 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.046524] PM: Allocated 4562980 kbytes in 5.36 seconds (851.30 MB/s)
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.046645] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.048553] Suspending console(s) (use no_console_suspend to debug)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.049663] serial 00:05: disabled
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260091] PM: freeze of devices complete after 211.420 msecs
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260391] PM: late freeze of devices complete after 0.298 msecs
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260939] PM: noirq freeze of devices complete after 0.545 msecs
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260940] Disabling non-boot CPUs ...
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.262294] smpboot: CPU 1 is now offline
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.264134] smpboot: CPU 2 is now offline
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.265056] Broke affinity for irq 16
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.266103] smpboot: CPU 3 is now offline
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.266614] PM: Creating hibernation image:
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] PM: Need to copy 920142 pages
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] PM: Normal pages needed: 920142 + 1024, available pages: 1176633
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] microcode: CPU0 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] Enabling non-boot CPUs ...
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.280111] microcode: CPU1 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.280300] CPU1 is up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.280425] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.293688] microcode: CPU2 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.293828] CPU2 is up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.293918] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.307216] microcode: CPU3 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.307358] CPU3 is up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508819] uhci_hcd 0000:00:1a.1: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508836] usb usb4: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508844] uhci_hcd 0000:00:1a.2: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508861] usb usb5: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508871] ehci-pci 0000:00:1a.7: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508889] usb usb1: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510138] uhci_hcd 0000:00:1d.0: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510159] usb usb6: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510168] uhci_hcd 0000:00:1d.1: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510187] usb usb7: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510196] uhci_hcd 0000:00:1d.2: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510215] usb usb8: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510225] ehci-pci 0000:00:1d.7: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510235] usb usb2: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512778] ehci-pci 0000:00:1a.7: cache line size of 32 is not supported
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512784] pci 0000:00:1e.0: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512879] ata_piix 0000:00:1f.2: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512934] ata_piix 0000:00:1f.5: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.514123] ehci-pci 0000:00:1d.7: cache line size of 32 is not supported
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611029] pciehp 0000:00:1c.3:pcie04: Device 0000:05:00.0 already exists at 0000:05:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611032] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611035] pciehp 0000:00:1c.2:pcie04: Device 0000:04:00.0 already exists at 0000:04:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611036] pciehp 0000:00:1c.3:pcie04: Cannot add device at 0000:05:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611037] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611039] pciehp 0000:00:1c.2:pcie04: Cannot add device at 0000:04:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611061] pciehp 0000:00:1c.4:pcie04: Device 0000:06:00.0 already exists at 0000:06:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611062] pciehp 0000:00:1c.4:pcie04: Cannot add device at 0000:06:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611086] pciehp 0000:00:1c.5:pcie04: Device 0000:07:00.0 already exists at 0000:07:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611087] pciehp 0000:00:1c.5:pcie04: Cannot add device at 0000:07:00
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611172] pata_jmicron 0000:05:00.1: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611249] pata_jmicron 0000:04:00.1: setting latency timer to 64
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.614064] serial 00:05: activated
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.775267] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.837013] ata11: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.837220] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.916030] ata2: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.916069] ata1: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.920031] ata4: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.930018] usb 1-2: reset high-speed USB device number 3 using ehci-pci
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.988036] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.991149] ata12.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.991151] ata12.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.991152] ata12.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.997133] ata12.00: configured for UDMA/100
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.069020] firewire_core 0000:08:02.0: rediscovered device fw0
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.074017] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.076182] ata3.00: configured for UDMA/133
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.076210] sd 2:0:0:0: [sda] Starting disk
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.146014] usb 2-5: reset high-speed USB device number 2 using ehci-pci
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284050] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284060] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284167] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284177] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287190] ata9.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287191] ata9.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287241] ata10.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287242] ata10.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287362] ata9.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287364] ata9.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287457] ata10.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287459] ata10.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293185] ata10.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293186] ata10.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293236] ata9.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293237] ata9.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293378] ata10.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293379] ata10.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293443] ata9.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293445] ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.302319] ata9.00: configured for UDMA/133
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.308304] ata9.01: configured for UDMA/133
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.308337] sd 8:0:0:0: [sdb] Starting disk
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.308338] sd 8:0:1:0: [sdc] Starting disk
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.318322] ata10.00: configured for UDMA/133
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.324321] ata10.01: configured for UDMA/133
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.324351] sd 9:0:1:0: [sde] Starting disk
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.324352] sd 9:0:0:0: [sdd] Starting disk
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.512018] usb 3-1: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73230.057013] usb 8-2: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73230.408094] usb 2-5.4: reset high-speed USB device number 4 using ehci-pci
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
<3.4> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - -  The canary thread is apparently starving. Taking action.
<3.6> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - -  Demoting known real-time threads.
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - -  Demoted 3 threads.
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809].
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440155] CPU: 0 PID: 6255 Comm: kworker/0:7 Tainted: P           O 3.11.10-17-desktop #1
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440322] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440361] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440364]  0000000000000001 ffffffff815a0402 000000000010c9d3 ffffffffa0c38996
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440365]  ffff880211412b00 ffff88023448dd80 ffff88023fb95cb0 0000000000000001
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440366]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440367] Call Trace:
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440377]  [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440380]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440382]  [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440385]  [<ffffffff815a0402>] dump_stack+0x50/0x89
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440399]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440442]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440484]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440534]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440597]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440633]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440662]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440690]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440718]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440737]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440739]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440742]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440746]  [<ffffffff815adfbc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<3.6> 2014-08-11 15:17:22 Telcontar systemd 1 - -  Time has been changed
<3.6> 2014-08-11 15:17:27 Telcontar acpid - - -  1 client rule loaded
<3.4> 2014-08-11 15:17:29 Telcontar pm-utils - - -  Thawing (95)...
<3.5> 2014-08-11 15:17:30 Telcontar dbus 1020 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<3.6> 2014-08-11 15:17:30 Telcontar systemd 1 - -  Starting LSB: Network time protocol daemon (ntpd)...
<0.4> 2014-08-11 15:17:30 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<3.5> 2014-08-11 15:17:31 Telcontar dbus 1020 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-08-11 15:17:31 Telcontar network 6315 - -  redirecting to "systemctl  restart network.service"
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Stopping ifup managed network interface eth1...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Stopping ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Stopping LSB: Configure network interfaces and set up routing...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Starting LSB: Configure network interfaces and set up routing...
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  touch: cannot touch ‘/dev/.sysconfig/network/tmp/if-eth0.6352’: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  scripts/functions: line 1221: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  scripts/functions: line 1239: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  cat: /dev/.sysconfig/network/tmp/if-eth0.6352: No such file or directory
<3.6> 2014-08-11 15:17:34 Telcontar ntp 6314 - -  11 Aug 15:17:34 sntp[6505]: Started sntp
<3.6> 2014-08-11 15:17:34 Telcontar ifdown 6352 - -  eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:34 Telcontar ifdown 6352 - -      eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:34 Telcontar ifdown 6351 - -  eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:34 Telcontar ifdown 6351 - -      eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:34 Telcontar network 6384 - -  Setting up network interfaces:
<3.6> 2014-08-11 15:17:34 Telcontar network 6384 - -  lo
<1.5> 2014-08-11 15:17:34 Telcontar ifup 6924 - -      lo
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -      lo
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -  IP address: 127.0.0.1/8
<3.6> 2014-08-11 15:17:35 Telcontar network 6384 - -  lo        IP address: 127.0.0.1/8
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -.
<16.3> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - -  eth1: dhcpcd not running
<16.6> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - -  eth1: exiting
<3.5> 2014-08-11 15:17:38 Telcontar systemd 1 - -  Unit network@eth0.service entered failed state.
<3.6> 2014-08-11 15:17:38 Telcontar systemd 1 - -  Starting ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Interface eth0.IPv6 no longer relevant for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Leaving mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Interface eth0.IPv4 no longer relevant for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Withdrawing address record for fc00::14 on eth0.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Withdrawing address record for 192.168.1.14 on eth0.
<3.6> 2014-08-11 15:17:38 Telcontar ifup 7226 - -  eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:38 Telcontar ifup 7226 - -      eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792336] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792353] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792366] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  New relevant interface eth0.IPv4 for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - -  Starting ifup managed network interface eth1...
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  ifplugd 0.28 initializing.
<0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646313] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646341] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  Using interface eth1/00:21:85:16:2D:0C with driver <r8169> (version: 2.3LK-NAPI)
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  Using detection mode: SIOCETHTOOL
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  Initialization complete, link beat not detected.
<1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - -      eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - -      eth1      is controlled by ifplugd
<3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - -  eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - -  eth1      is controlled by ifplugd
<3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - -  Started ifup managed network interface eth1.
<0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958299] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958306] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - -  Joining mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - -  New relevant interface eth0.IPv6 for mDNS.
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - -  Registering new address record for fe80::221:85ff:fe16:2d0b on eth0.*.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Joining mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Withdrawing address record for fe80::221:85ff:fe16:2d0b on eth0.
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  11 Aug 15:17:44 sntp[6505]: Received no useable packet from 192.168.1.15!
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  11 Aug 15:17:44 sntp[7926]: Started sntp
<3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - -  Time has been changed
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  2014-08-11 15:17:44.656291 (-0100) -0.112718 +/- 0.037338 secs
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  2014-08-11 15:17:44.604369 (-0100) +0.0081 +/- 0.069473 secs
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  Time synchronized with  0.pool.ntp.org
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - -  Setting up rules from /etc/sysconfig/SuSEfirewall2 ...
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - -  using default zone 'ext' for interface eth1
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - -  Firewall customary rules loaded from /etc/sysconfig/scripts/SuSEfirewall2-custom
<3.5> 2014-08-11 15:17:45 Telcontar ntpd 7991 - -  ntpd 4.2.6p5@1.2349-o Tue Jul 22 08:26:41 UTC 2014 (1)
<3.6> 2014-08-11 15:17:45 Telcontar ntp 6314 - -  Starting network time protocol daemon (NTPD)..done
<3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - -  Time has been changed
<3.6> 2014-08-11 15:17:45 Telcontar systemd 1 - -  Started LSB: Network time protocol daemon (ntpd).
<3.5> 2014-08-11 15:17:45 Telcontar ntpd 8017 - -  proto: precision = 1.613 usec
<3.7> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen and drop on 1 v6wildcard :: UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 2 lo 127.0.0.1 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 3 eth0 192.168.1.14 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 4 lo ::1 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 5 eth0 fe80::221:85ff:fe16:2d0b UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 6 eth0 fc00::14 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  peers refreshed
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listening on routing socket on fd #23 for interface updates
<3.5> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  logging to file /var/log/ntp
<4.6> 2014-08-11 15:17:48 Telcontar SuSEfirewall2 - - -  Firewall rules successfully set
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Found user 'avahi-autoipd' (UID 495) and group 'avahi-autoipd' (GID 491).
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Successfully called chroot().
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Successfully dropped root privileges.
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Starting with address 169.254.3.89
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Routable address already assigned, sleeping.
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - -  Started ifup managed network interface eth0.
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - -  Started ifup managed network interface eth1.
<3.6> 2014-08-11 15:17:50 Telcontar network 6384 - -  ..done..done..done    ppp0      Startmode is 'manual' -> skipping
<1.5> 2014-08-11 15:17:50 Telcontar ifup 8500 - -      ppp0      Startmode is 'manual' -> skipping
<3.6> 2014-08-11 15:17:50 Telcontar network 6384 - -  ..skippedSetting up service network  .  .  .  .  .  .  .  .  .  .  .  .  ...done
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - -  Started LSB: Configure network interfaces and set up routing.
<3.4> 2014-08-11 15:17:52 Telcontar pm-utils - - -  Thawing the system now (04)...
<0.6> 2014-08-11 15:17:55 Telcontar kernel - - - [73268.481672] Chrome_ChildThr[5680]: segfault at 0 ip 00007ffcedf71598 sp 00007ffce1821410 error 6 in libmozalloc.so[7ffcedf7
<0.4> 2014-08-11 15:18:00 Telcontar kernel - - - [73274.336014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:18:01 Telcontar systemd 1 - -  Starting Session 559 of user news.
<3.4> 2014-08-11 15:18:16 Telcontar router - - -  (Thawing 04) Logging the current IP= 79.159.63.177
<0.4> 2014-08-11 15:18:31 Telcontar kernel - - - [73304.416012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:19:01 Telcontar kernel - - - [73334.496014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:19:31 Telcontar kernel - - - [73364.576016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:20:01 Telcontar kernel - - - [73394.656015] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:20:01 Telcontar systemd 1 - -  Starting Session 560 of user cer.
<0.4> 2014-08-11 15:20:31 Telcontar kernel - - - [73424.736049] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:21:01 Telcontar kernel - - - [73454.816016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:21:31 Telcontar kernel - - - [73484.896015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:22:01 Telcontar kernel - - - [73514.976016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:22:31 Telcontar kernel - - - [73545.056018] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:23:01 Telcontar systemd 1 - -  Starting Session 561 of user news.
<0.4> 2014-08-11 15:23:01 Telcontar kernel - - - [73575.136025] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:23:31 Telcontar kernel - - - [73605.216014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:23:52 Telcontar smartd 1013 - -  Device: /dev/sdb [SAT], Temperature changed -5 Celsius to 33 Celsius (Min/Max 19/38)
<0.4> 2014-08-11 15:24:01 Telcontar kernel - - - [73635.296078] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:24:32 Telcontar kernel - - - [73665.376020] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:25:01 Telcontar systemd 1 - -  Starting Session 562 of user news.
<0.4> 2014-08-11 15:25:02 Telcontar kernel - - - [73695.456011] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:25:32 Telcontar kernel - - - [73725.536015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:26:02 Telcontar kernel - - - [73755.616017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:26:32 Telcontar kernel - - - [73785.696017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:27:02 Telcontar kernel - - - [73815.776016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:27:32 Telcontar kernel - - - [73845.856021] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:28:01 Telcontar systemd 1 - -  Starting Session 563 of user news.
<0.4> 2014-08-11 15:28:02 Telcontar kernel - - - [73875.936014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:28:32 Telcontar kernel - - - [73906.016015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:29:02 Telcontar kernel - - - [73936.096017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:29:32 Telcontar kernel - - - [73966.176012] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - -  Starting Session 564 of user root.
<3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - -  Starting Session 565 of user cer.
<1.6> 2014-08-11 15:30:01 Telcontar run-crons 8974 - -  suse.de-snapper: OK
<4.5> 2014-08-11 15:30:01 Telcontar su - - -  (to root) root on (null)
<10.3> 2014-08-11 15:30:01 Telcontar su - - -  pam_systemd(su-l:session): pam_putenv: delete non-existent entry; XDG_RUNTIME_DIR
<0.4> 2014-08-11 15:30:02 Telcontar kernel - - - [73996.256010] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:30:32 Telcontar kernel - - - [74026.336013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:31:03 Telcontar kernel - - - [74056.416012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:31:33 Telcontar kernel - - - [74086.496011] XFS (sdd5): xfs_log_force: error 5 returned.
<4.5> 2014-08-11 15:31:59 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  GLib-GObject: invalid unclassed pointer in cast to 'GkmObject'
<4.3> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: gkm_object_expose_full: assertion 'GKM_IS_OBJECT (self)' failed
<4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't create temporary file for: /home/cer/.gnome2/keyrings/login.keyring: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  couldn't create login keyring: An error occurred on the device
<10.3> 2014-08-11 15:32:00 Telcontar unix2_chkpwd - - -  gkr-pam: the password for the login keyring was invalid.
<0.4> 2014-08-11 15:32:03 Telcontar kernel - - - [74116.576018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:32:33 Telcontar kernel - - - [74146.656011] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:33:01 Telcontar systemd 1 - -  Starting Session 566 of user news.
<0.4> 2014-08-11 15:33:03 Telcontar kernel - - - [74176.736068] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:33:33 Telcontar kernel - - - [74206.816012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:34:03 Telcontar kernel - - - [74236.896017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:34:33 Telcontar kernel - - - [74266.976014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:35:01 Telcontar systemd 1 - -  Starting Session 567 of user news.
<0.4> 2014-08-11 15:35:03 Telcontar kernel - - - [74297.056012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:35:33 Telcontar kernel - - - [74327.136015] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:35:56 Telcontar run-crons 8974 - -  leafnode: OK
<3.6> 2014-08-11 15:35:56 Telcontar systemd 1 - -  Reloading System Logging Service.
<3.6> 2014-08-11 15:35:57 Telcontar systemd 1 - -  Reloaded System Logging Service.
<5.6> 2014-08-11 15:35:57 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
<3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - -  Reloading System Logging Service.
<3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - -  Reloaded System Logging Service.
<0.4> 2014-08-11 15:36:03 Telcontar kernel - - - [74357.216013] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - -  logrotate: OK
<3.2> 2014-08-11 15:36:06 Telcontar mdadm 9290 - -  DegradedArray event detected on md device /dev/md0
<1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - -  mdadm: OK
<4.5> 2014-08-11 15:36:06 Telcontar su - - -  (to root) root on (null)
<1.4> 2014-08-11 15:36:25 Telcontar run-crons 8974 - -  mlocate.cron returned 143
<1.6> 2014-08-11 15:36:25 Telcontar run-crons 8974 - -  packagekit-background.cron: OK
<1.6> 2014-08-11 15:36:26 Telcontar run-crons 8974 - -  suse-clean_catman: OK
<0.4> 2014-08-11 15:36:33 Telcontar kernel - - - [74387.296018] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:36:41 Telcontar run-crons 8974 - -  suse-do_mandb: OK
<1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - -  suse-texlive: OK
<1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - -  suse.cron-sa-update: OK
<1.6> 2014-08-11 15:36:58 Telcontar run-crons 8974 - -  suse.de-backup-rc.config: OK
<0.4> 2014-08-11 15:37:04 Telcontar kernel - - - [74417.376010] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-backup-rpmdb: OK
<0.4> 2014-08-11 15:37:34 Telcontar kernel - - - [74447.456013] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-check-battery: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-cron-local: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-faxcron: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-snapper: OK
<3.6> 2014-08-11 15:38:01 Telcontar systemd 1 - -  Starting Session 568 of user news.
<0.4> 2014-08-11 15:38:04 Telcontar kernel - - - [74477.536013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:38:34 Telcontar kernel - - - [74507.616019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:39:04 Telcontar kernel - - - [74537.696013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:39:34 Telcontar kernel - - - [74567.776014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:40:01 Telcontar systemd 1 - -  Starting Session 569 of user cer.
<0.4> 2014-08-11 15:40:04 Telcontar kernel - - - [74597.856013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:40:34 Telcontar kernel - - - [74627.936021] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:41:04 Telcontar kernel - - - [74658.016012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:41:34 Telcontar kernel - - - [74688.096019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:42:04 Telcontar kernel - - - [74718.176018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:42:34 Telcontar kernel - - - [74748.256017] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:43:01 Telcontar systemd 1 - -  Starting Session 570 of user news.
<0.4> 2014-08-11 15:43:04 Telcontar kernel - - - [74778.336012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:43:35 Telcontar kernel - - - [74808.416013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:44:05 Telcontar kernel - - - [74838.496014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:44:35 Telcontar kernel - - - [74868.576013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - -  Starting Session 571 of user root.
<3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - -  Starting Session 572 of user news.
<0.4> 2014-08-11 15:45:05 Telcontar kernel - - - [74898.656019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:45:35 Telcontar kernel - - - [74928.736017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:46:05 Telcontar kernel - - - [74958.816015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:46:35 Telcontar kernel - - - [74988.896026] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:47:05 Telcontar kernel - - - [75018.976014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:47:35 Telcontar kernel - - - [75049.056013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:48:01 Telcontar systemd 1 - -  Starting Session 573 of user news.
<0.4> 2014-08-11 15:48:05 Telcontar kernel - - - [75079.136016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:48:35 Telcontar kernel - - - [75109.216014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:49:05 Telcontar kernel - - - [75139.296014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:49:36 Telcontar kernel - - - [75169.376013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:50:06 Telcontar kernel - - - [75199.456012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:50:36 Telcontar kernel - - - [75229.536011] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:51:06 Telcontar kernel - - - [75259.616013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.6> 2014-08-11 15:51:09 Telcontar kernel - - - [75262.721354] xfce4-session[4520]: segfault at 8 ip 00000000004164dc sp 00007fffdc291dc0 error 4 in xfce4-session[400000+2b00
<4.6> 2014-08-11 15:51:18 Telcontar systemd-logind 1021 - -  Removed session 8.
<10.5> 2014-08-11 15:51:18 Telcontar polkitd 4314 - -  Unregistered Authentication Agent for unix-session:10 (system bus name :1.69, object path /org/gnome/PolicyKit1/Authenti
<0.7> 2014-08-11 15:51:28 Telcontar kernel - - - [75282.132776] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
<3.6> 2014-08-11 15:51:29 Telcontar acpid - - -  1 client rule loaded
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - -  Removed session 10.
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.176020] usb 1-6: new high-speed USB device number 4 using ehci-pci
<3.6> 2014-08-11 15:51:30 Telcontar systemd 1 - -  Starting Session 574 of user lightdm.
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - -  New session 574 of user lightdm.
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - -  Linked /tmp/.X11-unix/X0 to /run/user/127/X11-display.
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291822] usb 1-6: New USB device found, idVendor=8564, idProduct=1000
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291825] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291828] usb 1-6: Product: Mass Storage Device
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291829] usb 1-6: Manufacturer: JetFlash
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291831] usb 1-6: SerialNumber: 346YLQ4L0G5H8S2F
<1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - -  checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:1a.7/usb1/1-6"
<1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - -  bus: 1, device: 4 was not an MTP device
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667399] usb-storage 1-6:1.0: USB Mass Storage device detected
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667502] scsi12 : usb-storage 1-6:1.0
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667606] usbcore: registered new interface driver usb-storage
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794904] scsi 12:0:0:0: Direct-Access     JetFlash Transcend 4GB    1100 PQ: 0 ANSI: 4
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794976] scsi 12:0:0:0: alua: supports implicit and explicit TPGS
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796262] scsi 12:0:0:0: alua: No target port descriptors found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796265] scsi 12:0:0:0: alua: not attached
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796396] sd 12:0:0:0: Attached scsi generic sg6 type 0
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796888] sd 12:0:0:0: [sdf] 7913472 512-byte logical blocks: (4.05 GB/3.77 GiB)
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797634] sd 12:0:0:0: [sdf] Write Protect is off
<0.7> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797637] sd 12:0:0:0: [sdf] Mode Sense: 43 00 00 00
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798386] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798388] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801508] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801511] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.802147]  sdf: sdf1 sdf2 sdf3
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805642] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805645] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805648] sd 12:0:0:0: [sdf] Attached SCSI removable disk
<0.4> 2014-08-11 15:51:36 Telcontar kernel - - - [75289.696019] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - -  Starting Getty on tty2...
<3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - -  Started Getty on tty2.
<3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - -  Starting Getty on tty3...
<3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - -  Started Getty on tty3.
<3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - -  Starting Getty on tty6...
<3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - -  Started Getty on tty6.
<3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - -  Starting Getty on tty5...
<3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - -  Started Getty on tty5.
<3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - -  Starting Getty on tty4...
<3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - -  Started Getty on tty4.
<0.4> 2014-08-11 15:52:06 Telcontar kernel - - - [75319.776023] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Unmounting /data/raid...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Unmounting /data/cripta...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping /sys/devices/virtual/block/dm-0.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  message repeated 5 times: [ Stopping /sys/devices/virtual/block/dm-0.]
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Session 574 of user lightdm.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped Session 574 of user lightdm.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Session 7 of user root.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped Session 7 of user root.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping user-0.slice.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Removed slice user-0.slice.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Stop Read-Ahead Data Collection 10s After Completed Startup.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped Stop Read-Ahead Data Collection 10s After Completed Startup.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping User Manager for 1000...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping User Manager for 9...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping User Manager for 127...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping CUPS Printing Service...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping ifup managed network interface eth1...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:44 Telcontar systemd 4377 - -  message repeated 14 times: [ Time has been changed]
<3.3> 2014-08-11 15:52:08 Telcontar systemd 4377 - -  Failed to enqueue exit.target job: Unit exit.target failed to load: Input/output error.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Graphical Interface.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped target Graphical Interface.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: X Display Manager...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping helloworld.service...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped helloworld.service.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Multi-User System.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped target Multi-User System.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: virus scanner daemon...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: Start the hddtemp daemon...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: mdadmd daemon monitoring MD devices...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: This services starts and stops the USB Arbitrator....
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: Supports the direct execution of binary formats....
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: irqbalance daemon providing irq balancing on MP-machines...
<3.6> 2014-08-11 15:52:09 Telcontar systemd 1 - -  Stopping LSB: Set up analog joysticks...
<0.4> 2014-08-11 15:52:10 Telcontar kernel - - - [75323.547122] nfsd: last server has exited, flushing export cache
<5.6> 2014-08-11 15:36:02 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
<5.6> 2014-08-11 15:52:11 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-08-11 15:52:12+02:00 - Halting the system now  =========================================== uptime:  15:52pm  up 1 day 20:54,  1 user,  load average: 5.94, 2.47, 1.22



> I am interested in the metadata dump.

Ok, sure, no problem. I'm working on that, but I need to have lunch first ;-)


> Also, some one hit back to back duplicate block allocation 
> XFS_WANT_CORRUPTED_GOTO bugs, you may want to do a metadata dump before and 
> after the xfs_repair in case you hit it again soon.

I already have a metadata dump, and I have not attempted to repair yet 
(I'm doing a full dd copy of partition, and it is 400 Gigs). I will obtain 
another metadadump after repair, and I can upload both to google drive.

But first I need sustenance :-)

(At least this time I do not have any pressing thing to do on the 
computer...)

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF0EAREIAAYFAlPo4p4ACgkQja8UbcUWM1wGAADxAVuTUPkxG+LO29VzehJ8cSPV
uItG/Puu2KbqUeCyXwD/cgu/+F7vhEeU9WEbNP5eifhmyu0T3ByDMtuKp55Rj7A=
=CgSx
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 15:34     ` Carlos E. R.
@ 2014-08-11 16:14       ` Brian Foster
  2014-08-11 17:08         ` Carlos E. R.
  2014-08-11 21:27       ` Mark Tinguely
  1 sibling, 1 reply; 56+ messages in thread
From: Brian Foster @ 2014-08-11 16:14 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Mon, Aug 11, 2014 at 05:34:46PM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> 
> 
> El 2014-08-11 a las 09:57 -0500, Mark Tinguely escribió:
> >On 08/11/14 09:23, Carlos E. R. wrote:
> 
> >Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?
> 
> This time?
> Did not look at the log yet. Let me see...
> 
> Here is the full log of the event. It starts prior to hibernating, all
> things nominal. And ends on shutdown (had to hit reset button, despite what
> log says). If you want to see entries prior to that, since boot, I can do
> that.
> 
> 
...
> <0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1
> <0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809].
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440155] CPU: 0 PID: 6255 Comm: kworker/0:7 Tainted: P           O 3.11.10-17-desktop #1
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440322] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440361] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440364]  0000000000000001 ffffffff815a0402 000000000010c9d3 ffffffffa0c38996
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440365]  ffff880211412b00 ffff88023448dd80 ffff88023fb95cb0 0000000000000001
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440366]  0000000000000000 0000000100000000 0000000000000000 0000000000000001
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440367] Call Trace:
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440377]  [<ffffffff81004a28>] dump_trace+0x88/0x310
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440380]  [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440382]  [<ffffffff810061bc>] show_stack+0x1c/0x50
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440385]  [<ffffffff815a0402>] dump_stack+0x50/0x89
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440399]  [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440442]  [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440484]  [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440534]  [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440597]  [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440633]  [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440662]  [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440690]  [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440718]  [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440737]  [<ffffffff8106ac78>] process_one_work+0x168/0x490
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440739]  [<ffffffff8106b914>] worker_thread+0x114/0x3a0
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440742]  [<ffffffff81071c3f>] kthread+0xaf/0xc0
> <0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440746]  [<ffffffff815adfbc>] ret_from_fork+0x7c/0xb0
> <0.5> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-
> <0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
> <0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)

This reminds me that it might be interesting to tune the eofblocks
scanner to be more aggressive and see if that helps reproduce. This
thread that's running here normally runs every 5 minutes by default, but
it can be tuned to run at a user-defined interval via the following
/proc file:

# cat /proc/sys/fs/xfs/speculative_prealloc_lifetime 
300

I wonder if setting it to 30s or so ('echo 30 > /proc/...') and running
some hibernation cycles would help...

Brian

> <3.6> 2014-08-11 15:17:22 Telcontar systemd 1 - -  Time has been changed
> <3.6> 2014-08-11 15:17:27 Telcontar acpid - - -  1 client rule loaded
> <3.4> 2014-08-11 15:17:29 Telcontar pm-utils - - -  Thawing (95)...
> <3.5> 2014-08-11 15:17:30 Telcontar dbus 1020 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
> <3.6> 2014-08-11 15:17:30 Telcontar systemd 1 - -  Starting LSB: Network time protocol daemon (ntpd)...
> <0.4> 2014-08-11 15:17:30 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.5> 2014-08-11 15:17:31 Telcontar dbus 1020 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
> <1.5> 2014-08-11 15:17:31 Telcontar network 6315 - -  redirecting to "systemctl  restart network.service"
> <3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Stopping ifup managed network interface eth1...
> <3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Stopping ifup managed network interface eth0...
> <3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Stopping LSB: Configure network interfaces and set up routing...
> <3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - -  Starting LSB: Configure network interfaces and set up routing...
> <3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  touch: cannot touch ‘/dev/.sysconfig/network/tmp/if-eth0.6352’: No such file or directory
> <3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  scripts/functions: line 1221: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
> <3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  scripts/functions: line 1239: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
> <3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - -  cat: /dev/.sysconfig/network/tmp/if-eth0.6352: No such file or directory
> <3.6> 2014-08-11 15:17:34 Telcontar ntp 6314 - -  11 Aug 15:17:34 sntp[6505]: Started sntp
> <3.6> 2014-08-11 15:17:34 Telcontar ifdown 6352 - -  eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <1.5> 2014-08-11 15:17:34 Telcontar ifdown 6352 - -      eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <3.6> 2014-08-11 15:17:34 Telcontar ifdown 6351 - -  eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <1.5> 2014-08-11 15:17:34 Telcontar ifdown 6351 - -      eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <3.6> 2014-08-11 15:17:34 Telcontar network 6384 - -  Setting up network interfaces:
> <3.6> 2014-08-11 15:17:34 Telcontar network 6384 - -  lo
> <1.5> 2014-08-11 15:17:34 Telcontar ifup 6924 - -      lo
> <1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -      lo
> <1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -  IP address: 127.0.0.1/8
> <3.6> 2014-08-11 15:17:35 Telcontar network 6384 - -  lo        IP address: 127.0.0.1/8
> <1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -.
> <16.3> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - -  eth1: dhcpcd not running
> <16.6> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - -  eth1: exiting
> <3.5> 2014-08-11 15:17:38 Telcontar systemd 1 - -  Unit network@eth0.service entered failed state.
> <3.6> 2014-08-11 15:17:38 Telcontar systemd 1 - -  Starting ifup managed network interface eth0...
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Interface eth0.IPv6 no longer relevant for mDNS.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Leaving mDNS multicast group on interface eth0.IPv6 with address fc00::14.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Interface eth0.IPv4 no longer relevant for mDNS.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Withdrawing address record for fc00::14 on eth0.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Withdrawing address record for 192.168.1.14 on eth0.
> <3.6> 2014-08-11 15:17:38 Telcontar ifup 7226 - -  eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <1.5> 2014-08-11 15:17:38 Telcontar ifup 7226 - -      eth0      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792336] r8169 0000:06:00.0 eth0: link down
> <0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792353] r8169 0000:06:00.0 eth0: link down
> <0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792366] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  New relevant interface eth0.IPv4 for mDNS.
> <3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - -  Registering new address record for 192.168.1.14 on eth0.IPv4.
> <3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - -  Starting ifup managed network interface eth1...
> <3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  ifplugd 0.28 initializing.
> <0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646313] r8169 0000:07:00.0 eth1: link down
> <0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646341] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> <3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  Using interface eth1/00:21:85:16:2D:0C with driver <r8169> (version: 2.3LK-NAPI)
> <3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  Using detection mode: SIOCETHTOOL
> <3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - -  Initialization complete, link beat not detected.
> <1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - -      eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - -      eth1      is controlled by ifplugd
> <3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - -  eth1      device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
> <3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - -  eth1      is controlled by ifplugd
> <3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - -  Started ifup managed network interface eth1.
> <0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958299] r8169 0000:06:00.0 eth0: link up
> <0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958306] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> <3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - -  Joining mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
> <3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - -  New relevant interface eth0.IPv6 for mDNS.
> <3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - -  Registering new address record for fe80::221:85ff:fe16:2d0b on eth0.*.
> <3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
> <3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Joining mDNS multicast group on interface eth0.IPv6 with address fc00::14.
> <3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Registering new address record for fc00::14 on eth0.*.
> <3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - -  Withdrawing address record for fe80::221:85ff:fe16:2d0b on eth0.
> <3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  11 Aug 15:17:44 sntp[6505]: Received no useable packet from 192.168.1.15!
> <3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  11 Aug 15:17:44 sntp[7926]: Started sntp
> <3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - -  Time has been changed
> <3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  2014-08-11 15:17:44.656291 (-0100) -0.112718 +/- 0.037338 secs
> <3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  2014-08-11 15:17:44.604369 (-0100) +0.0081 +/- 0.069473 secs
> <3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - -  Time synchronized with  0.pool.ntp.org
> <4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - -  Setting up rules from /etc/sysconfig/SuSEfirewall2 ...
> <4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - -  using default zone 'ext' for interface eth1
> <4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - -  Firewall customary rules loaded from /etc/sysconfig/scripts/SuSEfirewall2-custom
> <3.5> 2014-08-11 15:17:45 Telcontar ntpd 7991 - -  ntpd 4.2.6p5@1.2349-o Tue Jul 22 08:26:41 UTC 2014 (1)
> <3.6> 2014-08-11 15:17:45 Telcontar ntp 6314 - -  Starting network time protocol daemon (NTPD)..done
> <3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - -  Time has been changed
> <3.6> 2014-08-11 15:17:45 Telcontar systemd 1 - -  Started LSB: Network time protocol daemon (ntpd).
> <3.5> 2014-08-11 15:17:45 Telcontar ntpd 8017 - -  proto: precision = 1.613 usec
> <3.7> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen and drop on 1 v6wildcard :: UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 2 lo 127.0.0.1 UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 3 eth0 192.168.1.14 UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 4 lo ::1 UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 5 eth0 fe80::221:85ff:fe16:2d0b UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listen normally on 6 eth0 fc00::14 UDP 123
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  peers refreshed
> <3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  Listening on routing socket on fd #23 for interface updates
> <3.5> 2014-08-11 15:17:46 Telcontar ntpd 8017 - -  logging to file /var/log/ntp
> <4.6> 2014-08-11 15:17:48 Telcontar SuSEfirewall2 - - -  Firewall rules successfully set
> <3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Found user 'avahi-autoipd' (UID 495) and group 'avahi-autoipd' (GID 491).
> <3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Successfully called chroot().
> <3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Successfully dropped root privileges.
> <3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Starting with address 169.254.3.89
> <3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - -  Routable address already assigned, sleeping.
> <3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - -  Started ifup managed network interface eth0.
> <3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - -  Started ifup managed network interface eth1.
> <3.6> 2014-08-11 15:17:50 Telcontar network 6384 - -  ..done..done..done    ppp0      Startmode is 'manual' -> skipping
> <1.5> 2014-08-11 15:17:50 Telcontar ifup 8500 - -      ppp0      Startmode is 'manual' -> skipping
> <3.6> 2014-08-11 15:17:50 Telcontar network 6384 - -  ..skippedSetting up service network  .  .  .  .  .  .  .  .  .  .  .  .  ...done
> <3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - -  Started LSB: Configure network interfaces and set up routing.
> <3.4> 2014-08-11 15:17:52 Telcontar pm-utils - - -  Thawing the system now (04)...
> <0.6> 2014-08-11 15:17:55 Telcontar kernel - - - [73268.481672] Chrome_ChildThr[5680]: segfault at 0 ip 00007ffcedf71598 sp 00007ffce1821410 error 6 in libmozalloc.so[7ffcedf7
> <0.4> 2014-08-11 15:18:00 Telcontar kernel - - - [73274.336014] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:18:01 Telcontar systemd 1 - -  Starting Session 559 of user news.
> <3.4> 2014-08-11 15:18:16 Telcontar router - - -  (Thawing 04) Logging the current IP= 79.159.63.177
> <0.4> 2014-08-11 15:18:31 Telcontar kernel - - - [73304.416012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:19:01 Telcontar kernel - - - [73334.496014] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:19:31 Telcontar kernel - - - [73364.576016] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:20:01 Telcontar kernel - - - [73394.656015] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:20:01 Telcontar systemd 1 - -  Starting Session 560 of user cer.
> <0.4> 2014-08-11 15:20:31 Telcontar kernel - - - [73424.736049] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:21:01 Telcontar kernel - - - [73454.816016] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:21:31 Telcontar kernel - - - [73484.896015] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:22:01 Telcontar kernel - - - [73514.976016] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:22:31 Telcontar kernel - - - [73545.056018] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:23:01 Telcontar systemd 1 - -  Starting Session 561 of user news.
> <0.4> 2014-08-11 15:23:01 Telcontar kernel - - - [73575.136025] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:23:31 Telcontar kernel - - - [73605.216014] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:23:52 Telcontar smartd 1013 - -  Device: /dev/sdb [SAT], Temperature changed -5 Celsius to 33 Celsius (Min/Max 19/38)
> <0.4> 2014-08-11 15:24:01 Telcontar kernel - - - [73635.296078] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:24:32 Telcontar kernel - - - [73665.376020] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:25:01 Telcontar systemd 1 - -  Starting Session 562 of user news.
> <0.4> 2014-08-11 15:25:02 Telcontar kernel - - - [73695.456011] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:25:32 Telcontar kernel - - - [73725.536015] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:26:02 Telcontar kernel - - - [73755.616017] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:26:32 Telcontar kernel - - - [73785.696017] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:27:02 Telcontar kernel - - - [73815.776016] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:27:32 Telcontar kernel - - - [73845.856021] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:28:01 Telcontar systemd 1 - -  Starting Session 563 of user news.
> <0.4> 2014-08-11 15:28:02 Telcontar kernel - - - [73875.936014] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:28:32 Telcontar kernel - - - [73906.016015] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:29:02 Telcontar kernel - - - [73936.096017] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:29:32 Telcontar kernel - - - [73966.176012] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - -  Starting Session 564 of user root.
> <3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - -  Starting Session 565 of user cer.
> <1.6> 2014-08-11 15:30:01 Telcontar run-crons 8974 - -  suse.de-snapper: OK
> <4.5> 2014-08-11 15:30:01 Telcontar su - - -  (to root) root on (null)
> <10.3> 2014-08-11 15:30:01 Telcontar su - - -  pam_systemd(su-l:session): pam_putenv: delete non-existent entry; XDG_RUNTIME_DIR
> <0.4> 2014-08-11 15:30:02 Telcontar kernel - - - [73996.256010] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:30:32 Telcontar kernel - - - [74026.336013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:31:03 Telcontar kernel - - - [74056.416012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:31:33 Telcontar kernel - - - [74086.496011] XFS (sdd5): xfs_log_force: error 5 returned.
> <4.5> 2014-08-11 15:31:59 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
> <4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  GLib-GObject: invalid unclassed pointer in cast to 'GkmObject'
> <4.3> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: gkm_object_expose_full: assertion 'GKM_IS_OBJECT (self)' failed
> <4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
> <4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
> <4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  Gkm: couldn't create temporary file for: /home/cer/.gnome2/keyrings/login.keyring: Input/output error
> <4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - -  couldn't create login keyring: An error occurred on the device
> <10.3> 2014-08-11 15:32:00 Telcontar unix2_chkpwd - - -  gkr-pam: the password for the login keyring was invalid.
> <0.4> 2014-08-11 15:32:03 Telcontar kernel - - - [74116.576018] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:32:33 Telcontar kernel - - - [74146.656011] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:33:01 Telcontar systemd 1 - -  Starting Session 566 of user news.
> <0.4> 2014-08-11 15:33:03 Telcontar kernel - - - [74176.736068] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:33:33 Telcontar kernel - - - [74206.816012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:34:03 Telcontar kernel - - - [74236.896017] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:34:33 Telcontar kernel - - - [74266.976014] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:35:01 Telcontar systemd 1 - -  Starting Session 567 of user news.
> <0.4> 2014-08-11 15:35:03 Telcontar kernel - - - [74297.056012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:35:33 Telcontar kernel - - - [74327.136015] XFS (sdd5): xfs_log_force: error 5 returned.
> <1.6> 2014-08-11 15:35:56 Telcontar run-crons 8974 - -  leafnode: OK
> <3.6> 2014-08-11 15:35:56 Telcontar systemd 1 - -  Reloading System Logging Service.
> <3.6> 2014-08-11 15:35:57 Telcontar systemd 1 - -  Reloaded System Logging Service.
> <5.6> 2014-08-11 15:35:57 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
> <3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - -  Reloading System Logging Service.
> <3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - -  Reloaded System Logging Service.
> <0.4> 2014-08-11 15:36:03 Telcontar kernel - - - [74357.216013] XFS (sdd5): xfs_log_force: error 5 returned.
> <1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - -  logrotate: OK
> <3.2> 2014-08-11 15:36:06 Telcontar mdadm 9290 - -  DegradedArray event detected on md device /dev/md0
> <1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - -  mdadm: OK
> <4.5> 2014-08-11 15:36:06 Telcontar su - - -  (to root) root on (null)
> <1.4> 2014-08-11 15:36:25 Telcontar run-crons 8974 - -  mlocate.cron returned 143
> <1.6> 2014-08-11 15:36:25 Telcontar run-crons 8974 - -  packagekit-background.cron: OK
> <1.6> 2014-08-11 15:36:26 Telcontar run-crons 8974 - -  suse-clean_catman: OK
> <0.4> 2014-08-11 15:36:33 Telcontar kernel - - - [74387.296018] XFS (sdd5): xfs_log_force: error 5 returned.
> <1.6> 2014-08-11 15:36:41 Telcontar run-crons 8974 - -  suse-do_mandb: OK
> <1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - -  suse-texlive: OK
> <1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - -  suse.cron-sa-update: OK
> <1.6> 2014-08-11 15:36:58 Telcontar run-crons 8974 - -  suse.de-backup-rc.config: OK
> <0.4> 2014-08-11 15:37:04 Telcontar kernel - - - [74417.376010] XFS (sdd5): xfs_log_force: error 5 returned.
> <1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-backup-rpmdb: OK
> <0.4> 2014-08-11 15:37:34 Telcontar kernel - - - [74447.456013] XFS (sdd5): xfs_log_force: error 5 returned.
> <1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-check-battery: OK
> <1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-cron-local: OK
> <1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-faxcron: OK
> <1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - -  suse.de-snapper: OK
> <3.6> 2014-08-11 15:38:01 Telcontar systemd 1 - -  Starting Session 568 of user news.
> <0.4> 2014-08-11 15:38:04 Telcontar kernel - - - [74477.536013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:38:34 Telcontar kernel - - - [74507.616019] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:39:04 Telcontar kernel - - - [74537.696013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:39:34 Telcontar kernel - - - [74567.776014] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:40:01 Telcontar systemd 1 - -  Starting Session 569 of user cer.
> <0.4> 2014-08-11 15:40:04 Telcontar kernel - - - [74597.856013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:40:34 Telcontar kernel - - - [74627.936021] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:41:04 Telcontar kernel - - - [74658.016012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:41:34 Telcontar kernel - - - [74688.096019] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:42:04 Telcontar kernel - - - [74718.176018] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:42:34 Telcontar kernel - - - [74748.256017] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:43:01 Telcontar systemd 1 - -  Starting Session 570 of user news.
> <0.4> 2014-08-11 15:43:04 Telcontar kernel - - - [74778.336012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:43:35 Telcontar kernel - - - [74808.416013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:44:05 Telcontar kernel - - - [74838.496014] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:44:35 Telcontar kernel - - - [74868.576013] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - -  Starting Session 571 of user root.
> <3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - -  Starting Session 572 of user news.
> <0.4> 2014-08-11 15:45:05 Telcontar kernel - - - [74898.656019] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:45:35 Telcontar kernel - - - [74928.736017] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:46:05 Telcontar kernel - - - [74958.816015] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:46:35 Telcontar kernel - - - [74988.896026] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:47:05 Telcontar kernel - - - [75018.976014] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:47:35 Telcontar kernel - - - [75049.056013] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:48:01 Telcontar systemd 1 - -  Starting Session 573 of user news.
> <0.4> 2014-08-11 15:48:05 Telcontar kernel - - - [75079.136016] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:48:35 Telcontar kernel - - - [75109.216014] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:49:05 Telcontar kernel - - - [75139.296014] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:49:36 Telcontar kernel - - - [75169.376013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:50:06 Telcontar kernel - - - [75199.456012] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:50:36 Telcontar kernel - - - [75229.536011] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.4> 2014-08-11 15:51:06 Telcontar kernel - - - [75259.616013] XFS (sdd5): xfs_log_force: error 5 returned.
> <0.6> 2014-08-11 15:51:09 Telcontar kernel - - - [75262.721354] xfce4-session[4520]: segfault at 8 ip 00000000004164dc sp 00007fffdc291dc0 error 4 in xfce4-session[400000+2b00
> <4.6> 2014-08-11 15:51:18 Telcontar systemd-logind 1021 - -  Removed session 8.
> <10.5> 2014-08-11 15:51:18 Telcontar polkitd 4314 - -  Unregistered Authentication Agent for unix-session:10 (system bus name :1.69, object path /org/gnome/PolicyKit1/Authenti
> <0.7> 2014-08-11 15:51:28 Telcontar kernel - - - [75282.132776] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
> <3.6> 2014-08-11 15:51:29 Telcontar acpid - - -  1 client rule loaded
> <4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - -  Removed session 10.
> <0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.176020] usb 1-6: new high-speed USB device number 4 using ehci-pci
> <3.6> 2014-08-11 15:51:30 Telcontar systemd 1 - -  Starting Session 574 of user lightdm.
> <4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - -  New session 574 of user lightdm.
> <4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - -  Linked /tmp/.X11-unix/X0 to /run/user/127/X11-display.
> <0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291822] usb 1-6: New USB device found, idVendor=8564, idProduct=1000
> <0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291825] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> <0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291828] usb 1-6: Product: Mass Storage Device
> <0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291829] usb 1-6: Manufacturer: JetFlash
> <0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291831] usb 1-6: SerialNumber: 346YLQ4L0G5H8S2F
> <1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - -  checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:1a.7/usb1/1-6"
> <1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - -  bus: 1, device: 4 was not an MTP device
> <0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667399] usb-storage 1-6:1.0: USB Mass Storage device detected
> <0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667502] scsi12 : usb-storage 1-6:1.0
> <0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667606] usbcore: registered new interface driver usb-storage
> <0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794904] scsi 12:0:0:0: Direct-Access     JetFlash Transcend 4GB    1100 PQ: 0 ANSI: 4
> <0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794976] scsi 12:0:0:0: alua: supports implicit and explicit TPGS
> <0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796262] scsi 12:0:0:0: alua: No target port descriptors found
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796265] scsi 12:0:0:0: alua: not attached
> <0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796396] sd 12:0:0:0: Attached scsi generic sg6 type 0
> <0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796888] sd 12:0:0:0: [sdf] 7913472 512-byte logical blocks: (4.05 GB/3.77 GiB)
> <0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797634] sd 12:0:0:0: [sdf] Write Protect is off
> <0.7> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797637] sd 12:0:0:0: [sdf] Mode Sense: 43 00 00 00
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798386] sd 12:0:0:0: [sdf] No Caching mode page found
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798388] sd 12:0:0:0: [sdf] Assuming drive cache: write through
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801508] sd 12:0:0:0: [sdf] No Caching mode page found
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801511] sd 12:0:0:0: [sdf] Assuming drive cache: write through
> <0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.802147]  sdf: sdf1 sdf2 sdf3
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805642] sd 12:0:0:0: [sdf] No Caching mode page found
> <0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805645] sd 12:0:0:0: [sdf] Assuming drive cache: write through
> <0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805648] sd 12:0:0:0: [sdf] Attached SCSI removable disk
> <0.4> 2014-08-11 15:51:36 Telcontar kernel - - - [75289.696019] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - -  Starting Getty on tty2...
> <3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - -  Started Getty on tty2.
> <3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - -  Starting Getty on tty3...
> <3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - -  Started Getty on tty3.
> <3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - -  Starting Getty on tty6...
> <3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - -  Started Getty on tty6.
> <3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - -  Starting Getty on tty5...
> <3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - -  Started Getty on tty5.
> <3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - -  Starting Getty on tty4...
> <3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - -  Started Getty on tty4.
> <0.4> 2014-08-11 15:52:06 Telcontar kernel - - - [75319.776023] XFS (sdd5): xfs_log_force: error 5 returned.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Unmounting /data/raid...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Unmounting /data/cripta...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping /sys/devices/virtual/block/dm-0.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  message repeated 5 times: [ Stopping /sys/devices/virtual/block/dm-0.]
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Session 574 of user lightdm.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped Session 574 of user lightdm.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Session 7 of user root.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped Session 7 of user root.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping user-0.slice.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Removed slice user-0.slice.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Stop Read-Ahead Data Collection 10s After Completed Startup.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped Stop Read-Ahead Data Collection 10s After Completed Startup.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping User Manager for 1000...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping User Manager for 9...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping User Manager for 127...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping CUPS Printing Service...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping ifup managed network interface eth1...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping ifup managed network interface eth0...
> <3.6> 2014-08-11 15:17:44 Telcontar systemd 4377 - -  message repeated 14 times: [ Time has been changed]
> <3.3> 2014-08-11 15:52:08 Telcontar systemd 4377 - -  Failed to enqueue exit.target job: Unit exit.target failed to load: Input/output error.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Graphical Interface.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped target Graphical Interface.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: X Display Manager...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping helloworld.service...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped helloworld.service.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping Multi-User System.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopped target Multi-User System.
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: virus scanner daemon...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: Start the hddtemp daemon...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: mdadmd daemon monitoring MD devices...
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: This services starts and stops the USB Arbitrator....
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: Supports the direct execution of binary formats....
> <3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - -  Stopping LSB: irqbalance daemon providing irq balancing on MP-machines...
> <3.6> 2014-08-11 15:52:09 Telcontar systemd 1 - -  Stopping LSB: Set up analog joysticks...
> <0.4> 2014-08-11 15:52:10 Telcontar kernel - - - [75323.547122] nfsd: last server has exited, flushing export cache
> <5.6> 2014-08-11 15:36:02 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
> <5.6> 2014-08-11 15:52:11 Telcontar rsyslogd - - -  [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] exiting on signal 15.
> 2014-08-11 15:52:12+02:00 - Halting the system now  =========================================== uptime:  15:52pm  up 1 day 20:54,  1 user,  load average: 5.94, 2.47, 1.22
> 
> 
> 
> >I am interested in the metadata dump.
> 
> Ok, sure, no problem. I'm working on that, but I need to have lunch first ;-)
> 
> 
> >Also, some one hit back to back duplicate block allocation
> >XFS_WANT_CORRUPTED_GOTO bugs, you may want to do a metadata dump before
> >and after the xfs_repair in case you hit it again soon.
> 
> I already have a metadata dump, and I have not attempted to repair yet (I'm
> doing a full dd copy of partition, and it is 400 Gigs). I will obtain
> another metadadump after repair, and I can upload both to google drive.
> 
> But first I need sustenance :-)
> 
> (At least this time I do not have any pressing thing to do on the
> computer...)
> 
> - -- Cheers
>        Carlos E. R.
> 
>        (from 13.1 x86_64 "Bottle" (Minas Tirith))
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iF0EAREIAAYFAlPo4p4ACgkQja8UbcUWM1wGAADxAVuTUPkxG+LO29VzehJ8cSPV
> uItG/Puu2KbqUeCyXwD/cgu/+F7vhEeU9WEbNP5eifhmyu0T3ByDMtuKp55Rj7A=
> =CgSx
> -----END PGP SIGNATURE-----

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 14:58     ` Carlos E. R.
@ 2014-08-11 17:05       ` Carlos E. R.
  2014-08-11 21:31         ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 17:05 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4704 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



El 2014-08-11 a las 16:58 +0200, Carlos E. R. escribió:

> Ok, will do.
>
> I will create a backup of my partition, with xfsdump, after attempting
> repair of the partition, and reboot, and see (without the reformat cycle).
>
> At this instant I'm doing a full dd of the partition, just in case it
> becomes useful.

linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -V
xfs_repair version 3.1.11

It is a live system, so I acan't update it. If I boot from the main 
system, that has a more modern xfs_repair, systemd will attempt mount and 
automated repair, and we will not get any logs.

linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -v /dev/sdd5
Phase 1 - find and verify superblock...
         - block cache size set to 753952 entries
Phase 2 - using internal log
         - zero log...
zero_log: head block 65662 tail block 65607
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 #


linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # mount -v /dev/sdd5 mnt/
mount: /dev/sdd5 mounted on /run/media/linux/d_storage/xfs_disaster_home/20140811/mnt.
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # umount mnt
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 #

dmesg:

[10266.034290] XFS (sdd5): Mounting Filesystem
[10266.073739] XFS (sdd5): Starting recovery (logdev: internal)
[10266.690325] XFS (sdd5): Ending recovery (logdev: internal)
linux@linux:~>

dmesg --ctime

[Mon Aug 11 16:47:12 2014] XFS (sdd5): Mounting Filesystem
[Mon Aug 11 16:47:12 2014] XFS (sdd5): Starting recovery (logdev: internal)
[Mon Aug 11 16:47:12 2014] XFS (sdd5): Ending recovery (logdev: internal)
linux@linux:~>



linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -v /dev/sdd5
Phase 1 - find and verify superblock...
         - block cache size set to 753952 entries
Phase 2 - using internal log
         - zero log...
zero_log: head block 65700 tail block 65700
         - scan filesystem freespace and inode maps...
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
agf_freeblks 27745492, counted 27745496 in ag 1
sb_fdblocks 115565042, counted 115565046
         - found root inode chunk
Phase 3 - for each AG...
         - scan and clear agi unlinked lists...
         - process known inodes and perform inode discovery...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
         - setting up duplicate extent list...
         - check for inodes claiming duplicate blocks...
         - agno = 2
         - agno = 1
         - agno = 3
         - agno = 0
Phase 5 - rebuild AG headers and trees...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - reset superblock...
Phase 6 - check inode connectivity...
         - resetting contents of realtime bitmap and summary inodes
         - traversing filesystem ...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - traversal finished ...
         - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

         XFS_REPAIR Summary    Mon Aug 11 16:48:08 2014

Phase		Start		End		Duration
Phase 1:	08/11 16:47:49	08/11 16:47:49 
Phase 2:	08/11 16:47:49	08/11 16:47:52	3 seconds
Phase 3:	08/11 16:47:52	08/11 16:48:07	15 seconds
Phase 4:	08/11 16:48:07	08/11 16:48:07 
Phase 5:	08/11 16:48:07	08/11 16:48:07 
Phase 6:	08/11 16:48:07	08/11 16:48:07 
Phase 7:	08/11 16:48:07	08/11 16:48:07

Total run time: 18 seconds
done
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 #


I don't understand all it says, but aparently it does not detect any 
problem.

dmesg doesn't have any more entries.


Now I'm going to create an xfsdump of it, and reboot, without rebuilding. 
Then I'll upload the metadata to google drive.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPo98kACgkQja8UbcUWM1y78wD+MznL/4Ht53XAOw+CN/4ThhqQ
P4cN85akyVugU+T6zusA/313Z3PHezJe2oUTx1dFpQpV8Lf+LSgtlHVZ0M4xL+sz
=/iid
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 16:14       ` Brian Foster
@ 2014-08-11 17:08         ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 17:08 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1058 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Content-ID: <alpine.LSU.2.11.1408111908330.7326@minas-tirith.valinor>


El 2014-08-11 a las 12:14 -0400, Brian Foster escribió:
> On Mon, Aug 11, 2014 at 05:34:46PM +0200, Carlos E. R. wrote:


> This reminds me that it might be interesting to tune the eofblocks
> scanner to be more aggressive and see if that helps reproduce. This
> thread that's running here normally runs every 5 minutes by default, but
> it can be tuned to run at a user-defined interval via the following
> /proc file:
>
> # cat /proc/sys/fs/xfs/speculative_prealloc_lifetime
> 300
>
> I wonder if setting it to 30s or so ('echo 30 > /proc/...') and running
> some hibernation cycles would help...

Ok, I can try that.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPo+K0ACgkQja8UbcUWM1zxVgD+LFcJ3OBRJhn9LVrwFxf98H1u
n7ubtexF8XXKWcUdjv0A/Ail1kzWxIgQkevunUQ/3UnbthtWqnyniHP4qKVUWv8m
=AauB
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 15:34     ` Carlos E. R.
  2014-08-11 16:14       ` Brian Foster
@ 2014-08-11 21:27       ` Mark Tinguely
  2014-08-11 21:50         ` Carlos E. R.
  1 sibling, 1 reply; 56+ messages in thread
From: Mark Tinguely @ 2014-08-11 21:27 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On 08/11/14 10:34, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
>
> El 2014-08-11 a las 09:57 -0500, Mark Tinguely escribió:
>> On 08/11/14 09:23, Carlos E. R. wrote:
>
>> Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?
>
> This time?
> Did not look at the log yet. Let me see...
>
> Here is the full log of the event. It starts prior to hibernating, all
> things nominal. And ends on shutdown (had to hit reset button, despite
> what log says). If you want to see entries prior to that, since boot, I
> can do that.
>

...

so XFS gave a forced shutdown after the machine came back from 
hibernation. After replaying the log, there were no errors in xfs_repair.

We should have quiesced the metadata/log before freezing xfs. Was there 
a lot of items in the log?

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 17:05       ` Carlos E. R.
@ 2014-08-11 21:31         ` Carlos E. R.
       [not found]           ` <53E938CC.4010103@sgi.com>
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 21:31 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1345 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



El 2014-08-11 a las 19:05 +0200, Carlos E. R. escribió:

> linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -V
> xfs_repair version 3.1.11
>
> It is a live system, so I acan't update it. If I boot from the main

...

> Now I'm going to create an xfsdump of it, and reboot, without rebuilding.
> Then I'll upload the metadata to google drive.


I have just booted the main system, in text mode, logged as root. Look:

Telcontar:/data/storage_d/xfs_disaster_home/20140811 # time xfs_metadump -g -w /dev/sdc5 tgtfile_20140811_obfus_after_repair_bis
Copied 231552 of 231552 inodes (3 of 4 AGs)
xfs_metadump: invalid dqblk inode number (-1)
Copying log

real    0m20.044s
user    0m1.527s
sys     0m1.174s
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #
Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_metadump -V
xfs_metadump version 3.2.1


And that was after running xfs_repair 3.2.1, which found nothing...


Does that give any ideas?


- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPpNloACgkQja8UbcUWM1xa/wD/XzXCXoTni1LL13iBgD7XzTrv
1D6PLaMyIYsLiE9K0PYA/j7sXcWkvZV27fpfIdlU4ECyid6iULLdlQN4oSE56O2C
=rZOy
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 21:27       ` Mark Tinguely
@ 2014-08-11 21:50         ` Carlos E. R.
  2014-08-11 21:56           ` Mark Tinguely
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 21:50 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 828 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



El 2014-08-11 a las 16:27 -0500, Mark Tinguely escribió:
> On 08/11/14 10:34, Carlos E. R. wrote:

>
> so XFS gave a forced shutdown after the machine came back from hibernation. 
> After replaying the log, there were no errors in xfs_repair.
>
> We should have quiesced the metadata/log before freezing xfs. Was there a lot 
> of items in the log?

Sorry, what log? The /var/log/messages file? I posted it in full, from 
before the hibernation to powerdown.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPpOqUACgkQja8UbcUWM1xpewD/XaCmyL60x0lqs8PuoA9xfSTn
8CNxtpVX78L3O/1RsdYA/3U9CIA0uOCI9Lk4t0KO5xiLLjCZi/+AlvUtAbCfCQCB
=VEcc
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 21:50         ` Carlos E. R.
@ 2014-08-11 21:56           ` Mark Tinguely
  2014-08-11 22:36             ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Mark Tinguely @ 2014-08-11 21:56 UTC (permalink / raw)
  To: xfs

On 08/11/14 16:50, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
>
> El 2014-08-11 a las 16:27 -0500, Mark Tinguely escribió:
>> On 08/11/14 10:34, Carlos E. R. wrote:
>
>>
>> so XFS gave a forced shutdown after the machine came back from
>> hibernation. After replaying the log, there were no errors in xfs_repair.
>>
>> We should have quiesced the metadata/log before freezing xfs. Was
>> there a lot of items in the log?
>
> Sorry, what log? The /var/log/messages file? I posted it in full, from
> before the hibernation to powerdown.
>
> - -- Cheers
> Carlos E. R.

Sorry, I was referring to the XFS log.

If you had a metadata dump before mounting/xfs_repair, then you can 
display the xfs log using the xfs_logprint.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
       [not found]           ` <53E938CC.4010103@sgi.com>
@ 2014-08-11 22:01             ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 22:01 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1582 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Content-ID: <alpine.LSU.2.11.1408112355480.17839@minas-tirith.valinor>


El 2014-08-11 a las 16:42 -0500, Mark Tinguely escribió:
> On 08/11/14 16:31, Carlos E. R. wrote:


>> Does that give any ideas?


> Which version of Linux?

Telcontar:~ # cat /etc/os-release
NAME=openSUSE
VERSION="13.1 (Bottle)"
VERSION_ID="13.1"
PRETTY_NAME="openSUSE 13.1 (Bottle) (x86_64)"
ID=opensuse
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:opensuse:13.1"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://opensuse.org/"
ID_LIKE="suse"
Telcontar:~ #
Telcontar:~ # uname -a
Linux Telcontar 3.11.10-17-desktop #1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f) x86_64 x86_64 x86_64 GNU/Linux
Telcontar:~ # rpm -q xfsprogs
xfsprogs-3.2.1-40.1.x86_64



> Did you get a metadata dump before the xfs_repair?

Yes, sure. I said so on another post. I'm on the process of starting up 
the machine, when I noticed that error:

xfs_metadump: invalid dqblk inode number (-1)


being the first time I see that error, I'm wondering if going ahead with 
mounting and using, as explained on other posts today, or wait for 
different instructions from you people.

I'll try meanwhile to upload the metadata files using another machine.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPpPTcACgkQja8UbcUWM1wm3wD+KTezrh+/SRAFFIVGCeJcHkgE
MeJ5PMzceqDlUDOb4Q0A/jO8O5TkHCS1oDqI3zfdIExTUiel1GAcbMDNkk501Cf+
=6qE9
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 21:56           ` Mark Tinguely
@ 2014-08-11 22:36             ` Carlos E. R.
  2014-08-12  0:17               ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-11 22:36 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7010 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Content-ID: <alpine.LSU.2.11.1408120019370.17839@minas-tirith.valinor>


El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> On 08/11/14 16:50, Carlos E. R. wrote:

>>> We should have quiesced the metadata/log before freezing xfs. Was
>>> there a lot of items in the log?
>> 
>> Sorry, what log? The /var/log/messages file? I posted it in full, from
>> before the hibernation to powerdown.
>> 
>
> Sorry, I was referring to the XFS log.
>
> If you had a metadata dump before mounting/xfs_repair, then you can display 
> the xfs log using the xfs_logprint.

Ah! Ok :-)



Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_logprint -f tgtfile_20140811
xfs_logprint:
     data device: 0xffffffffffffffff
     log device: 0xffffffffffffffff daddr: 0 length: 820476

cycle: 3        version: 2              lsn: 3,65730    tail_lsn: 3,65561
length of Log Record: 1024      prev offset: 65667              num ops: 10
uuid: 3a35756d-1b63-4b9b-9b3a-c12c8951b678   format: little endian linuxh_size: 32768
- ----------------------------------------------------------------------------
Oper (0): tid: b80486ac  len: 0  clientid: TRANS  flags: START
- ----------------------------------------------------------------------------
Oper (1): tid: b80486ac  len: 16  clientid: TRANS  flags: none
TRAN:    type: CHECKPOINT       tid: b80486ac       num_items: 7
- ----------------------------------------------------------------------------
Oper (2): tid: b80486ac  len: 56  clientid: TRANS  flags: none
INODE: #regs: 3   ino: 0x20fcbc83  flags: 0x5   dsize: 96
         blkno: 264281664  len: 16  boff: 768
Oper (3): tid: b80486ac  len: 96  clientid: TRANS  flags: none
INODE CORE
magic 0x494e mode 0100644 version 2 format 2
nlink 1 uid 1000 gid 100
atime 0x53b9d2fd mtime 0x53c05b34 ctime 0x53c05b34
size 0x90f8 nblocks 0xa extsize 0x0 nextents 0x6
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0xd2610167
Oper (4): tid: b80486ac  len: 96  clientid: TRANS  flags: none
EXTENTS inode data
- ----------------------------------------------------------------------------
Oper (5): tid: b80486ac  len: 56  clientid: TRANS  flags: none
INODE: #regs: 2   ino: 0x6048329c  flags: 0x1   dsize: 0
         blkno: 770365760  len: 16  boff: 7168
Oper (6): tid: b80486ac  len: 96  clientid: TRANS  flags: none
INODE CORE
magic 0x494e mode 0100600 version 2 format 2
nlink 1 uid 1000 gid 100
atime 0x53c05b25 mtime 0x53c05b34 ctime 0x53c05b34
size 0x0 nblocks 0x0 extsize 0x0 nextents 0x0
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x69e7b261
- ----------------------------------------------------------------------------
Oper (7): tid: b80486ac  len: 56  clientid: TRANS  flags: none
INODE: #regs: 2   ino: 0x600814ef  flags: 0x1   dsize: 0
         blkno: 768264816  len: 16  boff: 3840
Oper (8): tid: b80486ac  len: 96  clientid: TRANS  flags: none
INODE CORE
magic 0x494e mode 0100600 version 2 format 2
nlink 1 uid 1000 gid 100
atime 0x53b6ef00 mtime 0x53c05b34 ctime 0x53c05b34
size 0x1 nblocks 0x1 extsize 0x0 nextents 0x1
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x97ccc0ee
- ----------------------------------------------------------------------------
Oper (9): tid: b80486ac  len: 0  clientid: TRANS  flags: COMMIT

============================================================================
cycle: 3        version: 2              lsn: 3,65733    tail_lsn: 3,65561
length of Log Record: 32256     prev offset: 65730              num ops: 176
uuid: 3a35756d-1b63-4b9b-9b3a-c12c8951b678   format: little endian linux
h_size: 32768
**********************************************************************
* ERROR: data block=379316                                            *
**********************************************************************
Bad data in log
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #



But I have no idea what any of that means.


Notice that the metadata was obtained using tools version 3.1.11, but the 
print above was made using tools version 3.2.1 - in case that has any 
relevance.



And, same operation on the metadata obtained after running repairs:



Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_logprint -f tgtfile_20140811_after_repair
xfs_logprint:
     data device: 0xffffffffffffffff
     log device: 0xffffffffffffffff daddr: 0 length: 820428

Log inconsistent or not a log (last==0, first!=1)
xfs_logprint: after 7 zeroed blocks
**********************************************************************
* ERROR: found data after zeroed blocks block=13                     *
**********************************************************************
Bad log - data after zeroed blocks
Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_logprint -f tgtfile_20140811_after_repair_bis
xfs_logprint:
     data device: 0xffffffffffffffff
     log device: 0xffffffffffffffff daddr: 0 length: 820428

Log inconsistent or not a log (last==0, first!=1)
xfs_logprint: after 7 zeroed blocks
**********************************************************************
* ERROR: found data after zeroed blocks block=13                     *
**********************************************************************
Bad log - data after zeroed blocks
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #




Telcontar:/data/storage_d/xfs_disaster_home/20140811 # file 
tgtfile_20140811*
tgtfile_20140811:                        XFS filesystem metadump image
tgtfile_20140811_after_repair:           XFS filesystem metadump image
tgtfile_20140811_after_repair_bis:       XFS filesystem metadump image
tgtfile_20140811_obfus:                  XFS filesystem metadump image
tgtfile_20140811_obfus_after_repair:     XFS filesystem metadump image
tgtfile_20140811_obfus_after_repair_bis: XFS filesystem metadump image
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #


tgtfile_20140811  is the metadata obtained before any mount or
    repairs, using tools 3.1.11.

tgtfile_20140811_after_repair  is the metadata obtained after
    mount and repair, using tools 3.1.11.

tgtfile_20140811_after_repair_bis is the metadata obtained after
    mount and repair, using tools 3.2.1



I will now attempt to upload the three obfuscated files. Sizes are quite 
different, after compression:

Telcontar:/data/storage_d/xfs_disaster_home/20140811/tmp # ls -lh
total 51M
  26M Aug 11 16:21 tgtfile_20140811_obfus.xz
  13M Aug 11 18:54 tgtfile_20140811_obfus_after_repair.xz
  13M Aug 11 23:16 tgtfile_20140811_obfus_after_repair_bis.xz

but all of them are about 401M before compression. The upload will take 
long, my ADSL upload is 0.3M/s at most.


- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPpRXYACgkQja8UbcUWM1zsXwD/aGq1sLIqPi6U7nOTeB66B5CO
dTpXM/WZtk2gJl7JhvwBAIOeh+TkHN1+rdOQj3z80KG17IuOHpu/wrrPlZ+YqMPR
=3JHj
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 22:36             ` Carlos E. R.
@ 2014-08-12  0:17               ` Carlos E. R.
  2014-08-12 16:51                 ` Brian Foster
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-12  0:17 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 869 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>


El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:

> but all of them are about 401M before compression. The upload will take
> long, my ADSL upload is 0.3M/s at most.


I have shared (view) on google drive a folder with the three files. Both 
Brian Foster and Mark Tinguely should have got a link on the mail from me. 
If somebody else wants access, just tell me.

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlPpXQYACgkQja8UbcUWM1wQ9gEAl1WI24UDArdlWHh3J2ih3AV3
nMTwDRqTrT0Rk2BJOB8A/1BOzzn3/IX16sPCsYoqGEyXNHcNXWBHENShlyWzJGUr
=W+BG
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12  0:17               ` Carlos E. R.
@ 2014-08-12 16:51                 ` Brian Foster
  2014-08-12 21:17                   ` Carlos E. R.
  2014-08-12 21:27                   ` Eric Sandeen
  0 siblings, 2 replies; 56+ messages in thread
From: Brian Foster @ 2014-08-12 16:51 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
> 
> 
> El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
> >El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> 
> >but all of them are about 401M before compression. The upload will take
> >long, my ADSL upload is 0.3M/s at most.
> 
> 
> I have shared (view) on google drive a folder with the three files. Both
> Brian Foster and Mark Tinguely should have got a link on the mail from me.
> If somebody else wants access, just tell me.
> 

I see the same thing from repair that was in your repair output:

block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2

If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.

Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.

Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).

Given the failure occurs on freeing an extent via the xfs_eofblocks
scanner, perhaps this extent was initially allocated as speculative
preallocation and the eofblocks scanner is where we happen to first
identify the corrupted cntbt. What is strange is that, as mentioned
previously, the space appears to be free if I zero the log, so that
means it was probably free before the freeze. It seems highly unlikely
for a file to gain preallocation, be written out and then get trimmed by
the scanner all on wake-from-hibernate.

Carlos,

How long after hibernate does the shutdown/crash typically occur? Do you
basically wake-up and within a few seconds the filesystem crashes, or is
it some time (minutes) later?

If the former, I wonder if it's possible that the scanner returns to
life pointing to a stale or freed incore inode and does something bogus
based on that.

Brian

> - -- Cheers
>        Carlos E. R.
> 
>        (from 13.1 x86_64 "Bottle" (Minas Tirith))
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iF4EAREIAAYFAlPpXQYACgkQja8UbcUWM1wQ9gEAl1WI24UDArdlWHh3J2ih3AV3
> nMTwDRqTrT0Rk2BJOB8A/1BOzzn3/IX16sPCsYoqGEyXNHcNXWBHENShlyWzJGUr
> =W+BG
> -----END PGP SIGNATURE-----

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 16:51                 ` Brian Foster
@ 2014-08-12 21:17                   ` Carlos E. R.
  2014-08-13 12:04                     ` Brian Foster
  2014-08-12 21:27                   ` Eric Sandeen
  1 sibling, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-08-12 21:17 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 10139 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Tuesday, 2014-08-12 at 12:51 -0400, Brian Foster wrote:
> On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:

> I see the same thing from repair that was in your repair output:
>
> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2

Is it possible to find out what file uses that block?
I have a non-obfuscated copy of the metadata. Knowing the file, we can 
know what application is involved - and that might help, or perhaps not.


> If I take a look at the btrees as is, I see "235:[12608397,10]" included
> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
> 0x2000781). If I skip the mount, zero the log and repair, everything
> seems Ok. I can allocate the remainder of available space and rm -rf
> everything in the fs without an error.
>
> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
> the cntbt, which is clearly a duplicate entry. This is what repair
> detects and cleans up and seems to lead to the shutdown. E.g., if I
> mount and use the fs, I can hit an assert or failure just by attempting
> to allocate the rest of the space in the fs. If that is the state of the
> fs on disk, it's only a matter of time we explode due to allocating and
> freeing that range of space or possibly attempting to allocate that
> space twice.

I'm not sure if I follow you.

The sequence of events here is:

  a) hibernate
  b) thaw
  c) immediately, in memory corruption found and kernel error message.
     Filesystem is switched to read only.
     System is unstable, has to be halted or rebooted.
     Umount is impossible.

  d) (¬) Reboot
  e) Mount (¬), manual umount, xfs_repair (¬), mount
     (photos of metadata taken at the appropriate points (marked with ¬))


This the point I'm at now. Are you saying that the filesystem can explode 
at any time now? I have not written any files, beyond what the desktop 
does automatically.



What I have not done (on your request), this time, is:

  f) backup, format, restore.




> Mark mentioned that he didn't see the superblock item in the log with
> regard to the freeze. I don't see that either... which perhaps suggests
> that this all happens during the wake-from-hibernate sequence..? My
> understanding is that we should freeze on hibernate, thus force
> everything out to the log, write an unmount record and then dirty the
> log with a superblock transaction. Therefore, that should be the only
> item in the log post-freeze. Here, we have various items in the log
> including several logged buffers that correspond to the cntbt block that
> ends up corrupted (daddr 0xf427c08).
>
> Given the failure occurs on freeing an extent via the xfs_eofblocks
> scanner, perhaps this extent was initially allocated as speculative
> preallocation and the eofblocks scanner is where we happen to first
> identify the corrupted cntbt. What is strange is that, as mentioned
> previously, the space appears to be free if I zero the log, so that
> means it was probably free before the freeze. It seems highly unlikely
> for a file to gain preallocation, be written out and then get trimmed by
> the scanner all on wake-from-hibernate.


Well, I understand little of that, but if you do, and can do whatever 
modifications need to be done to the code, that's fine with me :-)



> Carlos,
>
> How long after hibernate does the shutdown/crash typically occur? Do you
> basically wake-up and within a few seconds the filesystem crashes, or is
> it some time (minutes) later?

Instantly during the wake-up (thaw), according to the log.

I'm typically not present when it happens: my routine is switch on the 
computer, then go make coffee/tea, and then return and start using the 
machine. It takes a minute or two to wake up from hibernation, and then 
the machine is sluggish for a minute or two more while processes start 
doing things and claiming chunks from swap, mail is fetched, etc.

And instead of starting work, I find the machine in a bad state.


Look, an excerpt from the last event (the full log is in another post 
yesterday), but taken from another log file with finer grained timestaps:


<30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - -  Shutting down network time protocol daemon (NTPD)..done
<30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - -  Stopped LSB: Network time protocol daemon (ntpd).
<28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - -  Hibernating (95)...
<7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
<4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
<4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
<6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)


The "Hibernating (95)" is written by a script of mine in
"/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
line.

Then the machine wakes up, hours later - despite the timestamp not saying 
so (the time jump is written instead lines above):


<6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] CPU3 is up
<6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
<6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
<7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset

...


<6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
<6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
<4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
<4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
<7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
<28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - -  The canary thread is apparently starving. Taking action.
<30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - -  Demoting known real-time threads.
<29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - -  Demoted 3 threads.
<20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - -  imap: Warning: Time jumped forwards 33996 seconds
<20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - -  imap: Warning: Time jumped forwards 35660 seconds
<22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - -  imap(cer): Disconnected for inactivity in=237010 out=9273919
<1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
<1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]


...


<5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
<1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
<1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - -  Time has been changed
<30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - -  1 client rule loaded
<28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - -  Thawing (95)...
<29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - -  Starting LSB: Network time protocol daemon (ntpd)...
<4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid


As you see, the corruption is detected instantly after waking up, before 
pm-utils scripts have a chance to run.




> If the former, I wonder if it's possible that the scanner returns to
> life pointing to a stale or freed incore inode and does something bogus
> based on that.


Well, as I said, that's above my understanding ;-)



- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlPqhHwACgkQtTMYHG2NR9WmrwCglBRRHEMgU9mCEHkU9iHqYehX
+1AAn2oUn8/M3Rfb7mLWapLqYxDfvHNv
=9Yft
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 16:51                 ` Brian Foster
  2014-08-12 21:17                   ` Carlos E. R.
@ 2014-08-12 21:27                   ` Eric Sandeen
  2014-08-12 21:57                     ` Dave Chinner
  2014-08-12 21:59                     ` Brian Foster
  1 sibling, 2 replies; 56+ messages in thread
From: Eric Sandeen @ 2014-08-12 21:27 UTC (permalink / raw)
  To: Brian Foster, Carlos E. R.; +Cc: XFS mailing list

On 8/12/14, 9:51 AM, Brian Foster wrote:
> On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
> 
> 
> El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
>>>> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> 
>>>> but all of them are about 401M before compression. The upload will take
>>>> long, my ADSL upload is 0.3M/s at most.
> 
> 
> I have shared (view) on google drive a folder with the three files. Both
> Brian Foster and Mark Tinguely should have got a link on the mail from me.
> If somebody else wants access, just tell me.
> 
> 
>> I see the same thing from repair that was in your repair output:
> 
>> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
> 
>> If I take a look at the btrees as is, I see "235:[12608397,10]" included
>> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
>> 0x2000781). If I skip the mount, zero the log and repair, everything
>> seems Ok. I can allocate the remainder of available space and rm -rf
>> everything in the fs without an error.
> 
>> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
>> the cntbt, which is clearly a duplicate entry. This is what repair
>> detects and cleans up and seems to lead to the shutdown. E.g., if I
>> mount and use the fs, I can hit an assert or failure just by attempting
>> to allocate the rest of the space in the fs. If that is the state of the
>> fs on disk, it's only a matter of time we explode due to allocating and
>> freeing that range of space or possibly attempting to allocate that
>> space twice.
> 
>> Mark mentioned that he didn't see the superblock item in the log with
>> regard to the freeze. I don't see that either... which perhaps suggests
>> that this all happens during the wake-from-hibernate sequence..? My
>> understanding is that we should freeze on hibernate, thus force
>> everything out to the log, write an unmount record and then dirty the
>> log with a superblock transaction. Therefore, that should be the only
>> item in the log post-freeze. Here, we have various items in the log
>> including several logged buffers that correspond to the cntbt block that
>> ends up corrupted (daddr 0xf427c08).

What freeze?  look at hibernate(), nothing but a sync:

/**
 * hibernate - Carry out system hibernation, including saving the image.
 */
int hibernate(void)
{
...
        printk(KERN_INFO "PM: Syncing filesystems ... ");
        sys_sync();
        printk("done.\n");

        error = freeze_processes();
        if (error)
                goto Exit;


AFAIK there is no freeze call involved.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 21:27                   ` Eric Sandeen
@ 2014-08-12 21:57                     ` Dave Chinner
  2014-08-12 21:59                     ` Brian Foster
  1 sibling, 0 replies; 56+ messages in thread
From: Dave Chinner @ 2014-08-12 21:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Brian Foster, Carlos E. R., XFS mailing list

On Tue, Aug 12, 2014 at 02:27:58PM -0700, Eric Sandeen wrote:
> On 8/12/14, 9:51 AM, Brian Foster wrote:
> > On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> > Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
> > 
> > 
> > El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
> >>>> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> > 
> >>>> but all of them are about 401M before compression. The upload will take
> >>>> long, my ADSL upload is 0.3M/s at most.
> > 
> > 
> > I have shared (view) on google drive a folder with the three files. Both
> > Brian Foster and Mark Tinguely should have got a link on the mail from me.
> > If somebody else wants access, just tell me.
> > 
> > 
> >> I see the same thing from repair that was in your repair output:
> > 
> >> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
> > 
> >> If I take a look at the btrees as is, I see "235:[12608397,10]" included
> >> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
> >> 0x2000781). If I skip the mount, zero the log and repair, everything
> >> seems Ok. I can allocate the remainder of available space and rm -rf
> >> everything in the fs without an error.
> > 
> >> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
> >> the cntbt, which is clearly a duplicate entry. This is what repair
> >> detects and cleans up and seems to lead to the shutdown. E.g., if I
> >> mount and use the fs, I can hit an assert or failure just by attempting
> >> to allocate the rest of the space in the fs. If that is the state of the
> >> fs on disk, it's only a matter of time we explode due to allocating and
> >> freeing that range of space or possibly attempting to allocate that
> >> space twice.
> > 
> >> Mark mentioned that he didn't see the superblock item in the log with
> >> regard to the freeze. I don't see that either... which perhaps suggests
> >> that this all happens during the wake-from-hibernate sequence..? My
> >> understanding is that we should freeze on hibernate, thus force
> >> everything out to the log, write an unmount record and then dirty the
> >> log with a superblock transaction. Therefore, that should be the only
> >> item in the log post-freeze. Here, we have various items in the log
> >> including several logged buffers that correspond to the cntbt block that
> >> ends up corrupted (daddr 0xf427c08).
> 
> What freeze?  look at hibernate(), nothing but a sync:
> 
> /**
>  * hibernate - Carry out system hibernation, including saving the image.
>  */
> int hibernate(void)
> {
> ...
>         printk(KERN_INFO "PM: Syncing filesystems ... ");
>         sys_sync();
>         printk("done.\n");
> 
>         error = freeze_processes();
>         if (error)
>                 goto Exit;
> 
> 
> AFAIK there is no freeze call involved.

Yes, that's a problem I've been pointing out for years. TuxOnIce
freezes the filesystems, but the kernel hibernation maintainers have
steadfastly refuses to even acknowledge that it is necessary.

As it is, I'm pretty sure this is being caused by the XFS workqueues
not being frozen appropriately i.e. WQ_FREEZEABLE needs to be added
to various workqueue definitions so that work gets halted when
kernel threads get halted.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 21:27                   ` Eric Sandeen
  2014-08-12 21:57                     ` Dave Chinner
@ 2014-08-12 21:59                     ` Brian Foster
  2014-08-12 22:21                       ` Eric Sandeen
  1 sibling, 1 reply; 56+ messages in thread
From: Brian Foster @ 2014-08-12 21:59 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Carlos E. R., XFS mailing list

On Tue, Aug 12, 2014 at 02:27:58PM -0700, Eric Sandeen wrote:
> On 8/12/14, 9:51 AM, Brian Foster wrote:
> > On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> > Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
> > 
> > 
> > El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
> >>>> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> > 
> >>>> but all of them are about 401M before compression. The upload will take
> >>>> long, my ADSL upload is 0.3M/s at most.
> > 
> > 
> > I have shared (view) on google drive a folder with the three files. Both
> > Brian Foster and Mark Tinguely should have got a link on the mail from me.
> > If somebody else wants access, just tell me.
> > 
> > 
> >> I see the same thing from repair that was in your repair output:
> > 
> >> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
> > 
> >> If I take a look at the btrees as is, I see "235:[12608397,10]" included
> >> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
> >> 0x2000781). If I skip the mount, zero the log and repair, everything
> >> seems Ok. I can allocate the remainder of available space and rm -rf
> >> everything in the fs without an error.
> > 
> >> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
> >> the cntbt, which is clearly a duplicate entry. This is what repair
> >> detects and cleans up and seems to lead to the shutdown. E.g., if I
> >> mount and use the fs, I can hit an assert or failure just by attempting
> >> to allocate the rest of the space in the fs. If that is the state of the
> >> fs on disk, it's only a matter of time we explode due to allocating and
> >> freeing that range of space or possibly attempting to allocate that
> >> space twice.
> > 
> >> Mark mentioned that he didn't see the superblock item in the log with
> >> regard to the freeze. I don't see that either... which perhaps suggests
> >> that this all happens during the wake-from-hibernate sequence..? My
> >> understanding is that we should freeze on hibernate, thus force
> >> everything out to the log, write an unmount record and then dirty the
> >> log with a superblock transaction. Therefore, that should be the only
> >> item in the log post-freeze. Here, we have various items in the log
> >> including several logged buffers that correspond to the cntbt block that
> >> ends up corrupted (daddr 0xf427c08).
> 
> What freeze?  look at hibernate(), nothing but a sync:
> 
> /**
>  * hibernate - Carry out system hibernation, including saving the image.
>  */
> int hibernate(void)
> {
> ...
>         printk(KERN_INFO "PM: Syncing filesystems ... ");
>         sys_sync();
>         printk("done.\n");
> 
>         error = freeze_processes();
>         if (error)
>                 goto Exit;
> 
> 
> AFAIK there is no freeze call involved.
> 

Eep, not sure why I was thinking there was a freeze there. It appears
not. I guess that explains why the log contains what it does. Thanks for
pointing that out...

Brian

> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 21:59                     ` Brian Foster
@ 2014-08-12 22:21                       ` Eric Sandeen
  2014-08-12 23:16                         ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Eric Sandeen @ 2014-08-12 22:21 UTC (permalink / raw)
  To: Brian Foster; +Cc: Carlos E. R., XFS mailing list

On 8/12/14, 2:59 PM, Brian Foster wrote:
> On Tue, Aug 12, 2014 at 02:27:58PM -0700, Eric Sandeen wrote:
>> On 8/12/14, 9:51 AM, Brian Foster wrote:
>>> On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
>>> Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
>>>
>>>
>>> El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
>>>>>> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
>>>
>>>>>> but all of them are about 401M before compression. The upload will take
>>>>>> long, my ADSL upload is 0.3M/s at most.
>>>
>>>
>>> I have shared (view) on google drive a folder with the three files. Both
>>> Brian Foster and Mark Tinguely should have got a link on the mail from me.
>>> If somebody else wants access, just tell me.
>>>
>>>
>>>> I see the same thing from repair that was in your repair output:
>>>
>>>> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
>>>
>>>> If I take a look at the btrees as is, I see "235:[12608397,10]" included
>>>> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
>>>> 0x2000781). If I skip the mount, zero the log and repair, everything
>>>> seems Ok. I can allocate the remainder of available space and rm -rf
>>>> everything in the fs without an error.
>>>
>>>> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
>>>> the cntbt, which is clearly a duplicate entry. This is what repair
>>>> detects and cleans up and seems to lead to the shutdown. E.g., if I
>>>> mount and use the fs, I can hit an assert or failure just by attempting
>>>> to allocate the rest of the space in the fs. If that is the state of the
>>>> fs on disk, it's only a matter of time we explode due to allocating and
>>>> freeing that range of space or possibly attempting to allocate that
>>>> space twice.
>>>
>>>> Mark mentioned that he didn't see the superblock item in the log with
>>>> regard to the freeze. I don't see that either... which perhaps suggests
>>>> that this all happens during the wake-from-hibernate sequence..? My
>>>> understanding is that we should freeze on hibernate, thus force
>>>> everything out to the log, write an unmount record and then dirty the
>>>> log with a superblock transaction. Therefore, that should be the only
>>>> item in the log post-freeze. Here, we have various items in the log
>>>> including several logged buffers that correspond to the cntbt block that
>>>> ends up corrupted (daddr 0xf427c08).
>>
>> What freeze?  look at hibernate(), nothing but a sync:
>>
>> /**
>>  * hibernate - Carry out system hibernation, including saving the image.
>>  */
>> int hibernate(void)
>> {
>> ...
>>         printk(KERN_INFO "PM: Syncing filesystems ... ");
>>         sys_sync();
>>         printk("done.\n");
>>
>>         error = freeze_processes();
>>         if (error)
>>                 goto Exit;
>>
>>
>> AFAIK there is no freeze call involved.
>>
> 
> Eep, not sure why I was thinking there was a freeze there.

because it seems so logical.  :)

> It appears
> not. I guess that explains why the log contains what it does. Thanks for
> pointing that out...

but as I was saying on IRC, I think in theory it's not necessary; the fs state
on disk + fs state in memory (saved to disk during hibernate) needs to be
consistent, and it's conceivable that this could be done without freeze
(or even sync for that matter).

A freeze sure sounds nice though, to be sure the fs really is consistent
on disk, in case resume fails.

The thing I was wondering about is what makes sure disk caches are flushed
before disks lose power when hibernate completes.  (I'm just handwaving
here, though...)

Anyway, Dave's mention of making threads freezable makes the most sense.
Documentation/power/freezing-of-tasks.txt
makes it pretty clear that any thread which might change fs state
needs to be freezable:

> We therefore freeze tasks that might
> cause the on-disk filesystems' data and metadata to be modified after the
> hibernation image has been created and before the system is finally powered off.
> The majority of these are user space processes, but if any of the kernel threads
> may cause something like this to happen, they have to be freezable.

jbd/jbd2 explicitly handle this freezing in the kjournald/kjournald2 threads.

-Eric



> Brian
> 
>> -Eric
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 22:21                       ` Eric Sandeen
@ 2014-08-12 23:16                         ` Dave Chinner
  2014-08-13  0:07                           ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2014-08-12 23:16 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Brian Foster, Carlos E. R., XFS mailing list

On Tue, Aug 12, 2014 at 03:21:58PM -0700, Eric Sandeen wrote:
> On 8/12/14, 2:59 PM, Brian Foster wrote:
> > On Tue, Aug 12, 2014 at 02:27:58PM -0700, Eric Sandeen wrote:
> >> On 8/12/14, 9:51 AM, Brian Foster wrote:
> >>> On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> >>> Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
> >>>
> >>>
> >>> El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
> >>>>>> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> >>>
> >>>>>> but all of them are about 401M before compression. The upload will take
> >>>>>> long, my ADSL upload is 0.3M/s at most.
> >>>
> >>>
> >>> I have shared (view) on google drive a folder with the three files. Both
> >>> Brian Foster and Mark Tinguely should have got a link on the mail from me.
> >>> If somebody else wants access, just tell me.
> >>>
> >>>
> >>>> I see the same thing from repair that was in your repair output:
> >>>
> >>>> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
> >>>
> >>>> If I take a look at the btrees as is, I see "235:[12608397,10]" included
> >>>> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
> >>>> 0x2000781). If I skip the mount, zero the log and repair, everything
> >>>> seems Ok. I can allocate the remainder of available space and rm -rf
> >>>> everything in the fs without an error.
> >>>
> >>>> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
> >>>> the cntbt, which is clearly a duplicate entry. This is what repair
> >>>> detects and cleans up and seems to lead to the shutdown. E.g., if I
> >>>> mount and use the fs, I can hit an assert or failure just by attempting
> >>>> to allocate the rest of the space in the fs. If that is the state of the
> >>>> fs on disk, it's only a matter of time we explode due to allocating and
> >>>> freeing that range of space or possibly attempting to allocate that
> >>>> space twice.
> >>>
> >>>> Mark mentioned that he didn't see the superblock item in the log with
> >>>> regard to the freeze. I don't see that either... which perhaps suggests
> >>>> that this all happens during the wake-from-hibernate sequence..? My
> >>>> understanding is that we should freeze on hibernate, thus force
> >>>> everything out to the log, write an unmount record and then dirty the
> >>>> log with a superblock transaction. Therefore, that should be the only
> >>>> item in the log post-freeze. Here, we have various items in the log
> >>>> including several logged buffers that correspond to the cntbt block that
> >>>> ends up corrupted (daddr 0xf427c08).
> >>
> >> What freeze?  look at hibernate(), nothing but a sync:
> >>
> >> /**
> >>  * hibernate - Carry out system hibernation, including saving the image.
> >>  */
> >> int hibernate(void)
> >> {
> >> ...
> >>         printk(KERN_INFO "PM: Syncing filesystems ... ");
> >>         sys_sync();
> >>         printk("done.\n");
> >>
> >>         error = freeze_processes();
> >>         if (error)
> >>                 goto Exit;
> >>
> >>
> >> AFAIK there is no freeze call involved.
> >>
> > 
> > Eep, not sure why I was thinking there was a freeze there.
> 
> because it seems so logical.  :)
> 
> > It appears
> > not. I guess that explains why the log contains what it does. Thanks for
> > pointing that out...
> 
> but as I was saying on IRC, I think in theory it's not necessary; the fs state
> on disk + fs state in memory (saved to disk during hibernate) needs to be
> consistent, and it's conceivable that this could be done without freeze
> (or even sync for that matter).

Well, the sync is necessary for hibernate - it needs to shrink the
amount of memory that is saved to disk to as small as possible. If
your memory is full of dirty page cache, why would you save that to
the hibernate image, only to have to load it back off, then write it
to the filesystem after resume? Why wouldn't you write it straight
to disk before hibernation, then remove it from memory so you've
then got free memory to allocate the hibernation image that gets
written to disk?

> A freeze sure sounds nice though, to be sure the fs really is consistent
> on disk, in case resume fails.
> 
> The thing I was wondering about is what makes sure disk caches are flushed
> before disks lose power when hibernate completes.  (I'm just handwaving
> here, though...)

That usually happens in the driver power-down sequence.

> Anyway, Dave's mention of making threads freezable makes the most sense.
> Documentation/power/freezing-of-tasks.txt
> makes it pretty clear that any thread which might change fs state
> needs to be freezable:
> 
> > We therefore freeze tasks that might
> > cause the on-disk filesystems' data and metadata to be modified after the
> > hibernation image has been created and before the system is finally powered off.
> > The majority of these are user space processes, but if any of the kernel threads
> > may cause something like this to happen, they have to be freezable.
> 
> jbd/jbd2 explicitly handle this freezing in the kjournald/kjournald2 threads.

As we do for the xfsaild kernel thread. We used to use kernel
threads for functionality that we now use workqueues for - the
xfssyncd and the xfsbufd  - and those kernel threads used to also
freeze like the xfsaild does. We lost that when moving to
workqueues.

The stupid part about all this is we actually stop periodic
workqueue processing for workqueues that can modify state when the
filesystem freezes. i.e. if the hibernation code froze the
filesystem we wouldn't need to mark workqueues as freezable because
XFS already manages everything in the manner than hibernation
requires....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 23:16                         ` Dave Chinner
@ 2014-08-13  0:07                           ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-08-13  0:07 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Wednesday, 2014-08-13 at 09:16 +1000, Dave Chinner wrote:

...

> Well, the sync is necessary for hibernate - it needs to shrink the
> amount of memory that is saved to disk to as small as possible. If
> your memory is full of dirty page cache, why would you save that to
> the hibernate image, only to have to load it back off, then write it
> to the filesystem after resume? Why wouldn't you write it straight
> to disk before hibernation, then remove it from memory so you've
> then got free memory to allocate the hibernation image that gets
> written to disk?

You can see that this happens by looking at the output of "free" before 
and after hibernation. Even issuing the command after getting the desktop 
back, I can see a big difference (the ammount of buffers and cache).

- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlPqrFcACgkQtTMYHG2NR9VfIgCgj3AXArngfCdoK/bGDsHNNWWU
pgoAnRJK7GMHRbO9KCV2TKYnlSYWMolT
=MtMr
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-12 21:17                   ` Carlos E. R.
@ 2014-08-13 12:04                     ` Brian Foster
  2014-08-13 13:29                       ` Mark Tinguely
  2014-08-13 21:04                       ` Dave Chinner
  0 siblings, 2 replies; 56+ messages in thread
From: Brian Foster @ 2014-08-13 12:04 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Tue, Aug 12, 2014 at 11:17:36PM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Tuesday, 2014-08-12 at 12:51 -0400, Brian Foster wrote:
> >On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> 
> >I see the same thing from repair that was in your repair output:
> >
> >block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
> 
> Is it possible to find out what file uses that block?
> I have a non-obfuscated copy of the metadata. Knowing the file, we can know
> what application is involved - and that might help, or perhaps not.
> 

I don't see how given the current situation. The space appears to be
free initially, so zeroing the log contents on repair puts the fs in a
state where the space is not allocated to any particular file. Perhaps
there is some incremental state created by the log that can provide this
information (e.g., space is free, space is preallocated, extent is
converted, eofblocks are trimmed all in a single checkpoint), but that
could be difficult to trace back since iirc the btree had grown as well.

> 
> >If I take a look at the btrees as is, I see "235:[12608397,10]" included
> >in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
> >0x2000781). If I skip the mount, zero the log and repair, everything
> >seems Ok. I can allocate the remainder of available space and rm -rf
> >everything in the fs without an error.
> >
> >Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
> >the cntbt, which is clearly a duplicate entry. This is what repair
> >detects and cleans up and seems to lead to the shutdown. E.g., if I
> >mount and use the fs, I can hit an assert or failure just by attempting
> >to allocate the rest of the space in the fs. If that is the state of the
> >fs on disk, it's only a matter of time we explode due to allocating and
> >freeing that range of space or possibly attempting to allocate that
> >space twice.
> 
> I'm not sure if I follow you.
> 
> The sequence of events here is:
> 
>  a) hibernate
>  b) thaw
>  c) immediately, in memory corruption found and kernel error message.
>     Filesystem is switched to read only.
>     System is unstable, has to be halted or rebooted.
>     Umount is impossible.
> 

Ok, so the crash is fairly immediate after the wake (also according to
the log output below).

>  d) (¬) Reboot
>  e) Mount (¬), manual umount, xfs_repair (¬), mount
>     (photos of metadata taken at the appropriate points (marked with ¬))
> 
> 
> This the point I'm at now. Are you saying that the filesystem can explode at
> any time now? I have not written any files, beyond what the desktop does
> automatically.
> 

No, the filesystem has been fixed by repair. I'm just saying that
somehow the fs creates a duplicate free space record in one of the free
space trees. That particular condition means it's only a matter of time
before some block allocation operation trips up on that inconsistent
state and shuts down the fs. You happen to hit it immediately due to
that space being involved with speculative preallocation.

The current theory is that this is probably due to XFS workqueues not
being freezable, and therefore can make changes on disk after the dump
image is created. This seems logical to me, but I'd still like to see
some kind of verification of the potential fix if possible. I can repeat
some vm hibernate testing with that in mind. Alternatively, would you
have the ability to test a patch? Have you been able to reproduce this
again since the most recent instance?

Brian

> 
> 
> What I have not done (on your request), this time, is:
> 
>  f) backup, format, restore.
> 
> 
> 
> 
> >Mark mentioned that he didn't see the superblock item in the log with
> >regard to the freeze. I don't see that either... which perhaps suggests
> >that this all happens during the wake-from-hibernate sequence..? My
> >understanding is that we should freeze on hibernate, thus force
> >everything out to the log, write an unmount record and then dirty the
> >log with a superblock transaction. Therefore, that should be the only
> >item in the log post-freeze. Here, we have various items in the log
> >including several logged buffers that correspond to the cntbt block that
> >ends up corrupted (daddr 0xf427c08).
> >
> >Given the failure occurs on freeing an extent via the xfs_eofblocks
> >scanner, perhaps this extent was initially allocated as speculative
> >preallocation and the eofblocks scanner is where we happen to first
> >identify the corrupted cntbt. What is strange is that, as mentioned
> >previously, the space appears to be free if I zero the log, so that
> >means it was probably free before the freeze. It seems highly unlikely
> >for a file to gain preallocation, be written out and then get trimmed by
> >the scanner all on wake-from-hibernate.
> 
> 
> Well, I understand little of that, but if you do, and can do whatever
> modifications need to be done to the code, that's fine with me :-)
> 
> 
> 
> >Carlos,
> >
> >How long after hibernate does the shutdown/crash typically occur? Do you
> >basically wake-up and within a few seconds the filesystem crashes, or is
> >it some time (minutes) later?
> 
> Instantly during the wake-up (thaw), according to the log.
> 
> I'm typically not present when it happens: my routine is switch on the
> computer, then go make coffee/tea, and then return and start using the
> machine. It takes a minute or two to wake up from hibernation, and then the
> machine is sluggish for a minute or two more while processes start doing
> things and claiming chunks from swap, mail is fetched, etc.
> 
> And instead of starting work, I find the machine in a bad state.
> 
> 
> Look, an excerpt from the last event (the full log is in another post
> yesterday), but taken from another log file with finer grained timestaps:
> 
> 
> <30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - -  Shutting down network time protocol daemon (NTPD)..done
> <30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - -  Stopped LSB: Network time protocol daemon (ntpd).
> <28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - -  Hibernating (95)...
> <7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
> <7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
> <7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
> <4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
> <4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
> <6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)
> 
> 
> The "Hibernating (95)" is written by a script of mine in
> "/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
> line.
> 
> Then the machine wakes up, hours later - despite the timestamp not saying so
> (the time jump is written instead lines above):
> 
> 
> <6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] CPU3 is up
> <6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
> <6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
> <7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
> <4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset
> 
> ...
> 
> 
> <6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
> <6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
> <4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
> <4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
> <7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
> <28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - -  The canary thread is apparently starving. Taking action.
> <30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - -  Demoting known real-time threads.
> <29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
> <29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
> <29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
> <29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - -  Demoted 3 threads.
> <20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - -  imap: Warning: Time jumped forwards 33996 seconds
> <20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - -  imap: Warning: Time jumped forwards 35660 seconds
> <22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - -  imap(cer): Disconnected for inactivity in=237010 out=9273919
> <1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
> <1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]
> 
> 
> ...
> 
> 
> <5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
> <1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
> <1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
> <30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - -  Time has been changed
> <30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - -  1 client rule loaded
> <28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - -  Thawing (95)...
> <29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
> <30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - -  Starting LSB: Network time protocol daemon (ntpd)...
> <4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
> <29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
> 
> 
> As you see, the corruption is detected instantly after waking up, before
> pm-utils scripts have a chance to run.
> 
> 
> 
> 
> >If the former, I wonder if it's possible that the scanner returns to
> >life pointing to a stale or freed incore inode and does something bogus
> >based on that.
> 
> 
> Well, as I said, that's above my understanding ;-)
> 
> 
> 
> - -- Cheers,
>        Carlos E. R.
>        (from 13.1 x86_64 "Bottle" at Telcontar)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iEYEARECAAYFAlPqhHwACgkQtTMYHG2NR9WmrwCglBRRHEMgU9mCEHkU9iHqYehX
> +1AAn2oUn8/M3Rfb7mLWapLqYxDfvHNv
> =9Yft
> -----END PGP SIGNATURE-----

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-13 12:04                     ` Brian Foster
@ 2014-08-13 13:29                       ` Mark Tinguely
  2014-08-13 21:04                       ` Dave Chinner
  1 sibling, 0 replies; 56+ messages in thread
From: Mark Tinguely @ 2014-08-13 13:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: Carlos E. R., XFS mailing list

On 08/13/14 07:04, Brian Foster wrote:
> On Tue, Aug 12, 2014 at 11:17:36PM +0200, Carlos E. R. wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> On Tuesday, 2014-08-12 at 12:51 -0400, Brian Foster wrote:
>>> On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
>>
>>> I see the same thing from repair that was in your repair output:
>>>
>>> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
>>
>> Is it possible to find out what file uses that block?
>> I have a non-obfuscated copy of the metadata. Knowing the file, we can know
>> what application is involved - and that might help, or perhaps not.
>>
>
> I don't see how given the current situation. The space appears to be
> free initially, so zeroing the log contents on repair puts the fs in a
> state where the space is not allocated to any particular file. Perhaps
> there is some incremental state created by the log that can provide this
> information (e.g., space is free, space is preallocated, extent is
> converted, eofblocks are trimmed all in a single checkpoint), but that
> could be difficult to trace back since iirc the btree had grown as well.
>
>>
>>> If I take a look at the btrees as is, I see "235:[12608397,10]" included
>>> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
>>> 0x2000781). If I skip the mount, zero the log and repair, everything
>>> seems Ok. I can allocate the remainder of available space and rm -rf
>>> everything in the fs without an error.
>>>
>>> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
>>> the cntbt, which is clearly a duplicate entry. This is what repair
>>> detects and cleans up and seems to lead to the shutdown. E.g., if I
>>> mount and use the fs, I can hit an assert or failure just by attempting
>>> to allocate the rest of the space in the fs. If that is the state of the
>>> fs on disk, it's only a matter of time we explode due to allocating and
>>> freeing that range of space or possibly attempting to allocate that
>>> space twice.
>>
>> I'm not sure if I follow you.
>>
>> The sequence of events here is:
>>
>>   a) hibernate
>>   b) thaw
>>   c) immediately, in memory corruption found and kernel error message.
>>      Filesystem is switched to read only.
>>      System is unstable, has to be halted or rebooted.
>>      Umount is impossible.
>>
>
> Ok, so the crash is fairly immediate after the wake (also according to
> the log output below).
>
>>   d) (¬) Reboot
>>   e) Mount (¬), manual umount, xfs_repair (¬), mount
>>      (photos of metadata taken at the appropriate points (marked with ¬))
>>
>>
>> This the point I'm at now. Are you saying that the filesystem can explode at
>> any time now? I have not written any files, beyond what the desktop does
>> automatically.
>>
>
> No, the filesystem has been fixed by repair. I'm just saying that
> somehow the fs creates a duplicate free space record in one of the free
> space trees. That particular condition means it's only a matter of time
> before some block allocation operation trips up on that inconsistent
> state and shuts down the fs. You happen to hit it immediately due to
> that space being involved with speculative preallocation.
>
> The current theory is that this is probably due to XFS workqueues not
> being freezable, and therefore can make changes on disk after the dump
> image is created. This seems logical to me, but I'd still like to see
> some kind of verification of the potential fix if possible. I can repeat
> some vm hibernate testing with that in mind. Alternatively, would you
> have the ability to test a patch? Have you been able to reproduce this
> again since the most recent instance?
>
> Brian

I am still digging through the xfs log:
  I do not see anything in that extent range 46162829-46162839 being
  freed in the log. (or anything close to it).

  Late in the log, there is a write (op 27 of tid e9f15120) of a big
  portion of the interested AG1 cnt btree. So we know that it is good at
  that point.

  The the next two writes (op 66 of tid 6ed362ea and op 25 of tid
  6281c8b) that write entry "8d63c000  a000000" to that block are the
  beginning of the 16 byte log write. Depending on the offset, it is
  possible that one of these writes could insert a duplicate entry.

I will chase it further and see where and why this duplicate happens 
from a log perspective.

--Mark.

>>
>>
>> What I have not done (on your request), this time, is:
>>
>>   f) backup, format, restore.
>>
>>
>>
>>
>>> Mark mentioned that he didn't see the superblock item in the log with
>>> regard to the freeze. I don't see that either... which perhaps suggests
>>> that this all happens during the wake-from-hibernate sequence..? My
>>> understanding is that we should freeze on hibernate, thus force
>>> everything out to the log, write an unmount record and then dirty the
>>> log with a superblock transaction. Therefore, that should be the only
>>> item in the log post-freeze. Here, we have various items in the log
>>> including several logged buffers that correspond to the cntbt block that
>>> ends up corrupted (daddr 0xf427c08).
>>>
>>> Given the failure occurs on freeing an extent via the xfs_eofblocks
>>> scanner, perhaps this extent was initially allocated as speculative
>>> preallocation and the eofblocks scanner is where we happen to first
>>> identify the corrupted cntbt. What is strange is that, as mentioned
>>> previously, the space appears to be free if I zero the log, so that
>>> means it was probably free before the freeze. It seems highly unlikely
>>> for a file to gain preallocation, be written out and then get trimmed by
>>> the scanner all on wake-from-hibernate.
>>
>>
>> Well, I understand little of that, but if you do, and can do whatever
>> modifications need to be done to the code, that's fine with me :-)
>>
>>
>>
>>> Carlos,
>>>
>>> How long after hibernate does the shutdown/crash typically occur? Do you
>>> basically wake-up and within a few seconds the filesystem crashes, or is
>>> it some time (minutes) later?
>>
>> Instantly during the wake-up (thaw), according to the log.
>>
>> I'm typically not present when it happens: my routine is switch on the
>> computer, then go make coffee/tea, and then return and start using the
>> machine. It takes a minute or two to wake up from hibernation, and then the
>> machine is sluggish for a minute or two more while processes start doing
>> things and claiming chunks from swap, mail is fetched, etc.
>>
>> And instead of starting work, I find the machine in a bad state.
>>
>>
>> Look, an excerpt from the last event (the full log is in another post
>> yesterday), but taken from another log file with finer grained timestaps:
>>
>>
>> <30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - -  Shutting down network time protocol daemon (NTPD)..done
>> <30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - -  Stopped LSB: Network time protocol daemon (ntpd).
>> <28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - -  Hibernating (95)...
>> <7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
>> <7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
>> <7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
>> <4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
>> <4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
>> <6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)
>>
>>
>> The "Hibernating (95)" is written by a script of mine in
>> "/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
>> line.
>>
>> Then the machine wakes up, hours later - despite the timestamp not saying so
>> (the time jump is written instead lines above):
>>
>>
>> <6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] CPU3 is up
>> <6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
>> <6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
>> <7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
>> <4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset
>>
>> ...
>>
>>
>> <6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
>> <6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
>> <4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
>> <4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
>> <7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
>> <28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - -  The canary thread is apparently starving. Taking action.
>> <30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - -  Demoting known real-time threads.
>> <29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
>> <29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
>> <29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - -  Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
>> <29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - -  Demoted 3 threads.
>> <20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - -  imap: Warning: Time jumped forwards 33996 seconds
>> <20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - -  imap: Warning: Time jumped forwards 35660 seconds
>> <22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - -  imap(cer): Disconnected for inactivity in=237010 out=9273919
>> <1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c39fe9
>> <1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]
>>
>>
>> ...
>>
>>
>> <5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa0c4c3d8
>> <1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected.  Shutting down filesystem
>> <1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
>> <30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - -  Time has been changed
>> <30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - -  1 client rule loaded
>> <28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - -  Thawing (95)...
>> <29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - -  [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
>> <30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - -  Starting LSB: Network time protocol daemon (ntpd)...
>> <4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
>> <29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - -  [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
>>
>>
>> As you see, the corruption is detected instantly after waking up, before
>> pm-utils scripts have a chance to run.
>>
>>
>>
>>
>>> If the former, I wonder if it's possible that the scanner returns to
>>> life pointing to a stale or freed incore inode and does something bogus
>>> based on that.
>>
>>
>> Well, as I said, that's above my understanding ;-)
>>
>>
>>
>> - -- Cheers,
>>         Carlos E. R.
>>         (from 13.1 x86_64 "Bottle" at Telcontar)
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v2.0.22 (GNU/Linux)
>>
>> iEYEARECAAYFAlPqhHwACgkQtTMYHG2NR9WmrwCglBRRHEMgU9mCEHkU9iHqYehX
>> +1AAn2oUn8/M3Rfb7mLWapLqYxDfvHNv
>> =9Yft
>> -----END PGP SIGNATURE-----
>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-13 12:04                     ` Brian Foster
  2014-08-13 13:29                       ` Mark Tinguely
@ 2014-08-13 21:04                       ` Dave Chinner
  1 sibling, 0 replies; 56+ messages in thread
From: Dave Chinner @ 2014-08-13 21:04 UTC (permalink / raw)
  To: Brian Foster; +Cc: Carlos E. R., XFS mailing list

On Wed, Aug 13, 2014 at 08:04:51AM -0400, Brian Foster wrote:
> On Tue, Aug 12, 2014 at 11:17:36PM +0200, Carlos E. R. wrote:
> > This the point I'm at now. Are you saying that the filesystem can explode at
> > any time now? I have not written any files, beyond what the desktop does
> > automatically.
> > 
> 
> No, the filesystem has been fixed by repair. I'm just saying that
> somehow the fs creates a duplicate free space record in one of the free
> space trees.

Simple answer: the block is being freed twice. i.e. from a workqueue
during the hibernate process after the relevant memory has been
snapshotted (i.e. because the workqueue was not frozen), and again
after thaw when the memory image is restored to RAM and the
workqueue is started up again and the workqueue runs the same work a
second time.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
  2014-08-11 14:44   ` Brian Foster
  2014-08-11 14:57   ` Mark Tinguely
@ 2014-09-30 22:27   ` Carlos E. R.
  2014-10-01  0:45     ` Dave Chinner
  2 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-09-30 22:27 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



> Subject: Subject : Happened again,
>     20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs
>      reformatting to correct issue.
> Date: Mon, 11 Aug 2014 16:23:01 +0200 (CEST)

> Happened again, I'm on middle of recovery procedures, and using my laptop
> to post.

And again.

Well, after I recover the system I'm going to migrate my /home to a small 
ext4 partition, leaving Documents and the big directories in the current 
xfs home, renamed to something else. I hope this way to avoid the crashes 
and corruption on restore from hibernation.

SUSE and openSUSE (12 and 13.2) are switching to use, on new installs by 
default, btrfs for their root filesystem, and xfs for home. I hope that 
this will cause more crashes on laptops on return from hibernation, 
producing more reports, and finally solving the root cause of this.

Or perhaps it does not happen on 13.2, and I'm the only one left having 
the issue...

Anyway, as I'm using 13.1, and will be for some time, the solution for me 
is to move the core of /home to ext4, at least temporarily.



If someone wants me to get some data from the partition, I'm willing. I 
made a full 'dd' of it before attempting mount or repair on it (still 
pending), it will be possible for some days or more. Time permitting. :-)

- -- 
Cheers
        Carlos E. R.

        (from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlQrLlIACgkQja8UbcUWM1z5nAD8DSoMv0LXUZQSbDcuVpmpH96g
zSnNcVKFRuZdBp6WY6gA/j9LzOmEy0a5MD5h++wnkKLk2z9RsnuJVc6PYW4RLyG8
=rZgI
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-09-30 22:27   ` Happened again, 20140930 " Carlos E. R.
@ 2014-10-01  0:45     ` Dave Chinner
  2014-10-01  2:48       ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2014-10-01  0:45 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Wed, Oct 01, 2014 at 12:27:20AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> 
> 
> >Subject: Subject : Happened again,
> >    20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs
> >     reformatting to correct issue.
> >Date: Mon, 11 Aug 2014 16:23:01 +0200 (CEST)
> 
> >Happened again, I'm on middle of recovery procedures, and using my laptop
> >to post.
> 
> And again.

We've already got the fix in our upstream repository:

8018ec0 xfs: mark all internal workqueues as freezable

It's currently in linux-next, queued for the next merge window. You
shoul dprobably talk to your distro about getting it backported.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-10-01  0:45     ` Dave Chinner
@ 2014-10-01  2:48       ` Carlos E. R.
  2014-10-01  3:04         ` Eric Sandeen
  2014-10-02 11:32         ` Jan Kara
  0 siblings, 2 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-10-01  2:48 UTC (permalink / raw)
  To: XFS mailing list

On 2014-10-01 02:45, Dave Chinner wrote:
> On Wed, Oct 01, 2014 at 12:27:20AM +0200, Carlos E. R. wrote:

> We've already got the fix in our upstream repository:
>
> 8018ec0 xfs: mark all internal workqueues as freezable
>
> It's currently in linux-next, queued for the next merge window. You
> shoul dprobably talk to your distro about getting it backported.

Wow, thanks. I'll tell them.

Question: where does it apply, kernel, xfsprogs? They will know, but I don't, and I need that info 
for the bugzilla "component" field ;-)

-- 
Cheers / Saludos,

		Carlos E. R.
		(from 13.1 x86_64 "Bottle" (Elessar))

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-10-01  2:48       ` Carlos E. R.
@ 2014-10-01  3:04         ` Eric Sandeen
  2014-10-02 11:32         ` Jan Kara
  1 sibling, 0 replies; 56+ messages in thread
From: Eric Sandeen @ 2014-10-01  3:04 UTC (permalink / raw)
  To: Carlos E. R., XFS mailing list

On 9/30/14 9:48 PM, Carlos E. R. wrote:
> On 2014-10-01 02:45, Dave Chinner wrote:
>> On Wed, Oct 01, 2014 at 12:27:20AM +0200, Carlos E. R. wrote:
> 
>> We've already got the fix in our upstream repository:
>>
>> 8018ec0 xfs: mark all internal workqueues as freezable
>>
>> It's currently in linux-next, queued for the next merge window. You
>> shoul dprobably talk to your distro about getting it backported.
> 
> Wow, thanks. I'll tell them.
> 
> Question: where does it apply, kernel, xfsprogs? They will know, but I don't, and I need that info for the bugzilla "component" field ;-)
> 

kernel

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-10-01  2:48       ` Carlos E. R.
  2014-10-01  3:04         ` Eric Sandeen
@ 2014-10-02 11:32         ` Jan Kara
  2014-10-02 11:46           ` Carlos E. R.
  1 sibling, 1 reply; 56+ messages in thread
From: Jan Kara @ 2014-10-02 11:32 UTC (permalink / raw)
  To: Carlos E. R.; +Cc: XFS mailing list

On Wed 01-10-14 04:48:42, Carlos E. R. wrote:
> On 2014-10-01 02:45, Dave Chinner wrote:
> >On Wed, Oct 01, 2014 at 12:27:20AM +0200, Carlos E. R. wrote:
> 
> >We've already got the fix in our upstream repository:
> >
> >8018ec0 xfs: mark all internal workqueues as freezable
> >
> >It's currently in linux-next, queued for the next merge window. You
> >shoul dprobably talk to your distro about getting it backported.
> 
> Wow, thanks. I'll tell them.
> 
> Question: where does it apply, kernel, xfsprogs? They will know, but
> I don't, and I need that info for the bugzilla "component" field ;-)
  Feel free to assign that bug to me (jack@suse.com in bugzilla) and I'll
take care of it.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-10-02 11:32         ` Jan Kara
@ 2014-10-02 11:46           ` Carlos E. R.
  2014-10-05 14:28             ` Carlos E. R.
  0 siblings, 1 reply; 56+ messages in thread
From: Carlos E. R. @ 2014-10-02 11:46 UTC (permalink / raw)
  To: XFS mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2014-10-02 13:32, Jan Kara wrote:
> On Wed 01-10-14 04:48:42, Carlos E. R. wrote:


>> Question: where does it apply, kernel, xfsprogs? They will know,
>> but I don't, and I need that info for the bugzilla "component"
>> field ;-)
> Feel free to assign that bug to me (jack at suse.com in bugzilla)
> and I'll take care of it.

Sure, thanks.
I'm busy with other things right now, will do as soon as I can.
Yesterday I was restoring the machine.

- -- 
Cheers / Saludos,

		Carlos E. R.
		(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlQtOv8ACgkQtTMYHG2NR9Ux/QCdExQvegQEierd/9RiFwvjQ+6z
wBcAmgOdaaeWY3iZaTo4HogjuxqUSsMg
=8lRj
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Happened again, 20140930 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
  2014-10-02 11:46           ` Carlos E. R.
@ 2014-10-05 14:28             ` Carlos E. R.
  0 siblings, 0 replies; 56+ messages in thread
From: Carlos E. R. @ 2014-10-05 14:28 UTC (permalink / raw)
  To: XFS mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 725 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Thursday, 2014-10-02 at 13:46 +0200, Carlos E. R. wrote:
> On 2014-10-02 13:32, Jan Kara wrote:

>> Feel free to assign that bug to me (jack at suse.com in bugzilla)
>> and I'll take care of it.
>
> Sure, thanks.
> I'm busy with other things right now, will do as soon as I can.
> Yesterday I was restoring the machine.

Done!

Bugzilla – Bug 899785

at bugzilla.opensuse.org

Thanks all!

- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlQxVXsACgkQtTMYHG2NR9UrFwCfTGyKIbUE61zU/ypyyCp1f9IR
US0AnRaXbFjR35nW44sCUgYJF2JZq4Hn
=dkv8
-----END PGP SIGNATURE-----

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2014-10-05 14:28 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-02  9:57 Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue Carlos E. R.
2014-07-02 12:04 ` Brian Foster
2014-07-02 13:07   ` Mark Tinguely
2014-07-03  2:54     ` Carlos E. R.
2014-07-03  3:00   ` Carlos E. R.
2014-07-03  9:43     ` Dave Chinner
2014-07-03 17:40       ` Brian Foster
2014-07-03 23:34       ` Carlos E. R.
2014-07-04  0:04         ` Dave Chinner
2014-07-04  1:29           ` Carlos E. R.
2014-07-04  1:40             ` Dave Chinner
2014-07-04  2:42               ` Carlos E. R.
2014-07-04  3:12                 ` Carlos E. R.
2014-07-04 12:40               ` Brian Foster
2014-07-04 13:36                 ` Carlos E. R.
2014-07-03 17:39     ` Brian Foster
2014-07-04 21:32       ` Carlos E. R.
2014-07-05 12:28         ` Brian Foster
2014-07-12  0:30           ` Carlos E. R.
2014-07-12  1:30             ` Carlos E. R.
2014-07-12  1:45               ` Carlos E. R.
2014-07-12 14:26                 ` Brian Foster
2014-07-12 14:19             ` Brian Foster
2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
2014-08-11 14:44   ` Brian Foster
2014-08-11 14:58     ` Carlos E. R.
2014-08-11 17:05       ` Carlos E. R.
2014-08-11 21:31         ` Carlos E. R.
     [not found]           ` <53E938CC.4010103@sgi.com>
2014-08-11 22:01             ` Carlos E. R.
2014-08-11 14:57   ` Mark Tinguely
2014-08-11 15:34     ` Carlos E. R.
2014-08-11 16:14       ` Brian Foster
2014-08-11 17:08         ` Carlos E. R.
2014-08-11 21:27       ` Mark Tinguely
2014-08-11 21:50         ` Carlos E. R.
2014-08-11 21:56           ` Mark Tinguely
2014-08-11 22:36             ` Carlos E. R.
2014-08-12  0:17               ` Carlos E. R.
2014-08-12 16:51                 ` Brian Foster
2014-08-12 21:17                   ` Carlos E. R.
2014-08-13 12:04                     ` Brian Foster
2014-08-13 13:29                       ` Mark Tinguely
2014-08-13 21:04                       ` Dave Chinner
2014-08-12 21:27                   ` Eric Sandeen
2014-08-12 21:57                     ` Dave Chinner
2014-08-12 21:59                     ` Brian Foster
2014-08-12 22:21                       ` Eric Sandeen
2014-08-12 23:16                         ` Dave Chinner
2014-08-13  0:07                           ` Carlos E. R.
2014-09-30 22:27   ` Happened again, 20140930 " Carlos E. R.
2014-10-01  0:45     ` Dave Chinner
2014-10-01  2:48       ` Carlos E. R.
2014-10-01  3:04         ` Eric Sandeen
2014-10-02 11:32         ` Jan Kara
2014-10-02 11:46           ` Carlos E. R.
2014-10-05 14:28             ` Carlos E. R.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.