All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dheeraj Sangamkar <dheerajrs@gmail.com>
To: linux-xfs@vger.kernel.org
Subject: XFS reports in-memory corruption and unmounts filesystem
Date: Mon, 16 Apr 2018 12:13:53 -0700	[thread overview]
Message-ID: <CAG+d3v+rMvN=N4adMLC+mhY8FgWrdTior1bToVzb4YBK89UjqQ@mail.gmail.com> (raw)

Hello,

I have a few linux boxes where I see xfs error messages when the
filesystem becomes full.
I saw quite a few reports of this kind of crash but none that had
exactly the same backtrace as the one I found. So, here it is..

The kernel log:

Jan 9 20:09:33 linux-box kernel: 1,1871,248971320,-;XFS (dm-17):
Internal error xfs_trans_cancel at line 1005 of file
/build/src/linux-4.9.51/fs/xfs/xfs_trans.c. Caller
xfs_create+0x44d/0x6c0 [xfs]
Jan 9 20:09:33 linux-box kernel: 4,1872,248985454,-;CPU: 11 PID: 27044
Comm: xxxxxx Tainted: G O 4.9.0-4-amd64 #1 Debian 4.9.51-1+ntap1
Jan 9 20:09:33 linux-box kernel: 4,1873,248994971,-;Hardware name: ..........
Jan 9 20:09:33 linux-box kernel: 4,1874,249005526,-; 0000000000000000
ffffffff99729974 ffff95c11afaae80 0000000000000001
Jan 9 20:09:33 linux-box kernel: 4,1875,249012916,-; ffffffffc0a041ed
ffff95c15b407800 ffff95c1c0949000 00000000ffffffe4
Jan 9 20:09:33 linux-box kernel: 4,1876,249020305,-; ffffffffc09f70fd
0000000000000001 ffffb23f2279bbf0 0000000000000000
Jan 9 20:09:33 linux-box kernel: 4,1877,249027694,-;Call Trace:
Jan 9 20:09:33 linux-box kernel: 4,1878,249030129,-;
[<ffffffff99729974>] ? dump_stack+0x5c/0x78
Jan 9 20:09:33 linux-box kernel: 4,1879,249035474,-;
[<ffffffffc0a041ed>] ? xfs_trans_cancel+0xad/0xd0 [xfs]
Jan 9 20:09:33 linux-box kernel: 4,1880,249041843,-;
[<ffffffffc09f70fd>] ? xfs_create+0x44d/0x6c0 [xfs]
Jan 9 20:09:33 linux-box kernel: 4,1881,249047823,-;
[<ffffffff99660000>] ? load_elf_binary+0x12c0/0x1640
Jan 9 20:09:33 linux-box kernel: 4,1882,249053930,-;
[<ffffffffc09f41ec>] ? xfs_generic_create+0x23c/0x2e0 [xfs]
Jan 9 20:09:33 linux-box kernel: 4,1883,249060597,-;
[<ffffffff99612888>] ? path_openat+0x1338/0x1440
Jan 9 20:09:33 linux-box kernel: 4,1884,249066314,-;
[<ffffffff994f6264>] ? futex_wake+0x94/0x170
Jan 9 20:09:33 linux-box kernel: 4,1885,249071682,-;
[<ffffffff99613c51>] ? do_filp_open+0x91/0x100
Jan 9 20:09:33 linux-box kernel: 4,1886,249077224,-;
[<ffffffff995fedba>] ? __check_object_size+0xfa/0x1d8
Jan 9 20:09:33 linux-box kernel: 4,1887,249083370,-;
[<ffffffff9960162e>] ? do_sys_open+0x12e/0x210
Jan 9 20:09:33 linux-box kernel: 4,1888,249088914,-;
[<ffffffff99a085bb>] ? system_call_fast_compare_end+0xc/0x9b
Jan 9 20:09:33 linux-box kernel: 5,1889,249095715,-;XFS (dm-17):
xfs_do_force_shutdown(0x8) called from line 1006 of file
/build/src/linux-4.9.51/fs/xfs/xfs_trans.c. Return address =
0xffffffffc0a04206
Jan 9 20:09:33 linux-box kernel: 1,1890,249110179,-;XFS (dm-17):
Corruption of in-memory data detected. Shutting down filesystem
Jan 9 20:09:33 linux-box kernel: 1,1891,249118348,-;XFS (dm-17):
Please umount the filesystem and rectify the problem(s)

Upon running xfs repair, I see the following:

Output of xfs_repair on the rangedb device:
root@another-linux-box:/ # xfs_repair -n /dev/sdk
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
sb_icount 4710720, counted 4711168
sb_ifree 560, counted 0
sb_fdblocks 95850, counted 8321
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 1
        - agno = 4
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
root@another-linux-box:/

Remounting the volume makes the content accessible for a while.
However, eventually, some file lookup fails with ENOENT and the
filesystem is unmounted.

I am not able to create the problem at will.

Is this problem new/fixed?
Was the corruption only in memory or on disk as well?
Why did xfs_repair not detect the corruption?

-Dheeraj

Protection of our environment is our responsibility.

             reply	other threads:[~2018-04-16 19:13 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-16 19:13 Dheeraj Sangamkar [this message]
2018-04-16 19:57 ` XFS reports in-memory corruption and unmounts filesystem Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG+d3v+rMvN=N4adMLC+mhY8FgWrdTior1bToVzb4YBK89UjqQ@mail.gmail.com' \
    --to=dheerajrs@gmail.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.