All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible XFS bug encountered in 3.14.0-rc3+
@ 2014-03-12 20:14 Mears, Morgan
  2014-03-12 21:43 ` Mark Tinguely
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Mears, Morgan @ 2014-03-12 20:14 UTC (permalink / raw)
  To: xfs

Hi,

Please CC me on any responses; I don't subscribe to this list.

I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
system is running a 3.14.0-rc3+ kernel built from the for-next branch of
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).

The XFS instance in question is 200 GB and should have all default
parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
binaries and trace files.  At the time the issue occurred I had been
running Oracle with SQL*NET server tracing enabled.  The affected XFS
had filled up 100% with trace files several times; I was periodically
executing rm -f * in the trace file directory, which would reduce the
file system occupancy from 100% to 3%.  I had an Oracle load generating
tool running, so new log files were being created with some frequency.

The issue occurred during one of my rm -f * executions; afterwards the
file system would only produce errors.  Here is the traceback:

[1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
[1552067.297203] CPU: 13 PID: 699 Comm: rm Not tainted 3.14.0-rc3+ #1
[1552067.297206] Hardware name: FUJITSU PRIMERGY RX300 S7/D2939-A1, BIOS V4.6.5.3 R1.19.0 for D2939-A1x 12/06/2012
[1552067.297210]  0000000000069ff9 ffff8817740e1b88 ffffffff815f1eb5 0000000000000001
[1552067.297222]  ffff8817740e1ba0 ffffffffa04aac7b ffffffffa04c4905 ffff8817740e1c38
[1552067.297229]  ffffffffa04c3399 ffff882022dae000 ffff8810247d2d00 ffff8810239c4840
[1552067.297236] Call Trace:
[1552067.297248]  [<ffffffff815f1eb5>] dump_stack+0x45/0x56
[1552067.297311]  [<ffffffffa04aac7b>] xfs_error_report+0x3b/0x40 [xfs]
[1552067.297344]  [<ffffffffa04c4905>] ? xfs_free_extent+0xc5/0xf0 [xfs]
[1552067.297373]  [<ffffffffa04c3399>] xfs_free_ag_extent+0x1e9/0x710 [xfs]
[1552067.297401]  [<ffffffffa04c4905>] xfs_free_extent+0xc5/0xf0 [xfs]
[1552067.297425]  [<ffffffffa04a4b0f>] xfs_bmap_finish+0x13f/0x190 [xfs]
[1552067.297461]  [<ffffffffa04f281d>] xfs_itruncate_extents+0x16d/0x2a0 [xfs]
[1552067.297503]  [<ffffffffa04f29dd>] xfs_inactive_truncate+0x8d/0x120 [xfs]
[1552067.297534]  [<ffffffffa04f3188>] xfs_inactive+0x138/0x160 [xfs]
[1552067.297562]  [<ffffffffa04bbed0>] xfs_fs_evict_inode+0x80/0xc0 [xfs]
[1552067.297570]  [<ffffffff811dc0f3>] evict+0xa3/0x1a0
[1552067.297575]  [<ffffffff811dc925>] iput+0xf5/0x180
[1552067.297582]  [<ffffffff811cf4fe>] do_unlinkat+0x18e/0x2a0
[1552067.297590]  [<ffffffff811c6ba5>] ? SYSC_newfstatat+0x25/0x30
[1552067.297596]  [<ffffffff811d28eb>] SyS_unlinkat+0x1b/0x40
[1552067.297602]  [<ffffffff816024a9>] system_call_fastpath+0x16/0x1b
[1552067.297610] XFS (dm-7): xfs_do_force_shutdown(0x8) called from line 138 of file fs/xfs/xfs_bmap_util.c.  Return address = 0xffffffffa04a4b48
[1552067.298378] XFS (dm-7): Corruption of in-memory data detected.  Shutting down filesystem
[1552067.298385] XFS (dm-7): Please umount the filesystem and rectify the problem(s)

I was unable to umount, even with -f; failed with EBUSY and couldn't unbusy
as the fs was unresponsive (and happens to contain the Oracle management
tools necessary to close all open descriptors).  Accordingly I rebooted.

I captured a post-reboot xfs_metadump before attempting any other operations
on the partition.  It is here:

https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz

Attempting to mount in order to replay the log resulted in mount hanging,
and another traceback, as follows:

[  921.672867] XFS (dm-9): Mounting Filesystem
[  921.707326] XFS (dm-9): Starting recovery (logdev: internal)
[  921.721596] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa035d905
[  921.721607] CPU: 3 PID: 8569 Comm: mount Not tainted 3.14.0-rc3+ #1
[  921.721610] Hardware name: FUJITSU PRIMERGY RX300 S7/D2939-A1, BIOS V4.6.5.3 R1.19.0 for D2939-A1x 12/06/2012
[  921.721614]  0000000000069ff9 ffff88102698fb30 ffffffff815f1eb5 0000000000000001
[  921.721625]  ffff88102698fb48 ffffffffa0343c7b ffffffffa035d905 ffff88102698fbe0
[  921.721632]  ffffffffa035c399 ffff881021928800 ffff881017847900 ffff881029304f00
[  921.721639] Call Trace:
[  921.721650]  [<ffffffff815f1eb5>] dump_stack+0x45/0x56
[  921.721713]  [<ffffffffa0343c7b>] xfs_error_report+0x3b/0x40 [xfs]
[  921.721746]  [<ffffffffa035d905>] ? xfs_free_extent+0xc5/0xf0 [xfs]
[  921.721820]  [<ffffffffa035c399>] xfs_free_ag_extent+0x1e9/0x710 [xfs]
[  921.721877]  [<ffffffffa035d905>] xfs_free_extent+0xc5/0xf0 [xfs]
[  921.721912]  [<ffffffffa0390f50>] xlog_recover_process_efi+0x170/0x1b0 [xfs]
[  921.721948]  [<ffffffffa03a0434>] ? xfs_trans_ail_cursor_init+0x24/0x30 [xfs]
[  921.721981]  [<ffffffffa0392e84>] xlog_recover_process_efis.isra.27+0x64/0xb0 [xfs]
[  921.722023]  [<ffffffffa0396041>] xlog_recover_finish+0x21/0xb0 [xfs]
[  921.722055]  [<ffffffffa039aa34>] xfs_log_mount_finish+0x34/0x50 [xfs]
[  921.722082]  [<ffffffffa035327d>] xfs_mountfs+0x47d/0x710 [xfs]
[  921.722109]  [<ffffffffa03565c2>] xfs_fs_fill_super+0x2b2/0x330 [xfs]
[  921.722121]  [<ffffffff811c4db4>] mount_bdev+0x194/0x1d0
[  921.722161]  [<ffffffffa0356310>] ? xfs_parseargs+0xbd0/0xbd0 [xfs]
[  921.722190]  [<ffffffffa0354605>] xfs_fs_mount+0x15/0x20 [xfs]
[  921.722200]  [<ffffffff811c57d9>] mount_fs+0x39/0x1b0
[  921.722208]  [<ffffffff81172090>] ? __alloc_percpu+0x10/0x20
[  921.722219]  [<ffffffff811dfe67>] vfs_kern_mount+0x67/0x100
[  921.722226]  [<ffffffff811e24be>] do_mount+0x23e/0xad0
[  921.722233]  [<ffffffff8116c2fb>] ? strndup_user+0x4b/0xf0
[  921.722240]  [<ffffffff811e3073>] SyS_mount+0x83/0xc0
[  921.722248]  [<ffffffff816024a9>] system_call_fastpath+0x16/0x1b
[  921.722262] XFS (dm-9): Failed to recover EFIs
[  921.722266] XFS (dm-9): log mount finish failed

I ran xfs_repair -n; the output is here:

https://dl.dropboxusercontent.com/u/31522929/xfs_repair-n-output

xfs_repair with no args failed because there was still content in the logs.

xfs_repair -L succeeded; the output is here:

https://dl.dropboxusercontent.com/u/31522929/xfs_repair-L-output

Afterwards I was able to mount the file system and start Oracle from it.

Regards,
Morgan Mears

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-12 20:14 Possible XFS bug encountered in 3.14.0-rc3+ Mears, Morgan
@ 2014-03-12 21:43 ` Mark Tinguely
  2014-03-12 23:06   ` Dave Chinner
  2014-03-12 23:02 ` Dave Chinner
  2014-03-24 21:36 ` Mark Tinguely
  2 siblings, 1 reply; 11+ messages in thread
From: Mark Tinguely @ 2014-03-12 21:43 UTC (permalink / raw)
  To: Mears, Morgan; +Cc: Jie Liu, xfs

On 03/12/14 15:14, Mears, Morgan wrote:
> Hi,
>
> Please CC me on any responses; I don't subscribe to this list.
>
> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
>
> The XFS instance in question is 200 GB and should have all default
> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
> binaries and trace files.  At the time the issue occurred I had been
> running Oracle with SQL*NET server tracing enabled.  The affected XFS
> had filled up 100% with trace files several times; I was periodically
> executing rm -f * in the trace file directory, which would reduce the
> file system occupancy from 100% to 3%.  I had an Oracle load generating
> tool running, so new log files were being created with some frequency.
>
> The issue occurred during one of my rm -f * executions; afterwards the
> file system would only produce errors.  Here is the traceback:
>
> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
> [1552067.297203] CPU: 13 PID: 699 Comm: rm Not tainted 3.14.0-rc3+ #1
> [1552067.297206] Hardware name: FUJITSU PRIMERGY RX300 S7/D2939-A1, BIOS V4.6.5.3 R1.19.0 for D2939-A1x 12/06/2012
> [1552067.297210]  0000000000069ff9 ffff8817740e1b88 ffffffff815f1eb5 0000000000000001
> [1552067.297222]  ffff8817740e1ba0 ffffffffa04aac7b ffffffffa04c4905 ffff8817740e1c38
> [1552067.297229]  ffffffffa04c3399 ffff882022dae000 ffff8810247d2d00 ffff8810239c4840
> [1552067.297236] Call Trace:
> [1552067.297248]  [<ffffffff815f1eb5>] dump_stack+0x45/0x56
> [1552067.297311]  [<ffffffffa04aac7b>] xfs_error_report+0x3b/0x40 [xfs]
> [1552067.297344]  [<ffffffffa04c4905>] ? xfs_free_extent+0xc5/0xf0 [xfs]
> [1552067.297373]  [<ffffffffa04c3399>] xfs_free_ag_extent+0x1e9/0x710 [xfs]
> [1552067.297401]  [<ffffffffa04c4905>] xfs_free_extent+0xc5/0xf0 [xfs]
> [1552067.297425]  [<ffffffffa04a4b0f>] xfs_bmap_finish+0x13f/0x190 [xfs]
> [1552067.297461]  [<ffffffffa04f281d>] xfs_itruncate_extents+0x16d/0x2a0 [xfs]
> [1552067.297503]  [<ffffffffa04f29dd>] xfs_inactive_truncate+0x8d/0x120 [xfs]
> [1552067.297534]  [<ffffffffa04f3188>] xfs_inactive+0x138/0x160 [xfs]
> [1552067.297562]  [<ffffffffa04bbed0>] xfs_fs_evict_inode+0x80/0xc0 [xfs]
> [1552067.297570]  [<ffffffff811dc0f3>] evict+0xa3/0x1a0
> [1552067.297575]  [<ffffffff811dc925>] iput+0xf5/0x180
> [1552067.297582]  [<ffffffff811cf4fe>] do_unlinkat+0x18e/0x2a0
> [1552067.297590]  [<ffffffff811c6ba5>] ? SYSC_newfstatat+0x25/0x30
> [1552067.297596]  [<ffffffff811d28eb>] SyS_unlinkat+0x1b/0x40
> [1552067.297602]  [<ffffffff816024a9>] system_call_fastpath+0x16/0x1b
> [1552067.297610] XFS (dm-7): xfs_do_force_shutdown(0x8) called from line 138 of file fs/xfs/xfs_bmap_util.c.  Return address = 0xffffffffa04a4b48
> [1552067.298378] XFS (dm-7): Corruption of in-memory data detected.  Shutting down filesystem
> [1552067.298385] XFS (dm-7): Please umount the filesystem and rectify the problem(s)

Partial free extent. Lots of unused sequential blocks listed in 
sequential inodes. Are you sure there are no errors in the device mapper 
layer?


> I was unable to umount, even with -f; failed with EBUSY and couldn't unbusy
> as the fs was unresponsive (and happens to contain the Oracle management
> tools necessary to close all open descriptors).  Accordingly I rebooted.

You are the second person in 2-3 weeks to hit this unmount issue.
Unmatched EFI in the AIL keeps the unmount from completing.

Jeff are you still looking at this?

Here and in xlog_recover_process_efi(), the EFD will never happened 
because of the xfs_free_extent error. A manual removal of the EFI from 
the AIL in these cases has to be done. Christoph's proposed EFI/EFD 
recovery changed will not change the need to remove the EFI from the AIL 
in recovery because at this point he does put the EFI on the AIL.

Case 2 is a bit trickier. The EFI has a matching EFD. The EFI makes it 
to AIL but the EFD is on the CIL when there is a forced shutdown. The 
EFD will not remove the EFI in this case. We cannot check if the EFI on 
the AIL in the iop_{committed | unlink} for EFI/EFD pairs that on the 
CIL at forced shutdown because the check would be a use after free on 
the EFI.

The EFD being aborted on a bad xlog buffer write is another path, but 
that is pretty much the same as the case #2.

> I captured a post-reboot xfs_metadump before attempting any other operations
> on the partition.  It is here:
>
> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
>
> Attempting to mount in order to replay the log resulted in mount hanging,
> and another traceback, as follows:
>
> [  921.672867] XFS (dm-9): Mounting Filesystem
> [  921.707326] XFS (dm-9): Starting recovery (logdev: internal)
> [  921.721596] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa035d905
> [  921.721607] CPU: 3 PID: 8569 Comm: mount Not tainted 3.14.0-rc3+ #1
> [  921.721610] Hardware name: FUJITSU PRIMERGY RX300 S7/D2939-A1, BIOS V4.6.5.3 R1.19.0 for D2939-A1x 12/06/2012
> [  921.721614]  0000000000069ff9 ffff88102698fb30 ffffffff815f1eb5 0000000000000001
> [  921.721625]  ffff88102698fb48 ffffffffa0343c7b ffffffffa035d905 ffff88102698fbe0
> [  921.721632]  ffffffffa035c399 ffff881021928800 ffff881017847900 ffff881029304f00
> [  921.721639] Call Trace:
> [  921.721650]  [<ffffffff815f1eb5>] dump_stack+0x45/0x56
> [  921.721713]  [<ffffffffa0343c7b>] xfs_error_report+0x3b/0x40 [xfs]
> [  921.721746]  [<ffffffffa035d905>] ? xfs_free_extent+0xc5/0xf0 [xfs]
> [  921.721820]  [<ffffffffa035c399>] xfs_free_ag_extent+0x1e9/0x710 [xfs]
> [  921.721877]  [<ffffffffa035d905>] xfs_free_extent+0xc5/0xf0 [xfs]
> [  921.721912]  [<ffffffffa0390f50>] xlog_recover_process_efi+0x170/0x1b0 [xfs]
> [  921.721948]  [<ffffffffa03a0434>] ? xfs_trans_ail_cursor_init+0x24/0x30 [xfs]
> [  921.721981]  [<ffffffffa0392e84>] xlog_recover_process_efis.isra.27+0x64/0xb0 [xfs]
> [  921.722023]  [<ffffffffa0396041>] xlog_recover_finish+0x21/0xb0 [xfs]
> [  921.722055]  [<ffffffffa039aa34>] xfs_log_mount_finish+0x34/0x50 [xfs]
> [  921.722082]  [<ffffffffa035327d>] xfs_mountfs+0x47d/0x710 [xfs]
> [  921.722109]  [<ffffffffa03565c2>] xfs_fs_fill_super+0x2b2/0x330 [xfs]
> [  921.722121]  [<ffffffff811c4db4>] mount_bdev+0x194/0x1d0
> [  921.722161]  [<ffffffffa0356310>] ? xfs_parseargs+0xbd0/0xbd0 [xfs]
> [  921.722190]  [<ffffffffa0354605>] xfs_fs_mount+0x15/0x20 [xfs]
> [  921.722200]  [<ffffffff811c57d9>] mount_fs+0x39/0x1b0
> [  921.722208]  [<ffffffff81172090>] ? __alloc_percpu+0x10/0x20
> [  921.722219]  [<ffffffff811dfe67>] vfs_kern_mount+0x67/0x100
> [  921.722226]  [<ffffffff811e24be>] do_mount+0x23e/0xad0
> [  921.722233]  [<ffffffff8116c2fb>] ? strndup_user+0x4b/0xf0
> [  921.722240]  [<ffffffff811e3073>] SyS_mount+0x83/0xc0
> [  921.722248]  [<ffffffff816024a9>] system_call_fastpath+0x16/0x1b
> [  921.722262] XFS (dm-9): Failed to recover EFIs
> [  921.722266] XFS (dm-9): log mount finish failed

The unmount from this failed recovery could hang also see case #1 above.

>
> I ran xfs_repair -n; the output is here:
>
> https://dl.dropboxusercontent.com/u/31522929/xfs_repair-n-output
>
> xfs_repair with no args failed because there was still content in the logs.
>
> xfs_repair -L succeeded; the output is here:
>
> https://dl.dropboxusercontent.com/u/31522929/xfs_repair-L-output
>
> Afterwards I was able to mount the file system and start Oracle from it.
>

Interesting.

> Regards,
> Morgan Mears

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-12 20:14 Possible XFS bug encountered in 3.14.0-rc3+ Mears, Morgan
  2014-03-12 21:43 ` Mark Tinguely
@ 2014-03-12 23:02 ` Dave Chinner
  2014-03-13 14:47   ` Mears, Morgan
  2014-03-24 21:36 ` Mark Tinguely
  2 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-03-12 23:02 UTC (permalink / raw)
  To: Mears, Morgan; +Cc: xfs

On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
> Hi,
> 
> Please CC me on any responses; I don't subscribe to this list.
> 
> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
> 
> The XFS instance in question is 200 GB and should have all default
> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
> binaries and trace files.  At the time the issue occurred I had been
> running Oracle with SQL*NET server tracing enabled.  The affected XFS
> had filled up 100% with trace files several times; I was periodically
> executing rm -f * in the trace file directory, which would reduce the
> file system occupancy from 100% to 3%.  I had an Oracle load generating
> tool running, so new log files were being created with some frequency.
> 
> The issue occurred during one of my rm -f * executions; afterwards the
> file system would only produce errors.  Here is the traceback:
> 
> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905

So, freeing a range that is already partially free. The problem
appears to be in AG 15, according to the repair output.

> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz

AGF 15 is full:

xfs_db> agf 15
xfs_db> p
magicnum = 0x58414746
versionnum = 1
seqno = 15
length = 3276783
bnoroot = 1
cntroot = 2
bnolevel = 1
cntlevel = 1
flfirst = 0
fllast = 3
flcount = 4
freeblks = 1
longest = 1
btreeblks = 0
uuid = 00000000-0000-0000-0000-000000000000
lsn = 0
crc = 0

And the one free block (other than the minimum 4 on teh AGFL) is:

xfs_db> p
magic = 0x41425442
level = 0
numrecs = 1
leftsib = null
rightsib = null
recs[1] = [startblock,blockcount] 1:[3119876,1]

But:

data fork in ino 940862056 claims dup extent, off - 11, start - 58836692, cnt 1
correcting nextents for inode 940862056
bad data fork in inode 940862056
would have cleared inode 940862056

the block number here is in AG 14, which has much more free space:

xfs_db> p
magicnum = 0x58414746
versionnum = 1
seqno = 14
length = 3276783
bnoroot = 1
cntroot = 2
bnolevel = 1
cntlevel = 1
flfirst = 42
fllast = 45
flcount = 4
freeblks = 2092022
longest = 2078090
btreeblks = 0

which is in 2 extents:

xfs_db> a bnoroot
xfs_db> p
magic = 0x41425442
level = 0
numrecs = 2
leftsib = null
rightsib = null
recs[1-2] = [startblock,blockcount] 1:[102466,13932] 2:[116476,2078090]
xfs_db> convert agno 14 agbno 102466 fsb
0x3819042 (58822722)
xfs_db> convert agno 14 agbno 116476 fsb
0x381c6fc (58836732)

and so 58836692 is just short of the second free space. Looking at
all the other dup extent claims, they a remostly adjacent to the
left edge of these two free spaces. No surprise - that's the way
allocation occurs.

So, we've got a state where the allocation btree contains a
corruption, so a shutdown occurs. The log has captured that
corruption when it was made, so log recovery reintroduces that
corruption. And so when the extent is freed after log recovery, the
corruption is tripped over again.

There's two checkpoints in the log, both very small. The last
modification to AGI 14 is there before it:

BUF: cnt:2 total:2 a:0x669350 len:24 a:0x6693d0 len:128 
        BUF:  #regs:2   start blkno:0x15dff891   len:1   bmap size:1   flags:0x2800
        AGF Buffer: (XAGF)
            ver:1  seq#:14  len:3276783  
            root BNO:1  CNT:2
            level BNO:1  CNT:1
            1st:42  last:45  cnt:4  freeblks:2092020  longest:2078090

As is the freespace btree buffer modification:

BUF: cnt:2 total:2 a:0x669460 len:24 a:0x6694e0 len:128 
        BUF:  #regs:2   start blkno:0x15dff898   len:8   bmap size:1   flags:0x2000
        BUF DATA
 0 42544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
   magic = BTBA
   level = 0
   numrecs = 4
   leftsib = NULLFSBLOCK
   rightsib = NULLFSBLOCK
   rec[0] = 0x162cb, 1 (90827,1)
   rec[1] = 0x17cfc, 2 (97532,1)
 8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
   rec[2] = 0x19047, 0x3667 (102471,13927)
   rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)

so, from the pre-recovery case above, we've got two new freespaces
in rec[0-1], rec[2] has 5 blocks removed from the left edge, and
rec[3] is unchanged.

Confirming the ABTC buffer contains the same extents:

BUF: cnt:2 total:2 a:0x669570 len:24 a:0x6695f0 len:128 
        BUF:  #regs:2   start blkno:0x15dff8a0   len:8   bmap size:1   flags:0x2000
        BUF DATA
 0 43544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
   magic = CTBA
   level = 0
   numrecs = 4
   leftsib = NULLFSBLOCK
   rightsib = NULLFSBLOCK
   rec[0] = 0x162cb, 1 (90827,1)
   rec[1] = 0x17cfc, 2 (97532,1)
 8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
   rec[2] = 0x19047, 0x3667 (102471,13927)
   rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
10  8ca0100  8000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
18 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 

It does. So the btrees contain consistent information, and so it's
unlikely that we have a free space btree corruption in the log. So
let's look at what was freed:

The EFI/EFDs in the log are:

EFI: cnt:1 total:1 a:0x668670 len:32 
        EFI:  #regs:1    num_extents:1  id:0xffff881496024af0
        (s: 0x38162cb, l: 1) 
EFD: cnt:1 total:1 a:0x6684d0 len:32 
        EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024af0
....
EFI: cnt:1 total:1 a:0x667da0 len:32 
        EFI:  #regs:1    num_extents:1  id:0xffff8814960244b0
        (s: 0x3817cfc, l: 1) 
EFD: cnt:1 total:1 a:0x65fbd0 len:32 
        EFD:  #regs: 1    num_extents: 1  id: 0xffff8814960244b0
....
EFI: cnt:1 total:1 a:0x669250 len:32 
        EFI:  #regs:1    num_extents:1  id:0xffff881496024000
        (s: 0x3817cfd, l: 1) 
EFD: cnt:1 total:1 a:0x6692d0 len:32 
        EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024000

These are all in ag#14, and have agbnos of 0x162cb, 0x17cfc and
0x17cfd. This corresponds exactly to the two new records in the AG
btree. The one remaining EFI doesn't have an EFD in th elog, so this
is what is being replayed:

EFI: cnt:1 total:1 a:0x669f40 len:32 
        EFI:  #regs:1    num_extents:1  id:0xffff881496024640
	(s: 0x3869ff9, l: 1) 

Which is at agbno 0x69ff9 (434169). That's off to the right of the
start of the last freespace range in rec[3]. rec[3] starts at
0x1c6fc and ends at 0x1fb58a. That means this efi clearly lands
inside that freespace range - it's not an edge case, it's deep in
the interior of the freespace range.

The inode logged just before the EFI - likely it's owner:

INO: cnt:2 total:2 a:0x668040 len:56 a:0x667fb0 len:96 
        INODE: #regs:2   ino:0x3862d698  flags:0x1   dsize:0
        CORE inode:
            magic:IN  mode:0x81b0  ver:2  format:2  onlink:0
            uid:1001  gid:1001  nlink:0 projid:0
            atime:1394495104  mtime:1394495104  ctime:1394554526
            flushiter:1
            size:0x0  nblks:0x0  exsize:0  nextents:0  anextents:0
            forkoff:13  dmevmask:0x0  dmstate:0  flags:0x0  gen:-121998876

Is an unlinked inode that has had all it's block removed. Yup - it's
on the unlinked list:

agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)

So, prior to recovery, what did it contain? it's got 287 bytes of
date, and a single extent:

u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]

xfs_db> convert fsb 59154425 agno
0xe (14)
xfs_db> convert fsb 59154425 agbno
0x69ff9 (434169)

Ok, so the corruption, whatever it was, happened a long time ago,
and it's only when removing the file that it was tripped over.
There's nothing more I can really get from this - the root cause of
the corruption is long gone.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-12 21:43 ` Mark Tinguely
@ 2014-03-12 23:06   ` Dave Chinner
  2014-03-13 15:03     ` Mark Tinguely
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-03-12 23:06 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Mears, Morgan, Jie Liu, xfs

On Wed, Mar 12, 2014 at 04:43:26PM -0500, Mark Tinguely wrote:
> On 03/12/14 15:14, Mears, Morgan wrote:
> >I was unable to umount, even with -f; failed with EBUSY and couldn't unbusy
> >as the fs was unresponsive (and happens to contain the Oracle management
> >tools necessary to close all open descriptors).  Accordingly I rebooted.
> 
> You are the second person in 2-3 weeks to hit this unmount issue.
> Unmatched EFI in the AIL keeps the unmount from completing.
> 
> Jeff are you still looking at this?

I'd say the answer is no. Last time you pointed out this problem I
last asked you to provide patches to fix the problem, mark. Can
you please provide patches to fix this, Mark?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-12 23:02 ` Dave Chinner
@ 2014-03-13 14:47   ` Mears, Morgan
  2014-03-13 15:31     ` Ben Myers
  2014-03-13 22:58     ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Mears, Morgan @ 2014-03-13 14:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Mar 12, 2014 at 07:03:14PM -0400, Dave Chinner wrote:
> On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
>> Hi,
>> 
>> Please CC me on any responses; I don't subscribe to this list.
>> 
>> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
>> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
>> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
>> 
>> The XFS instance in question is 200 GB and should have all default
>> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
>> binaries and trace files.  At the time the issue occurred I had been
>> running Oracle with SQL*NET server tracing enabled.  The affected XFS
>> had filled up 100% with trace files several times; I was periodically
>> executing rm -f * in the trace file directory, which would reduce the
>> file system occupancy from 100% to 3%.  I had an Oracle load generating
>> tool running, so new log files were being created with some frequency.
>> 
>> The issue occurred during one of my rm -f * executions; afterwards the
>> file system would only produce errors.  Here is the traceback:
>> 
>> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
> 
> So, freeing a range that is already partially free. The problem
> appears to be in AG 15, according to the repair output.
> 
>> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
> 
> AGF 15 is full:
> 
> xfs_db> agf 15
> xfs_db> p
> magicnum = 0x58414746
> versionnum = 1
> seqno = 15
> length = 3276783
> bnoroot = 1
> cntroot = 2
> bnolevel = 1
> cntlevel = 1
> flfirst = 0
> fllast = 3
> flcount = 4
> freeblks = 1
> longest = 1
> btreeblks = 0
> uuid = 00000000-0000-0000-0000-000000000000
> lsn = 0
> crc = 0
> 
> And the one free block (other than the minimum 4 on teh AGFL) is:
> 
> xfs_db> p
> magic = 0x41425442
> level = 0
> numrecs = 1
> leftsib = null
> rightsib = null
> recs[1] = [startblock,blockcount] 1:[3119876,1]
> 
> But:
> 
> data fork in ino 940862056 claims dup extent, off - 11, start - 58836692, cnt 1
> correcting nextents for inode 940862056
> bad data fork in inode 940862056
> would have cleared inode 940862056
> 
> the block number here is in AG 14, which has much more free space:
> 
> xfs_db> p
> magicnum = 0x58414746
> versionnum = 1
> seqno = 14
> length = 3276783
> bnoroot = 1
> cntroot = 2
> bnolevel = 1
> cntlevel = 1
> flfirst = 42
> fllast = 45
> flcount = 4
> freeblks = 2092022
> longest = 2078090
> btreeblks = 0
> 
> which is in 2 extents:
> 
> xfs_db> a bnoroot
> xfs_db> p
> magic = 0x41425442
> level = 0
> numrecs = 2
> leftsib = null
> rightsib = null
> recs[1-2] = [startblock,blockcount] 1:[102466,13932] 2:[116476,2078090]
> xfs_db> convert agno 14 agbno 102466 fsb
> 0x3819042 (58822722)
> xfs_db> convert agno 14 agbno 116476 fsb
> 0x381c6fc (58836732)
> 
> and so 58836692 is just short of the second free space. Looking at
> all the other dup extent claims, they a remostly adjacent to the
> left edge of these two free spaces. No surprise - that's the way
> allocation occurs.
> 
> So, we've got a state where the allocation btree contains a
> corruption, so a shutdown occurs. The log has captured that
> corruption when it was made, so log recovery reintroduces that
> corruption. And so when the extent is freed after log recovery, the
> corruption is tripped over again.
> 
> There's two checkpoints in the log, both very small. The last
> modification to AGI 14 is there before it:
> 
> BUF: cnt:2 total:2 a:0x669350 len:24 a:0x6693d0 len:128 
>         BUF:  #regs:2   start blkno:0x15dff891   len:1   bmap size:1   flags:0x2800
>         AGF Buffer: (XAGF)
>             ver:1  seq#:14  len:3276783  
>             root BNO:1  CNT:2
>             level BNO:1  CNT:1
>             1st:42  last:45  cnt:4  freeblks:2092020  longest:2078090
> 
> As is the freespace btree buffer modification:
> 
> BUF: cnt:2 total:2 a:0x669460 len:24 a:0x6694e0 len:128 
>         BUF:  #regs:2   start blkno:0x15dff898   len:8   bmap size:1   flags:0x2000
>         BUF DATA
>  0 42544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
>    magic = BTBA
>    level = 0
>    numrecs = 4
>    leftsib = NULLFSBLOCK
>    rightsib = NULLFSBLOCK
>    rec[0] = 0x162cb, 1 (90827,1)
>    rec[1] = 0x17cfc, 2 (97532,1)
>  8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
>    rec[2] = 0x19047, 0x3667 (102471,13927)
>    rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
> 
> so, from the pre-recovery case above, we've got two new freespaces
> in rec[0-1], rec[2] has 5 blocks removed from the left edge, and
> rec[3] is unchanged.
> 
> Confirming the ABTC buffer contains the same extents:
> 
> BUF: cnt:2 total:2 a:0x669570 len:24 a:0x6695f0 len:128 
>         BUF:  #regs:2   start blkno:0x15dff8a0   len:8   bmap size:1   flags:0x2000
>         BUF DATA
>  0 43544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
>    magic = CTBA
>    level = 0
>    numrecs = 4
>    leftsib = NULLFSBLOCK
>    rightsib = NULLFSBLOCK
>    rec[0] = 0x162cb, 1 (90827,1)
>    rec[1] = 0x17cfc, 2 (97532,1)
>  8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
>    rec[2] = 0x19047, 0x3667 (102471,13927)
>    rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
> 10  8ca0100  8000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
> 18 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
> 
> It does. So the btrees contain consistent information, and so it's
> unlikely that we have a free space btree corruption in the log. So
> let's look at what was freed:
> 
> The EFI/EFDs in the log are:
> 
> EFI: cnt:1 total:1 a:0x668670 len:32 
>         EFI:  #regs:1    num_extents:1  id:0xffff881496024af0
>         (s: 0x38162cb, l: 1) 
> EFD: cnt:1 total:1 a:0x6684d0 len:32 
>         EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024af0
> ....
> EFI: cnt:1 total:1 a:0x667da0 len:32 
>         EFI:  #regs:1    num_extents:1  id:0xffff8814960244b0
>         (s: 0x3817cfc, l: 1) 
> EFD: cnt:1 total:1 a:0x65fbd0 len:32 
>         EFD:  #regs: 1    num_extents: 1  id: 0xffff8814960244b0
> ....
> EFI: cnt:1 total:1 a:0x669250 len:32 
>         EFI:  #regs:1    num_extents:1  id:0xffff881496024000
>         (s: 0x3817cfd, l: 1) 
> EFD: cnt:1 total:1 a:0x6692d0 len:32 
>         EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024000
> 
> These are all in ag#14, and have agbnos of 0x162cb, 0x17cfc and
> 0x17cfd. This corresponds exactly to the two new records in the AG
> btree. The one remaining EFI doesn't have an EFD in th elog, so this
> is what is being replayed:
> 
> EFI: cnt:1 total:1 a:0x669f40 len:32 
>         EFI:  #regs:1    num_extents:1  id:0xffff881496024640
> 	(s: 0x3869ff9, l: 1) 
> 
> Which is at agbno 0x69ff9 (434169). That's off to the right of the
> start of the last freespace range in rec[3]. rec[3] starts at
> 0x1c6fc and ends at 0x1fb58a. That means this efi clearly lands
> inside that freespace range - it's not an edge case, it's deep in
> the interior of the freespace range.
> 
> The inode logged just before the EFI - likely it's owner:
> 
> INO: cnt:2 total:2 a:0x668040 len:56 a:0x667fb0 len:96 
>         INODE: #regs:2   ino:0x3862d698  flags:0x1   dsize:0
>         CORE inode:
>             magic:IN  mode:0x81b0  ver:2  format:2  onlink:0
>             uid:1001  gid:1001  nlink:0 projid:0
>             atime:1394495104  mtime:1394495104  ctime:1394554526
>             flushiter:1
>             size:0x0  nblks:0x0  exsize:0  nextents:0  anextents:0
>             forkoff:13  dmevmask:0x0  dmstate:0  flags:0x0  gen:-121998876
> 
> Is an unlinked inode that has had all it's block removed. Yup - it's
> on the unlinked list:
> 
> agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)
> 
> So, prior to recovery, what did it contain? it's got 287 bytes of
> date, and a single extent:
> 
> u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]
> 
> xfs_db> convert fsb 59154425 agno
> 0xe (14)
> xfs_db> convert fsb 59154425 agbno
> 0x69ff9 (434169)
> 
> Ok, so the corruption, whatever it was, happened a long time ago,
> and it's only when removing the file that it was tripped over.
> There's nothing more I can really get from this - the root cause of
> the corruption is long gone.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Thanks Dave.

Upon restarting my testing I immediately hit this error again (or a very
similar one in any case).  I suspect that the corruption you've noted was
not properly repaired by xfs_repair.

I captured all the same data as before, as well as an xfs_metadump from
after the xfs_repair.  If you're interested, it's all in this tarball:

https://dl.dropboxusercontent.com/u/31522929/xfs-unlink-internal-error-2013-03-13-1.tar.gz

Regards,
Morgan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-12 23:06   ` Dave Chinner
@ 2014-03-13 15:03     ` Mark Tinguely
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Tinguely @ 2014-03-13 15:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mears, Morgan, Jie Liu, xfs

On 03/12/14 18:06, Dave Chinner wrote:
> On Wed, Mar 12, 2014 at 04:43:26PM -0500, Mark Tinguely wrote:
>> On 03/12/14 15:14, Mears, Morgan wrote:
>>> I was unable to umount, even with -f; failed with EBUSY and couldn't unbusy
>>> as the fs was unresponsive (and happens to contain the Oracle management
>>> tools necessary to close all open descriptors).  Accordingly I rebooted.
>>
>> You are the second person in 2-3 weeks to hit this unmount issue.
>> Unmatched EFI in the AIL keeps the unmount from completing.
>>
>> Jeff are you still looking at this?
>
> I'd say the answer is no. Last time you pointed out this problem I
> last asked you to provide patches to fix the problem, mark. Can
> you please provide patches to fix this, Mark?
>
> Cheers,
>
> Dave.

Ah, you wanted me to fix the cil_push error issue that leaks ctx and 
does not wake up the waiters. This is only a side issue to that.

We can easily patch the xfs_bmap_finish() and 
xlog_recover_process_efis() code. I have that patch and tested it. But 
it does not cover the cases of a cil push error nor the successful 
xfs_bmap_finish() and the EFI is in the AIL but the EFD is discarded.

The most correct thing to do is clear the EFI from the AIL in the abort 
paths of xfs_efd_item_committed() and xfs_efd_item_unlock(), but those 
will be called many times and would be overkill.

A less correct but easier would be clear the EFIs from the AIL once in 
xfs_unmountfs() after the last force of the log and before the 
xfs_ail_push_all_sync(). Since the EFI are removed very late, then we 
don't have to special case the removal in xfs_bmap_finish() and the 
xlog_recover_process_efis(). This is why I was waiting to see what Jeff 
wanted to do.

If I hear no strong objection, I intend to put the clearing EFI on the 
AIL for each situation: the abort cases of the efd iop routines, in 
xfs_bmap_finish() and the xlog_recover_process_efis().

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-13 14:47   ` Mears, Morgan
@ 2014-03-13 15:31     ` Ben Myers
  2014-03-13 16:56       ` Mears, Morgan
  2014-03-13 22:58     ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Myers @ 2014-03-13 15:31 UTC (permalink / raw)
  To: Mears, Morgan; +Cc: xfs

On Thu, Mar 13, 2014 at 02:47:58PM +0000, Mears, Morgan wrote:
> On Wed, Mar 12, 2014 at 07:03:14PM -0400, Dave Chinner wrote:
> > On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
> >> Hi,
> >> 
> >> Please CC me on any responses; I don't subscribe to this list.
> >> 
> >> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
> >> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> >> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
> >> 
> >> The XFS instance in question is 200 GB and should have all default
> >> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
> >> binaries and trace files.  At the time the issue occurred I had been
> >> running Oracle with SQL*NET server tracing enabled.  The affected XFS
> >> had filled up 100% with trace files several times; I was periodically
> >> executing rm -f * in the trace file directory, which would reduce the
> >> file system occupancy from 100% to 3%.  I had an Oracle load generating
> >> tool running, so new log files were being created with some frequency.
> >> 
> >> The issue occurred during one of my rm -f * executions; afterwards the
> >> file system would only produce errors.  Here is the traceback:
> >> 
> >> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
> > 
> > So, freeing a range that is already partially free. The problem
> > appears to be in AG 15, according to the repair output.
> > 
> >> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
> > 
> > AGF 15 is full:
> > 
> > xfs_db> agf 15
> > xfs_db> p
> > magicnum = 0x58414746
> > versionnum = 1
> > seqno = 15
> > length = 3276783
> > bnoroot = 1
> > cntroot = 2
> > bnolevel = 1
> > cntlevel = 1
> > flfirst = 0
> > fllast = 3
> > flcount = 4
> > freeblks = 1
> > longest = 1
> > btreeblks = 0
> > uuid = 00000000-0000-0000-0000-000000000000
> > lsn = 0
> > crc = 0
> > 
> > And the one free block (other than the minimum 4 on teh AGFL) is:
> > 
> > xfs_db> p
> > magic = 0x41425442
> > level = 0
> > numrecs = 1
> > leftsib = null
> > rightsib = null
> > recs[1] = [startblock,blockcount] 1:[3119876,1]
> > 
> > But:
> > 
> > data fork in ino 940862056 claims dup extent, off - 11, start - 58836692, cnt 1
> > correcting nextents for inode 940862056
> > bad data fork in inode 940862056
> > would have cleared inode 940862056
> > 
> > the block number here is in AG 14, which has much more free space:
> > 
> > xfs_db> p
> > magicnum = 0x58414746
> > versionnum = 1
> > seqno = 14
> > length = 3276783
> > bnoroot = 1
> > cntroot = 2
> > bnolevel = 1
> > cntlevel = 1
> > flfirst = 42
> > fllast = 45
> > flcount = 4
> > freeblks = 2092022
> > longest = 2078090
> > btreeblks = 0
> > 
> > which is in 2 extents:
> > 
> > xfs_db> a bnoroot
> > xfs_db> p
> > magic = 0x41425442
> > level = 0
> > numrecs = 2
> > leftsib = null
> > rightsib = null
> > recs[1-2] = [startblock,blockcount] 1:[102466,13932] 2:[116476,2078090]
> > xfs_db> convert agno 14 agbno 102466 fsb
> > 0x3819042 (58822722)
> > xfs_db> convert agno 14 agbno 116476 fsb
> > 0x381c6fc (58836732)
> > 
> > and so 58836692 is just short of the second free space. Looking at
> > all the other dup extent claims, they a remostly adjacent to the
> > left edge of these two free spaces. No surprise - that's the way
> > allocation occurs.
> > 
> > So, we've got a state where the allocation btree contains a
> > corruption, so a shutdown occurs. The log has captured that
> > corruption when it was made, so log recovery reintroduces that
> > corruption. And so when the extent is freed after log recovery, the
> > corruption is tripped over again.
> > 
> > There's two checkpoints in the log, both very small. The last
> > modification to AGI 14 is there before it:
> > 
> > BUF: cnt:2 total:2 a:0x669350 len:24 a:0x6693d0 len:128 
> >         BUF:  #regs:2   start blkno:0x15dff891   len:1   bmap size:1   flags:0x2800
> >         AGF Buffer: (XAGF)
> >             ver:1  seq#:14  len:3276783  
> >             root BNO:1  CNT:2
> >             level BNO:1  CNT:1
> >             1st:42  last:45  cnt:4  freeblks:2092020  longest:2078090
> > 
> > As is the freespace btree buffer modification:
> > 
> > BUF: cnt:2 total:2 a:0x669460 len:24 a:0x6694e0 len:128 
> >         BUF:  #regs:2   start blkno:0x15dff898   len:8   bmap size:1   flags:0x2000
> >         BUF DATA
> >  0 42544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
> >    magic = BTBA
> >    level = 0
> >    numrecs = 4
> >    leftsib = NULLFSBLOCK
> >    rightsib = NULLFSBLOCK
> >    rec[0] = 0x162cb, 1 (90827,1)
> >    rec[1] = 0x17cfc, 2 (97532,1)
> >  8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
> >    rec[2] = 0x19047, 0x3667 (102471,13927)
> >    rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
> > 
> > so, from the pre-recovery case above, we've got two new freespaces
> > in rec[0-1], rec[2] has 5 blocks removed from the left edge, and
> > rec[3] is unchanged.
> > 
> > Confirming the ABTC buffer contains the same extents:
> > 
> > BUF: cnt:2 total:2 a:0x669570 len:24 a:0x6695f0 len:128 
> >         BUF:  #regs:2   start blkno:0x15dff8a0   len:8   bmap size:1   flags:0x2000
> >         BUF DATA
> >  0 43544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
> >    magic = CTBA
> >    level = 0
> >    numrecs = 4
> >    leftsib = NULLFSBLOCK
> >    rightsib = NULLFSBLOCK
> >    rec[0] = 0x162cb, 1 (90827,1)
> >    rec[1] = 0x17cfc, 2 (97532,1)
> >  8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
> >    rec[2] = 0x19047, 0x3667 (102471,13927)
> >    rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
> > 10  8ca0100  8000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
> > 18 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
> > 
> > It does. So the btrees contain consistent information, and so it's
> > unlikely that we have a free space btree corruption in the log. So
> > let's look at what was freed:
> > 
> > The EFI/EFDs in the log are:
> > 
> > EFI: cnt:1 total:1 a:0x668670 len:32 
> >         EFI:  #regs:1    num_extents:1  id:0xffff881496024af0
> >         (s: 0x38162cb, l: 1) 
> > EFD: cnt:1 total:1 a:0x6684d0 len:32 
> >         EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024af0
> > ....
> > EFI: cnt:1 total:1 a:0x667da0 len:32 
> >         EFI:  #regs:1    num_extents:1  id:0xffff8814960244b0
> >         (s: 0x3817cfc, l: 1) 
> > EFD: cnt:1 total:1 a:0x65fbd0 len:32 
> >         EFD:  #regs: 1    num_extents: 1  id: 0xffff8814960244b0
> > ....
> > EFI: cnt:1 total:1 a:0x669250 len:32 
> >         EFI:  #regs:1    num_extents:1  id:0xffff881496024000
> >         (s: 0x3817cfd, l: 1) 
> > EFD: cnt:1 total:1 a:0x6692d0 len:32 
> >         EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024000
> > 
> > These are all in ag#14, and have agbnos of 0x162cb, 0x17cfc and
> > 0x17cfd. This corresponds exactly to the two new records in the AG
> > btree. The one remaining EFI doesn't have an EFD in th elog, so this
> > is what is being replayed:
> > 
> > EFI: cnt:1 total:1 a:0x669f40 len:32 
> >         EFI:  #regs:1    num_extents:1  id:0xffff881496024640
> > 	(s: 0x3869ff9, l: 1) 
> > 
> > Which is at agbno 0x69ff9 (434169). That's off to the right of the
> > start of the last freespace range in rec[3]. rec[3] starts at
> > 0x1c6fc and ends at 0x1fb58a. That means this efi clearly lands
> > inside that freespace range - it's not an edge case, it's deep in
> > the interior of the freespace range.
> > 
> > The inode logged just before the EFI - likely it's owner:
> > 
> > INO: cnt:2 total:2 a:0x668040 len:56 a:0x667fb0 len:96 
> >         INODE: #regs:2   ino:0x3862d698  flags:0x1   dsize:0
> >         CORE inode:
> >             magic:IN  mode:0x81b0  ver:2  format:2  onlink:0
> >             uid:1001  gid:1001  nlink:0 projid:0
> >             atime:1394495104  mtime:1394495104  ctime:1394554526
> >             flushiter:1
> >             size:0x0  nblks:0x0  exsize:0  nextents:0  anextents:0
> >             forkoff:13  dmevmask:0x0  dmstate:0  flags:0x0  gen:-121998876
> > 
> > Is an unlinked inode that has had all it's block removed. Yup - it's
> > on the unlinked list:
> > 
> > agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)
> > 
> > So, prior to recovery, what did it contain? it's got 287 bytes of
> > date, and a single extent:
> > 
> > u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]
> > 
> > xfs_db> convert fsb 59154425 agno
> > 0xe (14)
> > xfs_db> convert fsb 59154425 agbno
> > 0x69ff9 (434169)
> > 
> > Ok, so the corruption, whatever it was, happened a long time ago,
> > and it's only when removing the file that it was tripped over.
> > There's nothing more I can really get from this - the root cause of
> > the corruption is long gone.
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> 
> Thanks Dave.
> 
> Upon restarting my testing I immediately hit this error again (or a very
> similar one in any case).  I suspect that the corruption you've noted was
> not properly repaired by xfs_repair.

There are some kinds of corruption that xfs_repair was finding but not
repairing until commit ea4a8de1e1.  I suggest you upgrade if you don't have
this commit.  How long ago did you make this filesystem?

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-13 15:31     ` Ben Myers
@ 2014-03-13 16:56       ` Mears, Morgan
  0 siblings, 0 replies; 11+ messages in thread
From: Mears, Morgan @ 2014-03-13 16:56 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Thu, Mar 13, 2014 at 11:32:22PM -0400, Ben Myers wrote:
> On Thu, Mar 13, 2014 at 02:47:58PM +0000, Mears, Morgan wrote:
>> On Wed, Mar 12, 2014 at 07:03:14PM -0400, Dave Chinner wrote:
>> > On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
>> >> Hi,
>> >> 
>> >> Please CC me on any responses; I don't subscribe to this list.
>> >> 
>> >> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
>> >> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
>> >> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
>> >> 
>> >> The XFS instance in question is 200 GB and should have all default
>> >> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
>> >> binaries and trace files.  At the time the issue occurred I had been
>> >> running Oracle with SQL*NET server tracing enabled.  The affected XFS
>> >> had filled up 100% with trace files several times; I was periodically
>> >> executing rm -f * in the trace file directory, which would reduce the
>> >> file system occupancy from 100% to 3%.  I had an Oracle load generating
>> >> tool running, so new log files were being created with some frequency.
>> >> 
>> >> The issue occurred during one of my rm -f * executions; afterwards the
>> >> file system would only produce errors.  Here is the traceback:
>> >> 
>> >> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
>> > 
>> > So, freeing a range that is already partially free. The problem
>> > appears to be in AG 15, according to the repair output.
>> > 
>> >> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
>> > 
>> > AGF 15 is full:
>> > 
>> > xfs_db> agf 15
>> > xfs_db> p
>> > magicnum = 0x58414746
>> > versionnum = 1
>> > seqno = 15
>> > length = 3276783
>> > bnoroot = 1
>> > cntroot = 2
>> > bnolevel = 1
>> > cntlevel = 1
>> > flfirst = 0
>> > fllast = 3
>> > flcount = 4
>> > freeblks = 1
>> > longest = 1
>> > btreeblks = 0
>> > uuid = 00000000-0000-0000-0000-000000000000
>> > lsn = 0
>> > crc = 0
>> > 
>> > And the one free block (other than the minimum 4 on teh AGFL) is:
>> > 
>> > xfs_db> p
>> > magic = 0x41425442
>> > level = 0
>> > numrecs = 1
>> > leftsib = null
>> > rightsib = null
>> > recs[1] = [startblock,blockcount] 1:[3119876,1]
>> > 
>> > But:
>> > 
>> > data fork in ino 940862056 claims dup extent, off - 11, start - 58836692, cnt 1
>> > correcting nextents for inode 940862056
>> > bad data fork in inode 940862056
>> > would have cleared inode 940862056
>> > 
>> > the block number here is in AG 14, which has much more free space:
>> > 
>> > xfs_db> p
>> > magicnum = 0x58414746
>> > versionnum = 1
>> > seqno = 14
>> > length = 3276783
>> > bnoroot = 1
>> > cntroot = 2
>> > bnolevel = 1
>> > cntlevel = 1
>> > flfirst = 42
>> > fllast = 45
>> > flcount = 4
>> > freeblks = 2092022
>> > longest = 2078090
>> > btreeblks = 0
>> > 
>> > which is in 2 extents:
>> > 
>> > xfs_db> a bnoroot
>> > xfs_db> p
>> > magic = 0x41425442
>> > level = 0
>> > numrecs = 2
>> > leftsib = null
>> > rightsib = null
>> > recs[1-2] = [startblock,blockcount] 1:[102466,13932] 2:[116476,2078090]
>> > xfs_db> convert agno 14 agbno 102466 fsb
>> > 0x3819042 (58822722)
>> > xfs_db> convert agno 14 agbno 116476 fsb
>> > 0x381c6fc (58836732)
>> > 
>> > and so 58836692 is just short of the second free space. Looking at
>> > all the other dup extent claims, they a remostly adjacent to the
>> > left edge of these two free spaces. No surprise - that's the way
>> > allocation occurs.
>> > 
>> > So, we've got a state where the allocation btree contains a
>> > corruption, so a shutdown occurs. The log has captured that
>> > corruption when it was made, so log recovery reintroduces that
>> > corruption. And so when the extent is freed after log recovery, the
>> > corruption is tripped over again.
>> > 
>> > There's two checkpoints in the log, both very small. The last
>> > modification to AGI 14 is there before it:
>> > 
>> > BUF: cnt:2 total:2 a:0x669350 len:24 a:0x6693d0 len:128 
>> >	     BUF:  #regs:2   start blkno:0x15dff891   len:1   bmap size:1   flags:0x2800
>> >	     AGF Buffer: (XAGF)
>> >		 ver:1  seq#:14  len:3276783  
>> >		 root BNO:1  CNT:2
>> >		 level BNO:1  CNT:1
>> >		 1st:42  last:45  cnt:4  freeblks:2092020  longest:2078090
>> > 
>> > As is the freespace btree buffer modification:
>> > 
>> > BUF: cnt:2 total:2 a:0x669460 len:24 a:0x6694e0 len:128 
>> >	     BUF:  #regs:2   start blkno:0x15dff898   len:8   bmap size:1   flags:0x2000
>> >	     BUF DATA
>> >  0 42544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
>> >	magic = BTBA
>> >	level = 0
>> >	numrecs = 4
>> >	leftsib = NULLFSBLOCK
>> >	rightsib = NULLFSBLOCK
>> >	rec[0] = 0x162cb, 1 (90827,1)
>> >	rec[1] = 0x17cfc, 2 (97532,1)
>> >  8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
>> >	rec[2] = 0x19047, 0x3667 (102471,13927)
>> >	rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
>> > 
>> > so, from the pre-recovery case above, we've got two new freespaces
>> > in rec[0-1], rec[2] has 5 blocks removed from the left edge, and
>> > rec[3] is unchanged.
>> > 
>> > Confirming the ABTC buffer contains the same extents:
>> > 
>> > BUF: cnt:2 total:2 a:0x669570 len:24 a:0x6695f0 len:128 
>> >	     BUF:  #regs:2   start blkno:0x15dff8a0   len:8   bmap size:1   flags:0x2000
>> >	     BUF DATA
>> >  0 43544241  4000000 ffffffff ffffffff cb620100  1000000 fc7c0100  2000000 
>> >	magic = CTBA
>> >	level = 0
>> >	numrecs = 4
>> >	leftsib = NULLFSBLOCK
>> >	rightsib = NULLFSBLOCK
>> >	rec[0] = 0x162cb, 1 (90827,1)
>> >	rec[1] = 0x17cfc, 2 (97532,1)
>> >  8 47900100 67360000 fcc60100 8ab51f00 71ff3100  5000000 71ff3100  5000000 
>> >	rec[2] = 0x19047, 0x3667 (102471,13927)
>> >	rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
>> > 10  8ca0100  8000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
>> > 18 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 71ff3100  5000000 
>> > 
>> > It does. So the btrees contain consistent information, and so it's
>> > unlikely that we have a free space btree corruption in the log. So
>> > let's look at what was freed:
>> > 
>> > The EFI/EFDs in the log are:
>> > 
>> > EFI: cnt:1 total:1 a:0x668670 len:32 
>> >	     EFI:  #regs:1    num_extents:1  id:0xffff881496024af0
>> >	     (s: 0x38162cb, l: 1) 
>> > EFD: cnt:1 total:1 a:0x6684d0 len:32 
>> >	     EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024af0
>> > ....
>> > EFI: cnt:1 total:1 a:0x667da0 len:32 
>> >	     EFI:  #regs:1    num_extents:1  id:0xffff8814960244b0
>> >	     (s: 0x3817cfc, l: 1) 
>> > EFD: cnt:1 total:1 a:0x65fbd0 len:32 
>> >	     EFD:  #regs: 1    num_extents: 1  id: 0xffff8814960244b0
>> > ....
>> > EFI: cnt:1 total:1 a:0x669250 len:32 
>> >	     EFI:  #regs:1    num_extents:1  id:0xffff881496024000
>> >	     (s: 0x3817cfd, l: 1) 
>> > EFD: cnt:1 total:1 a:0x6692d0 len:32 
>> >	     EFD:  #regs: 1    num_extents: 1  id: 0xffff881496024000
>> > 
>> > These are all in ag#14, and have agbnos of 0x162cb, 0x17cfc and
>> > 0x17cfd. This corresponds exactly to the two new records in the AG
>> > btree. The one remaining EFI doesn't have an EFD in th elog, so this
>> > is what is being replayed:
>> > 
>> > EFI: cnt:1 total:1 a:0x669f40 len:32 
>> >	     EFI:  #regs:1    num_extents:1  id:0xffff881496024640
>> >   (s: 0x3869ff9, l: 1) 
>> > 
>> > Which is at agbno 0x69ff9 (434169). That's off to the right of the
>> > start of the last freespace range in rec[3]. rec[3] starts at
>> > 0x1c6fc and ends at 0x1fb58a. That means this efi clearly lands
>> > inside that freespace range - it's not an edge case, it's deep in
>> > the interior of the freespace range.
>> > 
>> > The inode logged just before the EFI - likely it's owner:
>> > 
>> > INO: cnt:2 total:2 a:0x668040 len:56 a:0x667fb0 len:96 
>> >	     INODE: #regs:2   ino:0x3862d698  flags:0x1   dsize:0
>> >	     CORE inode:
>> >		 magic:IN  mode:0x81b0  ver:2  format:2  onlink:0
>> >		 uid:1001  gid:1001  nlink:0 projid:0
>> >		 atime:1394495104  mtime:1394495104  ctime:1394554526
>> >		 flushiter:1
>> >		 size:0x0  nblks:0x0  exsize:0  nextents:0  anextents:0
>> >		 forkoff:13  dmevmask:0x0  dmstate:0  flags:0x0  gen:-121998876
>> > 
>> > Is an unlinked inode that has had all it's block removed. Yup - it's
>> > on the unlinked list:
>> > 
>> > agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)
>> > 
>> > So, prior to recovery, what did it contain? it's got 287 bytes of
>> > date, and a single extent:
>> > 
>> > u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]
>> > 
>> > xfs_db> convert fsb 59154425 agno
>> > 0xe (14)
>> > xfs_db> convert fsb 59154425 agbno
>> > 0x69ff9 (434169)
>> > 
>> > Ok, so the corruption, whatever it was, happened a long time ago,
>> > and it's only when removing the file that it was tripped over.
>> > There's nothing more I can really get from this - the root cause of
>> > the corruption is long gone.
>> > 
>> > Cheers,
>> > 
>> > Dave.
>> > -- 
>> > Dave Chinner
>> > david@fromorbit.com
>> 
>> Thanks Dave.
>> 
>> Upon restarting my testing I immediately hit this error again (or a very
>> similar one in any case).  I suspect that the corruption you've noted was
>> not properly repaired by xfs_repair.
> 
> There are some kinds of corruption that xfs_repair was finding but not
> repairing until commit ea4a8de1e1.  I suggest you upgrade if you don't have
> this commit.  How long ago did you make this filesystem?
> 
> Thanks,
> 	Ben

I'm using the xfs_repair that came with RHEL 7.0 post-beta snapshot 4; -V
says "xfs_repair version 3.2.0-alpha2".  I don't think I have access to the
commit at which it was built.  I can certainly try the latest version,
though.

I made this file system on February 17th, 2014.  I happen to remember
because I was disappointed to be working on Presidents' Day :)

--Morgan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-13 14:47   ` Mears, Morgan
  2014-03-13 15:31     ` Ben Myers
@ 2014-03-13 22:58     ` Dave Chinner
  2014-03-14 14:22       ` Mears, Morgan
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-03-13 22:58 UTC (permalink / raw)
  To: Mears, Morgan; +Cc: xfs

On Thu, Mar 13, 2014 at 02:47:58PM +0000, Mears, Morgan wrote:
> On Wed, Mar 12, 2014 at 07:03:14PM -0400, Dave Chinner wrote:
> > On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
> >> Hi,
> >> 
> >> Please CC me on any responses; I don't subscribe to this list.
> >> 
> >> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
> >> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> >> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
> >> 
> >> The XFS instance in question is 200 GB and should have all default
> >> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
> >> binaries and trace files.  At the time the issue occurred I had been
> >> running Oracle with SQL*NET server tracing enabled.  The affected XFS
> >> had filled up 100% with trace files several times; I was periodically
> >> executing rm -f * in the trace file directory, which would reduce the
> >> file system occupancy from 100% to 3%.  I had an Oracle load generating
> >> tool running, so new log files were being created with some frequency.
> >> 
> >> The issue occurred during one of my rm -f * executions; afterwards the
> >> file system would only produce errors.  Here is the traceback:
> >> 
> >> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
> > 
> > So, freeing a range that is already partially free. The problem
> > appears to be in AG 15, according to the repair output.
> > 
> >> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
> > 
> > AGF 15 is full:
....
> > on the unlinked list:
> > 
> > agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)
> > 
> > So, prior to recovery, what did it contain? it's got 287 bytes of
> > date, and a single extent:
> > 
> > u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]
> > 
> > xfs_db> convert fsb 59154425 agno
> > 0xe (14)
> > xfs_db> convert fsb 59154425 agbno
> > 0x69ff9 (434169)
> > 
> > Ok, so the corruption, whatever it was, happened a long time ago,
> > and it's only when removing the file that it was tripped over.
> > There's nothing more I can really get from this - the root cause of
> > the corruption is long gone.
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> 
> Thanks Dave.
> 
> Upon restarting my testing I immediately hit this error again (or a very
> similar one in any case).  I suspect that the corruption you've noted was
> not properly repaired by xfs_repair.

What happens if you run xfs_repair twice in a row?

> I captured all the same data as before, as well as an xfs_metadump from
> after the xfs_repair.  If you're interested, it's all in this tarball:
> 
> https://dl.dropboxusercontent.com/u/31522929/xfs-unlink-internal-error-2013-03-13-1.tar.gz

Ok, that's triggered the right side btree checks, not the left side
like the previous one. It's probably AG 14 again.

EFI:

EFI: cnt:1 total:1 a:0x19c77b0 len:48 
        EFI:  #regs:1    num_extents:2  id:0xffff880f8011c640
	(s: 0x3816a3b, l: 112) (s: 0x3817cb1, l: 1920) 

So, two extents being freed:

xfs_db> convert fsb 0x3816a3b agno
0xe (14)
xfs_db> convert fsb 0x3816a3b agbno
0x16a3b (92731)
xfs_db> convert fsb 0x3817cb1 agbno
0x17cb1 (97457)

Surrounding free space regions:

66:[92551,180] 67:[92856,2]	-> used space range [92731,125]
...
172:[97415,42] 173:[97622,4]	-> used space range [97457,65]

So the first extent is good. The second, however, aligns correctly
to the free space region to the left, but massively overruns the
used space region which is only 165 blocks long. So it's a similar
problem here - both the free space trees are internally consistent,
the inode BMBT is internally consistent, but the space that they
track is not consistent.

After repair:

63:[92551,292] 64:[92856,2]	-> correctly accounted
....
169:[97415,49] 170:[97468,56] 171:[97528,168]
	-> used space [97464,2], [97524,4]

But that's a very different freespace map around the second extent
in the EFI. It's most definitely not a contiguous range of 1920
blocks now that repair has made sure the inode has no extents and
the range is correctly accounted for, so that indicates that the
length in the EFI is suspect.  Maybe it should only be 7 blocks?

Did you run the filesystem out of space again before this happened?
If you don't hit enospc, does removing files trigger this
corruption?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-13 22:58     ` Dave Chinner
@ 2014-03-14 14:22       ` Mears, Morgan
  0 siblings, 0 replies; 11+ messages in thread
From: Mears, Morgan @ 2014-03-14 14:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Mar 13, 2014 at 06:59:22PM -0400, Dave Chinner wrote:
> On Thu, Mar 13, 2014 at 02:47:58PM +0000, Mears, Morgan wrote:
>> On Wed, Mar 12, 2014 at 07:03:14PM -0400, Dave Chinner wrote:
>> > On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
>> >> Hi,
>> >> 
>> >> Please CC me on any responses; I don't subscribe to this list.
>> >> 
>> >> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
>> >> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
>> >> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
>> >> 
>> >> The XFS instance in question is 200 GB and should have all default
>> >> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
>> >> binaries and trace files.  At the time the issue occurred I had been
>> >> running Oracle with SQL*NET server tracing enabled.  The affected XFS
>> >> had filled up 100% with trace files several times; I was periodically
>> >> executing rm -f * in the trace file directory, which would reduce the
>> >> file system occupancy from 100% to 3%.  I had an Oracle load generating
>> >> tool running, so new log files were being created with some frequency.
>> >> 
>> >> The issue occurred during one of my rm -f * executions; afterwards the
>> >> file system would only produce errors.  Here is the traceback:
>> >> 
>> >> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
>> > 
>> > So, freeing a range that is already partially free. The problem
>> > appears to be in AG 15, according to the repair output.
>> > 
>> >> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
>> > 
>> > AGF 15 is full:
>....
>> > on the unlinked list:
>> > 
>> > agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)
>> > 
>> > So, prior to recovery, what did it contain? it's got 287 bytes of
>> > date, and a single extent:
>> > 
>> > u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]
>> > 
>> > xfs_db> convert fsb 59154425 agno
>> > 0xe (14)
>> > xfs_db> convert fsb 59154425 agbno
>> > 0x69ff9 (434169)
>> > 
>> > Ok, so the corruption, whatever it was, happened a long time ago,
>> > and it's only when removing the file that it was tripped over.
>> > There's nothing more I can really get from this - the root cause of
>> > the corruption is long gone.
>> > 
>> > Cheers,
>> > 
>> > Dave.
>> > -- 
>> > Dave Chinner
>> > david@fromorbit.com
>> 
>> Thanks Dave.
>> 
>> Upon restarting my testing I immediately hit this error again (or a very
>> similar one in any case).  I suspect that the corruption you've noted was
>> not properly repaired by xfs_repair.
> 
> What happens if you run xfs_repair twice in a row?

Don't know; I didn't try.  The filesystem seems to be clean now, after two
xfs_repairs with an intervening recurrence of the issue during another
rm -f *.  It certainly seems possible that the first xfs_repair only
partially fixed the corruption, and the second occurrence of the issue
wasn't a real reproduction but fallout from an incomplete fix.

Unfortunately I didn't think to snapshot the LUN before fixing it up, so
I can't go back and try.  If I can reproduce the issue I will run
xfs_repair until it doesn't find anything to fix.

>> I captured all the same data as before, as well as an xfs_metadump from
>> after the xfs_repair.  If you're interested, it's all in this tarball:
>> 
>> https://dl.dropboxusercontent.com/u/31522929/xfs-unlink-internal-error-2013-03-13-1.tar.gz
> 
> Ok, that's triggered the right side btree checks, not the left side
> like the previous one. It's probably AG 14 again.
> 
> EFI:
> 
> EFI: cnt:1 total:1 a:0x19c77b0 len:48 
>         EFI:  #regs:1    num_extents:2  id:0xffff880f8011c640
> 	(s: 0x3816a3b, l: 112) (s: 0x3817cb1, l: 1920) 
> 
> So, two extents being freed:
> 
> xfs_db> convert fsb 0x3816a3b agno
> 0xe (14)
> xfs_db> convert fsb 0x3816a3b agbno
> 0x16a3b (92731)
> xfs_db> convert fsb 0x3817cb1 agbno
> 0x17cb1 (97457)
> 
> Surrounding free space regions:
> 
> 66:[92551,180] 67:[92856,2]	-> used space range [92731,125]
> ...
> 172:[97415,42] 173:[97622,4]	-> used space range [97457,65]
> 
> So the first extent is good. The second, however, aligns correctly
> to the free space region to the left, but massively overruns the
> used space region which is only 165 blocks long. So it's a similar
> problem here - both the free space trees are internally consistent,
> the inode BMBT is internally consistent, but the space that they
> track is not consistent.
> 
> After repair:
> 
> 63:[92551,292] 64:[92856,2]	-> correctly accounted
> ....
> 169:[97415,49] 170:[97468,56] 171:[97528,168]
> 	-> used space [97464,2], [97524,4]
> 
> But that's a very different freespace map around the second extent
> in the EFI. It's most definitely not a contiguous range of 1920
> blocks now that repair has made sure the inode has no extents and
> the range is correctly accounted for, so that indicates that the
> length in the EFI is suspect.  Maybe it should only be 7 blocks?
> 
> Did you run the filesystem out of space again before this happened?
> If you don't hit enospc, does removing files trigger this
> corruption?

I did not run the filesystem out of space again.  I'd actually deactivated
most of the Oracle tracing and started another benchmark run, then noticed
that the filesystem was still about 40% full (presumably my last rm -f *
only got that far before encountering the error), so I did another
rm -f * in the trace file directory and hit the issue again.

As time is available, I will see if I can reproduce the issue in the manner
I produced it initially and report back if so.  I believe xfs_repair is now
running clean; latest output is here if you want to confirm:

https://dl.dropboxusercontent.com/u/31522929/xfs_repair-latest-output-2014-03-13-2

Also I've upgraded to the latest xfs_repair built on main from
git://oss.sgi.com/xfs/cmds/xfsprogs; git log shows commit ea4a8de1e1 which
Ben recommended (though xfs_repair -V still shows version 3.2.0-alpha2).

Regards,
Morgan

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible XFS bug encountered in 3.14.0-rc3+
  2014-03-12 20:14 Possible XFS bug encountered in 3.14.0-rc3+ Mears, Morgan
  2014-03-12 21:43 ` Mark Tinguely
  2014-03-12 23:02 ` Dave Chinner
@ 2014-03-24 21:36 ` Mark Tinguely
  2 siblings, 0 replies; 11+ messages in thread
From: Mark Tinguely @ 2014-03-24 21:36 UTC (permalink / raw)
  To: Mears, Morgan; +Cc: xfs

On 03/12/14 15:14, Mears, Morgan wrote:
> Hi,
>
> Please CC me on any responses; I don't subscribe to this list.
>
> I ran into a possible XFS bug while doing some Oracle benchmarking.  My test
> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
>
> The XFS instance in question is 200 GB and should have all default
> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>).  It contains Oracle
> binaries and trace files.  At the time the issue occurred I had been
> running Oracle with SQL*NET server tracing enabled.  The affected XFS
> had filled up 100% with trace files several times; I was periodically
> executing rm -f * in the trace file directory, which would reduce the
> file system occupancy from 100% to 3%.  I had an Oracle load generating
> tool running, so new log files were being created with some frequency.
>
> The issue occurred during one of my rm -f * executions; afterwards the
> file system would only produce errors.  Here is the traceback:
>
> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa04c4905
> [1552067.297203] CPU: 13 PID: 699 Comm: rm Not tainted 3.14.0-rc3+ #1
> [1552067.297206] Hardware name: FUJITSU PRIMERGY RX300 S7/D2939-A1, BIOS V4.6.5.3 R1.19.0 for D2939-A1x 12/06/2012
> [1552067.297210]  0000000000069ff9 ffff8817740e1b88 ffffffff815f1eb5 0000000000000001
> [1552067.297222]  ffff8817740e1ba0 ffffffffa04aac7b ffffffffa04c4905 ffff8817740e1c38
> [1552067.297229]  ffffffffa04c3399 ffff882022dae000 ffff8810247d2d00 ffff8810239c4840
> [1552067.297236] Call Trace:
> [1552067.297248]  [<ffffffff815f1eb5>] dump_stack+0x45/0x56
> [1552067.297311]  [<ffffffffa04aac7b>] xfs_error_report+0x3b/0x40 [xfs]
> [1552067.297344]  [<ffffffffa04c4905>] ? xfs_free_extent+0xc5/0xf0 [xfs]
> [1552067.297373]  [<ffffffffa04c3399>] xfs_free_ag_extent+0x1e9/0x710 [xfs]
> [1552067.297401]  [<ffffffffa04c4905>] xfs_free_extent+0xc5/0xf0 [xfs]
> [1552067.297425]  [<ffffffffa04a4b0f>] xfs_bmap_finish+0x13f/0x190 [xfs]
> [1552067.297461]  [<ffffffffa04f281d>] xfs_itruncate_extents+0x16d/0x2a0 [xfs]
> [1552067.297503]  [<ffffffffa04f29dd>] xfs_inactive_truncate+0x8d/0x120 [xfs]
> [1552067.297534]  [<ffffffffa04f3188>] xfs_inactive+0x138/0x160 [xfs]
> [1552067.297562]  [<ffffffffa04bbed0>] xfs_fs_evict_inode+0x80/0xc0 [xfs]
> [1552067.297570]  [<ffffffff811dc0f3>] evict+0xa3/0x1a0
> [1552067.297575]  [<ffffffff811dc925>] iput+0xf5/0x180
> [1552067.297582]  [<ffffffff811cf4fe>] do_unlinkat+0x18e/0x2a0
> [1552067.297590]  [<ffffffff811c6ba5>] ? SYSC_newfstatat+0x25/0x30
> [1552067.297596]  [<ffffffff811d28eb>] SyS_unlinkat+0x1b/0x40
> [1552067.297602]  [<ffffffff816024a9>] system_call_fastpath+0x16/0x1b
> [1552067.297610] XFS (dm-7): xfs_do_force_shutdown(0x8) called from line 138 of file fs/xfs/xfs_bmap_util.c.  Return address = 0xffffffffa04a4b48
> [1552067.298378] XFS (dm-7): Corruption of in-memory data detected.  Shutting down filesystem
> [1552067.298385] XFS (dm-7): Please umount the filesystem and rectify the problem(s)


This is very interesting. From your first occurrence of the problem, there
are 3 groups of duplicate allocated blocks in AG14. Remove both
duplicates and the XFS_WANT_CORRUPTED_GOTO is triggered.

In the first group, inode 940954751 maps fsb 58817713 for a length of 1920
and most of these blocks are allocated elsewhere in small lengths.

In the second group, inode 940954759 is maps fsb 58822053 for a length 39,
and most of these blocks are allocated elsewhere.

In the third group there are smaller (1, 2, 3, 10) blocks of overlaps.
The last 2 blocks of this group are allocated to inode 941385832 and are
also listed as being free in the cntbr/bnobt at the same time.

To make things more interesting, there a several cases where the first inode
of an inode chunk has a single block mapped and that block is a 
duplicate for
another active inode chunk block. Example of this is inode 941083520 maps
fsb 58817724, but that block is also the inode chunk for inodes starting
at 941083584.

The earlier found interesting duplicate is the user data block, fsb
58836692 in inode 941386494 that is also a directory block 11 in inode
940862056. The user block was written last which is now garbage for the
directory.

I don't know any more why we are duplicate mapping.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-03-24 21:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-12 20:14 Possible XFS bug encountered in 3.14.0-rc3+ Mears, Morgan
2014-03-12 21:43 ` Mark Tinguely
2014-03-12 23:06   ` Dave Chinner
2014-03-13 15:03     ` Mark Tinguely
2014-03-12 23:02 ` Dave Chinner
2014-03-13 14:47   ` Mears, Morgan
2014-03-13 15:31     ` Ben Myers
2014-03-13 16:56       ` Mears, Morgan
2014-03-13 22:58     ` Dave Chinner
2014-03-14 14:22       ` Mears, Morgan
2014-03-24 21:36 ` Mark Tinguely

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.