All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs corruption
@ 2015-09-03 11:09 Danny Shavit
  2015-09-03 13:22 ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Danny Shavit @ 2015-09-03 11:09 UTC (permalink / raw)
  To: xfs; +Cc: Alex Lyakas


[-- Attachment #1.1: Type: text/plain, Size: 632 bytes --]

Hi Dave,

We couple of more xfs corruption that we would like to share:

1. This is an interesting one, since xfs reported corruption but when
running xfs_repair, no error was found.
Attached is the kernel log section regarding the corruption (6458).
Does xfs_repair explicitly read data from the disk? In such case it might
be a memory corruption. Are you familiar with such cases?

2. xfs corruption occurred suddenly with no apparent external event.
Attached are xfs_repair and kernel logs are.
Xfs dump can be found in:
https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz




-- 
Thanks,
Danny Shavit
ZadaraStorage

[-- Attachment #1.2: Type: text/html, Size: 1015 bytes --]

[-- Attachment #2: 6458-kernel.log --]
[-- Type: application/octet-stream, Size: 2688 bytes --]

The XFS volumes then entered a corrupted state:

Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.307743] XFS (dm-39): Internal error xfs_allocbt_verify at line 330 of file /mnt/share/builds/14.11--3.8.13-030813-generic/2015-04-29_10-45-42--14.11-1601-124/src/zadara-btrfs/fs/xfs/xfs_alloc_btree.c.  Caller 0xffffffffa064e9ce
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.307743]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314446] Pid: 25231, comm: kworker/0:0H Tainted: GF       W  O 3.8.13-030813-generic #201305111843
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314449] Call Trace:
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314487]  [<ffffffffa0631baf>] xfs_error_report+0x3f/0x50 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314502]  [<ffffffffa064e9ce>] ? xfs_allocbt_read_verify+0xe/0x10 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314514]  [<ffffffffa0631c1e>] xfs_corruption_error+0x5e/0x90 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314528]  [<ffffffffa064e862>] xfs_allocbt_verify+0x92/0x1e0 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314540]  [<ffffffffa064e9ce>] ? xfs_allocbt_read_verify+0xe/0x10 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314547]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314551]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314566]  [<ffffffffa064e9ce>] xfs_allocbt_read_verify+0xe/0x10 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315251]  [<ffffffffa062f48f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315255]  [<ffffffff81078b81>] process_one_work+0x141/0x490
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315257]  [<ffffffff81079b48>] worker_thread+0x168/0x400
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315259]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315262]  [<ffffffff8107f050>] kthread+0xc0/0xd0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315265]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315270]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315273]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315275] XFS (dm-39): Corruption detected. Unmount and run xfs_repair
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.316706] XFS (dm-39): metadata I/O error: block 0x41a6eff8 ("xfs_trans_read_buf_map") error 117 numblks 8

[-- Attachment #3: 6442-82-xfs_repair.log --]
[-- Type: application/octet-stream, Size: 6009 bytes --]

root@vsa-00000110-vc-0:~# xfs_repair /dev/dm-82
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
root@vsa-00000110-vc-0:~# xfs_repair -L /dev/dm-82
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
agi unlinked bucket 1 is 12580353 in ag 3 (inode=213906945)
sb_icount 1226496, counted 1227776
sb_ifree 292180, counted 297082
sb_fdblocks 31182739, counted 55158044
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
7f8d22a2c700: Badness in key lookup (length)
bp=(bno 84932992, len 16384 bytes) key=(bno 84932992, len 8192 bytes)
        - agno = 3
bad magic # 0xeabb123a in inode 213906945 (data fork) bmbt block 13369242
bad data fork in inode 213906945
cleared inode 213906945
clearing forw/back pointers in block 0 for attributes in inode 213906953
bad attribute leaf magic # 0xbc6c for dir ino 213906953
problem with attribute contents in inode 213906953
clearing inode 213906953 attributes
correcting nblocks for inode 213906953, was 66 - counted 65
clearing forw/back pointers in block 0 for attributes in inode 213906954
bad attribute leaf magic # 0xde72 for dir ino 213906954
problem with attribute contents in inode 213906954
clearing inode 213906954 attributes
correcting nblocks for inode 213906954, was 2 - counted 1
clearing forw/back pointers in block 0 for attributes in inode 213906960
bad attribute leaf magic # 0xd0eb for dir ino 213906960
problem with attribute contents in inode 213906960
clearing inode 213906960 attributes
correcting nblocks for inode 213906960, was 4 - counted 3
clearing forw/back pointers in block 0 for attributes in inode 213906961
bad attribute leaf magic # 0xb876 for dir ino 213906961
problem with attribute contents in inode 213906961
clearing inode 213906961 attributes
correcting nblocks for inode 213906961, was 5 - counted 4
        - agno = 4
        - agno = 5
clearing forw/back pointers in block 0 for attributes in inode 347235105
bad attribute leaf magic # 0xb033 for dir ino 347235105
problem with attribute contents in inode 347235105
clearing inode 347235105 attributes
correcting nblocks for inode 347235105, was 9 - counted 8
clearing forw/back pointers in block 0 for attributes in inode 347235106
bad attribute leaf magic # 0xe13 for dir ino 347235106
problem with attribute contents in inode 347235106
clearing inode 347235106 attributes
correcting nblocks for inode 347235106, was 9 - counted 8
        - agno = 6
        - agno = 7
clearing forw/back pointers in block 0 for attributes in inode 478759702
bad attribute leaf magic # 0xa065 for dir ino 478759702
problem with attribute contents in inode 478759702
clearing inode 478759702 attributes
correcting nblocks for inode 478759702, was 1561 - counted 1560
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad magic # 0x58465342 in inode 213906953 (data fork) bmbt block 0
bad data fork in inode 213906953
cleared inode 213906953
bad attribute format 1 in inode 213906954, resetting value
bad attribute format 1 in inode 213906960, resetting value
bad attribute format 1 in inode 213906961, resetting value
        - agno = 4
        - agno = 5
bad attribute format 1 in inode 347235105, resetting value
bad attribute format 1 in inode 347235106, resetting value
        - agno = 6
        - agno = 7
bad magic # 0x58465342 in inode 478759702 (data fork) bmbt block 0
bad data fork in inode 478759702
cleared inode 478759702
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
7f8d24478740: Badness in key lookup (length)
bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes)
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
entry "3247.png" in directory inode 201326924 points to free inode 213906953
bad hash table for directory inode 201326924 (no data entry): rebuilding
rebuilding directory inode 201326924
entry "0251050.NWB" in directory inode 469762366 points to free inode 478759702
rebuilding directory inode 469762366
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
root@vsa-00000110-vc-0:~# echo $?
0
root@vsa-00000110-vc-0:~# crm_mon
Connection to the CIB terminated
Reconnecting...root@vsa-00000110-vc-0:~# less /var/log/kern.log
root@vsa-00000110-vc-0:~#

[-- Attachment #4: dm-82-kernel.log --]
[-- Type: application/octet-stream, Size: 2549 bytes --]

Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.685353] ffff88010ec36000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af  ...:_D...*......
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.686568] XFS (dm-82): Internal error xfs_bmbt_verify at line 747 of file /mnt/share/builds/14.11--3.8.13-030813-generic/2015-06-17_03-30-37--14.11-1601-129/src/zadara-btrfs/fs/xfs/xfs_bmap_btree.c.  Caller 0xffffffffa07779ee
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.686568] 
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689393] Pid: 17063, comm: kworker/0:1H Tainted: GF       W  O 3.8.13-030813-generic #201305111843
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689395] Call Trace:
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689443]  [<ffffffffa0746baf>] xfs_error_report+0x3f/0x50 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689491]  [<ffffffffa07779ee>] ? xfs_bmbt_read_verify+0xe/0x10 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689503]  [<ffffffffa0746c1e>] xfs_corruption_error+0x5e/0x90 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689517]  [<ffffffffa0777867>] xfs_bmbt_verify+0x77/0x1e0 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689535]  [<ffffffffa07779ee>] ? xfs_bmbt_read_verify+0xe/0x10 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689548]  [<ffffffffa07779ee>] xfs_bmbt_read_verify+0xe/0x10 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689558]  [<ffffffffa074448f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689564]  [<ffffffff81078b81>] process_one_work+0x141/0x490
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689566]  [<ffffffff81079b48>] worker_thread+0x168/0x400
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689569]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689571]  [<ffffffff8107f050>] kthread+0xc0/0xd0
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689574]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689579]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689582]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.689584] XFS (dm-82): Corruption detected. Unmount and run xfs_repair
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.690508] XFS (dm-82): metadata I/O error: block 0x50ffb50 ("xfs_trans_read_buf_map") error 117 numblks 8

[-- Attachment #5: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-03 11:09 xfs corruption Danny Shavit
@ 2015-09-03 13:22 ` Eric Sandeen
  2015-09-03 14:26   ` Danny Shavit
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2015-09-03 13:22 UTC (permalink / raw)
  To: Danny Shavit, xfs; +Cc: Alex Lyakas

On 9/3/15 6:09 AM, Danny Shavit wrote:
> Hi Dave,
> 
> We couple of more xfs corruption that we would like to share:

On the same box as the one that seemed to be experiencing some
bit-flips in your earlier email?

As a general note: You are not providing enough information for
us to effectively help you.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Kernel version?  xfsprogs version?  At a bare minimum...

Your dmesg snippets are edited.  You've provided what you feel is
important, omitting the parts that may actually be important or
informational.

You haven't described the sequence of events that led to these issues.

You haven't made clear what these attachments are; which repair log goes
with which kernel event?

Etc...

> 1. This is an interesting one, since xfs reported corruption but when
> running xfs_repair, no error was found. Attached is the kernel log
> section regarding the corruption (6458). Does xfs_repair explicitly
> read data from the disk? In such case it might be a memory
> corruption. Are you familiar with such cases?

Yes, xfs_repair opens the block device O_DIRECT.

your 6485-kernel.log shows a failure in xfs_allocbt_verify(), right
after the allocation btree is read from disk.  i.e. this is an in-kernel
metadata consistency check that is failing.

It also shows:

kworker/0:1H Tainted: GF       W 

So it's tainted:

  2: 'F' if any module was force loaded by "insmod -f", ' ' if all
     modules were loaded normally.

 10: 'W' if a warning has previously been issued by the kernel.
     (Though some warnings may set more specific taint flags.)

You force-loaded a module?  And previous warnings were emitted (though we
can't see them in your edited dmesg).  
All bets are off.  If you had included the full dmesg, we might know 
more about what's going on, at least.

> 2. xfs corruption occurred suddenly with no apparent external event.
>  Attached are xfs_repair and kernel logs are. Xfs dump can be found
> in: https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz

Your 6442-82-xfs_repair.log is from an xfs_repair -L, so of course it
is finding corruption, and the output is more or less meaningless
from a triage POV.  Repair said:

> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.

Why did you run it with -L? Did mount fail? If so how?

dm-82-kernel.log also shows a failing verifier, this time xfs_bmbt_verify,
when reading metadata from disk.

You've truncated other parts, though:

Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.685353] ffff88010ec36000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af  ...:_D...*
......

so there's not a ton to go on, just hints that there is more information
that's not provided.


-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-03 13:22 ` Eric Sandeen
@ 2015-09-03 14:26   ` Danny Shavit
  2015-09-03 14:55     ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Danny Shavit @ 2015-09-03 14:26 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Alex Lyakas, xfs


[-- Attachment #1.1: Type: text/plain, Size: 3967 bytes --]

Hi Eric,

Thanks for the prompt response.
Sorry for the missing parts, I was wrongly assuming that everybody knows
our environment :-)

More information:
uname -a:  Linux vsa-00000142 3.8.13-030813-generic #201305111843 SMP Sat
May 11 22:44:40 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
xfs_repair version 3.1.7

We are using modified xfs. Mainly, added some reporting features and
changed discard operation to be aligned with chunk sizes used in our
systems.
The modified code resides at  https://github.com/zadarastora
<https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
<https://github.com/zadarastorage/zadara-xfs-pushback>.

We were in a hurry at the time we run xfs_repair with -L. Was not so
smart...
Any way, the xfs_dump was taken before running xfs_repair.
We will use the original xfs meta data to run xfs_repair after mount and
get back with the results.

Regards,
Danny




On Thu, Sep 3, 2015 at 4:22 PM, Eric Sandeen <sandeen@sandeen.net> wrote:

> On 9/3/15 6:09 AM, Danny Shavit wrote:
> > Hi Dave,
> >
> > We couple of more xfs corruption that we would like to share:
>
> On the same box as the one that seemed to be experiencing some
> bit-flips in your earlier email?
>
> As a general note: You are not providing enough information for
> us to effectively help you.
>
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> Kernel version?  xfsprogs version?  At a bare minimum...
>
> Your dmesg snippets are edited.  You've provided what you feel is
> important, omitting the parts that may actually be important or
> informational.
>
> You haven't described the sequence of events that led to these issues.
>
> You haven't made clear what these attachments are; which repair log goes
> with which kernel event?
>
> Etc...
>
> > 1. This is an interesting one, since xfs reported corruption but when
> > running xfs_repair, no error was found. Attached is the kernel log
> > section regarding the corruption (6458). Does xfs_repair explicitly
> > read data from the disk? In such case it might be a memory
> > corruption. Are you familiar with such cases?
>
> Yes, xfs_repair opens the block device O_DIRECT.
>
> your 6485-kernel.log shows a failure in xfs_allocbt_verify(), right
> after the allocation btree is read from disk.  i.e. this is an in-kernel
> metadata consistency check that is failing.
>
> It also shows:
>
> kworker/0:1H Tainted: GF       W
>
> So it's tainted:
>
>   2: 'F' if any module was force loaded by "insmod -f", ' ' if all
>      modules were loaded normally.
>
>  10: 'W' if a warning has previously been issued by the kernel.
>      (Though some warnings may set more specific taint flags.)
>
> You force-loaded a module?  And previous warnings were emitted (though we
> can't see them in your edited dmesg).
> All bets are off.  If you had included the full dmesg, we might know
> more about what's going on, at least.
>
> > 2. xfs corruption occurred suddenly with no apparent external event.
> >  Attached are xfs_repair and kernel logs are. Xfs dump can be found
> > in: https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz
>
> Your 6442-82-xfs_repair.log is from an xfs_repair -L, so of course it
> is finding corruption, and the output is more or less meaningless
> from a triage POV.  Repair said:
>
> > Note that destroying the log may cause corruption -- please attempt a
> mount
> > of the filesystem before doing this.
>
> Why did you run it with -L? Did mount fail? If so how?
>
> dm-82-kernel.log also shows a failing verifier, this time xfs_bmbt_verify,
> when reading metadata from disk.
>
> You've truncated other parts, though:
>
> Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.685353]
> ffff88010ec36000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af
> ...:_D...*
> ......
>
> so there's not a ton to go on, just hints that there is more information
> that's not provided.
>
>
> -Eric
>



-- 
Regards,
Danny

[-- Attachment #1.2: Type: text/html, Size: 5726 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-03 14:26   ` Danny Shavit
@ 2015-09-03 14:55     ` Eric Sandeen
  2015-09-03 16:14       ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2015-09-03 14:55 UTC (permalink / raw)
  To: Danny Shavit; +Cc: Alex Lyakas, xfs

On 9/3/15 9:26 AM, Danny Shavit wrote:
> Hi Eric,
> 
> Thanks for the prompt response. Sorry for the missing parts, I was
> wrongly assuming that everybody knows our environment :-)

Maybe some do, but my brain is too small for that.  ;)

> More information: uname -a:  Linux vsa-00000142 3.8.13-030813-generic
> #201305111843 SMP Sat May 11 22:44:40 UTC 2013 x86_64 x86_64 x86_64
> GNU/Linux xfs_repair version 3.1.7
> 
> We are using modified xfs. Mainly, added some reporting features and
> changed discard operation to be aligned with chunk sizes used in our
> systems. The modified code resides at  https://github.com/zadarastora
> <https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
> <https://github.com/zadarastorage/zadara-xfs-pushback>.

Interesting, thanks for the pointer.  I guess at this point I have to
ask, do you see these same problems without your modifications?

I'd really encourage Zadara to work on submitting some of these upstream,
if they are of general interest.  It'll get more review, more testing,
and will reduce your maintenance burden.  Obviously some of it may not
be desired upstream, but if you've solved a general problem, it'd be
very good to propose a patch for inclusion.

> We were in a hurry at the time we run xfs_repair with -L. Was not so
> smart... Any way, the xfs_dump was taken before running xfs_repair. 
> We will use the original xfs meta data to run xfs_repair after mount
> and get back with the results.

Ok, from the metadump I see that log replay fails due to the corruption:

[ 7708.169145] XFS (loop0): Mounting V4 Filesystem
[ 7708.178379] XFS (loop0): Starting recovery (logdev: internal)
[ 7708.185369] XFS (loop0): Metadata corruption detected at xfs_bmbt_read_verify+0x7e/0xc0 [xfs], block 0x50ffb50
[ 7708.195344] XFS (loop0): Unmount and run xfs_repair
[ 7708.200214] XFS (loop0): First 64 bytes of corrupted metadata buffer:
[ 7708.206638] ffff8802e5b9d000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af  ...:_D...*......
[ 7708.215312] ffff8802e5b9d010: f6 b0 39 2d 08 54 7a ec 37 1b 94 b0 c2 37 23 1f  ..9-.Tz.7....7#.
[ 7708.223986] ffff8802e5b9d020: 54 62 b5 fd ff 63 95 01 4b 23 fc 5d 8b d4 7b 78  Tb...c..K#.]..{x
[ 7708.232662] ffff8802e5b9d030: 94 e6 fa cc e2 87 3d fe ab df b8 e9 e5 9b e5 da  ......=.........
[ 7708.241341] XFS (loop0): metadata I/O error: block 0x50ffb50 ("xfs_trans_read_buf_map") error 117 numblks 8
[ 7708.251058] XFS (loop0): xfs_do_force_shutdown(0x1) called from line 315 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xffffffffa036c41a
[ 7708.263721] XFS (loop0): I/O Error Detected. Shutting down filesystem
[ 7708.270144] XFS (loop0): Please umount the filesystem and rectify the problem(s)
[ 7708.277533] XFS (loop0): Ending recovery (logdev: internal)
[ 7708.283095] SELinux: (dev loop0, type xfs) getxattr errno 5
[ 7708.288664] XFS (loop0): xfs_log_force: error -5 returned.
[ 7708.294136] XFS (loop0): Unmounting Filesystem


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-03 14:55     ` Eric Sandeen
@ 2015-09-03 16:14       ` Eric Sandeen
  2015-09-06 10:19         ` Alex Lyakas
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2015-09-03 16:14 UTC (permalink / raw)
  To: Danny Shavit; +Cc: Alex Lyakas, xfs

On 9/3/15 9:55 AM, Eric Sandeen wrote:
> On 9/3/15 9:26 AM, Danny Shavit wrote:

...

>> We are using modified xfs. Mainly, added some reporting features and
>> changed discard operation to be aligned with chunk sizes used in our
>> systems. The modified code resides at  https://github.com/zadarastora
>> <https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
>> <https://github.com/zadarastorage/zadara-xfs-pushback>.
> 
> Interesting, thanks for the pointer.  I guess at this point I have to
> ask, do you see these same problems without your modifications?

Have you ever mounted this filesystem on non-zadara kernels?

looking at
https://github.com/zadarastorage/zadara-xfs-pushback/commit/094df949fd080ede546bb7518405ab873a444823

you've changed the disk format w/o adding a feature flag,
which is pretty dangerous.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-03 16:14       ` Eric Sandeen
@ 2015-09-06 10:19         ` Alex Lyakas
  2015-09-06 21:56           ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Lyakas @ 2015-09-06 10:19 UTC (permalink / raw)
  To: Danny Shavit, Eric Sandeen; +Cc: xfs

Hi Eric,
Thank you for your comments.

Yes, we made the ACL limit change, being fully aware that this breaks 
compatibility with the mainline kernel and future mainline kernels. We mount 
our XFS filesystems with our kernel only. We are also aware that this change 
needs to be carefully forward-ported, when we move to a newer kernel.

I have an additional question regarding the latest XFS corruption report:
kernel: [3507105.314446] Pid: 25231, comm: kworker/0:0H Tainted: GF       W 
O 3.8.13-030813-generic #201305111843
kernel: [3507105.314449] Call Trace:
kernel: [3507105.314487]  [<ffffffffa0631baf>] xfs_error_report+0x3f/0x50 
[xfs]
kernel: [3507105.314502]  [<ffffffffa064e9ce>] ? 
xfs_allocbt_read_verify+0xe/0x10 [xfs]
kernel: [3507105.314514]  [<ffffffffa0631c1e>] 
xfs_corruption_error+0x5e/0x90 [xfs]
kernel: [3507105.314528]  [<ffffffffa064e862>] xfs_allocbt_verify+0x92/0x1e0 
[xfs]
kernel: [3507105.314540]  [<ffffffffa064e9ce>] ? 
xfs_allocbt_read_verify+0xe/0x10 [xfs]
kernel: [3507105.314547]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
kernel: [3507105.314551]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0
kernel: [3507105.314566]  [<ffffffffa064e9ce>] 
xfs_allocbt_read_verify+0xe/0x10 [xfs]
kernel: [3507105.315251]  [<ffffffffa062f48f>] xfs_buf_iodone_work+0x3f/0xa0 
[xfs]
kernel: [3507105.315255]  [<ffffffff81078b81>] process_one_work+0x141/0x490
kernel: [3507105.315257]  [<ffffffff81079b48>] worker_thread+0x168/0x400
kernel: [3507105.315259]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120
kernel: [3507105.315262]  [<ffffffff8107f050>] kthread+0xc0/0xd0
kernel: [3507105.315265]  [<ffffffff8107ef90>] ? 
flush_kthread_worker+0xb0/0xb0
kernel: [3507105.315270]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
kernel: [3507105.315273]  [<ffffffff8107ef90>] ? 
flush_kthread_worker+0xb0/0xb0
kernel: [3507105.315275] XFS (dm-39): Corruption detected. Unmount and run 
xfs_repair
kernel: [3507105.316706] XFS (dm-39): metadata I/O error: block 0x41a6eff8 
("xfs_trans_read_buf_map") error 117 numblks 8

>From looking at XFS code, it appears that XFS read metadata block from disk, 
and discovered that it was corrupted. At this point, the system was 
rebooted, and after reboot we prevented this particular XFS from mounting. 
Then we ran xfs-metadump and xfs-repair. The latter found absolutely no 
issues, and XFS was able to successfully mount and continue operation.

Can you think of a way to explain this?
Can you confirm that the above trace really means that XFS was reading its 
metadata from disk?
>From XFS code, I see that  XFS does not use Linux page cache for its 
metadata (unlike btrfs, for example). Is my understanding correct? 
(Otherwise, I could assume that somebody wrongly touched a page in the 
page-cache and messed up its in-memory content).

Thanks,
Alex.





-----Original Message----- 
From: Eric Sandeen
Sent: 03 September, 2015 6:14 PM
To: Danny Shavit
Cc: Alex Lyakas ; xfs@oss.sgi.com
Subject: Re: xfs corruption

On 9/3/15 9:55 AM, Eric Sandeen wrote:
> On 9/3/15 9:26 AM, Danny Shavit wrote:

...

>> We are using modified xfs. Mainly, added some reporting features and
>> changed discard operation to be aligned with chunk sizes used in our
>> systems. The modified code resides at  https://github.com/zadarastora
>> <https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
>> <https://github.com/zadarastorage/zadara-xfs-pushback>.
>
> Interesting, thanks for the pointer.  I guess at this point I have to
> ask, do you see these same problems without your modifications?

Have you ever mounted this filesystem on non-zadara kernels?

looking at
https://github.com/zadarastorage/zadara-xfs-pushback/commit/094df949fd080ede546bb7518405ab873a444823

you've changed the disk format w/o adding a feature flag,
which is pretty dangerous.

-Eric 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-06 10:19         ` Alex Lyakas
@ 2015-09-06 21:56           ` Eric Sandeen
  2015-09-07  8:30             ` Alex Lyakas
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2015-09-06 21:56 UTC (permalink / raw)
  To: Alex Lyakas, Danny Shavit; +Cc: xfs

On 9/6/15 5:19 AM, Alex Lyakas wrote:
> Hi Eric,
> Thank you for your comments.
> 
> Yes, we made the ACL limit change, being fully aware that this breaks
> compatibility with the mainline kernel and future mainline kernels.
> We mount our XFS filesystems with our kernel only. We are also aware
> that this change needs to be carefully forward-ported, when we move
> to a newer kernel.

Ok, sorry for the lecture...  ;)  I did want to make sure it
hadn't been mounted on an unmodified kernel, though.

> I have an additional question regarding the latest XFS corruption report:
> kernel: [3507105.314446] Pid: 25231, comm: kworker/0:0H Tainted: GF       W O 3.8.13-030813-generic #201305111843
> kernel: [3507105.314449] Call Trace:
> kernel: [3507105.314487]  [<ffffffffa0631baf>] xfs_error_report+0x3f/0x50 [xfs]
> kernel: [3507105.314502]  [<ffffffffa064e9ce>] ? xfs_allocbt_read_verify+0xe/0x10 [xfs]
> kernel: [3507105.314514]  [<ffffffffa0631c1e>] xfs_corruption_error+0x5e/0x90 [xfs]
> kernel: [3507105.314528]  [<ffffffffa064e862>] xfs_allocbt_verify+0x92/0x1e0 [xfs]
> kernel: [3507105.314540]  [<ffffffffa064e9ce>] ? xfs_allocbt_read_verify+0xe/0x10 [xfs]
> kernel: [3507105.314547]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
> kernel: [3507105.314551]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0
> kernel: [3507105.314566]  [<ffffffffa064e9ce>] xfs_allocbt_read_verify+0xe/0x10 [xfs]
> kernel: [3507105.315251]  [<ffffffffa062f48f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]
> kernel: [3507105.315255]  [<ffffffff81078b81>] process_one_work+0x141/0x490
> kernel: [3507105.315257]  [<ffffffff81079b48>] worker_thread+0x168/0x400
> kernel: [3507105.315259]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120
> kernel: [3507105.315262]  [<ffffffff8107f050>] kthread+0xc0/0xd0
> kernel: [3507105.315265]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
> kernel: [3507105.315270]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
> kernel: [3507105.315273]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
> kernel: [3507105.315275] XFS (dm-39): Corruption detected. Unmount and run xfs_repair
> kernel: [3507105.316706] XFS (dm-39): metadata I/O error: block 0x41a6eff8 ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> From looking at XFS code, it appears that XFS read metadata block
> from disk, and discovered that it was corrupted.

Yes.  Unfortunately the verifier didn't say what it thinks is wrong.

I'd have to look to see for sure, but I think that on your kernel version,
if you turn up the xfs error level sysctl, you should get a hexdump of the
first 64 bytes of the buffer when this happens, and that would hopefully
tell us enough to know what was wrong, and -

> At this point, the
> system was rebooted, and after reboot we prevented this particular
> XFS from mounting. Then we ran xfs-metadump and xfs-repair. The
> latter found absolutely no issues, and XFS was able to successfully
> mount and continue operation.

- and why repair found no issue

With the buffer dump, and then from that hopefully knowing what the verifier
didn't like, we could then check your repair version and be sure it is
performing the same checks as the verifier

-Eric

> Can you think of a way to explain this?
> Can you confirm that the above trace really means that XFS was reading its metadata from disk?
> From XFS code, I see that XFS does not use Linux page cache for its
> metadata (unlike btrfs, for example). Is my understanding correct?
> (Otherwise, I could assume that somebody wrongly touched a page in
> the page-cache and messed up its in-memory content).
> 
> Thanks,
> Alex.
> 
> 
> 
> 
> 
> -----Original Message----- From: Eric Sandeen
> Sent: 03 September, 2015 6:14 PM
> To: Danny Shavit
> Cc: Alex Lyakas ; xfs@oss.sgi.com
> Subject: Re: xfs corruption
> 
> On 9/3/15 9:55 AM, Eric Sandeen wrote:
>> On 9/3/15 9:26 AM, Danny Shavit wrote:
> 
> ...
> 
>>> We are using modified xfs. Mainly, added some reporting features and
>>> changed discard operation to be aligned with chunk sizes used in our
>>> systems. The modified code resides at  https://github.com/zadarastora
>>> <https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
>>> <https://github.com/zadarastorage/zadara-xfs-pushback>.
>>
>> Interesting, thanks for the pointer.  I guess at this point I have to
>> ask, do you see these same problems without your modifications?
> 
> Have you ever mounted this filesystem on non-zadara kernels?
> 
> looking at
> https://github.com/zadarastorage/zadara-xfs-pushback/commit/094df949fd080ede546bb7518405ab873a444823
> 
> you've changed the disk format w/o adding a feature flag,
> which is pretty dangerous.
> 
> -Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs corruption
  2015-09-06 21:56           ` Eric Sandeen
@ 2015-09-07  8:30             ` Alex Lyakas
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Lyakas @ 2015-09-07  8:30 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Danny Shavit, xfs

Hi Eric,

This is what the verifier said, sorry for not posting it fully:
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.306317] ffff88000617d000: 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.307743] XFS (dm-39): 
Internal error xfs_allocbt_verify at line 330 of file 
/mnt/share/builds/14.11--3.8.13-030813-generic/2015-04-29_10-45-42--14.11-1601-124/src/zadara-btrfs/fs/xfs/xfs_alloc_btree.c. 
Caller 0xffffffffa064e9ce
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.307743]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314446] Pid: 25231, comm: 
kworker/0:0H Tainted: GF       W  O 3.8.13-030813-generic #201305111843
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314449] Call Trace:
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314487] 
[<ffffffffa0631baf>] xfs_error_report+0x3f/0x50 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314502] 
[<ffffffffa064e9ce>] ? xfs_allocbt_read_verify+0xe/0x10 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314514] 
[<ffffffffa0631c1e>] xfs_corruption_error+0x5e/0x90 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314528] 
[<ffffffffa064e862>] xfs_allocbt_verify+0x92/0x1e0 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314540] 
[<ffffffffa064e9ce>] ? xfs_allocbt_read_verify+0xe/0x10 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314547] 
[<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314551] 
[<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.314566] 
[<ffffffffa064e9ce>] xfs_allocbt_read_verify+0xe/0x10 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315251] 
[<ffffffffa062f48f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315255] 
[<ffffffff81078b81>] process_one_work+0x141/0x490
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315257] 
[<ffffffff81079b48>] worker_thread+0x168/0x400
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315259] 
[<ffffffff810799e0>] ? manage_workers+0x120/0x120
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315262] 
[<ffffffff8107f050>] kthread+0xc0/0xd0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315265] 
[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315270] 
[<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315273] 
[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.315275] XFS (dm-39): 
Corruption detected. Unmount and run xfs_repair
Aug 27 01:01:34 vsa-0000014e-vc-0 kernel: [3507105.316706] XFS (dm-39): 
metadata I/O error: block 0x41a6eff8 ("xfs_trans_read_buf_map") error 117 
numblks 8

The verifier function is [1], line 330 is where is goes 
"XFS_CORRUPTION_ERROR".

xfs_repair version:
root@vsa-0000003f-vc-0:~# xfs_repair -V
xfs_repair version 3.1.7

xfs_progs are stock what's coming in ubuntu 12.04 distribution (we didn't 
mess with that;).

Thanks for your help,
Alex.


[1]
static void
xfs_allocbt_verify(
    struct xfs_buf        *bp)
{
    struct xfs_mount    *mp = bp->b_target->bt_mount;
    struct xfs_btree_block    *block = XFS_BUF_TO_BLOCK(bp);
    struct xfs_perag    *pag = bp->b_pag;
    unsigned int        level;
    int            sblock_ok; /* block passes checks */

    /*
     * magic number and level verification
     *
     * During growfs operations, we can't verify the exact level as the
     * perag is not fully initialised and hence not attached to the buffer.
     * In this case, check against the maximum tree depth.
     */
    level = be16_to_cpu(block->bb_level);
    switch (block->bb_magic) {
    case cpu_to_be32(XFS_ABTB_MAGIC):
        if (pag)
            sblock_ok = level < pag->pagf_levels[XFS_BTNUM_BNOi];
        else
            sblock_ok = level < mp->m_ag_maxlevels;
        break;
    case cpu_to_be32(XFS_ABTC_MAGIC):
        if (pag)
            sblock_ok = level < pag->pagf_levels[XFS_BTNUM_CNTi];
        else
            sblock_ok = level < mp->m_ag_maxlevels;
        break;
    default:
        sblock_ok = 0;
        break;
    }

    /* numrecs verification */
    sblock_ok = sblock_ok &&
        be16_to_cpu(block->bb_numrecs) <= mp->m_alloc_mxr[level != 0];

    /* sibling pointer verification */
    sblock_ok = sblock_ok &&
        (block->bb_u.s.bb_leftsib == cpu_to_be32(NULLAGBLOCK) ||
         be32_to_cpu(block->bb_u.s.bb_leftsib) < mp->m_sb.sb_agblocks) &&
        block->bb_u.s.bb_leftsib &&
        (block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
         be32_to_cpu(block->bb_u.s.bb_rightsib) < mp->m_sb.sb_agblocks) &&
        block->bb_u.s.bb_rightsib;

    if (!sblock_ok) {
        trace_xfs_btree_corrupt(bp, _RET_IP_);
        XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, block);
        xfs_buf_ioerror(bp, EFSCORRUPTED);
    }
}

-----Original Message----- 
From: Eric Sandeen
Sent: 06 September, 2015 11:56 PM
To: Alex Lyakas ; Danny Shavit
Cc: xfs@oss.sgi.com
Subject: Re: xfs corruption

On 9/6/15 5:19 AM, Alex Lyakas wrote:
> Hi Eric,
> Thank you for your comments.
>
> Yes, we made the ACL limit change, being fully aware that this breaks
> compatibility with the mainline kernel and future mainline kernels.
> We mount our XFS filesystems with our kernel only. We are also aware
> that this change needs to be carefully forward-ported, when we move
> to a newer kernel.

Ok, sorry for the lecture...  ;)  I did want to make sure it
hadn't been mounted on an unmodified kernel, though.

> I have an additional question regarding the latest XFS corruption report:
> kernel: [3507105.314446] Pid: 25231, comm: kworker/0:0H Tainted: GF 
> W O 3.8.13-030813-generic #201305111843
> kernel: [3507105.314449] Call Trace:
> kernel: [3507105.314487]  [<ffffffffa0631baf>] xfs_error_report+0x3f/0x50 
> [xfs]
> kernel: [3507105.314502]  [<ffffffffa064e9ce>] ? 
> xfs_allocbt_read_verify+0xe/0x10 [xfs]
> kernel: [3507105.314514]  [<ffffffffa0631c1e>] 
> xfs_corruption_error+0x5e/0x90 [xfs]
> kernel: [3507105.314528]  [<ffffffffa064e862>] 
> xfs_allocbt_verify+0x92/0x1e0 [xfs]
> kernel: [3507105.314540]  [<ffffffffa064e9ce>] ? 
> xfs_allocbt_read_verify+0xe/0x10 [xfs]
> kernel: [3507105.314547]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
> kernel: [3507105.314551]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0
> kernel: [3507105.314566]  [<ffffffffa064e9ce>] 
> xfs_allocbt_read_verify+0xe/0x10 [xfs]
> kernel: [3507105.315251]  [<ffffffffa062f48f>] 
> xfs_buf_iodone_work+0x3f/0xa0 [xfs]
> kernel: [3507105.315255]  [<ffffffff81078b81>] 
> process_one_work+0x141/0x490
> kernel: [3507105.315257]  [<ffffffff81079b48>] worker_thread+0x168/0x400
> kernel: [3507105.315259]  [<ffffffff810799e0>] ? 
> manage_workers+0x120/0x120
> kernel: [3507105.315262]  [<ffffffff8107f050>] kthread+0xc0/0xd0
> kernel: [3507105.315265]  [<ffffffff8107ef90>] ? 
> flush_kthread_worker+0xb0/0xb0
> kernel: [3507105.315270]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
> kernel: [3507105.315273]  [<ffffffff8107ef90>] ? 
> flush_kthread_worker+0xb0/0xb0
> kernel: [3507105.315275] XFS (dm-39): Corruption detected. Unmount and run 
> xfs_repair
> kernel: [3507105.316706] XFS (dm-39): metadata I/O error: block 0x41a6eff8 
> ("xfs_trans_read_buf_map") error 117 numblks 8
>
> From looking at XFS code, it appears that XFS read metadata block
> from disk, and discovered that it was corrupted.

Yes.  Unfortunately the verifier didn't say what it thinks is wrong.

I'd have to look to see for sure, but I think that on your kernel version,
if you turn up the xfs error level sysctl, you should get a hexdump of the
first 64 bytes of the buffer when this happens, and that would hopefully
tell us enough to know what was wrong, and -

> At this point, the
> system was rebooted, and after reboot we prevented this particular
> XFS from mounting. Then we ran xfs-metadump and xfs-repair. The
> latter found absolutely no issues, and XFS was able to successfully
> mount and continue operation.

- and why repair found no issue

With the buffer dump, and then from that hopefully knowing what the verifier
didn't like, we could then check your repair version and be sure it is
performing the same checks as the verifier

-Eric

> Can you think of a way to explain this?
> Can you confirm that the above trace really means that XFS was reading its 
> metadata from disk?
> From XFS code, I see that XFS does not use Linux page cache for its
> metadata (unlike btrfs, for example). Is my understanding correct?
> (Otherwise, I could assume that somebody wrongly touched a page in
> the page-cache and messed up its in-memory content).
>
> Thanks,
> Alex.
>
>
>
>
>
> -----Original Message----- From: Eric Sandeen
> Sent: 03 September, 2015 6:14 PM
> To: Danny Shavit
> Cc: Alex Lyakas ; xfs@oss.sgi.com
> Subject: Re: xfs corruption
>
> On 9/3/15 9:55 AM, Eric Sandeen wrote:
>> On 9/3/15 9:26 AM, Danny Shavit wrote:
>
> ...
>
>>> We are using modified xfs. Mainly, added some reporting features and
>>> changed discard operation to be aligned with chunk sizes used in our
>>> systems. The modified code resides at  https://github.com/zadarastora
>>> <https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
>>> <https://github.com/zadarastorage/zadara-xfs-pushback>.
>>
>> Interesting, thanks for the pointer.  I guess at this point I have to
>> ask, do you see these same problems without your modifications?
>
> Have you ever mounted this filesystem on non-zadara kernels?
>
> looking at
> https://github.com/zadarastorage/zadara-xfs-pushback/commit/094df949fd080ede546bb7518405ab873a444823
>
> you've changed the disk format w/o adding a feature flag,
> which is pretty dangerous.
>
> -Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS Corruption
  2016-02-24  6:12 XFS Corruption fangchen sun
@ 2016-02-24 22:23 ` Eric Sandeen
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Sandeen @ 2016-02-24 22:23 UTC (permalink / raw)
  To: xfs

On 2/24/16 12:12 AM, fangchen sun wrote:
> Dear all:
> 
> I have a ceph object storage cluster, and choose XFS as the underlying file system.
> I recently ran into a problem that sometimes the function "setxattr()"  failed, 
> I can only umount the disk and repair it with "xfs_repair".
> 
> os: centos 6.5
> kernel version: 2.6.32

Problems with distribution kernels generally need to be reported and handled through
your distribution.  And when you do, providing the full unedited dmesg - which
should include an actual description of the error from xfs, rather than just
the backtrace below - and the results of the xfs_repair, will be necessary.

Thanks,
-Eric
 
> the log for dmesg command:
> [41796028.532225] Pid: 1438740, comm: ceph-osd Not tainted 2.6.32-925.431.23.3.letv.el6.x86_64 #1
> [41796028.532227] Call Trace:
> [41796028.532255]  [<ffffffffa01e1e5f>] ? xfs_error_report+0x3f/0x50 [xfs]
> [41796028.532276]  [<ffffffffa01d506a>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
> [41796028.532296]  [<ffffffffa01e1ece>] ? xfs_corruption_error+0x5e/0x90 [xfs]
> [41796028.532316]  [<ffffffffa01d4f4c>] ? xfs_da_do_buf+0x6cc/0x770 [xfs]
> [41796028.532335]  [<ffffffffa01d506a>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
> [41796028.532359]  [<ffffffffa0206fc7>] ? kmem_zone_alloc+0x77/0xf0 [xfs]
> [41796028.532380]  [<ffffffffa01d506a>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
> [41796028.532399]  [<ffffffffa01bc481>] ? xfs_attr_leaf_addname+0x61/0x3d0 [xfs]
> [41796028.532426]  [<ffffffffa01bc481>] ? xfs_attr_leaf_addname+0x61/0x3d0 [xfs]
> [41796028.532455]  [<ffffffffa01ff187>] ? xfs_trans_add_item+0x57/0x70 [xfs]
> [41796028.532476]  [<ffffffffa01cc208>] ? xfs_bmbt_get_all+0x18/0x20 [xfs]
> [41796028.532495]  [<ffffffffa01bcbb4>] ? xfs_attr_set_int+0x3c4/0x510 [xfs]
> [41796028.532517]  [<ffffffffa01d4f5b>] ? xfs_da_do_buf+0x6db/0x770 [xfs]
> [41796028.532536]  [<ffffffffa01bcd81>] ? xfs_attr_set+0x81/0x90 [xfs]
> [41796028.532560]  [<ffffffffa0216cc3>] ? __xfs_xattr_set+0x43/0x60 [xfs]
> [41796028.532584]  [<ffffffffa0216d31>] ? xfs_xattr_user_set+0x11/0x20 [xfs]
> [41796028.532592]  [<ffffffff811aee92>] ? generic_setxattr+0xa2/0xb0
> [41796028.532596]  [<ffffffff811b134e>] ? __vfs_setxattr_noperm+0x4e/0x160
> [41796028.532600]  [<ffffffff81196b77>] ? inode_permission+0xa7/0x100
> [41796028.532604]  [<ffffffff811b151c>] ? vfs_setxattr+0xbc/0xc0
> [41796028.532607]  [<ffffffff811b15f0>] ? setxattr+0xd0/0x150
> [41796028.532612]  [<ffffffff8105af80>] ? __dequeue_entity+0x30/0x50
> [41796028.532617]  [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
> [41796028.532621]  [<ffffffff8118aec0>] ? __sb_start_write+0x80/0x120
> [41796028.532626]  [<ffffffff8152912e>] ? thread_return+0x4e/0x760
> [41796028.532630]  [<ffffffff811b171d>] ? sys_fsetxattr+0xad/0xd0
> [41796028.532633]  [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
> [41796028.532636] XFS (sdi1): Corruption detected. Unmount and run xfs_repair
> 
> Any comments will be much appreciated!
> 
> Best Regards!
> sunspot
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* XFS Corruption
@ 2016-02-24  6:12 fangchen sun
  2016-02-24 22:23 ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: fangchen sun @ 2016-02-24  6:12 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2534 bytes --]

Dear all:

I have a ceph object storage cluster, and choose XFS as the underlying file
system.
I recently ran into a problem that sometimes the function "setxattr()"
 failed,
I can only umount the disk and repair it with "xfs_repair".

os: centos 6.5
kernel version: 2.6.32

the log for dmesg command:
[41796028.532225] Pid: 1438740, comm: ceph-osd Not tainted
2.6.32-925.431.23.3.letv.el6.x86_64 #1
[41796028.532227] Call Trace:
[41796028.532255]  [<ffffffffa01e1e5f>] ? xfs_error_report+0x3f/0x50 [xfs]
[41796028.532276]  [<ffffffffa01d506a>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[41796028.532296]  [<ffffffffa01e1ece>] ? xfs_corruption_error+0x5e/0x90
[xfs]
[41796028.532316]  [<ffffffffa01d4f4c>] ? xfs_da_do_buf+0x6cc/0x770 [xfs]
[41796028.532335]  [<ffffffffa01d506a>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[41796028.532359]  [<ffffffffa0206fc7>] ? kmem_zone_alloc+0x77/0xf0 [xfs]
[41796028.532380]  [<ffffffffa01d506a>] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[41796028.532399]  [<ffffffffa01bc481>] ? xfs_attr_leaf_addname+0x61/0x3d0
[xfs]
[41796028.532426]  [<ffffffffa01bc481>] ? xfs_attr_leaf_addname+0x61/0x3d0
[xfs]
[41796028.532455]  [<ffffffffa01ff187>] ? xfs_trans_add_item+0x57/0x70 [xfs]
[41796028.532476]  [<ffffffffa01cc208>] ? xfs_bmbt_get_all+0x18/0x20 [xfs]
[41796028.532495]  [<ffffffffa01bcbb4>] ? xfs_attr_set_int+0x3c4/0x510 [xfs]
[41796028.532517]  [<ffffffffa01d4f5b>] ? xfs_da_do_buf+0x6db/0x770 [xfs]
[41796028.532536]  [<ffffffffa01bcd81>] ? xfs_attr_set+0x81/0x90 [xfs]
[41796028.532560]  [<ffffffffa0216cc3>] ? __xfs_xattr_set+0x43/0x60 [xfs]
[41796028.532584]  [<ffffffffa0216d31>] ? xfs_xattr_user_set+0x11/0x20 [xfs]
[41796028.532592]  [<ffffffff811aee92>] ? generic_setxattr+0xa2/0xb0
[41796028.532596]  [<ffffffff811b134e>] ? __vfs_setxattr_noperm+0x4e/0x160
[41796028.532600]  [<ffffffff81196b77>] ? inode_permission+0xa7/0x100
[41796028.532604]  [<ffffffff811b151c>] ? vfs_setxattr+0xbc/0xc0
[41796028.532607]  [<ffffffff811b15f0>] ? setxattr+0xd0/0x150
[41796028.532612]  [<ffffffff8105af80>] ? __dequeue_entity+0x30/0x50
[41796028.532617]  [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
[41796028.532621]  [<ffffffff8118aec0>] ? __sb_start_write+0x80/0x120
[41796028.532626]  [<ffffffff8152912e>] ? thread_return+0x4e/0x760
[41796028.532630]  [<ffffffff811b171d>] ? sys_fsetxattr+0xad/0xd0
[41796028.532633]  [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
[41796028.532636] XFS (sdi1): Corruption detected. Unmount and run
xfs_repair

Any comments will be much appreciated!

Best Regards!
sunspot

[-- Attachment #1.2: Type: text/html, Size: 3620 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-23  9:57         ` Alex Lyakas
@ 2014-12-23 20:36           ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2014-12-23 20:36 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: Brian Foster, Eric Sandeen, xfs

On Tue, Dec 23, 2014 at 11:57:13AM +0200, Alex Lyakas wrote:
> Hi Dave,
> 
> On Tue, Dec 23, 2014 at 2:39 AM, Dave Chinner <david@fromorbit.com> wrote:
> > commit 40194ecc6d78327d98e66de3213db96ca0a31e6f
> > Author: Ben Myers <bpm@sgi.com>
> > Date:   Fri Dec 6 12:30:11 2013 -0800
> >
> >     xfs: reinstate the ilock in xfs_readdir
> >
> >     Although it was removed in commit 051e7cd44ab8, ilock needs to be taken in
> >     xfs_readdir because we might have to read the extent list in from disk.  This
> >     keeps other threads from reading from or writing to the extent list while it i
> >     being read in and is still in a transitional state.
> >
> >     This has been associated with "Access to block zero" messages on directories
> >     with large numbers of extents resulting from excessive filesytem fragmentation
> >     as well as extent list corruption.  Unfortunately no test case at this point.
> >
> >     Signed-off-by: Ben Myers <bpm@sgi.com>
> >     Reviewed-by: Dave Chinner <dchinner@redhat.com>
> >
> > Seems to match the behaviour being seen.
> >
> > Alex, what type of inode is the one that is reporting the "access to
> > block zero" errors?
> I have just searched the relevant file system for this inode, but such
> inode was not found:(
> # find /export/XXX -mount -inum 1946454529
> did not find anything. Perhaps it got deleted since the incident.

It probably got cleared by xfs_repair because it was corrupt....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-23  0:39       ` Dave Chinner
@ 2014-12-23  9:57         ` Alex Lyakas
  2014-12-23 20:36           ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Lyakas @ 2014-12-23  9:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, Eric Sandeen, xfs

Hi Dave,

On Tue, Dec 23, 2014 at 2:39 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Dec 22, 2014 at 09:42:12AM -0500, Brian Foster wrote:
>> On Mon, Dec 22, 2014 at 10:08:18AM +1100, Dave Chinner wrote:
>> > On Sun, Dec 21, 2014 at 12:13:45PM -0600, Eric Sandeen wrote:
>> > > On 12/21/14 5:42 AM, Alex Lyakas wrote:
>> > > > Greetings,
>> > > > we encountered XFS corruption:
>> > >
>> > > > kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....
>> > >
>> > > There should have been 64 bytes of hexdump, not just the single line above, no?
>> >
>> > Yeah, really need the whole dmesg, because we've got readahead in
>> > the picture here so the number of times the corruption error is seen
>> > is actually important....
>> >
>> > >
>> > > > [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
>> > > > [813114.622928] PGD 0
>> > > > [813114.622928] Oops: 0000 [#1] SMP
>> > > > [813114.622928] CPU 2
>> > > > [813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
>> > > > [813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
>> > > > [813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
>> > > > [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5
>> >
>> > RCX implies gotp->br_startblock was not overwritten by the
>> > extent search. i.e. we've called xfs_bmap_search_multi_extents()
>> > but no extent was actually found.
>> >
>> > > > We analyzed several suspects, but all of them fall on disk addresses
>> > > > not near the corrupted disk address. I realize that running somewhat
>> > > > outdated kernel + our changes within XFSs, points back at us, but
>> > > > this is first time we see XFS corruption after about a year of this
>> > > > code being exercised. So posting here, just in case this is a known
>> > > > issue.
>> > >
>> > > well, xfs should _never_ oops, even if it encounters corruption.  So hopefully
>> > > we can work backwards from the trace above to what went wrong here.
>> > >
>> > > offhand, in xfs_bmap_search_multi_extents():
>> > >
>> > >         ep = xfs_iext_bno_to_ext(ifp, bno, &lastx);
>> > >         if (lastx > 0) {
>> > >                 xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp);
>> > >         }
>> > >         if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) {
>> > >                 xfs_bmbt_get_all(ep, gotp);
>> > >                 *eofp = 0;
>> > >
>> > > xfs_iext_bno_to_ext() can return NULL with lastx set to 0:
>> > >
>> > >         nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>> > >         if (nextents == 0) {
>> > >                 *idxp = 0;
>> > >                 return NULL;
>> > >         }
>> > >
>> > > (where idxp is the &lastx we sent in)
>> >
>> > > and if we do that, it sure seems like the "if lastx < ...." test will wind up
>> > > sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref.
>> >
>> > No, it shouldn't because lastx = 0 to get it set that way
>> > ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) must be zero.
>> > Therefore, this:
>> >
>> >     if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)))
>> >
>> > evaulates as:
>> >
>> >     if (0 < 0)
>> >
>> > which is not true, so we fall into the else case:
>> >
>> >     } else {
>> >                 if (lastx > 0) {
>> >                         *gotp = *prevp;
>> >                 }
>> >                 *eofp = 1;
>> >                 ep = NULL;
>> >         }
>> >         *lastxp = lastx;
>> >         return ep;
>> >
>> > Which basically overwrites *eofp and *lastxp, neither of which are
>> > NULL.
>> >
>> > However, the stack trace clearly shows we've just called
>> > xfs_bmap_search_multi_extents() - the "?" before the function name
>> > means it found the symbol in the stack, but not in the direct line
>> > of the frame pointers the current function stack points to.
>> >
>> > That makes me doubt the accuracy of the stack trace, because the
>> > only caller of xfs_bmap_search_multi_extents() is
>> > xfs_bmap_search_extents() and xfs_bmap_search_extents does not call
>> > xfs_bmbt_get_all() directly like the stack trace would lead us to
>> > beleive. Hence I don't think we can trust the stack trace to be
>> > pointing use at the correct caller of xfs_bmbt_get_all(), which
>> > makes it real hard to isolate the cause...
>> >
>>
>> What seems strange to me here is why are we searching through extents
>> when the bmbt is presumed to be corrupt? I suppose we don't know for
>> sure whether the backtrace that panics is on the same inode, but the
>> fact that the panic is linked with the corruption errors suggests this
>> is likely.
>>
>> Digging through the current tot code to see how that might occur, I
>> noticed an XFS_ILOCK_EXCL assert in xfs_iread_extents() that doesn't
>> exist in 3.18.3. It looks like part of some fixes Christoph made a while
>> back, ending with the following commit in the commit log (see some of
>> the immediately prior commits as well):
>>
>> eef334e5776c xfs: assert that we hold the ilock for extent map access
>>
>> ... which suggests some paths were reading in inode extents without the
>> proper locking. That would appear to be problematic in its own right
>> given how XFS_IFEXTENTS is used. If that is the case, I wonder if
>> hitting that problem in combination with a bmbt that happens to be
>> corrupted is causing us to go off the rails? Just a theory... and
>> another reason it would be really nice to have a metadump. ;)
>
> commit 40194ecc6d78327d98e66de3213db96ca0a31e6f
> Author: Ben Myers <bpm@sgi.com>
> Date:   Fri Dec 6 12:30:11 2013 -0800
>
>     xfs: reinstate the ilock in xfs_readdir
>
>     Although it was removed in commit 051e7cd44ab8, ilock needs to be taken in
>     xfs_readdir because we might have to read the extent list in from disk.  This
>     keeps other threads from reading from or writing to the extent list while it i
>     being read in and is still in a transitional state.
>
>     This has been associated with "Access to block zero" messages on directories
>     with large numbers of extents resulting from excessive filesytem fragmentation
>     as well as extent list corruption.  Unfortunately no test case at this point.
>
>     Signed-off-by: Ben Myers <bpm@sgi.com>
>     Reviewed-by: Dave Chinner <dchinner@redhat.com>
>
> Seems to match the behaviour being seen.
>
> Alex, what type of inode is the one that is reporting the "access to
> block zero" errors?
I have just searched the relevant file system for this inode, but such
inode was not found:(
# find /export/XXX -mount -inum 1946454529
did not find anything. Perhaps it got deleted since the incident.

Thanks again,
Alex.


>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-22 14:42     ` Brian Foster
@ 2014-12-23  0:39       ` Dave Chinner
  2014-12-23  9:57         ` Alex Lyakas
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2014-12-23  0:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: Alex Lyakas, Eric Sandeen, xfs

On Mon, Dec 22, 2014 at 09:42:12AM -0500, Brian Foster wrote:
> On Mon, Dec 22, 2014 at 10:08:18AM +1100, Dave Chinner wrote:
> > On Sun, Dec 21, 2014 at 12:13:45PM -0600, Eric Sandeen wrote:
> > > On 12/21/14 5:42 AM, Alex Lyakas wrote:
> > > > Greetings,
> > > > we encountered XFS corruption:
> > > 
> > > > kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....    
> > > 
> > > There should have been 64 bytes of hexdump, not just the single line above, no?
> > 
> > Yeah, really need the whole dmesg, because we've got readahead in
> > the picture here so the number of times the corruption error is seen
> > is actually important....
> > 
> > > 
> > > > [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> > > > [813114.622928] PGD 0
> > > > [813114.622928] Oops: 0000 [#1] SMP
> > > > [813114.622928] CPU 2
> > > > [813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
> > > > [813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> > > > [813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
> > > > [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5
> > 
> > RCX implies gotp->br_startblock was not overwritten by the
> > extent search. i.e. we've called xfs_bmap_search_multi_extents()
> > but no extent was actually found.
> > 
> > > > We analyzed several suspects, but all of them fall on disk addresses
> > > > not near the corrupted disk address. I realize that running somewhat
> > > > outdated kernel + our changes within XFSs, points back at us, but
> > > > this is first time we see XFS corruption after about a year of this
> > > > code being exercised. So posting here, just in case this is a known
> > > > issue.
> > > 
> > > well, xfs should _never_ oops, even if it encounters corruption.  So hopefully
> > > we can work backwards from the trace above to what went wrong here.
> > > 
> > > offhand, in xfs_bmap_search_multi_extents():
> > > 
> > >         ep = xfs_iext_bno_to_ext(ifp, bno, &lastx);
> > >         if (lastx > 0) {
> > >                 xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp);
> > >         }
> > >         if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) {
> > >                 xfs_bmbt_get_all(ep, gotp);
> > >                 *eofp = 0;
> > > 
> > > xfs_iext_bno_to_ext() can return NULL with lastx set to 0:
> > > 
> > >         nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
> > >         if (nextents == 0) {
> > >                 *idxp = 0;
> > >                 return NULL;
> > >         }
> > > 
> > > (where idxp is the &lastx we sent in)
> > 
> > > and if we do that, it sure seems like the "if lastx < ...." test will wind up
> > > sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref.
> > 
> > No, it shouldn't because lastx = 0 to get it set that way
> > ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) must be zero.
> > Therefore, this:
> > 
> > 	if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)))
> > 
> > evaulates as:
> > 
> > 	if (0 < 0)
> > 
> > which is not true, so we fall into the else case:
> > 
> > 	} else {
> >                 if (lastx > 0) {
> >                         *gotp = *prevp;
> >                 }
> >                 *eofp = 1;
> >                 ep = NULL;
> >         }
> >         *lastxp = lastx;
> >         return ep;
> > 
> > Which basically overwrites *eofp and *lastxp, neither of which are
> > NULL.
> > 
> > However, the stack trace clearly shows we've just called
> > xfs_bmap_search_multi_extents() - the "?" before the function name
> > means it found the symbol in the stack, but not in the direct line
> > of the frame pointers the current function stack points to.
> > 
> > That makes me doubt the accuracy of the stack trace, because the
> > only caller of xfs_bmap_search_multi_extents() is
> > xfs_bmap_search_extents() and xfs_bmap_search_extents does not call
> > xfs_bmbt_get_all() directly like the stack trace would lead us to
> > beleive. Hence I don't think we can trust the stack trace to be
> > pointing use at the correct caller of xfs_bmbt_get_all(), which
> > makes it real hard to isolate the cause...
> > 
> 
> What seems strange to me here is why are we searching through extents
> when the bmbt is presumed to be corrupt? I suppose we don't know for
> sure whether the backtrace that panics is on the same inode, but the
> fact that the panic is linked with the corruption errors suggests this
> is likely.
> 
> Digging through the current tot code to see how that might occur, I
> noticed an XFS_ILOCK_EXCL assert in xfs_iread_extents() that doesn't
> exist in 3.18.3. It looks like part of some fixes Christoph made a while
> back, ending with the following commit in the commit log (see some of
> the immediately prior commits as well):
> 
> eef334e5776c xfs: assert that we hold the ilock for extent map access
> 
> ... which suggests some paths were reading in inode extents without the
> proper locking. That would appear to be problematic in its own right
> given how XFS_IFEXTENTS is used. If that is the case, I wonder if
> hitting that problem in combination with a bmbt that happens to be
> corrupted is causing us to go off the rails? Just a theory... and
> another reason it would be really nice to have a metadump. ;)

commit 40194ecc6d78327d98e66de3213db96ca0a31e6f
Author: Ben Myers <bpm@sgi.com>
Date:   Fri Dec 6 12:30:11 2013 -0800

    xfs: reinstate the ilock in xfs_readdir
    
    Although it was removed in commit 051e7cd44ab8, ilock needs to be taken in
    xfs_readdir because we might have to read the extent list in from disk.  This
    keeps other threads from reading from or writing to the extent list while it i
    being read in and is still in a transitional state.
    
    This has been associated with "Access to block zero" messages on directories
    with large numbers of extents resulting from excessive filesytem fragmentation
    as well as extent list corruption.  Unfortunately no test case at this point.
    
    Signed-off-by: Ben Myers <bpm@sgi.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Seems to match the behaviour being seen.

Alex, what type of inode is the one that is reporting the "access to
block zero" errors?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-21 23:08   ` Dave Chinner
  2014-12-22 10:09     ` Alex Lyakas
@ 2014-12-22 14:42     ` Brian Foster
  2014-12-23  0:39       ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-12-22 14:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Alex Lyakas, Eric Sandeen, xfs

On Mon, Dec 22, 2014 at 10:08:18AM +1100, Dave Chinner wrote:
> On Sun, Dec 21, 2014 at 12:13:45PM -0600, Eric Sandeen wrote:
> > On 12/21/14 5:42 AM, Alex Lyakas wrote:
> > > Greetings,
> > > we encountered XFS corruption:
> > 
> > > kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....    
> > 
> > There should have been 64 bytes of hexdump, not just the single line above, no?
> 
> Yeah, really need the whole dmesg, because we've got readahead in
> the picture here so the number of times the corruption error is seen
> is actually important....
> 
> > 
> > > [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> > > [813114.622928] PGD 0
> > > [813114.622928] Oops: 0000 [#1] SMP
> > > [813114.622928] CPU 2
> > > [813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
> > > [813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> > > [813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
> > > [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5
> 
> RCX implies gotp->br_startblock was not overwritten by the
> extent search. i.e. we've called xfs_bmap_search_multi_extents()
> but no extent was actually found.
> 
> > > We analyzed several suspects, but all of them fall on disk addresses
> > > not near the corrupted disk address. I realize that running somewhat
> > > outdated kernel + our changes within XFSs, points back at us, but
> > > this is first time we see XFS corruption after about a year of this
> > > code being exercised. So posting here, just in case this is a known
> > > issue.
> > 
> > well, xfs should _never_ oops, even if it encounters corruption.  So hopefully
> > we can work backwards from the trace above to what went wrong here.
> > 
> > offhand, in xfs_bmap_search_multi_extents():
> > 
> >         ep = xfs_iext_bno_to_ext(ifp, bno, &lastx);
> >         if (lastx > 0) {
> >                 xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp);
> >         }
> >         if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) {
> >                 xfs_bmbt_get_all(ep, gotp);
> >                 *eofp = 0;
> > 
> > xfs_iext_bno_to_ext() can return NULL with lastx set to 0:
> > 
> >         nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
> >         if (nextents == 0) {
> >                 *idxp = 0;
> >                 return NULL;
> >         }
> > 
> > (where idxp is the &lastx we sent in)
> 
> > and if we do that, it sure seems like the "if lastx < ...." test will wind up
> > sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref.
> 
> No, it shouldn't because lastx = 0 to get it set that way
> ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) must be zero.
> Therefore, this:
> 
> 	if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)))
> 
> evaulates as:
> 
> 	if (0 < 0)
> 
> which is not true, so we fall into the else case:
> 
> 	} else {
>                 if (lastx > 0) {
>                         *gotp = *prevp;
>                 }
>                 *eofp = 1;
>                 ep = NULL;
>         }
>         *lastxp = lastx;
>         return ep;
> 
> Which basically overwrites *eofp and *lastxp, neither of which are
> NULL.
> 
> However, the stack trace clearly shows we've just called
> xfs_bmap_search_multi_extents() - the "?" before the function name
> means it found the symbol in the stack, but not in the direct line
> of the frame pointers the current function stack points to.
> 
> That makes me doubt the accuracy of the stack trace, because the
> only caller of xfs_bmap_search_multi_extents() is
> xfs_bmap_search_extents() and xfs_bmap_search_extents does not call
> xfs_bmbt_get_all() directly like the stack trace would lead us to
> beleive. Hence I don't think we can trust the stack trace to be
> pointing use at the correct caller of xfs_bmbt_get_all(), which
> makes it real hard to isolate the cause...
> 

What seems strange to me here is why are we searching through extents
when the bmbt is presumed to be corrupt? I suppose we don't know for
sure whether the backtrace that panics is on the same inode, but the
fact that the panic is linked with the corruption errors suggests this
is likely.

Digging through the current tot code to see how that might occur, I
noticed an XFS_ILOCK_EXCL assert in xfs_iread_extents() that doesn't
exist in 3.18.3. It looks like part of some fixes Christoph made a while
back, ending with the following commit in the commit log (see some of
the immediately prior commits as well):

eef334e5776c xfs: assert that we hold the ilock for extent map access

... which suggests some paths were reading in inode extents without the
proper locking. That would appear to be problematic in its own right
given how XFS_IFEXTENTS is used. If that is the case, I wonder if
hitting that problem in combination with a bmbt that happens to be
corrupted is causing us to go off the rails? Just a theory... and
another reason it would be really nice to have a metadump. ;)

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-21 23:08   ` Dave Chinner
@ 2014-12-22 10:09     ` Alex Lyakas
  2014-12-22 14:42     ` Brian Foster
  1 sibling, 0 replies; 18+ messages in thread
From: Alex Lyakas @ 2014-12-22 10:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, xfs

Hi Eric, Dave,

Thank you for looking at this.

On Mon, Dec 22, 2014 at 1:08 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Dec 21, 2014 at 12:13:45PM -0600, Eric Sandeen wrote:
>> On 12/21/14 5:42 AM, Alex Lyakas wrote:
>> > Greetings,
>> > we encountered XFS corruption:
>>
>> > kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....
>>
>> There should have been 64 bytes of hexdump, not just the single line above, no?
>
> Yeah, really need the whole dmesg, because we've got readahead in
> the picture here so the number of times the corruption error is seen
> is actually important....
>

I uploaded the full dump, captured by our kmsg dumper here:
https://drive.google.com/file/d/0ByBy89zr3kJNUkRfRG9TMWVnVkU/view?usp=sharing

As far as I see, all the corruption warnings are the same, and they
all print only one line of hex dump. There are some additional
warnings, like:
[812756.915765] XFS (dm-72): Access to block zero in inode 1946454529
start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 964
[812756.915765]
[812756.915772] XFS (dm-72): Access to block zero in inode 1946454529
start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 964
[812756.915772]
[812756.915815] XFS (dm-72): Access to block zero in inode 1946454529
start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 964

Two more log files (one prior to the crash and one from another VM
that took over after the crash). All corruption reports are the same.
https://drive.google.com/file/d/0ByBy89zr3kJNSHRCaUxDQnBEZHc/view?usp=sharing
https://drive.google.com/file/d/0ByBy89zr3kJNYk1hRTRaVDE4ZzA/view?usp=sharing

Unfortunately, I did not capture the output of xfs_repair. I also have
not captured the metadump. So I realize we do not have much to work
on.

Thanks!
Alex.


>>
>> > [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
>> > [813114.622928] PGD 0
>> > [813114.622928] Oops: 0000 [#1] SMP
>> > [813114.622928] CPU 2
>> > [813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
>> > [813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
>> > [813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
>> > [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5
>
> RCX implies gotp->br_startblock was not overwritten by the
> extent search. i.e. we've called xfs_bmap_search_multi_extents()
> but no extent was actually found.
>
>> > We analyzed several suspects, but all of them fall on disk addresses
>> > not near the corrupted disk address. I realize that running somewhat
>> > outdated kernel + our changes within XFSs, points back at us, but
>> > this is first time we see XFS corruption after about a year of this
>> > code being exercised. So posting here, just in case this is a known
>> > issue.
>>
>> well, xfs should _never_ oops, even if it encounters corruption.  So hopefully
>> we can work backwards from the trace above to what went wrong here.
>>
>> offhand, in xfs_bmap_search_multi_extents():
>>
>>         ep = xfs_iext_bno_to_ext(ifp, bno, &lastx);
>>         if (lastx > 0) {
>>                 xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp);
>>         }
>>         if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) {
>>                 xfs_bmbt_get_all(ep, gotp);
>>                 *eofp = 0;
>>
>> xfs_iext_bno_to_ext() can return NULL with lastx set to 0:
>>
>>         nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>>         if (nextents == 0) {
>>                 *idxp = 0;
>>                 return NULL;
>>         }
>>
>> (where idxp is the &lastx we sent in)
>
>> and if we do that, it sure seems like the "if lastx < ...." test will wind up
>> sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref.
>
> No, it shouldn't because lastx = 0 to get it set that way
> ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) must be zero.
> Therefore, this:
>
>         if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)))
>
> evaulates as:
>
>         if (0 < 0)
>
> which is not true, so we fall into the else case:
>
>         } else {
>                 if (lastx > 0) {
>                         *gotp = *prevp;
>                 }
>                 *eofp = 1;
>                 ep = NULL;
>         }
>         *lastxp = lastx;
>         return ep;
>
> Which basically overwrites *eofp and *lastxp, neither of which are
> NULL.
>
> However, the stack trace clearly shows we've just called
> xfs_bmap_search_multi_extents() - the "?" before the function name
> means it found the symbol in the stack, but not in the direct line
> of the frame pointers the current function stack points to.
>
> That makes me doubt the accuracy of the stack trace, because the
> only caller of xfs_bmap_search_multi_extents() is
> xfs_bmap_search_extents() and xfs_bmap_search_extents does not call
> xfs_bmbt_get_all() directly like the stack trace would lead us to
> beleive. Hence I don't think we can trust the stack trace to be
> pointing use at the correct caller of xfs_bmbt_get_all(), which
> makes it real hard to isolate the cause...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-21 18:13 ` Eric Sandeen
@ 2014-12-21 23:08   ` Dave Chinner
  2014-12-22 10:09     ` Alex Lyakas
  2014-12-22 14:42     ` Brian Foster
  0 siblings, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2014-12-21 23:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Alex Lyakas, xfs

On Sun, Dec 21, 2014 at 12:13:45PM -0600, Eric Sandeen wrote:
> On 12/21/14 5:42 AM, Alex Lyakas wrote:
> > Greetings,
> > we encountered XFS corruption:
> 
> > kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....    
> 
> There should have been 64 bytes of hexdump, not just the single line above, no?

Yeah, really need the whole dmesg, because we've got readahead in
the picture here so the number of times the corruption error is seen
is actually important....

> 
> > [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> > [813114.622928] PGD 0
> > [813114.622928] Oops: 0000 [#1] SMP
> > [813114.622928] CPU 2
> > [813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
> > [813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> > [813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
> > [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5

RCX implies gotp->br_startblock was not overwritten by the
extent search. i.e. we've called xfs_bmap_search_multi_extents()
but no extent was actually found.

> > We analyzed several suspects, but all of them fall on disk addresses
> > not near the corrupted disk address. I realize that running somewhat
> > outdated kernel + our changes within XFSs, points back at us, but
> > this is first time we see XFS corruption after about a year of this
> > code being exercised. So posting here, just in case this is a known
> > issue.
> 
> well, xfs should _never_ oops, even if it encounters corruption.  So hopefully
> we can work backwards from the trace above to what went wrong here.
> 
> offhand, in xfs_bmap_search_multi_extents():
> 
>         ep = xfs_iext_bno_to_ext(ifp, bno, &lastx);
>         if (lastx > 0) {
>                 xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp);
>         }
>         if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) {
>                 xfs_bmbt_get_all(ep, gotp);
>                 *eofp = 0;
> 
> xfs_iext_bno_to_ext() can return NULL with lastx set to 0:
> 
>         nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>         if (nextents == 0) {
>                 *idxp = 0;
>                 return NULL;
>         }
> 
> (where idxp is the &lastx we sent in)

> and if we do that, it sure seems like the "if lastx < ...." test will wind up
> sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref.

No, it shouldn't because lastx = 0 to get it set that way
ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) must be zero.
Therefore, this:

	if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)))

evaulates as:

	if (0 < 0)

which is not true, so we fall into the else case:

	} else {
                if (lastx > 0) {
                        *gotp = *prevp;
                }
                *eofp = 1;
                ep = NULL;
        }
        *lastxp = lastx;
        return ep;

Which basically overwrites *eofp and *lastxp, neither of which are
NULL.

However, the stack trace clearly shows we've just called
xfs_bmap_search_multi_extents() - the "?" before the function name
means it found the symbol in the stack, but not in the direct line
of the frame pointers the current function stack points to.

That makes me doubt the accuracy of the stack trace, because the
only caller of xfs_bmap_search_multi_extents() is
xfs_bmap_search_extents() and xfs_bmap_search_extents does not call
xfs_bmbt_get_all() directly like the stack trace would lead us to
beleive. Hence I don't think we can trust the stack trace to be
pointing use at the correct caller of xfs_bmbt_get_all(), which
makes it real hard to isolate the cause...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: XFS corruption
  2014-12-21 11:42 XFS corruption Alex Lyakas
@ 2014-12-21 18:13 ` Eric Sandeen
  2014-12-21 23:08   ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2014-12-21 18:13 UTC (permalink / raw)
  To: Alex Lyakas, xfs

On 12/21/14 5:42 AM, Alex Lyakas wrote:
> Greetings,
> we encountered XFS corruption:

> kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....    

There should have been 64 bytes of hexdump, not just the single line above, no?

> kernel: [774772.854820] XFS (dm-72): Internal error xfs_bmbt_verify at line 747 of file /mnt/share/builds/14.09--3.8.13-030813-generic/2014-11-30_15-47-58--14.09-1419-28/src/zadara-btrfs/fs/xfs/xfs_bmap_btree.c.  Caller 0xffffffffa077b6be

so, btree corruption

> kernel: [774772.854820]                                                                                        
> kernel: [774772.860766] Pid: 14643, comm: kworker/0:0H Tainted: GF       W  O 3.8.13-030813-generic #20130511184
> kernel: [774772.860771] Call Trace:                                                                            
> kernel: [774772.860909]  [<ffffffffa074abaf>] xfs_error_report+0x3f/0x50 [xfs]                                 
> kernel: [774772.860961]  [<ffffffffa077b6be>] ? xfs_bmbt_read_verify+0xe/0x10 [xfs]                            
> kernel: [774772.860985]  [<ffffffffa074ac1e>] xfs_corruption_error+0x5e/0x90 [xfs]                             
> kernel: [774772.861014]  [<ffffffffa077b537>] xfs_bmbt_verify+0x77/0x1e0 [xfs]                                 
> kernel: [774772.861047]  [<ffffffffa077b6be>] ? xfs_bmbt_read_verify+0xe/0x10 [xfs]                            
> kernel: [774772.861077]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0                                        
> kernel: [774772.861129]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0                                      
> kernel: [774772.861145]  [<ffffffffa077b6be>] xfs_bmbt_read_verify+0xe/0x10 [xfs]                              
> kernel: [774772.861157]  [<ffffffffa074848f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]                              
> kernel: [774772.861161]  [<ffffffff81078b81>] process_one_work+0x141/0x490                                     
> kernel: [774772.861164]  [<ffffffff81079b48>] worker_thread+0x168/0x400                                        
> kernel: [774772.861166]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120                                     
> kernel: [774772.861170]  [<ffffffff8107f050>] kthread+0xc0/0xd0                                                
> kernel: [774772.861172]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0                                 
> kernel: [774772.861193]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0                                          
> kernel: [774772.861199]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0                                 
> kernel: [774772.861318] XFS (dm-72): Corruption detected. Unmount and run xfs_repair                           
> kernel: [774772.863449] XFS (dm-72): metadata I/O error: block 0x2434e3e8 ("xfs_trans_read_buf_map") error 117 numblks 8
>  
> All the corruption reports were for the same block 0x2434e3e8, which according to the code is simply disk address (xfs_daddr_t) 607445992. So there was only one block corrupted.
>  
> Some time later, XFS crashed with:
> [813114.622928] NULL pointer dereference[813114.622928]  at 0000000000000008

ok that's worse.  ;)

> [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> [813114.622928] PGD 0
> [813114.622928] Oops: 0000 [#1] SMP
> [813114.622928] CPU 2
> [813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
> [813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> [813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
> [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5
> [813114.622928] RDX: ffff88010a193898 RSI: ffff88010a193898 RDI: 0000000000000000
> [813114.622928] RBP: ffff88010a1937f8 R08: ffff88010a193898 R09: ffff88010a1938b8
> [813114.622928] R10: ffffea0005de0940 R11: 0000000000004d0e R12: ffff88010a1938dc
> [813114.622928] R13: ffff88010a1938e0 R14: ffff88010a193898 R15: ffff88010a1938b8
> [813114.622928] FS:  00007eff2dc7e700(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
> [813114.622928] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [813114.622928] CR2: 0000000000000008 CR3: 0000000109574000 CR4: 00000000001406e0
> [813114.622928] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [813114.622928] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [813114.622928] Process smbd (pid: 31120, threadinfo ffff88010a192000, task ffff88011687ae80)
> [813114.622928] Stack:
> [813114.622928]  ffff88010a1937f8 ffffffffa076f85a ffffffffffffffff 0000000000000000
> [813114.622928]  ffffffff816ec509 000000000a193830 ffffffff816ed31d ffff88010a193898
> [813114.622928]  ffff880180fa9c00 0000000000000000 ffff88010a1938dc ffff88010a1938e0
> [813114.622928] Call Trace:
> [813114.622928]  [<ffffffffa076f85a>] ? xfs_bmap_search_multi_extents+0xaa/0x110 [xfs]
> [813114.622928]  [<ffffffff816ec509>] ? schedule+0x29/0x70
> [813114.622928]  [<ffffffff816ed31d>] ? rwsem_down_failed_common+0xcd/0x170
> [813114.622928]  [<ffffffffa076f92e>] xfs_bmap_search_extents+0x6e/0xf0 [xfs]
> [813114.622928]  [<ffffffffa0778d6c>] xfs_bmapi_read+0xfc/0x2f0 [xfs]
> [813114.622928]  [<ffffffffa0792a49>] ? xfs_ilock_map_shared+0x49/0x60 [xfs]
> [813114.622928]  [<ffffffffa07459a8>] __xfs_get_blocks+0xe8/0x550 [xfs]
> [813114.622928]  [<ffffffff8135d8c4>] ? call_rwsem_down_read_failed+0x14/0x30
> [813114.622928]  [<ffffffffa0745e41>] xfs_get_blocks+0x11/0x20 [xfs]
> [813114.622928]  [<ffffffff811d05b7>] block_read_full_page+0x127/0x360
> [813114.622928]  [<ffffffffa0745e30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
> [813114.622928]  [<ffffffff811d9b0f>] do_mpage_readpage+0x35f/0x550
> [813114.622928]  [<ffffffff816f1025>] ? do_async_page_fault+0x35/0x90
> [813114.622928]  [<ffffffff816edd48>] ? async_page_fault+0x28/0x30
> [813114.622928]  [<ffffffff811d9d4f>] mpage_readpage+0x4f/0x70
> [813114.622928]  [<ffffffffa0745e30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
> [813114.622928]  [<ffffffff81134da8>] ? file_read_actor+0x68/0x160
> [813114.622928]  [<ffffffff81134e04>] ? file_read_actor+0xc4/0x160
> [813114.622928]  [<ffffffff81354bfe>] ? radix_tree_lookup_slot+0xe/0x10
> [813114.622928]  [<ffffffffa07451b8>] xfs_vm_readpage+0x18/0x20 [xfs]
> [813114.622928]  [<ffffffff811364ad>] do_generic_file_read.constprop.31+0x10d/0x440
> [813114.622928]  [<ffffffff811374d1>] generic_file_aio_read+0xe1/0x220
> [813114.622928]  [<ffffffffa074fb98>] xfs_file_aio_read+0x1c8/0x330 [xfs]
> [813114.622928]  [<ffffffff8119ad93>] do_sync_read+0xa3/0xe0
> [813114.622928]  [<ffffffff8119b4d0>] vfs_read+0xb0/0x180
> [813114.622928]  [<ffffffff8119b77a>] sys_pread64+0x9a/0xa0
> [813114.622928]  [<ffffffff816f629d>] system_call_fastpath+0x1a/0x1f
> [813114.622928] Code: d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f2 <48> 8b 77 08 48 8b 3f 48 89 e5 e8 48 f8 ff ff 5d c3 66 0f 1f 44
> [813114.622928] RIP  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
> [813114.622928]  RSP <ffff88010a193798>
> [813114.622928] CR2: 0000000000000008
> [813114.721138] ---[ end trace cce2a358d4050d3d ]---
>  
> We are running XFS based on kernel 3.8.13, with our changes for
> large-block discard in
> https://github.com/zadarastorage/zadara-xfs-pushback.

hmmm... so a custom kernel, that makes it trickier.

> We analyzed several suspects, but all of them fall on disk addresses
> not near the corrupted disk address. I realize that running somewhat
> outdated kernel + our changes within XFSs, points back at us, but
> this is first time we see XFS corruption after about a year of this
> code being exercised. So posting here, just in case this is a known
> issue.

well, xfs should _never_ oops, even if it encounters corruption.  So hopefully
we can work backwards from the trace above to what went wrong here.

offhand, in xfs_bmap_search_multi_extents():

        ep = xfs_iext_bno_to_ext(ifp, bno, &lastx);
        if (lastx > 0) {
                xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp);
        }
        if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) {
                xfs_bmbt_get_all(ep, gotp);
                *eofp = 0;

xfs_iext_bno_to_ext() can return NULL with lastx set to 0:

        nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
        if (nextents == 0) {
                *idxp = 0;
                return NULL;
        }

(where idxp is the &lastx we sent in)

and if we do that, it sure seems like the "if lastx < ...." test will wind up
sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref.

> I must point out that xfs_repair was able to fix this, which was
> awesome!

do you have the xfs_repair output?

If you ever hit something like this again, capturing a metadump prior to repair,
if possible, would be great, so we might have a better reproducer.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* XFS corruption
@ 2014-12-21 11:42 Alex Lyakas
  2014-12-21 18:13 ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Lyakas @ 2014-12-21 11:42 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 7724 bytes --]

Greetings,
we encountered XFS corruption:

kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f  .....././..j....     
kernel: [774772.854820] XFS (dm-72): Internal error xfs_bmbt_verify at line 747 of file /mnt/share/builds/14.09--3.8.13-030813-generic/2014-11-30_15-47-58--14.09-1419-28/src/zadara-btrfs/fs/xfs/xfs_bmap_btree.c.  Caller 0xffffffffa077b6be
kernel: [774772.854820]                                                                                         
kernel: [774772.860766] Pid: 14643, comm: kworker/0:0H Tainted: GF       W  O 3.8.13-030813-generic #20130511184
kernel: [774772.860771] Call Trace:                                                                             
kernel: [774772.860909]  [<ffffffffa074abaf>] xfs_error_report+0x3f/0x50 [xfs]                                  
kernel: [774772.860961]  [<ffffffffa077b6be>] ? xfs_bmbt_read_verify+0xe/0x10 [xfs]                             
kernel: [774772.860985]  [<ffffffffa074ac1e>] xfs_corruption_error+0x5e/0x90 [xfs]                              
kernel: [774772.861014]  [<ffffffffa077b537>] xfs_bmbt_verify+0x77/0x1e0 [xfs]                                  
kernel: [774772.861047]  [<ffffffffa077b6be>] ? xfs_bmbt_read_verify+0xe/0x10 [xfs]                             
kernel: [774772.861077]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0                                         
kernel: [774772.861129]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0                                       
kernel: [774772.861145]  [<ffffffffa077b6be>] xfs_bmbt_read_verify+0xe/0x10 [xfs]                               
kernel: [774772.861157]  [<ffffffffa074848f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]                               
kernel: [774772.861161]  [<ffffffff81078b81>] process_one_work+0x141/0x490                                      
kernel: [774772.861164]  [<ffffffff81079b48>] worker_thread+0x168/0x400                                         
kernel: [774772.861166]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120                                      
kernel: [774772.861170]  [<ffffffff8107f050>] kthread+0xc0/0xd0                                                 
kernel: [774772.861172]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0                                  
kernel: [774772.861193]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0                                           
kernel: [774772.861199]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0                                  
kernel: [774772.861318] XFS (dm-72): Corruption detected. Unmount and run xfs_repair                            
kernel: [774772.863449] XFS (dm-72): metadata I/O error: block 0x2434e3e8 ("xfs_trans_read_buf_map") error 117 numblks 8

All the corruption reports were for the same block 0x2434e3e8, which according to the code is simply disk address (xfs_daddr_t) 607445992. So there was only one block corrupted.

Some time later, XFS crashed with:
[813114.622928] NULL pointer dereference[813114.622928]  at 0000000000000008
[813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
[813114.622928] PGD 0 
[813114.622928] Oops: 0000 [#1] SMP 
[813114.622928] CPU 2 
[813114.622928] Pid: 31120, comm: smbd Tainted: GF       W  O 3.8.13-030813-generic #201305111843 Bochs Bochs
[813114.622928] RIP: 0010:[<ffffffffa077bad9>]  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
[813114.622928] RSP: 0018:ffff88010a193798  EFLAGS: 00010297
[813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5
[813114.622928] RDX: ffff88010a193898 RSI: ffff88010a193898 RDI: 0000000000000000
[813114.622928] RBP: ffff88010a1937f8 R08: ffff88010a193898 R09: ffff88010a1938b8
[813114.622928] R10: ffffea0005de0940 R11: 0000000000004d0e R12: ffff88010a1938dc
[813114.622928] R13: ffff88010a1938e0 R14: ffff88010a193898 R15: ffff88010a1938b8
[813114.622928] FS:  00007eff2dc7e700(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
[813114.622928] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[813114.622928] CR2: 0000000000000008 CR3: 0000000109574000 CR4: 00000000001406e0
[813114.622928] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[813114.622928] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[813114.622928] Process smbd (pid: 31120, threadinfo ffff88010a192000, task ffff88011687ae80)
[813114.622928] Stack:
[813114.622928]  ffff88010a1937f8 ffffffffa076f85a ffffffffffffffff 0000000000000000
[813114.622928]  ffffffff816ec509 000000000a193830 ffffffff816ed31d ffff88010a193898
[813114.622928]  ffff880180fa9c00 0000000000000000 ffff88010a1938dc ffff88010a1938e0
[813114.622928] Call Trace:
[813114.622928]  [<ffffffffa076f85a>] ? xfs_bmap_search_multi_extents+0xaa/0x110 [xfs]
[813114.622928]  [<ffffffff816ec509>] ? schedule+0x29/0x70
[813114.622928]  [<ffffffff816ed31d>] ? rwsem_down_failed_common+0xcd/0x170
[813114.622928]  [<ffffffffa076f92e>] xfs_bmap_search_extents+0x6e/0xf0 [xfs]
[813114.622928]  [<ffffffffa0778d6c>] xfs_bmapi_read+0xfc/0x2f0 [xfs]
[813114.622928]  [<ffffffffa0792a49>] ? xfs_ilock_map_shared+0x49/0x60 [xfs]
[813114.622928]  [<ffffffffa07459a8>] __xfs_get_blocks+0xe8/0x550 [xfs]
[813114.622928]  [<ffffffff8135d8c4>] ? call_rwsem_down_read_failed+0x14/0x30
[813114.622928]  [<ffffffffa0745e41>] xfs_get_blocks+0x11/0x20 [xfs]
[813114.622928]  [<ffffffff811d05b7>] block_read_full_page+0x127/0x360
[813114.622928]  [<ffffffffa0745e30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
[813114.622928]  [<ffffffff811d9b0f>] do_mpage_readpage+0x35f/0x550
[813114.622928]  [<ffffffff816f1025>] ? do_async_page_fault+0x35/0x90
[813114.622928]  [<ffffffff816edd48>] ? async_page_fault+0x28/0x30
[813114.622928]  [<ffffffff811d9d4f>] mpage_readpage+0x4f/0x70
[813114.622928]  [<ffffffffa0745e30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
[813114.622928]  [<ffffffff81134da8>] ? file_read_actor+0x68/0x160
[813114.622928]  [<ffffffff81134e04>] ? file_read_actor+0xc4/0x160
[813114.622928]  [<ffffffff81354bfe>] ? radix_tree_lookup_slot+0xe/0x10
[813114.622928]  [<ffffffffa07451b8>] xfs_vm_readpage+0x18/0x20 [xfs]
[813114.622928]  [<ffffffff811364ad>] do_generic_file_read.constprop.31+0x10d/0x440
[813114.622928]  [<ffffffff811374d1>] generic_file_aio_read+0xe1/0x220
[813114.622928]  [<ffffffffa074fb98>] xfs_file_aio_read+0x1c8/0x330 [xfs]
[813114.622928]  [<ffffffff8119ad93>] do_sync_read+0xa3/0xe0
[813114.622928]  [<ffffffff8119b4d0>] vfs_read+0xb0/0x180
[813114.622928]  [<ffffffff8119b77a>] sys_pread64+0x9a/0xa0
[813114.622928]  [<ffffffff816f629d>] system_call_fastpath+0x1a/0x1f
[813114.622928] Code: d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f2 <48> 8b 77 08 48 8b 3f 48 89 e5 e8 48 f8 ff ff 5d c3 66 0f 1f 44 
[813114.622928] RIP  [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs]
[813114.622928]  RSP <ffff88010a193798>
[813114.622928] CR2: 0000000000000008
[813114.721138] ---[ end trace cce2a358d4050d3d ]---

We are running XFS based on kernel 3.8.13, with our changes for large-block discard in https://github.com/zadarastorage/zadara-xfs-pushback.

We analyzed several suspects, but all of them fall on disk addresses not near the corrupted disk address. I realize that running somewhat outdated kernel + our changes within XFSs, points back at us, but this is first time we see XFS corruption after about a year of this code being exercised. So posting here, just in case this is a known issue.

I must point out that xfs_repair was able to fix this, which was awesome!

Thanks,
Alex.


[-- Attachment #1.2: Type: text/html, Size: 13688 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-02-24 22:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03 11:09 xfs corruption Danny Shavit
2015-09-03 13:22 ` Eric Sandeen
2015-09-03 14:26   ` Danny Shavit
2015-09-03 14:55     ` Eric Sandeen
2015-09-03 16:14       ` Eric Sandeen
2015-09-06 10:19         ` Alex Lyakas
2015-09-06 21:56           ` Eric Sandeen
2015-09-07  8:30             ` Alex Lyakas
  -- strict thread matches above, loose matches on Subject: below --
2016-02-24  6:12 XFS Corruption fangchen sun
2016-02-24 22:23 ` Eric Sandeen
2014-12-21 11:42 XFS corruption Alex Lyakas
2014-12-21 18:13 ` Eric Sandeen
2014-12-21 23:08   ` Dave Chinner
2014-12-22 10:09     ` Alex Lyakas
2014-12-22 14:42     ` Brian Foster
2014-12-23  0:39       ` Dave Chinner
2014-12-23  9:57         ` Alex Lyakas
2014-12-23 20:36           ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.