All of lore.kernel.org
 help / color / mirror / Atom feed
* Read corruption on ARM
@ 2013-02-26 21:58 Jason Detring
  2013-02-26 22:33 ` Eric Sandeen
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jason Detring @ 2013-02-26 21:58 UTC (permalink / raw)
  To: xfs

Hello list,

I'm seeing filesystem read corruption on my NAS box.

My machine is an ARMv5 unit; this guy here:
   <http://buffalo.nas-central.org/wiki/Category:LSPro>
The hard disk is a Seagate 2TB ST32000644NS enterprise drive on the
SoC's SATA controller.
The unit is on a UPS and almost never sees unclean stops.

# xfs_info /dev/sda4
meta-data=/dev/sda4              isize=256    agcount=4, agsize=121469473 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=485877892, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=237245, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

This is a "from zero" clean installation since the original HDD was lost,
so the original factory firmware is gone.  It runs Slackware ARM (-current) now.
The majority of the disk, 1.9T, is an unmanaged XFS mass storage partition.
The file system was created mid-2010 by then-current tools and kernels.
The remainder is boot, OS, /home, and scratch on ext3.
Mass storage is always mounted ro,noatime on system startup,
then remounted rw,noatime when I am ready to start performing operations.
Write caching is disabled on the HDD as part of OS startup,
usually after ro mount but before rw.

I am currently running an unpatched, vanilla 3.7.9 kernel, though this
corruption has been going on for over a year across many quarterly
kernel releases.
I had been working around it, but it's just now become irritating enough for
me to look into it.  The other unresolved ARM report from about a month ago
was enough to prod me into action. :-)


The error seems to be triggered on some directory or file lookups, but not all.
So, some files and directores can be opened in regular userspace or via NFS,
but others are inaccessible.  This is not one or two files; it is
often 1/4 to 1/3 of
the entire file system.
Each misread item triggers a backtrace in the kernel log similiar to this:

[  465.441259] c6a59000: 58 46 53 42 00 00 10 00 00 00 00 00 1c f5 e8
84  XFSB............
[  465.449461] XFS (sda4): Internal error xfs_da_do_buf(2) at line
2192 of file fs/xfs/xfs_da_btree.c.  Caller 0xbf05de4c
[  465.449461]
[  465.461982] [<c001f0f4>] (unwind_backtrace+0x0/0x12c) from
[<bf029ff0>] (xfs_corruption_error+0x58/0x74 [xfs])
[  465.462606] [<bf029ff0>] (xfs_corruption_error+0x58/0x74 [xfs])
from [<bf0588fc>] (xfs_da_read_buf+0x134/0x1b0 [xfs])
[  465.463384] [<bf0588fc>] (xfs_da_read_buf+0x134/0x1b0 [xfs]) from
[<bf05de4c>] (xfs_dir2_leaf_readbuf+0x3a4/0x5f4 [xfs])
[  465.464230] [<bf05de4c>] (xfs_dir2_leaf_readbuf+0x3a4/0x5f4 [xfs])
from [<bf05e574>] (xfs_dir2_leaf_getdents+0xfc/0x3cc [xfs])
[  465.465016] [<bf05e574>] (xfs_dir2_leaf_getdents+0xfc/0x3cc [xfs])
from [<bf05aaec>] (xfs_readdir+0xc4/0xd0 [xfs])
[  465.465641] [<bf05aaec>] (xfs_readdir+0xc4/0xd0 [xfs]) from
[<bf02ac08>] (xfs_file_readdir+0x44/0x54 [xfs])
[  465.465919] [<bf02ac08>] (xfs_file_readdir+0x44/0x54 [xfs]) from
[<c00c9644>] (vfs_readdir+0x7c/0xac)
[  465.465979] [<c00c9644>] (vfs_readdir+0x7c/0xac) from [<c00c9810>]
(sys_getdents64+0x64/0xcc)
[  465.466035] [<c00c9810>] (sys_getdents64+0x64/0xcc) from
[<c0019080>] (ret_fast_syscall+0x0/0x2c)
[  465.466066] XFS (sda4): Corruption detected. Unmount and run xfs_repair

I've run xfs_repair offline on the hardware itself, but the tool never
finds problems.
Removing the disk from the NAS and mounting it in a desktop always
shows a clean, readable filesystem.


This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
case filesystem.
The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
populated by kernel 3.6.9.
This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
   <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
The problem appears to be tied to the filesystem, not the media,
since both an external USB reader and a loopback-mounted image on the
unit's main SD media show the same backtrace.  The loopback image was
captured on other hardware, then copied onto the RPi via network.

# xfs_info /dev/sdb1
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=15413 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=61651, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=1200, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[   90.638514] XFS (sdb1): Mounting Filesystem
[   92.154824] XFS (sdb1): Ending clean mount
[   99.010151] db027000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0
d3  XFSB............
[   99.018213] XFS (sdb1): Internal error xfs_da_do_buf(2) at line
2192 of file fs/xfs/xfs_da_btree.c.  Caller 0xbf1448e4
[   99.018213]
[   99.030528] Backtrace:
[   99.030605] [<c001c1f8>] (dump_backtrace+0x0/0x10c) from
[<c0381244>] (dump_stack+0x18/0x1c)
[   99.030653]  r6:bf171e38 r5:bf171e38 r4:bf171dd4 r3:dce6ac40
[   99.030998] [<c038122c>] (dump_stack+0x0/0x1c) from [<bf1105f0>]
(xfs_error_report+0x5c/0x68 [xfs])
[   99.031329] [<bf110594>] (xfs_error_report+0x0/0x68 [xfs]) from
[<bf110658>] (xfs_corruption_error+0x5c/0x78 [xfs])
[   99.031346]  r5:00000001 r4:c1abf800
[   99.031784] [<bf1105fc>] (xfs_corruption_error+0x0/0x78 [xfs]) from
[<bf13fa58>] (xfs_da_read_buf+0x160/0x194 [xfs])
[   99.031800]  r6:58465342 r5:dcdd9d80 r4:00000075
[   99.032311] [<bf13f8f8>] (xfs_da_read_buf+0x0/0x194 [xfs]) from
[<bf1448e4>] (xfs_dir2_leaf_readbuf+0x22c/0x628 [xfs])
[   99.032822] [<bf1446b8>] (xfs_dir2_leaf_readbuf+0x0/0x628 [xfs])
from [<bf1451ac>] (xfs_dir2_leaf_getdents+0x134/0x3d4 [xfs])
[   99.033326] [<bf145078>] (xfs_dir2_leaf_getdents+0x0/0x3d4 [xfs])
from [<bf141a44>] (xfs_readdir+0xdc/0xe4 [xfs])
[   99.033742] [<bf141968>] (xfs_readdir+0x0/0xe4 [xfs]) from
[<bf111398>] (xfs_file_readdir+0x4c/0x5c [xfs])
[   99.033939] [<bf11134c>] (xfs_file_readdir+0x0/0x5c [xfs]) from
[<c00f1874>] (vfs_readdir+0xa0/0xc4)
[   99.033954]  r7:dcdd9f78 r6:c00f158c r5:00000000 r4:dcf8aee0
[   99.034004] [<c00f17d4>] (vfs_readdir+0x0/0xc4) from [<c00f1a50>]
(sys_getdents64+0x68/0xd8)
[   99.034052] [<c00f19e8>] (sys_getdents64+0x0/0xd8) from
[<c0018900>] (ret_fast_syscall+0x0/0x30)
[   99.034066]  r7:000000d9 r6:0068ff58 r5:006882a8 r4:00000000
[   99.034101] XFS (sdb1): Corruption detected. Unmount and run xfs_repair

# xfs_info loop/
meta-data=/dev/loop0             isize=256    agcount=4, agsize=15413 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=61651, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=1200, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[ 1347.630983] XFS (loop0): Mounting Filesystem
[ 1347.745898] XFS (loop0): Ending clean mount
[ 1351.743284] db273000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0
d3  XFSB............
[ 1351.751716] XFS (loop0): Internal error xfs_da_do_buf(2) at line
2192 of file fs/xfs/xfs_da_btree.c.  Caller 0xbf1448e4
[ 1351.751716]
[ 1351.764072] Backtrace:
[ 1351.764148] [<c001c1f8>] (dump_backtrace+0x0/0x10c) from
[<c0381244>] (dump_stack+0x18/0x1c)
[ 1351.764204]  r6:bf171e38 r5:bf171e38 r4:bf171dd4 r3:c189ac40
[ 1351.764552] [<c038122c>] (dump_stack+0x0/0x1c) from [<bf1105f0>]
(xfs_error_report+0x5c/0x68 [xfs])
[ 1351.764924] [<bf110594>] (xfs_error_report+0x0/0x68 [xfs]) from
[<bf110658>] (xfs_corruption_error+0x5c/0x78 [xfs])
[ 1351.764945]  r5:00000001 r4:c1968000
[ 1351.765386] [<bf1105fc>] (xfs_corruption_error+0x0/0x78 [xfs]) from
[<bf13fa58>] (xfs_da_read_buf+0x160/0x194 [xfs])
[ 1351.765403]  r6:58465342 r5:dce25d80 r4:00000075
[ 1351.765920] [<bf13f8f8>] (xfs_da_read_buf+0x0/0x194 [xfs]) from
[<bf1448e4>] (xfs_dir2_leaf_readbuf+0x22c/0x628 [xfs])
[ 1351.766432] [<bf1446b8>] (xfs_dir2_leaf_readbuf+0x0/0x628 [xfs])
from [<bf1451ac>] (xfs_dir2_leaf_getdents+0x134/0x3d4 [xfs])
[ 1351.766942] [<bf145078>] (xfs_dir2_leaf_getdents+0x0/0x3d4 [xfs])
from [<bf141a44>] (xfs_readdir+0xdc/0xe4 [xfs])
[ 1351.767363] [<bf141968>] (xfs_readdir+0x0/0xe4 [xfs]) from
[<bf111398>] (xfs_file_readdir+0x4c/0x5c [xfs])
[ 1351.767557] [<bf11134c>] (xfs_file_readdir+0x0/0x5c [xfs]) from
[<c00f1874>] (vfs_readdir+0xa0/0xc4)
[ 1351.767574]  r7:dce25f78 r6:c00f158c r5:00000000 r4:c18e57e0
[ 1351.767622] [<c00f17d4>] (vfs_readdir+0x0/0xc4) from [<c00f1a50>]
(sys_getdents64+0x68/0xd8)
[ 1351.767670] [<c00f19e8>] (sys_getdents64+0x0/0xd8) from
[<c0018900>] (ret_fast_syscall+0x0/0x30)
[ 1351.767683]  r7:000000d9 r6:00642f58 r5:0063b2a8 r4:00000000
[ 1351.767719] XFS (loop0): Corruption detected. Unmount and run xfs_repair



Here's the kicker:  All this seems to happen only if xfs.ko is
crosscompiled with GCC 4.6 or 4.7.
A module (just the module, the rest of kernel can be built with
anything) compiled with
cross-GCC 4.4.1, 4.5.4, or curiously 4.8 (20130224) has no issue at all.
I've kept an old 2009 Sourcery G++ (4.4.1) Lite toolchain around just
for building kernels.
I'd really like to retire it, but I'm a little afraid this is going to
recur in newer compilers.

Is there something in the path lookup routine that is disagreeable to
GCCs targeting ARM?
Any other ideas on what could be happening?

Thanks,
Jason

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 21:58 Read corruption on ARM Jason Detring
@ 2013-02-26 22:33 ` Eric Sandeen
  2013-02-26 23:25   ` Jason Detring
  2013-02-26 22:37 ` Eric Sandeen
  2013-02-27  7:19 ` Stefan Ring
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-26 22:33 UTC (permalink / raw)
  To: Jason Detring; +Cc: xfs

On 2/26/13 3:58 PM, Jason Detring wrote:
> Hello list,
> 
> I'm seeing filesystem read corruption on my NAS box.
> 
> My machine is an ARMv5 unit; this guy here:
>    <http://buffalo.nas-central.org/wiki/Category:LSPro>
> The hard disk is a Seagate 2TB ST32000644NS enterprise drive on the
> SoC's SATA controller.
> The unit is on a UPS and almost never sees unclean stops.
> 
> # xfs_info /dev/sda4
> meta-data=/dev/sda4              isize=256    agcount=4, agsize=121469473 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=485877892, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=237245, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> This is a "from zero" clean installation since the original HDD was lost,
> so the original factory firmware is gone.  It runs Slackware ARM (-current) now.
> The majority of the disk, 1.9T, is an unmanaged XFS mass storage partition.
> The file system was created mid-2010 by then-current tools and kernels.
> The remainder is boot, OS, /home, and scratch on ext3.
> Mass storage is always mounted ro,noatime on system startup,
> then remounted rw,noatime when I am ready to start performing operations.
> Write caching is disabled on the HDD as part of OS startup,
> usually after ro mount but before rw.
> 
> I am currently running an unpatched, vanilla 3.7.9 kernel, though this
> corruption has been going on for over a year across many quarterly
> kernel releases.
> I had been working around it, but it's just now become irritating enough for
> me to look into it.  The other unresolved ARM report from about a month ago
> was enough to prod me into action. :-)
> 
> 
> The error seems to be triggered on some directory or file lookups, but not all.
> So, some files and directores can be opened in regular userspace or via NFS,
> but others are inaccessible.  This is not one or two files; it is
> often 1/4 to 1/3 of
> the entire file system.
> Each misread item triggers a backtrace in the kernel log similiar to this:
> 
> [  465.441259] c6a59000: 58 46 53 42 00 00 10 00 00 00 00 00 1c f5 e8
> 84  XFSB............
> [  465.449461] XFS (sda4): Internal error xfs_da_do_buf(2) at line
> 2192 of file fs/xfs/xfs_da_btree.c.  Caller 0xbf05de4c
> [  465.449461]
> [  465.461982] [<c001f0f4>] (unwind_backtrace+0x0/0x12c) from
> [<bf029ff0>] (xfs_corruption_error+0x58/0x74 [xfs])
> [  465.462606] [<bf029ff0>] (xfs_corruption_error+0x58/0x74 [xfs])
> from [<bf0588fc>] (xfs_da_read_buf+0x134/0x1b0 [xfs])
> [  465.463384] [<bf0588fc>] (xfs_da_read_buf+0x134/0x1b0 [xfs]) from
> [<bf05de4c>] (xfs_dir2_leaf_readbuf+0x3a4/0x5f4 [xfs])
> [  465.464230] [<bf05de4c>] (xfs_dir2_leaf_readbuf+0x3a4/0x5f4 [xfs])
> from [<bf05e574>] (xfs_dir2_leaf_getdents+0xfc/0x3cc [xfs])
> [  465.465016] [<bf05e574>] (xfs_dir2_leaf_getdents+0xfc/0x3cc [xfs])
> from [<bf05aaec>] (xfs_readdir+0xc4/0xd0 [xfs])
> [  465.465641] [<bf05aaec>] (xfs_readdir+0xc4/0xd0 [xfs]) from
> [<bf02ac08>] (xfs_file_readdir+0x44/0x54 [xfs])
> [  465.465919] [<bf02ac08>] (xfs_file_readdir+0x44/0x54 [xfs]) from
> [<c00c9644>] (vfs_readdir+0x7c/0xac)
> [  465.465979] [<c00c9644>] (vfs_readdir+0x7c/0xac) from [<c00c9810>]
> (sys_getdents64+0x64/0xcc)
> [  465.466035] [<c00c9810>] (sys_getdents64+0x64/0xcc) from
> [<c0019080>] (ret_fast_syscall+0x0/0x2c)
> [  465.466066] XFS (sda4): Corruption detected. Unmount and run xfs_repair
> 
> I've run xfs_repair offline on the hardware itself, but the tool never
> finds problems.
> Removing the disk from the NAS and mounting it in a desktop always
> shows a clean, readable filesystem.
> 
> 
> This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
> case filesystem.
> The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
> populated by kernel 3.6.9.
> This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
>    <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
> The problem appears to be tied to the filesystem, not the media,
> since both an external USB reader and a loopback-mounted image on the
> unit's main SD media show the same backtrace.  The loopback image was
> captured on other hardware, then copied onto the RPi via network.
> 
> # xfs_info /dev/sdb1
> meta-data=/dev/sdb1              isize=256    agcount=4, agsize=15413 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=61651, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=1200, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> [   90.638514] XFS (sdb1): Mounting Filesystem
> [   92.154824] XFS (sdb1): Ending clean mount
> [   99.010151] db027000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0
> d3  XFSB............
> [   99.018213] XFS (sdb1): Internal error xfs_da_do_buf(2) at line
> 2192 of file fs/xfs/xfs_da_btree.c.  Caller 0xbf1448e4

So this came out of xfs_da_read_buf(), and it thought it was reading
metadata but got something it didn't recognize.

The hex up there shows that it got what looks like xfs superblock
magic.

> [   99.018213]
> [   99.030528] Backtrace:
> [   99.030605] [<c001c1f8>] (dump_backtrace+0x0/0x10c) from
> [<c0381244>] (dump_stack+0x18/0x1c)
> [   99.030653]  r6:bf171e38 r5:bf171e38 r4:bf171dd4 r3:dce6ac40
> [   99.030998] [<c038122c>] (dump_stack+0x0/0x1c) from [<bf1105f0>]
> (xfs_error_report+0x5c/0x68 [xfs])
> [   99.031329] [<bf110594>] (xfs_error_report+0x0/0x68 [xfs]) from
> [<bf110658>] (xfs_corruption_error+0x5c/0x78 [xfs])
> [   99.031346]  r5:00000001 r4:c1abf800
> [   99.031784] [<bf1105fc>] (xfs_corruption_error+0x0/0x78 [xfs]) from
> [<bf13fa58>] (xfs_da_read_buf+0x160/0x194 [xfs])
> [   99.031800]  r6:58465342 r5:dcdd9d80 r4:00000075
> [   99.032311] [<bf13f8f8>] (xfs_da_read_buf+0x0/0x194 [xfs]) from
> [<bf1448e4>] (xfs_dir2_leaf_readbuf+0x22c/0x628 [xfs])
> [   99.032822] [<bf1446b8>] (xfs_dir2_leaf_readbuf+0x0/0x628 [xfs])

when reading a leaf format directory

> from [<bf1451ac>] (xfs_dir2_leaf_getdents+0x134/0x3d4 [xfs])
> [   99.033326] [<bf145078>] (xfs_dir2_leaf_getdents+0x0/0x3d4 [xfs])
> from [<bf141a44>] (xfs_readdir+0xdc/0xe4 [xfs])
> [   99.033742] [<bf141968>] (xfs_readdir+0x0/0xe4 [xfs]) from
> [<bf111398>] (xfs_file_readdir+0x4c/0x5c [xfs])
> [   99.033939] [<bf11134c>] (xfs_file_readdir+0x0/0x5c [xfs]) from
> [<c00f1874>] (vfs_readdir+0xa0/0xc4)
> [   99.033954]  r7:dcdd9f78 r6:c00f158c r5:00000000 r4:dcf8aee0
> [   99.034004] [<c00f17d4>] (vfs_readdir+0x0/0xc4) from [<c00f1a50>]
> (sys_getdents64+0x68/0xd8)
> [   99.034052] [<c00f19e8>] (sys_getdents64+0x0/0xd8) from
> [<c0018900>] (ret_fast_syscall+0x0/0x30)
> [   99.034066]  r7:000000d9 r6:0068ff58 r5:006882a8 r4:00000000
> [   99.034101] XFS (sdb1): Corruption detected. Unmount and run xfs_repair
> 
> # xfs_info loop/
> meta-data=/dev/loop0             isize=256    agcount=4, agsize=15413 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=61651, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=1200, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> [ 1347.630983] XFS (loop0): Mounting Filesystem
> [ 1347.745898] XFS (loop0): Ending clean mount
> [ 1351.743284] db273000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0
> d3  XFSB............
> [ 1351.751716] XFS (loop0): Internal error xfs_da_do_buf(2) at line
> 2192 of file fs/xfs/xfs_da_btree.c.  Caller 0xbf1448e4
> [ 1351.751716]
> [ 1351.764072] Backtrace:
> [ 1351.764148] [<c001c1f8>] (dump_backtrace+0x0/0x10c) from
> [<c0381244>] (dump_stack+0x18/0x1c)
> [ 1351.764204]  r6:bf171e38 r5:bf171e38 r4:bf171dd4 r3:c189ac40
> [ 1351.764552] [<c038122c>] (dump_stack+0x0/0x1c) from [<bf1105f0>]
> (xfs_error_report+0x5c/0x68 [xfs])
> [ 1351.764924] [<bf110594>] (xfs_error_report+0x0/0x68 [xfs]) from
> [<bf110658>] (xfs_corruption_error+0x5c/0x78 [xfs])
> [ 1351.764945]  r5:00000001 r4:c1968000
> [ 1351.765386] [<bf1105fc>] (xfs_corruption_error+0x0/0x78 [xfs]) from
> [<bf13fa58>] (xfs_da_read_buf+0x160/0x194 [xfs])
> [ 1351.765403]  r6:58465342 r5:dce25d80 r4:00000075
> [ 1351.765920] [<bf13f8f8>] (xfs_da_read_buf+0x0/0x194 [xfs]) from
> [<bf1448e4>] (xfs_dir2_leaf_readbuf+0x22c/0x628 [xfs])
> [ 1351.766432] [<bf1446b8>] (xfs_dir2_leaf_readbuf+0x0/0x628 [xfs])
> from [<bf1451ac>] (xfs_dir2_leaf_getdents+0x134/0x3d4 [xfs])
> [ 1351.766942] [<bf145078>] (xfs_dir2_leaf_getdents+0x0/0x3d4 [xfs])
> from [<bf141a44>] (xfs_readdir+0xdc/0xe4 [xfs])
> [ 1351.767363] [<bf141968>] (xfs_readdir+0x0/0xe4 [xfs]) from
> [<bf111398>] (xfs_file_readdir+0x4c/0x5c [xfs])
> [ 1351.767557] [<bf11134c>] (xfs_file_readdir+0x0/0x5c [xfs]) from
> [<c00f1874>] (vfs_readdir+0xa0/0xc4)
> [ 1351.767574]  r7:dce25f78 r6:c00f158c r5:00000000 r4:c18e57e0
> [ 1351.767622] [<c00f17d4>] (vfs_readdir+0x0/0xc4) from [<c00f1a50>]
> (sys_getdents64+0x68/0xd8)
> [ 1351.767670] [<c00f19e8>] (sys_getdents64+0x0/0xd8) from
> [<c0018900>] (ret_fast_syscall+0x0/0x30)
> [ 1351.767683]  r7:000000d9 r6:00642f58 r5:0063b2a8 r4:00000000
> [ 1351.767719] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> 
> 
> 
> Here's the kicker:  All this seems to happen only if xfs.ko is
> crosscompiled with GCC 4.6 or 4.7.

urk!  That is a kicker.

> A module (just the module, the rest of kernel can be built with
> anything) compiled with
> cross-GCC 4.4.1, 4.5.4, or curiously 4.8 (20130224) has no issue at all.
> I've kept an old 2009 Sourcery G++ (4.4.1) Lite toolchain around just
> for building kernels.
> I'd really like to retire it, but I'm a little afraid this is going to
> recur in newer compilers.

Maybe you can provide an xfs.ko built with each (for the same kernel)
with debug info, and we can compare the disassembly?

> Is there something in the path lookup routine that is disagreeable to
> GCCs targeting ARM?

at one point there were some alignment issues that went on, but hat
was for old ABI, etc.  I'm not aware of anything right now.

> Any other ideas on what could be happening?

Since you got xfs superblock magic, I wonder if you read block 0
rather than the intended block, due to $SOMETHING going wrong...

Enabling the trace_xfs_da_btree_corrupt tracepoint might yield more
info, can you do that?

I think it's:

# trace-cmd -e xfs_da_btree_corrupt &
# <do your dir read>
# fg
# ^C (ctrl-c trace-cmd)
# trace-cmd report

We might get more info about the buffer in question that way.

-Eric

> Thanks,
> Jason
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 21:58 Read corruption on ARM Jason Detring
  2013-02-26 22:33 ` Eric Sandeen
@ 2013-02-26 22:37 ` Eric Sandeen
  2013-02-26 22:51   ` Eric Sandeen
  2013-02-27  7:19 ` Stefan Ring
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-26 22:37 UTC (permalink / raw)
  To: Jason Detring; +Cc: xfs

On 2/26/13 3:58 PM, Jason Detring wrote:
> Hello list,

<snip>

> This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
> case filesystem.
> The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
> populated by kernel 3.6.9.
> This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
>    <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
> The problem appears to be tied to the filesystem, not the media,
> since both an external USB reader and a loopback-mounted image on the
> unit's main SD media show the same backtrace.  The loopback image was
> captured on other hardware, then copied onto the RPi via network.

Missed this; let me fire up my pi and see if I can replicate it.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 22:37 ` Eric Sandeen
@ 2013-02-26 22:51   ` Eric Sandeen
  2013-02-26 23:21     ` Jason Detring
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-26 22:51 UTC (permalink / raw)
  To: Jason Detring; +Cc: xfs

On 2/26/13 4:37 PM, Eric Sandeen wrote:
> On 2/26/13 3:58 PM, Jason Detring wrote:
>> Hello list,
> 
> <snip>
> 
>> This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
>> case filesystem.
>> The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
>> populated by kernel 3.6.9.
>> This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
>>    <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
>> The problem appears to be tied to the filesystem, not the media,
>> since both an external USB reader and a loopback-mounted image on the
>> unit's main SD media show the same backtrace.  The loopback image was
>> captured on other hardware, then copied onto the RPi via network.
> 
> Missed this; let me fire up my pi and see if I can replicate it.

Realized that I'll need to cross-compile xfs.ko I guess...

But - do you see this when the *whole* kernel is cross-compiled?
Building the kernel one way and xfs another way, with another gcc,
is probably nothing but trouble.  :)

-Eric

> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 22:51   ` Eric Sandeen
@ 2013-02-26 23:21     ` Jason Detring
  2013-02-27  2:16       ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Detring @ 2013-02-26 23:21 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On 2/26/13, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 2/26/13 4:37 PM, Eric Sandeen wrote:
>> On 2/26/13 3:58 PM, Jason Detring wrote:
>>> Hello list,
>>
>> <snip>
>>
>>> This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
>>> case filesystem.
>>> The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
>>> populated by kernel 3.6.9.
>>> This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
>>>    <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
>>> The problem appears to be tied to the filesystem, not the media,
>>> since both an external USB reader and a loopback-mounted image on the
>>> unit's main SD media show the same backtrace.  The loopback image was
>>> captured on other hardware, then copied onto the RPi via network.
>>
>> Missed this; let me fire up my pi and see if I can replicate it.
>
> Realized that I'll need to cross-compile xfs.ko I guess...
>
> But - do you see this when the *whole* kernel is cross-compiled?
> Building the kernel one way and xfs another way, with another gcc,
> is probably nothing but trouble.  :)

Yes, I did.  I remember seeing it in months past when those compilers
were freshly released.  I only mixed-and-matched here as a spot check
to be sure the errors were still present.  For any Real Serious
Business, I'll build end-to-end with the same compiler.

I've uploaded my demonstration problem file system here:
  <http://www.splack.org/~jason/projects/xfs-arm-corruption/problemimage.xfs>
This throws a backtrace when "find ." is run on the mountpoint.  The
junk in the file system is just that--filler.  Don't take the kernel
archives as debugging builds.

Jason

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 22:33 ` Eric Sandeen
@ 2013-02-26 23:25   ` Jason Detring
       [not found]     ` <512D49E2.40003@sandeen.net>
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Detring @ 2013-02-26 23:25 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On 2/26/13, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 2/26/13 3:58 PM, Jason Detring wrote:
>> Here's the kicker:  All this seems to happen only if xfs.ko is
>> crosscompiled with GCC 4.6 or 4.7.
>
> urk!  That is a kicker.
>
>> A module (just the module, the rest of kernel can be built with
>> anything) compiled with
>> cross-GCC 4.4.1, 4.5.4, or curiously 4.8 (20130224) has no issue at all.
>> I've kept an old 2009 Sourcery G++ (4.4.1) Lite toolchain around just
>> for building kernels.
>> I'd really like to retire it, but I'm a little afraid this is going to
>> recur in newer compilers.
>
> Maybe you can provide an xfs.ko built with each (for the same kernel)
> with debug info, and we can compare the disassembly?

OK, will do this evening when I can get things cleaned up a bit.


> Enabling the trace_xfs_da_btree_corrupt tracepoint might yield more
> info, can you do that?
>
> I think it's:
>
> # trace-cmd -e xfs_da_btree_corrupt &
> # <do your dir read>
> # fg
> # ^C (ctrl-c trace-cmd)
> # trace-cmd report
>
> We might get more info about the buffer in question that way.

I'll give it a go, but it might take me a while to get back to you.
I'm not familiar with that tool, and it looks like it's not part of my
base install.

> -Eric

Jason

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 23:21     ` Jason Detring
@ 2013-02-27  2:16       ` Dave Chinner
  2013-02-27 14:48         ` Eric Sandeen
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-02-27  2:16 UTC (permalink / raw)
  To: Jason Detring; +Cc: Eric Sandeen, xfs

On Tue, Feb 26, 2013 at 05:21:15PM -0600, Jason Detring wrote:
> On 2/26/13, Eric Sandeen <sandeen@sandeen.net> wrote:
> > On 2/26/13 4:37 PM, Eric Sandeen wrote:
> >> On 2/26/13 3:58 PM, Jason Detring wrote:
> >>> Hello list,
> >>
> >> <snip>
> >>
> >>> This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
> >>> case filesystem.
> >>> The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
> >>> populated by kernel 3.6.9.
> >>> This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
> >>>    <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
> >>> The problem appears to be tied to the filesystem, not the media,
> >>> since both an external USB reader and a loopback-mounted image on the
> >>> unit's main SD media show the same backtrace.  The loopback image was
> >>> captured on other hardware, then copied onto the RPi via network.
> >>
> >> Missed this; let me fire up my pi and see if I can replicate it.
> >
> > Realized that I'll need to cross-compile xfs.ko I guess...
> >
> > But - do you see this when the *whole* kernel is cross-compiled?
> > Building the kernel one way and xfs another way, with another gcc,
> > is probably nothing but trouble.  :)
> 
> Yes, I did.  I remember seeing it in months past when those compilers
> were freshly released.  I only mixed-and-matched here as a spot check
> to be sure the errors were still present.  For any Real Serious
> Business, I'll build end-to-end with the same compiler.
> 
> I've uploaded my demonstration problem file system here:
>   <http://www.splack.org/~jason/projects/xfs-arm-corruption/problemimage.xfs>
> This throws a backtrace when "find ." is run on the mountpoint.  The
> junk in the file system is just that--filler.  Don't take the kernel
> archives as debugging builds.

The filesystem image appears to be just fine. xfs_repair on x86_64 does
not complain about it, nor does xfs_check. Mounting and running find
on it on my current 3.8-dev kernel does not cause any problems,
either. And looking directly at the structures on disk I can't see
any obvious problems.

Hence whatever issue is being seen must be to do with the way the
compiled ARM code is interpreting the on-disk structures....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-26 21:58 Read corruption on ARM Jason Detring
  2013-02-26 22:33 ` Eric Sandeen
  2013-02-26 22:37 ` Eric Sandeen
@ 2013-02-27  7:19 ` Stefan Ring
  2013-02-27 14:48   ` Eric Sandeen
  2 siblings, 1 reply; 19+ messages in thread
From: Stefan Ring @ 2013-02-27  7:19 UTC (permalink / raw)
  To: Jason Detring; +Cc: xfs

Risking stating the obvious, but there has very recently been an
almost identical thread, also with armv5:
http://oss.sgi.com/pipermail/xfs/2013-January/023805.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-27  7:19 ` Stefan Ring
@ 2013-02-27 14:48   ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2013-02-27 14:48 UTC (permalink / raw)
  To: Stefan Ring; +Cc: Jason Detring, xfs

On 2/27/13 1:19 AM, Stefan Ring wrote:
> Risking stating the obvious, but there has very recently been an
> almost identical thread, also with armv5:
> http://oss.sgi.com/pipermail/xfs/2013-January/023805.html

Thanks, I thought this sounded familiar!

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-27  2:16       ` Dave Chinner
@ 2013-02-27 14:48         ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2013-02-27 14:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jason Detring, xfs

On 2/26/13 8:16 PM, Dave Chinner wrote:
> On Tue, Feb 26, 2013 at 05:21:15PM -0600, Jason Detring wrote:
>> On 2/26/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>>> On 2/26/13 4:37 PM, Eric Sandeen wrote:
>>>> On 2/26/13 3:58 PM, Jason Detring wrote:
>>>>> Hello list,
>>>>
>>>> <snip>
>>>>
>>>>> This also seems to impact the Raspberry Pi.  Below shows a 256 MB test
>>>>> case filesystem.
>>>>> The filesystem was created on an x86-64 box by mkfs.xfs 3.1.8 and
>>>>> populated by kernel 3.6.9.
>>>>> This failure report is Linux 3.6.11-g89caf39 built by GCC 4.7.2 from
>>>>>    <https://github.com/raspberrypi/linux/commits/rpi-3.6.y>
>>>>> The problem appears to be tied to the filesystem, not the media,
>>>>> since both an external USB reader and a loopback-mounted image on the
>>>>> unit's main SD media show the same backtrace.  The loopback image was
>>>>> captured on other hardware, then copied onto the RPi via network.
>>>>
>>>> Missed this; let me fire up my pi and see if I can replicate it.
>>>
>>> Realized that I'll need to cross-compile xfs.ko I guess...
>>>
>>> But - do you see this when the *whole* kernel is cross-compiled?
>>> Building the kernel one way and xfs another way, with another gcc,
>>> is probably nothing but trouble.  :)
>>
>> Yes, I did.  I remember seeing it in months past when those compilers
>> were freshly released.  I only mixed-and-matched here as a spot check
>> to be sure the errors were still present.  For any Real Serious
>> Business, I'll build end-to-end with the same compiler.
>>
>> I've uploaded my demonstration problem file system here:
>>   <http://www.splack.org/~jason/projects/xfs-arm-corruption/problemimage.xfs>
>> This throws a backtrace when "find ." is run on the mountpoint.  The
>> junk in the file system is just that--filler.  Don't take the kernel
>> archives as debugging builds.
> 
> The filesystem image appears to be just fine. xfs_repair on x86_64 does
> not complain about it, nor does xfs_check. Mounting and running find
> on it on my current 3.8-dev kernel does not cause any problems,
> either. And looking directly at the structures on disk I can't see
> any obvious problems.

And works fine on my arm-compiled xfs.ko on my R-Pi.

> Hence whatever issue is being seen must be to do with the way the
> compiled ARM code is interpreting the on-disk structures....

s/compiled/cross-compiled/

-Eric

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
       [not found]       ` <CA+AKrqCrphO-eKy0n=70O9hmB3mXttOsKmTdfRnPxgJM3_PAkQ@mail.gmail.com>
@ 2013-02-27 17:00         ` Eric Sandeen
       [not found]           ` <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-27 17:00 UTC (permalink / raw)
  To: Jason Detring; +Cc: xfs

On 2/27/13 10:28 AM, Jason Detring wrote:
>             find-502   [000]   207.983594: xfs_da_btree_corrupt: dev 7:0 bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller xfs_dir2_leaf_readbuf

Was this on the same image as you sent earlier?

Ok, so this tells us that it was trying to read sector nr. 0x5a4f8 (369912), or fsblock 46239

What's really on disk there?

$ xfs_db problemimage.xfs
xfs_db> blockget -n
xfs_db> daddr 369912
xfs_db> blockuse
block 49152 (3/0) type sb
xfs_db> type text
xfs_db> p
000:  58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0 d3  XFSB............
...

So it really did have a superblock location that it was reading
at that point - the backup SB in the 3rd allocation group, to be exact.
But it shouldn't have been trying to read a superblock at this point
in the code...

Hm, maybe I should have had you enable all xfs tracepoints to get
more info about where we thought we were on disk when we were doing this.
If you used trace-cmd you can do "trace-cmd record -e xfs*" IIRC.
You can do similar echo 1 > /<blah>/xfs*/enable I think for the sysfs route.

Can you identify which directory it was that tripped the above error?

(I still think it sounds like a miscompile, but trying to get more clues)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
       [not found]           ` <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>
@ 2013-02-27 21:10             ` Eric Sandeen
       [not found]               ` <512E89C2.9000302@sandeen.net>
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-27 21:10 UTC (permalink / raw)
  To: Jason Detring; +Cc: xfs

On 2/27/13 12:15 PM, Jason Detring wrote:
> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>> On 2/27/13 10:28 AM, Jason Detring wrote:
>>>             find-502   [000]   207.983594: xfs_da_btree_corrupt: dev 7:0
>>> bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller
>>> xfs_dir2_leaf_readbuf
>>
>> Was this on the same image as you sent earlier?
> 
> Yes, sorry, I should have said that.  I'm now using the demo image
> with the RasPi exclusively for testing.
> 
> 
>> Ok, so this tells us that it was trying to read sector nr. 0x5a4f8 (369912),
>> or fsblock 46239
>>
>> What's really on disk there?
>>
>> $ xfs_db problemimage.xfs
>> xfs_db> blockget -n
>> xfs_db> daddr 369912
>> xfs_db> blockuse
>> block 49152 (3/0) type sb
>> xfs_db> type text
>> xfs_db> p
>> 000:  58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0 d3  XFSB............
>> ...
>>
>> So it really did have a superblock location that it was reading
>> at that point - the backup SB in the 3rd allocation group, to be exact.
>> But it shouldn't have been trying to read a superblock at this point
>> in the code...
>>
>> Hm, maybe I should have had you enable all xfs tracepoints to get
>> more info about where we thought we were on disk when we were doing this.
>> If you used trace-cmd you can do "trace-cmd record -e xfs*" IIRC.
>> You can do similar echo 1 > /<blah>/xfs*/enable I think for the sysfs
>> route.
>>
>> Can you identify which directory it was that tripped the above error?
> 
> # modprobe xfs-O1-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # find /loop -type d -print0 > list.txt
> # umount /loop
> # rmmod xfs
> # modprobe xfs-O2-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # cat list.txt | xargs -0 -P1 -n1 -I{} sh -c '(dir="{}" ; ls "${dir}"
>> /dev/null ; sleep 0.1 ; dmesg | tail -n1 | grep Corruption && echo
> "${dir} is causing problems")'
> ls: reading directory /loop/ruby/1.9.1: Structure needs cleaning
> [35689.975822] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> /loop/ruby/1.9.1 is causing problems
> ...
> 
> OK, I now have a name.  Rebooting to get a clean slate.

Ok, and an inode number:

134 test/ruby/1.9.1

xfs_db> inode 134
xfs_db> p
core.format = 2 (extents)
...
core.aformat = 2 (extents)
...
u.bmx[0-1] = [startoff,startblock,blockcount,extentflag] 0:[0,53675,1,0] 1:[8388608,60304,1,0]

so those are the blocks it should live in.

Or, if you prefer:

# xfs_bmap -vv test/ruby/1.9.1
test/ruby/1.9.1:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          406096..406103    3 (36184..36191)       8

Here's the relevant part of the trace, from the readdir of that inode:

   ls-520   xfs_readdir:          ino 0x86
   ls-520   xfs_perag_get:        agno 3 refcount 2 caller _xfs_buf_find
   ls-520   xfs_perag_put:        agno 3 refcount 1 caller _xfs_buf_find
   ls-520   xfs_buf_init:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_get_map

by here we're already looking for the block which isn't related to the dir.

   ls-520   xfs_perag_get:        agno 3 refcount 2 caller _xfs_buf_find
   ls-520   xfs_buf_get:          bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_read_map
   ls-520   xfs_buf_read:         bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_trans_read_buf_map
   ls-520   xfs_buf_iorequest:    bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
   ls-520   xfs_buf_hold:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   ls-520   xfs_buf_rele:         bno 0x5a4f8 nblks 0x8 hold 2 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   ls-520   xfs_buf_iowait:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
loop0-514   xfs_buf_ioerror:      bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 error 0 flags READ|PAGES caller xfs_buf_bio_end_io
loop0-514   xfs_buf_iodone:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_ioend
   ls-520   xfs_buf_iowait_done:  bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller _xfs_buf_read
   ls-520   xfs_da_btree_corrupt: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller xfs_dir2_leaf_readbuf

and here's where we notice that fact I think.

   ls-520   xfs_buf_unlock:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse
   ls-520   xfs_buf_rele:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse

Not yet sure what's up here.  I'd probably need to get a cross-compiled xfs.ko going on my rpi to do more debugging...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
       [not found]                     ` <CA+AKrqAv7-5gGj_cNBNj=-nChKPzi+_HZmH=z2UABG9pDOmpBg@mail.gmail.com>
@ 2013-02-28  4:38                       ` Eric Sandeen
  2013-02-28  4:50                         ` Eric Sandeen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-28  4:38 UTC (permalink / raw)
  To: Jason Detring, xfs-oss

On 2/27/13 8:57 PM, Jason Detring wrote:
> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>> On 2/27/13 4:56 PM, Jason Detring wrote:
>>> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>> Can you send xfs.ko's from both native & cross-compiles?  Need debugging
>>>> info in the binaries (nonstripped)
>>>
>>> I don't have a native build, sorry :-(   I can put one together if
>>> necessary, but it will be quite a while on the Pi.
>>
>> Yah I found that out ;)  You could just make M=fs/xfs though?
> 
> Done.  These are also on the site at
>   <http://www.splack.org/~jason/projects/xfs-arm-corruption/tracetest/3.6.11-g89caf39/>
> The directory containing cross-compiled modules has been renamed
> xfs-modules-cross/ and the new natively built modules are beneath the
> xfs-modules-native/ directory.

re-cc'ing xfs list

So I used pahole to look at all structs, objdump -d to disassemble,
and md5sum'd the results to see what's different.

pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis native/*.pahole

<manual sort>

c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole

so all structures look identical, good - but:

while disassembly of these two modules match:

d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis

do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?

the others differ:

349f3490a49f2ce539c2b058914f64f0  native/xfs-Os-g.ko.dis
91c8e8230774808b538c21a83106a5d7  cross/xfs-Os-g.ko.dis

649338e1b8eeed6a294504fc76a39cb0  native/xfs-O2-g.ko.dis
e52c2a48277326c313bba76aa0b33ab7  cross/xfs-O2-g.ko.dis

The diff of the disassembly of the others is huge, hard to
know where to start just yet.  Need an objdump mode that only
shows function-relative addresses or something to cut down
on the noise.

-Eric

> Slackware ARM (-current) also uses GCC 4.7.2 as its native compiler.
> The test modules built with it at -O2 are failing to read the
> ruby/1.9.1/ directory as well.  I don't know if that's fortunate (my
> homebrew compilers are just as good or bad as the distro's?) or
> unfortunate (I still have the problem and now I am diverging from your
> native RPi results that worked).
> 
> Is there maybe a memory or I/O tunable in the kernel .config that I've
> clobbered?
> 
> Jason
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-28  4:38                       ` Eric Sandeen
@ 2013-02-28  4:50                         ` Eric Sandeen
  2013-02-28  5:27                           ` Eric Sandeen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-28  4:50 UTC (permalink / raw)
  To: Jason Detring, xfs-oss

On 2/27/13 10:38 PM, Eric Sandeen wrote:

...

> re-cc'ing xfs list
> 
> So I used pahole to look at all structs, objdump -d to disassemble,
> and md5sum'd the results to see what's different.
> 
> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis native/*.pahole
> 
> <manual sort>
> 
> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
> 
> so all structures look identical, good - but:
> 
> while disassembly of these two modules match:
> 
> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
> 
> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?
> 
> the others differ:
> 
> 349f3490a49f2ce539c2b058914f64f0  native/xfs-Os-g.ko.dis
> 91c8e8230774808b538c21a83106a5d7  cross/xfs-Os-g.ko.dis
> 
> 649338e1b8eeed6a294504fc76a39cb0  native/xfs-O2-g.ko.dis
> e52c2a48277326c313bba76aa0b33ab7  cross/xfs-O2-g.ko.dis
> 
> The diff of the disassembly of the others is huge, hard to
> know where to start just yet.  Need an objdump mode that only
> shows function-relative addresses or something to cut down
> on the noise.

Could you try the same, to isolate the differences: objdump -d
all of the *.o files for, say, the -O2 build, md5sum & compare,
and see which ones differ?

-Eric


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-28  4:50                         ` Eric Sandeen
@ 2013-02-28  5:27                           ` Eric Sandeen
  2013-02-28 21:38                             ` Jason Detring
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-02-28  5:27 UTC (permalink / raw)
  To: Jason Detring, xfs-oss

On 2/27/13 10:50 PM, Eric Sandeen wrote:
> On 2/27/13 10:38 PM, Eric Sandeen wrote:
> 
> ...
> 
>> re-cc'ing xfs list
>>
>> So I used pahole to look at all structs, objdump -d to disassemble,
>> and md5sum'd the results to see what's different.
>>
>> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis native/*.pahole
>>
>> <manual sort>
>>
>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
>>
>> so all structures look identical, good - but:
>>
>> while disassembly of these two modules match:
>>
>> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
>> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
>>
>> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?
>>
>> the others differ:
>>
>> 349f3490a49f2ce539c2b058914f64f0  native/xfs-Os-g.ko.dis
>> 91c8e8230774808b538c21a83106a5d7  cross/xfs-Os-g.ko.dis
>>
>> 649338e1b8eeed6a294504fc76a39cb0  native/xfs-O2-g.ko.dis
>> e52c2a48277326c313bba76aa0b33ab7  cross/xfs-O2-g.ko.dis
>>
>> The diff of the disassembly of the others is huge, hard to
>> know where to start just yet.  Need an objdump mode that only
>> shows function-relative addresses or something to cut down
>> on the noise.
> 
> Could you try the same, to isolate the differences: objdump -d
> all of the *.o files for, say, the -O2 build, md5sum & compare,
> and see which ones differ?

And one more test.  Every time you hit the error, it causes
a log replay on the next mount since the fs has shut down.

Can you try

# mount; umount; mount; test

so that you start the test from a clean mount, and see if you still hit it?

Maybe save that image off before you do that test just in case it changes
the state.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-28  5:27                           ` Eric Sandeen
@ 2013-02-28 21:38                             ` Jason Detring
  2013-03-01  2:25                               ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Detring @ 2013-02-28 21:38 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 2/27/13 10:50 PM, Eric Sandeen wrote:
>> On 2/27/13 10:38 PM, Eric Sandeen wrote:
>>
>> ...
>>
>>> re-cc'ing xfs list
>>>
>>> So I used pahole to look at all structs, objdump -d to disassemble,
>>> and md5sum'd the results to see what's different.
>>>
>>> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis
>>> native/*.pahole
>>>
>>> <manual sort>
>>>
>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
>>>
>>> so all structures look identical, good - but:
>>>
>>> while disassembly of these two modules match:
>>>
>>> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
>>> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
>>>
>>> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?

No, I didn't.  The problem has only shown itself on the -O2 builds,
both native and cross-compiled.  Lower optimization levels don't show
any of the symptoms.

Perhaps a better comparison would be-O2 builds among working and
non-working compilers?   You'd asked for these before, but I just
finished them today.  The modules, build logs, and fs/xfs/ build trees
are up at
  <http://www.splack.org/~jason/projects/xfs-arm-corruption/3.6.11-g89caf39/>
A quick rundown:
  -cross-gcc4.4:  OK
  -cross-gcc4.5:  OK
  -cross-gcc4.6:  BAD
  -cross-gcc4.7:  BAD
  -cross-gcc4.8:  OK
Some of these don't seem to want to rmmod after they've been inserted.
 Argh reboots.


>>> the others differ:
>>>
>>> 349f3490a49f2ce539c2b058914f64f0  native/xfs-Os-g.ko.dis
>>> 91c8e8230774808b538c21a83106a5d7  cross/xfs-Os-g.ko.dis
>>>
>>> 649338e1b8eeed6a294504fc76a39cb0  native/xfs-O2-g.ko.dis
>>> e52c2a48277326c313bba76aa0b33ab7  cross/xfs-O2-g.ko.dis
>>>
>>> The diff of the disassembly of the others is huge, hard to
>>> know where to start just yet.  Need an objdump mode that only
>>> shows function-relative addresses or something to cut down
>>> on the noise.
>>
>> Could you try the same, to isolate the differences: objdump -d
>> all of the *.o files for, say, the -O2 build, md5sum & compare,
>> and see which ones differ?

Er, uh...  oops! :-)    I'd scrubbed the objects between each test, so
each module had to be regenerated.  So, the intermediate objects won't
match the various xfs-O2-g.ko's you've already downloaded.  Look in
the -cross-gcc4.7 and -native-gcc4.7 subdirectories for new copies.


# pwd
/xfsdebug/tracetest/3.6.11-g89caf39/xfs-modules-native-gcc4.7/xfs-O2-g-obj
# for obj in *.o; do
if [ "$(objdump -d $obj | md5sum)" != "$(cd
../../xfs-modules-cross-gcc4.7/xfs-O2-g-obj/ && objdump -d $obj |
md5sum)" ]; then
echo "obj $obj is different";  fi; done
obj xfs.o is different
obj xfs_attr_leaf.o is different
obj xfs_bmap.o is different
obj xfs_dir2_block.o is different
obj xfs_itable.o is different
obj xfs_log.o is different
obj xfs_log_recover.o is different



> And one more test.  Every time you hit the error, it causes
> a log replay on the next mount since the fs has shut down.
>
> Can you try
>
> # mount; umount; mount; test
>
> so that you start the test from a clean mount, and see if you still hit it?
>
> Maybe save that image off before you do that test just in case it changes
> the state.

I'm not sure on that.  Even in read-write mode, the notice in my
kernel log has always been "Corruption detected.  Unmount and run
xfs_repair".  It's never been a forced filesystem shutdown, just a
stern warning and half-accessible files.  The next mount always seems
to be clean.

[89574.079876] XFS (loop0): Corruption detected. Unmount and run xfs_repair
[89587.269316] XFS (loop0): Mounting Filesystem
[89587.444629] XFS (loop0): Ending clean mount

I usually mount read-only and it doesn't seem like the image's md5sum
doesn't change between runs.  I made a copy then mounted it read-write
a time or two.  The md5sum changed between mounts.  However, I am
still seeing the error when attempting to read the directory.  The
mounted-rw-checked image is up at
  <http://www.splack.org/~jason/projects/xfs-arm-corruption/journalreplaytest/>


Jason

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-02-28 21:38                             ` Jason Detring
@ 2013-03-01  2:25                               ` Dave Chinner
  2013-03-01  2:53                                 ` Eric Sandeen
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-03-01  2:25 UTC (permalink / raw)
  To: Jason Detring; +Cc: Eric Sandeen, xfs-oss

On Thu, Feb 28, 2013 at 03:38:51PM -0600, Jason Detring wrote:
> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
> > On 2/27/13 10:50 PM, Eric Sandeen wrote:
> >> On 2/27/13 10:38 PM, Eric Sandeen wrote:
> >>
> >> ...
> >>
> >>> re-cc'ing xfs list
> >>>
> >>> So I used pahole to look at all structs, objdump -d to disassemble,
> >>> and md5sum'd the results to see what's different.
> >>>
> >>> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis
> >>> native/*.pahole
> >>>
> >>> <manual sort>
> >>>
> >>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
> >>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
> >>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
> >>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
> >>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
> >>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
> >>>
> >>> so all structures look identical, good - but:
> >>>
> >>> while disassembly of these two modules match:
> >>>
> >>> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
> >>> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
> >>>
> >>> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?
> 
> No, I didn't.  The problem has only shown itself on the -O2 builds,
> both native and cross-compiled.  Lower optimization levels don't show
> any of the symptoms.
> 
> Perhaps a better comparison would be-O2 builds among working and
> non-working compilers?   You'd asked for these before, but I just
> finished them today.  The modules, build logs, and fs/xfs/ build trees
> are up at
>   <http://www.splack.org/~jason/projects/xfs-arm-corruption/3.6.11-g89caf39/>
> A quick rundown:
>   -cross-gcc4.4:  OK
>   -cross-gcc4.5:  OK
>   -cross-gcc4.6:  BAD
>   -cross-gcc4.7:  BAD
>   -cross-gcc4.8:  OK
> Some of these don't seem to want to rmmod after they've been inserted.
>  Argh reboots.

Do we really need to go any further than this to say conclusively
that this is a compiler problem? It's clearly not a problem with the
C code in that some compilers produce working code....

i.e. what steps do we need to take to get -cross-gcc4.[67]
blacklisted when it comes to building ARM kernels?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-03-01  2:25                               ` Dave Chinner
@ 2013-03-01  2:53                                 ` Eric Sandeen
  2013-03-01  4:54                                   ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2013-03-01  2:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jason Detring, xfs-oss

On 2/28/13 8:25 PM, Dave Chinner wrote:
> On Thu, Feb 28, 2013 at 03:38:51PM -0600, Jason Detring wrote:
>> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>>> On 2/27/13 10:50 PM, Eric Sandeen wrote:
>>>> On 2/27/13 10:38 PM, Eric Sandeen wrote:
>>>>
>>>> ...
>>>>
>>>>> re-cc'ing xfs list
>>>>>
>>>>> So I used pahole to look at all structs, objdump -d to disassemble,
>>>>> and md5sum'd the results to see what's different.
>>>>>
>>>>> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis
>>>>> native/*.pahole
>>>>>
>>>>> <manual sort>
>>>>>
>>>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
>>>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
>>>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
>>>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
>>>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
>>>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
>>>>>
>>>>> so all structures look identical, good - but:
>>>>>
>>>>> while disassembly of these two modules match:
>>>>>
>>>>> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
>>>>> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
>>>>>
>>>>> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?
>>
>> No, I didn't.  The problem has only shown itself on the -O2 builds,
>> both native and cross-compiled.  Lower optimization levels don't show
>> any of the symptoms.
>>
>> Perhaps a better comparison would be-O2 builds among working and
>> non-working compilers?   You'd asked for these before, but I just
>> finished them today.  The modules, build logs, and fs/xfs/ build trees
>> are up at
>>   <http://www.splack.org/~jason/projects/xfs-arm-corruption/3.6.11-g89caf39/>
>> A quick rundown:
>>   -cross-gcc4.4:  OK
>>   -cross-gcc4.5:  OK
>>   -cross-gcc4.6:  BAD
>>   -cross-gcc4.7:  BAD
>>   -cross-gcc4.8:  OK
>> Some of these don't seem to want to rmmod after they've been inserted.
>>  Argh reboots.
> 
> Do we really need to go any further than this to say conclusively
> that this is a compiler problem? It's clearly not a problem with the
> C code in that some compilers produce working code....
> 
> i.e. what steps do we need to take to get -cross-gcc4.[67]
> blacklisted when it comes to building ARM kernels?

Yeah, agreed.  (FWIW, I had misunderstood earlier; it's not a
cross-compile problem, it sounds like any native or cross compile
with 4.6 or 4.7 above a certain optimization level fails).

We could be helpful by tracking down the problem perhaps, but if it
is already fixed, perhaps no reason to do so (unless it was an
accidental fix that might show up again)

I suppose we could do something like :

#if defined(__arm__) && if __GNUC__ == 4 && (__GNUC_MINOR__ == 6 || __GNUC_MINOR__ == 7)
#warning gcc-4.[67] is known to miscompile xfs on arm.  A different compiler version is recommended.
#endif

The curious side of me still wants to track down what failed. ;)  Maybe weekend work.

-Eric



> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Read corruption on ARM
  2013-03-01  2:53                                 ` Eric Sandeen
@ 2013-03-01  4:54                                   ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2013-03-01  4:54 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Jason Detring, xfs-oss

On Thu, Feb 28, 2013 at 08:53:15PM -0600, Eric Sandeen wrote:
> On 2/28/13 8:25 PM, Dave Chinner wrote:
> > On Thu, Feb 28, 2013 at 03:38:51PM -0600, Jason Detring wrote:
> >> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
> >>> On 2/27/13 10:50 PM, Eric Sandeen wrote:
> >>>> On 2/27/13 10:38 PM, Eric Sandeen wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>> re-cc'ing xfs list
> >>>>>
> >>>>> So I used pahole to look at all structs, objdump -d to disassemble,
> >>>>> and md5sum'd the results to see what's different.
> >>>>>
> >>>>> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis
> >>>>> native/*.pahole
> >>>>>
> >>>>> <manual sort>
> >>>>>
> >>>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
> >>>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
> >>>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
> >>>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
> >>>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
> >>>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
> >>>>>
> >>>>> so all structures look identical, good - but:
> >>>>>
> >>>>> while disassembly of these two modules match:
> >>>>>
> >>>>> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
> >>>>> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
> >>>>>
> >>>>> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?
> >>
> >> No, I didn't.  The problem has only shown itself on the -O2 builds,
> >> both native and cross-compiled.  Lower optimization levels don't show
> >> any of the symptoms.
> >>
> >> Perhaps a better comparison would be-O2 builds among working and
> >> non-working compilers?   You'd asked for these before, but I just
> >> finished them today.  The modules, build logs, and fs/xfs/ build trees
> >> are up at
> >>   <http://www.splack.org/~jason/projects/xfs-arm-corruption/3.6.11-g89caf39/>
> >> A quick rundown:
> >>   -cross-gcc4.4:  OK
> >>   -cross-gcc4.5:  OK
> >>   -cross-gcc4.6:  BAD
> >>   -cross-gcc4.7:  BAD
> >>   -cross-gcc4.8:  OK
> >> Some of these don't seem to want to rmmod after they've been inserted.
> >>  Argh reboots.
> > 
> > Do we really need to go any further than this to say conclusively
> > that this is a compiler problem? It's clearly not a problem with the
> > C code in that some compilers produce working code....
> > 
> > i.e. what steps do we need to take to get -cross-gcc4.[67]
> > blacklisted when it comes to building ARM kernels?
> 
> Yeah, agreed.  (FWIW, I had misunderstood earlier; it's not a
> cross-compile problem, it sounds like any native or cross compile
> with 4.6 or 4.7 above a certain optimization level fails).
> 
> We could be helpful by tracking down the problem perhaps, but if it
> is already fixed, perhaps no reason to do so (unless it was an
> accidental fix that might show up again)
> 
> I suppose we could do something like :
> 
> #if defined(__arm__) && if __GNUC__ == 4 && (__GNUC_MINOR__ == 6 || __GNUC_MINOR__ == 7)
> #warning gcc-4.[67] is known to miscompile xfs on arm.  A different compiler version is recommended.
> #endif
> 
> The curious side of me still wants to track down what failed. ;)  Maybe weekend work.

I wouldn't use a warning - make it break the build immediately.
Maybe you could use BUILD_BUG_ON() for this....

Indeed, I'd even suggest sending a patch to lkml that blacklists
those ARM compiler versions altogether. i.e. if the compiler
miscompiles one kernel module, you can't trust any of the rest of
the kernel to correctly compiled, either...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-03-01  4:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-26 21:58 Read corruption on ARM Jason Detring
2013-02-26 22:33 ` Eric Sandeen
2013-02-26 23:25   ` Jason Detring
     [not found]     ` <512D49E2.40003@sandeen.net>
     [not found]       ` <CA+AKrqCrphO-eKy0n=70O9hmB3mXttOsKmTdfRnPxgJM3_PAkQ@mail.gmail.com>
2013-02-27 17:00         ` Eric Sandeen
     [not found]           ` <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>
2013-02-27 21:10             ` Eric Sandeen
     [not found]               ` <512E89C2.9000302@sandeen.net>
     [not found]                 ` <CA+AKrqDaY4cgP+EPLepzUOU2jAOygTuj-0xDtOaGf+O0aRZV_g@mail.gmail.com>
     [not found]                   ` <512E903A.2020405@sandeen.net>
     [not found]                     ` <CA+AKrqAv7-5gGj_cNBNj=-nChKPzi+_HZmH=z2UABG9pDOmpBg@mail.gmail.com>
2013-02-28  4:38                       ` Eric Sandeen
2013-02-28  4:50                         ` Eric Sandeen
2013-02-28  5:27                           ` Eric Sandeen
2013-02-28 21:38                             ` Jason Detring
2013-03-01  2:25                               ` Dave Chinner
2013-03-01  2:53                                 ` Eric Sandeen
2013-03-01  4:54                                   ` Dave Chinner
2013-02-26 22:37 ` Eric Sandeen
2013-02-26 22:51   ` Eric Sandeen
2013-02-26 23:21     ` Jason Detring
2013-02-27  2:16       ` Dave Chinner
2013-02-27 14:48         ` Eric Sandeen
2013-02-27  7:19 ` Stefan Ring
2013-02-27 14:48   ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.