linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* linux-next: xfs metadata corruption since 30 March
@ 2020-03-31 21:57 Qian Cai
  2020-03-31 22:13 ` Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Qian Cai @ 2020-03-31 21:57 UTC (permalink / raw)
  To: Darrick J. Wong, Christoph Hellwig; +Cc: linux-xfs, LKML

Ever since two days ago, linux-next starts to trigger xfs metadata corruption
during compilation workloads on both powerpc and arm64,

I suspect it could be one of those commits,

https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/

Especially, those commits that would mark corruption more aggressively?

      [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
      [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
      [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
      [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
      [6fb5aac73310] xfs: check owner of dir3 free blocks
      [a10c21ed5d52] xfs: check owner of dir3 data blocks
      [1b2c1a63b678] xfs: check owner of dir3 blocks
      [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
      [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails


[29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
[29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
[29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
[29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
[29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-
[29331.182442][  T665] 00000020: 32 2e 30 2d 6f 72 2d 6c 61 74 65 72 20 2a 2f 0a  2.0-or-later */.
[29331.182467][  T665] 00000030: 2f 2a 0a 20 2a 20 44 65 66 69 6e 69 74 69 6f 6e  /*. * Definition
[29331.182492][  T665] 00000040: 73 20 61 6e 64 20 70 6c 61 74 66 6f 72 6d 20 64  s and platform d
[29331.182517][  T665] 00000050: 61 74 61 20 66 6f 72 20 41 6e 61 6c 6f 67 20 44  ata for Analog D
[29331.182541][  T665] 00000060: 65 76 69 63 65 73 0a 20 2a 20 41 44 50 35 35 32  evices. * ADP552
[29331.182566][  T665] 00000070: 30 2f 41 44 50 35 35 30 31 20 4d 46 44 20 50 4d  0/ADP5501 MFD PM
[29331.182700][ T7490] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0xa9b97900 len 32 error 117
xfs_trans_read_buf at fs/xfs/xfs_trans.h:209
(inlined by) xfs_imap_to_bp at fs/xfs/libxfs/xfs_inode_buf.c:171
[29331.182812][ T7490] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
[29331.345347][ T7490] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 0000000058be213e
[29331.345378][ T7490] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
[29331.345402][ T7490] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
[29331.346474][  T498] dm-2: writeback error on inode 934606, offset 0, sector 961072
[29331.346502][  T498] dm-2: writeback error on inode 934607, offset 0, sector 961080
[29331.346624][  T498] dm-2: writeback error on inode 934608, offset 0, sector 961088
[29331.346683][  T498] dm-2: writeback error on inode 1074331758, offset 0, sector 948449704
[29331.347306][  T498] dm-2: writeback error on inode 1074331762, offset 0, sector 948558816
[29331.349165][  T498] dm-2: writeback error on inode 1074331759, offset 0, sector 948560984
[29331.349227][  T498] dm-2: writeback error on inode 1074331760, offset 0, sector 948562944
[29331.349303][  T498] dm-2: writeback error on inode 1074331761, offset 0, sector 948568000

[ 7762.204313][T124538] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0x712398e0 xfs_inode_buf_verify
[ 7762.204599][T124538] XFS (dm-2): Unmount and run xfs_repair
[ 7762.204625][T124538] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
[ 7762.204654][T124538] 00000000: 77 65 72 70 63 2f 69 6e 63 6c 75 64 65 2f 67 65  werpc/include/ge
[ 7762.204672][T124538] 00000010: 6e 65 72 61 74 65 64 2f 75 61 70 69 2f 61 73 6d  nerated/uapi/asm
[ 7762.204699][T124538] 00000020: 2f 72 65 73 6f 75 72 63 65 2e 68 20 5c 0a 20 69  /resource.h \. i
[ 7762.204727][T124538] 00000030: 6e 63 6c 75 64 65 2f 61 73 6d 2d 67 65 6e 65 72  nclude/asm-gener
[ 7762.204745][T124538] 00000040: 69 63 2f 72 65 73 6f 75 72 63 65 2e 68 20 69 6e  ic/resource.h in
[ 7762.204783][T124538] 00000050: 63 6c 75 64 65 2f 75 61 70 69 2f 61 73 6d 2d 67  clude/uapi/asm-g
[ 7762.204820][T124538] 00000060: 65 6e 65 72 69 63 2f 72 65 73 6f 75 72 63 65 2e  eneric/resource.
[ 7762.204858][T124538] 00000070: 68 20 5c 0a 20 69 6e 63 6c 75 64 65 2f 6c 69 6e  h \. include/lin
[ 7762.205068][ T7510] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0x712398e0 len 32 error 117
[ 7762.205466][ T7510] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
[ 7762.219267][ T7510] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 000000006bce0de3
[ 7762.219291][ T7510] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
[ 7762.219306][ T7510] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)


[ 1032.162278][ T1515] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x244/0x2bc [xfs], xfs_inode block 0xa2b75dc0 xfs_inode_buf_verify
[ 1032.176156][ T1515] XFS (dm-2): Unmount and run xfs_repair
[ 1032.181835][ T1515] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
[ 1032.189140][ T1515] 00000000: 6e 29 20 22 22 20 76 61 6c 75 65 20 22 22 20 73  n) "" value "" s
[ 1032.197988][ T1515] 00000010: 75 62 73 74 72 28 6c 69 6e 65 2c 20 6c 65 6e 20  ubstr(line, len 
[ 1032.206723][ T1515] 00000020: 2b 20 6b 65 79 6c 65 6e 20 2b 20 33 29 0a 20 20  + keylen + 3).  
[ 1032.215675][ T1515] 00000030: 20 20 20 20 6c 65 6e 20 2b 3d 20 6c 65 6e 67 74      len += lengt
[ 1032.224537][ T1515] 00000040: 68 28 76 61 6c 75 65 29 20 2b 20 6c 65 6e 67 74  h(value) + lengt
[ 1032.233388][ T1515] 00000050: 68 28 66 69 65 6c 64 5b 2b 2b 69 5d 29 0a 20 20  h(field[++i]).  
[ 1032.242234][ T1515] 00000060: 20 20 20 20 73 75 62 73 74 65 64 20 3d 20 31 0a      substed = 1.
[ 1032.251077][ T1515] 00000070: 20 20 20 20 7d 20 65 6c 73 65 0a 20 20 20 20 20      } else.     
[ 1032.260792][ T4119] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0xd8/0x18c [xfs]" at daddr 0xa2b75dc0 len 32 error 117
[ 1032.273096][ T4119] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
[ 1032.283283][ T4119] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 00000000d99a2721
[ 1032.296214][ T4119] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
[ 1032.305158][ T4119] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-03-31 21:57 linux-next: xfs metadata corruption since 30 March Qian Cai
@ 2020-03-31 22:13 ` Dave Chinner
  2020-04-01  2:13   ` Qian Cai
  2020-04-01  4:14 ` Chandan Rajendra
  2020-04-01 12:34 ` Brian Foster
  2 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2020-03-31 22:13 UTC (permalink / raw)
  To: Qian Cai; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML

On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote:
> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
> during compilation workloads on both powerpc and arm64,

Is this on an existing filesystem, or a new filesystem?

> I suspect it could be one of those commits,
> 
> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/
> 
> Especially, those commits that would mark corruption more aggressively?
> 
>       [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
>       [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
>       [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
>       [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
>       [6fb5aac73310] xfs: check owner of dir3 free blocks
>       [a10c21ed5d52] xfs: check owner of dir3 data blocks
>       [1b2c1a63b678] xfs: check owner of dir3 blocks
>       [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
>       [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails

Doubt it - they only add extra detection code and these:

> [29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
> [29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
> [29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> [29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
> [29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-

Would get caught by the existing  verifiers as they aren't valid
metadata at all.

Basically, you are getting file data where there should be inode
metadata. First thing to do is fix the existing corruptions with
xfs_repair - please post the entire output so we can see what was
corruption and what it fixed.

Then if the problem is still reproducable, I suspect you are going
to have to bisect it. i.e. run test, get corruption, mark bisect
bad, run xfs_repair or mkfs to fix mess, install new kernel, run
test again....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-03-31 22:13 ` Dave Chinner
@ 2020-04-01  2:13   ` Qian Cai
  0 siblings, 0 replies; 11+ messages in thread
From: Qian Cai @ 2020-04-01  2:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML



> On Mar 31, 2020, at 6:13 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote:
>> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
>> during compilation workloads on both powerpc and arm64,
> 
> Is this on an existing filesystem, or a new filesystem?

New.

> 
>> I suspect it could be one of those commits,
>> 
>> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/
>> 
>> Especially, those commits that would mark corruption more aggressively?
>> 
>>      [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
>>      [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
>>      [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
>>      [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
>>      [6fb5aac73310] xfs: check owner of dir3 free blocks
>>      [a10c21ed5d52] xfs: check owner of dir3 data blocks
>>      [1b2c1a63b678] xfs: check owner of dir3 blocks
>>      [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
>>      [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails
> 
> Doubt it - they only add extra detection code and these:
> 
>> [29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
>> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
>> [29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
>> [29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
>> [29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
>> [29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-
> 
> Would get caught by the existing  verifiers as they aren't valid
> metadata at all.
> 
> Basically, you are getting file data where there should be inode
> metadata. First thing to do is fix the existing corruptions with
> xfs_repair - please post the entire output so we can see what was
> corruption and what it fixed.


# xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home 
Phase 1 - find and verify superblock...
        - block cache size set to 4355512 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 793608 tail block 786824
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

# mount /dev/mapper/rhel_hpe--apollo--cn99xx--11-home /home/
# umount /home/
# xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home 
Phase 1 - find and verify superblock...
        - block cache size set to 4355512 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 793624 tail block 793624
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Tue Mar 31 22:10:54 2020

Phase		Start		End		Duration
Phase 1:	03/31 22:10:45	03/31 22:10:45	
Phase 2:	03/31 22:10:45	03/31 22:10:45	
Phase 3:	03/31 22:10:45	03/31 22:10:46	1 second
Phase 4:	03/31 22:10:46	03/31 22:10:53	7 seconds
Phase 5:	03/31 22:10:53	03/31 22:10:53	
Phase 6:	03/31 22:10:53	03/31 22:10:53	
Phase 7:	03/31 22:10:53	03/31 22:10:53	

Total run time: 8 seconds
done
> 
> Then if the problem is still reproducable, I suspect you are going
> to have to bisect it. i.e. run test, get corruption, mark bisect
> bad, run xfs_repair or mkfs to fix mess, install new kernel, run
> test again....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-03-31 21:57 linux-next: xfs metadata corruption since 30 March Qian Cai
  2020-03-31 22:13 ` Dave Chinner
@ 2020-04-01  4:14 ` Chandan Rajendra
  2020-04-01  4:15   ` Qian Cai
  2020-04-01 12:34 ` Brian Foster
  2 siblings, 1 reply; 11+ messages in thread
From: Chandan Rajendra @ 2020-04-01  4:14 UTC (permalink / raw)
  To: Qian Cai; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML

On Wednesday, April 1, 2020 3:27 AM Qian Cai wrote: 
> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
> during compilation workloads on both powerpc and arm64,

Can you please provide the filesystem geometry information?
You can get that by executing "xfs_info <mount-point>" command.

-- 
chandan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-04-01  4:14 ` Chandan Rajendra
@ 2020-04-01  4:15   ` Qian Cai
  2020-04-01  4:45     ` Darrick J. Wong
  0 siblings, 1 reply; 11+ messages in thread
From: Qian Cai @ 2020-04-01  4:15 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML



> On Apr 1, 2020, at 12:14 AM, Chandan Rajendra <chandan@linux.ibm.com> wrote:
> 
> On Wednesday, April 1, 2020 3:27 AM Qian Cai wrote: 
>> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
>> during compilation workloads on both powerpc and arm64,
> 
> Can you please provide the filesystem geometry information?
> You can get that by executing "xfs_info <mount-point>" command.
> 

== arm64 ==
# xfs_info /home/
meta-data=/dev/mapper/rhel_hpe--apollo--cn99xx--11-home isize=512    agcount=4, agsize=113568256 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=454273024, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=221813, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


== powerpc ==
# xfs_info /home/
meta-data=/dev/mapper/rhel_ibm--p9wr--01-home isize=512    agcount=4, agsize=118489856 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=473959424, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=231425, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

== x86 (not yet reproduced)  ==
meta-data=/dev/mapper/rhel_hpe--dl380gen9--01-home isize=512    agcount=16, agsize=3283776 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=52540416, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=25664, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-04-01  4:15   ` Qian Cai
@ 2020-04-01  4:45     ` Darrick J. Wong
  2020-04-01  6:10       ` Chandan Rajendra
  2020-04-01 13:54       ` Qian Cai
  0 siblings, 2 replies; 11+ messages in thread
From: Darrick J. Wong @ 2020-04-01  4:45 UTC (permalink / raw)
  To: Qian Cai; +Cc: Chandan Rajendra, Christoph Hellwig, linux-xfs, LKML

On Wed, Apr 01, 2020 at 12:15:32AM -0400, Qian Cai wrote:
> 
> 
> > On Apr 1, 2020, at 12:14 AM, Chandan Rajendra <chandan@linux.ibm.com> wrote:
> > 
> > On Wednesday, April 1, 2020 3:27 AM Qian Cai wrote: 
> >> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
> >> during compilation workloads on both powerpc and arm64,
> > 
> > Can you please provide the filesystem geometry information?
> > You can get that by executing "xfs_info <mount-point>" command.
> > 

Hmm.   Do the arm/ppc systems have 64k pages?  kconfigs might be a good
starting place.  Also, does the xfs for-next branch exhibit this
problem, or is it just the big -next branch that Stephen Rothwell puts
out?

--D

> == arm64 ==
> # xfs_info /home/
> meta-data=/dev/mapper/rhel_hpe--apollo--cn99xx--11-home isize=512    agcount=4, agsize=113568256 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1
> data     =                       bsize=4096   blocks=454273024, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=221813, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> 
> == powerpc ==
> # xfs_info /home/
> meta-data=/dev/mapper/rhel_ibm--p9wr--01-home isize=512    agcount=4, agsize=118489856 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1
> data     =                       bsize=4096   blocks=473959424, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=231425, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> == x86 (not yet reproduced)  ==
> meta-data=/dev/mapper/rhel_hpe--dl380gen9--01-home isize=512    agcount=16, agsize=3283776 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1
> data     =                       bsize=4096   blocks=52540416, imaxpct=25
>          =                       sunit=64     swidth=64 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=25664, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-04-01  4:45     ` Darrick J. Wong
@ 2020-04-01  6:10       ` Chandan Rajendra
  2020-04-01 13:54       ` Qian Cai
  1 sibling, 0 replies; 11+ messages in thread
From: Chandan Rajendra @ 2020-04-01  6:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Qian Cai, Christoph Hellwig, linux-xfs, LKML

On Wednesday, April 1, 2020 10:15 AM Darrick J. Wong wrote: 
> On Wed, Apr 01, 2020 at 12:15:32AM -0400, Qian Cai wrote:
> > 
> > 
> > > On Apr 1, 2020, at 12:14 AM, Chandan Rajendra <chandan@linux.ibm.com> wrote:
> > > 
> > > On Wednesday, April 1, 2020 3:27 AM Qian Cai wrote: 
> > >> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
> > >> during compilation workloads on both powerpc and arm64,
> > > 
> > > Can you please provide the filesystem geometry information?
> > > You can get that by executing "xfs_info <mount-point>" command.
> > > 
>

I wasn't able to recreate this issue on my P8 kvm guest. Can you provide,
1. The build command line you are using.
2. The number of online CPUs and amount of system memory.
2. As Darrick pointed out, Can you please provide the kconfig used (especially
   the one used for powerpc build).

> Hmm.   Do the arm/ppc systems have 64k pages?  kconfigs might be a good
> starting place.  Also, does the xfs for-next branch exhibit this
> problem, or is it just the big -next branch that Stephen Rothwell puts
> out?
> 
> 
> --D
> 
> > == arm64 ==
> > # xfs_info /home/
> > meta-data=/dev/mapper/rhel_hpe--apollo--cn99xx--11-home isize=512    agcount=4, agsize=113568256 blks
> >          =                       sectsz=4096  attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1
> > data     =                       bsize=4096   blocks=454273024, imaxpct=5
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=221813, version=2
> >          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > 
> > == powerpc ==
> > # xfs_info /home/
> > meta-data=/dev/mapper/rhel_ibm--p9wr--01-home isize=512    agcount=4, agsize=118489856 blks
> >          =                       sectsz=4096  attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1
> > data     =                       bsize=4096   blocks=473959424, imaxpct=5
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=231425, version=2
> >          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > == x86 (not yet reproduced)  ==
> > meta-data=/dev/mapper/rhel_hpe--dl380gen9--01-home isize=512    agcount=16, agsize=3283776 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1
> > data     =                       bsize=4096   blocks=52540416, imaxpct=25
> >          =                       sunit=64     swidth=64 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=25664, version=2
> >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-03-31 21:57 linux-next: xfs metadata corruption since 30 March Qian Cai
  2020-03-31 22:13 ` Dave Chinner
  2020-04-01  4:14 ` Chandan Rajendra
@ 2020-04-01 12:34 ` Brian Foster
  2020-04-01 16:21   ` Brian Foster
  2 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2020-04-01 12:34 UTC (permalink / raw)
  To: Qian Cai; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML

On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote:
> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
> during compilation workloads on both powerpc and arm64,
> 
> I suspect it could be one of those commits,
> 
> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/
> 
> Especially, those commits that would mark corruption more aggressively?
> 
>       [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
>       [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
>       [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
>       [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
>       [6fb5aac73310] xfs: check owner of dir3 free blocks
>       [a10c21ed5d52] xfs: check owner of dir3 data blocks
>       [1b2c1a63b678] xfs: check owner of dir3 blocks
>       [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
>       [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails
> 
> 
> [29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
> [29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
> [29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> [29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
> [29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-
> [29331.182442][  T665] 00000020: 32 2e 30 2d 6f 72 2d 6c 61 74 65 72 20 2a 2f 0a  2.0-or-later */.
> [29331.182467][  T665] 00000030: 2f 2a 0a 20 2a 20 44 65 66 69 6e 69 74 69 6f 6e  /*. * Definition
> [29331.182492][  T665] 00000040: 73 20 61 6e 64 20 70 6c 61 74 66 6f 72 6d 20 64  s and platform d
> [29331.182517][  T665] 00000050: 61 74 61 20 66 6f 72 20 41 6e 61 6c 6f 67 20 44  ata for Analog D
> [29331.182541][  T665] 00000060: 65 76 69 63 65 73 0a 20 2a 20 41 44 50 35 35 32  evices. * ADP552
> [29331.182566][  T665] 00000070: 30 2f 41 44 50 35 35 30 31 20 4d 46 44 20 50 4d  0/ADP5501 MFD PM
> [29331.182700][ T7490] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0xa9b97900 len 32 error 117
> xfs_trans_read_buf at fs/xfs/xfs_trans.h:209
> (inlined by) xfs_imap_to_bp at fs/xfs/libxfs/xfs_inode_buf.c:171
> [29331.182812][ T7490] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> [29331.345347][ T7490] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 0000000058be213e
> [29331.345378][ T7490] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> [29331.345402][ T7490] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)

I've actually been seeing similar corruption errors in the past day or
two but I'd chalked it up to the work and testing I'm doing in my
development branch. I'm also on a system/storage configuration that I
haven't established trust in yet, fwiw. I have recently rebased to
for-next, so I installed that baseline kernel and hit a similar crash
after 20 minutes or so of running fsstress. I'm not sure how consistent
this failure is yet but I'll see if I can back off from there and narrow
it down to a patch...

Brian

> [29331.346474][  T498] dm-2: writeback error on inode 934606, offset 0, sector 961072
> [29331.346502][  T498] dm-2: writeback error on inode 934607, offset 0, sector 961080
> [29331.346624][  T498] dm-2: writeback error on inode 934608, offset 0, sector 961088
> [29331.346683][  T498] dm-2: writeback error on inode 1074331758, offset 0, sector 948449704
> [29331.347306][  T498] dm-2: writeback error on inode 1074331762, offset 0, sector 948558816
> [29331.349165][  T498] dm-2: writeback error on inode 1074331759, offset 0, sector 948560984
> [29331.349227][  T498] dm-2: writeback error on inode 1074331760, offset 0, sector 948562944
> [29331.349303][  T498] dm-2: writeback error on inode 1074331761, offset 0, sector 948568000
> 
> [ 7762.204313][T124538] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0x712398e0 xfs_inode_buf_verify
> [ 7762.204599][T124538] XFS (dm-2): Unmount and run xfs_repair
> [ 7762.204625][T124538] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> [ 7762.204654][T124538] 00000000: 77 65 72 70 63 2f 69 6e 63 6c 75 64 65 2f 67 65  werpc/include/ge
> [ 7762.204672][T124538] 00000010: 6e 65 72 61 74 65 64 2f 75 61 70 69 2f 61 73 6d  nerated/uapi/asm
> [ 7762.204699][T124538] 00000020: 2f 72 65 73 6f 75 72 63 65 2e 68 20 5c 0a 20 69  /resource.h \. i
> [ 7762.204727][T124538] 00000030: 6e 63 6c 75 64 65 2f 61 73 6d 2d 67 65 6e 65 72  nclude/asm-gener
> [ 7762.204745][T124538] 00000040: 69 63 2f 72 65 73 6f 75 72 63 65 2e 68 20 69 6e  ic/resource.h in
> [ 7762.204783][T124538] 00000050: 63 6c 75 64 65 2f 75 61 70 69 2f 61 73 6d 2d 67  clude/uapi/asm-g
> [ 7762.204820][T124538] 00000060: 65 6e 65 72 69 63 2f 72 65 73 6f 75 72 63 65 2e  eneric/resource.
> [ 7762.204858][T124538] 00000070: 68 20 5c 0a 20 69 6e 63 6c 75 64 65 2f 6c 69 6e  h \. include/lin
> [ 7762.205068][ T7510] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0x712398e0 len 32 error 117
> [ 7762.205466][ T7510] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> [ 7762.219267][ T7510] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 000000006bce0de3
> [ 7762.219291][ T7510] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> [ 7762.219306][ T7510] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
> 
> 
> [ 1032.162278][ T1515] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x244/0x2bc [xfs], xfs_inode block 0xa2b75dc0 xfs_inode_buf_verify
> [ 1032.176156][ T1515] XFS (dm-2): Unmount and run xfs_repair
> [ 1032.181835][ T1515] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> [ 1032.189140][ T1515] 00000000: 6e 29 20 22 22 20 76 61 6c 75 65 20 22 22 20 73  n) "" value "" s
> [ 1032.197988][ T1515] 00000010: 75 62 73 74 72 28 6c 69 6e 65 2c 20 6c 65 6e 20  ubstr(line, len 
> [ 1032.206723][ T1515] 00000020: 2b 20 6b 65 79 6c 65 6e 20 2b 20 33 29 0a 20 20  + keylen + 3).  
> [ 1032.215675][ T1515] 00000030: 20 20 20 20 6c 65 6e 20 2b 3d 20 6c 65 6e 67 74      len += lengt
> [ 1032.224537][ T1515] 00000040: 68 28 76 61 6c 75 65 29 20 2b 20 6c 65 6e 67 74  h(value) + lengt
> [ 1032.233388][ T1515] 00000050: 68 28 66 69 65 6c 64 5b 2b 2b 69 5d 29 0a 20 20  h(field[++i]).  
> [ 1032.242234][ T1515] 00000060: 20 20 20 20 73 75 62 73 74 65 64 20 3d 20 31 0a      substed = 1.
> [ 1032.251077][ T1515] 00000070: 20 20 20 20 7d 20 65 6c 73 65 0a 20 20 20 20 20      } else.     
> [ 1032.260792][ T4119] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0xd8/0x18c [xfs]" at daddr 0xa2b75dc0 len 32 error 117
> [ 1032.273096][ T4119] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> [ 1032.283283][ T4119] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 00000000d99a2721
> [ 1032.296214][ T4119] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> [ 1032.305158][ T4119] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-04-01  4:45     ` Darrick J. Wong
  2020-04-01  6:10       ` Chandan Rajendra
@ 2020-04-01 13:54       ` Qian Cai
  1 sibling, 0 replies; 11+ messages in thread
From: Qian Cai @ 2020-04-01 13:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Chandan Rajendra, Christoph Hellwig, linux-xfs, LKML



> On Apr 1, 2020, at 12:45 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> On Wed, Apr 01, 2020 at 12:15:32AM -0400, Qian Cai wrote:
>> 
>> 
>>> On Apr 1, 2020, at 12:14 AM, Chandan Rajendra <chandan@linux.ibm.com> wrote:
>>> 
>>> On Wednesday, April 1, 2020 3:27 AM Qian Cai wrote: 
>>>> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
>>>> during compilation workloads on both powerpc and arm64,
>>> 
>>> Can you please provide the filesystem geometry information?
>>> You can get that by executing "xfs_info <mount-point>" command.
>>> 
> 
> Hmm.   Do the arm/ppc systems have 64k pages?  kconfigs might be a good
> starting place.  Also, does the xfs for-next branch exhibit this

Yes, 64k pages. The configs are in,

https://github.com/cailca/linux-mm

> problem, or is it just the big -next branch that Stephen Rothwell puts
> out?

The later.

> 
> --D
> 
>> == arm64 ==
>> # xfs_info /home/
>> meta-data=/dev/mapper/rhel_hpe--apollo--cn99xx--11-home isize=512    agcount=4, agsize=113568256 blks
>>         =                       sectsz=4096  attr=2, projid32bit=1
>>         =                       crc=1        finobt=1, sparse=1, rmapbt=0
>>         =                       reflink=1
>> data     =                       bsize=4096   blocks=454273024, imaxpct=5
>>         =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
>> log      =internal log           bsize=4096   blocks=221813, version=2
>>         =                       sectsz=4096  sunit=1 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>> 
>> 
>> == powerpc ==
>> # xfs_info /home/
>> meta-data=/dev/mapper/rhel_ibm--p9wr--01-home isize=512    agcount=4, agsize=118489856 blks
>>         =                       sectsz=4096  attr=2, projid32bit=1
>>         =                       crc=1        finobt=1, sparse=1, rmapbt=0
>>         =                       reflink=1
>> data     =                       bsize=4096   blocks=473959424, imaxpct=5
>>         =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
>> log      =internal log           bsize=4096   blocks=231425, version=2
>>         =                       sectsz=4096  sunit=1 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>> 
>> == x86 (not yet reproduced)  ==
>> meta-data=/dev/mapper/rhel_hpe--dl380gen9--01-home isize=512    agcount=16, agsize=3283776 blks
>>         =                       sectsz=512   attr=2, projid32bit=1
>>         =                       crc=1        finobt=1, sparse=1, rmapbt=0
>>         =                       reflink=1
>> data     =                       bsize=4096   blocks=52540416, imaxpct=25
>>         =                       sunit=64     swidth=64 blks
>> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
>> log      =internal log           bsize=4096   blocks=25664, version=2
>>         =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-04-01 12:34 ` Brian Foster
@ 2020-04-01 16:21   ` Brian Foster
  2020-04-01 18:24     ` Qian Cai
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2020-04-01 16:21 UTC (permalink / raw)
  To: Qian Cai; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML

On Wed, Apr 01, 2020 at 08:34:47AM -0400, Brian Foster wrote:
> On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote:
> > Ever since two days ago, linux-next starts to trigger xfs metadata corruption
> > during compilation workloads on both powerpc and arm64,
> > 
> > I suspect it could be one of those commits,
> > 
> > https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/
> > 
> > Especially, those commits that would mark corruption more aggressively?
> > 
> >       [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
> >       [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
> >       [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
> >       [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
> >       [6fb5aac73310] xfs: check owner of dir3 free blocks
> >       [a10c21ed5d52] xfs: check owner of dir3 data blocks
> >       [1b2c1a63b678] xfs: check owner of dir3 blocks
> >       [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
> >       [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails
> > 
> > 
> > [29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
> > xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
> > [29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
> > [29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> > [29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
> > [29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-
> > [29331.182442][  T665] 00000020: 32 2e 30 2d 6f 72 2d 6c 61 74 65 72 20 2a 2f 0a  2.0-or-later */.
> > [29331.182467][  T665] 00000030: 2f 2a 0a 20 2a 20 44 65 66 69 6e 69 74 69 6f 6e  /*. * Definition
> > [29331.182492][  T665] 00000040: 73 20 61 6e 64 20 70 6c 61 74 66 6f 72 6d 20 64  s and platform d
> > [29331.182517][  T665] 00000050: 61 74 61 20 66 6f 72 20 41 6e 61 6c 6f 67 20 44  ata for Analog D
> > [29331.182541][  T665] 00000060: 65 76 69 63 65 73 0a 20 2a 20 41 44 50 35 35 32  evices. * ADP552
> > [29331.182566][  T665] 00000070: 30 2f 41 44 50 35 35 30 31 20 4d 46 44 20 50 4d  0/ADP5501 MFD PM
> > [29331.182700][ T7490] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0xa9b97900 len 32 error 117
> > xfs_trans_read_buf at fs/xfs/xfs_trans.h:209
> > (inlined by) xfs_imap_to_bp at fs/xfs/libxfs/xfs_inode_buf.c:171
> > [29331.182812][ T7490] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> > [29331.345347][ T7490] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 0000000058be213e
> > [29331.345378][ T7490] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> > [29331.345402][ T7490] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
> 
> I've actually been seeing similar corruption errors in the past day or
> two but I'd chalked it up to the work and testing I'm doing in my
> development branch. I'm also on a system/storage configuration that I
> haven't established trust in yet, fwiw. I have recently rebased to
> for-next, so I installed that baseline kernel and hit a similar crash
> after 20 minutes or so of running fsstress. I'm not sure how consistent
> this failure is yet but I'll see if I can back off from there and narrow
> it down to a patch...
> 

A bisect lands at commit 5806165a6663 ("xfs: factor inode lookup from
xfs_ifree_cluster"). The exact sequence of the error is not clear to me,
but from inspection the following seems to fix it:

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5c930863ed5b..a455be84b2d4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2512,7 +2512,7 @@ static struct xfs_inode *
 xfs_ifree_get_one_inode(
 	struct xfs_perag	*pag,
 	struct xfs_inode	*free_ip,
-	int			inum)
+	xfs_ino_t		inum)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 	struct xfs_inode	*ip;

Qian, care to confirm?

Brian

> Brian
> 
> > [29331.346474][  T498] dm-2: writeback error on inode 934606, offset 0, sector 961072
> > [29331.346502][  T498] dm-2: writeback error on inode 934607, offset 0, sector 961080
> > [29331.346624][  T498] dm-2: writeback error on inode 934608, offset 0, sector 961088
> > [29331.346683][  T498] dm-2: writeback error on inode 1074331758, offset 0, sector 948449704
> > [29331.347306][  T498] dm-2: writeback error on inode 1074331762, offset 0, sector 948558816
> > [29331.349165][  T498] dm-2: writeback error on inode 1074331759, offset 0, sector 948560984
> > [29331.349227][  T498] dm-2: writeback error on inode 1074331760, offset 0, sector 948562944
> > [29331.349303][  T498] dm-2: writeback error on inode 1074331761, offset 0, sector 948568000
> > 
> > [ 7762.204313][T124538] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0x712398e0 xfs_inode_buf_verify
> > [ 7762.204599][T124538] XFS (dm-2): Unmount and run xfs_repair
> > [ 7762.204625][T124538] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> > [ 7762.204654][T124538] 00000000: 77 65 72 70 63 2f 69 6e 63 6c 75 64 65 2f 67 65  werpc/include/ge
> > [ 7762.204672][T124538] 00000010: 6e 65 72 61 74 65 64 2f 75 61 70 69 2f 61 73 6d  nerated/uapi/asm
> > [ 7762.204699][T124538] 00000020: 2f 72 65 73 6f 75 72 63 65 2e 68 20 5c 0a 20 69  /resource.h \. i
> > [ 7762.204727][T124538] 00000030: 6e 63 6c 75 64 65 2f 61 73 6d 2d 67 65 6e 65 72  nclude/asm-gener
> > [ 7762.204745][T124538] 00000040: 69 63 2f 72 65 73 6f 75 72 63 65 2e 68 20 69 6e  ic/resource.h in
> > [ 7762.204783][T124538] 00000050: 63 6c 75 64 65 2f 75 61 70 69 2f 61 73 6d 2d 67  clude/uapi/asm-g
> > [ 7762.204820][T124538] 00000060: 65 6e 65 72 69 63 2f 72 65 73 6f 75 72 63 65 2e  eneric/resource.
> > [ 7762.204858][T124538] 00000070: 68 20 5c 0a 20 69 6e 63 6c 75 64 65 2f 6c 69 6e  h \. include/lin
> > [ 7762.205068][ T7510] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0x712398e0 len 32 error 117
> > [ 7762.205466][ T7510] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> > [ 7762.219267][ T7510] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 000000006bce0de3
> > [ 7762.219291][ T7510] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> > [ 7762.219306][ T7510] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
> > 
> > 
> > [ 1032.162278][ T1515] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x244/0x2bc [xfs], xfs_inode block 0xa2b75dc0 xfs_inode_buf_verify
> > [ 1032.176156][ T1515] XFS (dm-2): Unmount and run xfs_repair
> > [ 1032.181835][ T1515] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
> > [ 1032.189140][ T1515] 00000000: 6e 29 20 22 22 20 76 61 6c 75 65 20 22 22 20 73  n) "" value "" s
> > [ 1032.197988][ T1515] 00000010: 75 62 73 74 72 28 6c 69 6e 65 2c 20 6c 65 6e 20  ubstr(line, len 
> > [ 1032.206723][ T1515] 00000020: 2b 20 6b 65 79 6c 65 6e 20 2b 20 33 29 0a 20 20  + keylen + 3).  
> > [ 1032.215675][ T1515] 00000030: 20 20 20 20 6c 65 6e 20 2b 3d 20 6c 65 6e 67 74      len += lengt
> > [ 1032.224537][ T1515] 00000040: 68 28 76 61 6c 75 65 29 20 2b 20 6c 65 6e 67 74  h(value) + lengt
> > [ 1032.233388][ T1515] 00000050: 68 28 66 69 65 6c 64 5b 2b 2b 69 5d 29 0a 20 20  h(field[++i]).  
> > [ 1032.242234][ T1515] 00000060: 20 20 20 20 73 75 62 73 74 65 64 20 3d 20 31 0a      substed = 1.
> > [ 1032.251077][ T1515] 00000070: 20 20 20 20 7d 20 65 6c 73 65 0a 20 20 20 20 20      } else.     
> > [ 1032.260792][ T4119] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0xd8/0x18c [xfs]" at daddr 0xa2b75dc0 len 32 error 117
> > [ 1032.273096][ T4119] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> > [ 1032.283283][ T4119] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 00000000d99a2721
> > [ 1032.296214][ T4119] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> > [ 1032.305158][ T4119] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
> > 
> 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: linux-next: xfs metadata corruption since 30 March
  2020-04-01 16:21   ` Brian Foster
@ 2020-04-01 18:24     ` Qian Cai
  0 siblings, 0 replies; 11+ messages in thread
From: Qian Cai @ 2020-04-01 18:24 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, Christoph Hellwig, linux-xfs, LKML



> On Apr 1, 2020, at 12:21 PM, Brian Foster <bfoster@redhat.com> wrote:
> 
> On Wed, Apr 01, 2020 at 08:34:47AM -0400, Brian Foster wrote:
>> On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote:
>>> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
>>> during compilation workloads on both powerpc and arm64,
>>> 
>>> I suspect it could be one of those commits,
>>> 
>>> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/
>>> 
>>> Especially, those commits that would mark corruption more aggressively?
>>> 
>>>      [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
>>>      [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
>>>      [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
>>>      [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
>>>      [6fb5aac73310] xfs: check owner of dir3 free blocks
>>>      [a10c21ed5d52] xfs: check owner of dir3 data blocks
>>>      [1b2c1a63b678] xfs: check owner of dir3 blocks
>>>      [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
>>>      [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails
>>> 
>>> 
>>> [29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
>>> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
>>> [29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
>>> [29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
>>> [29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
>>> [29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-
>>> [29331.182442][  T665] 00000020: 32 2e 30 2d 6f 72 2d 6c 61 74 65 72 20 2a 2f 0a  2.0-or-later */.
>>> [29331.182467][  T665] 00000030: 2f 2a 0a 20 2a 20 44 65 66 69 6e 69 74 69 6f 6e  /*. * Definition
>>> [29331.182492][  T665] 00000040: 73 20 61 6e 64 20 70 6c 61 74 66 6f 72 6d 20 64  s and platform d
>>> [29331.182517][  T665] 00000050: 61 74 61 20 66 6f 72 20 41 6e 61 6c 6f 67 20 44  ata for Analog D
>>> [29331.182541][  T665] 00000060: 65 76 69 63 65 73 0a 20 2a 20 41 44 50 35 35 32  evices. * ADP552
>>> [29331.182566][  T665] 00000070: 30 2f 41 44 50 35 35 30 31 20 4d 46 44 20 50 4d  0/ADP5501 MFD PM
>>> [29331.182700][ T7490] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0xa9b97900 len 32 error 117
>>> xfs_trans_read_buf at fs/xfs/xfs_trans.h:209
>>> (inlined by) xfs_imap_to_bp at fs/xfs/libxfs/xfs_inode_buf.c:171
>>> [29331.182812][ T7490] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
>>> [29331.345347][ T7490] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 0000000058be213e
>>> [29331.345378][ T7490] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
>>> [29331.345402][ T7490] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
>> 
>> I've actually been seeing similar corruption errors in the past day or
>> two but I'd chalked it up to the work and testing I'm doing in my
>> development branch. I'm also on a system/storage configuration that I
>> haven't established trust in yet, fwiw. I have recently rebased to
>> for-next, so I installed that baseline kernel and hit a similar crash
>> after 20 minutes or so of running fsstress. I'm not sure how consistent
>> this failure is yet but I'll see if I can back off from there and narrow
>> it down to a patch...
>> 
> 
> A bisect lands at commit 5806165a6663 ("xfs: factor inode lookup from
> xfs_ifree_cluster"). The exact sequence of the error is not clear to me,
> but from inspection the following seems to fix it:
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 5c930863ed5b..a455be84b2d4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2512,7 +2512,7 @@ static struct xfs_inode *
> xfs_ifree_get_one_inode(
> 	struct xfs_perag	*pag,
> 	struct xfs_inode	*free_ip,
> -	int			inum)
> +	xfs_ino_t		inum)
> {
> 	struct xfs_mount	*mp = pag->pag_mount;
> 	struct xfs_inode	*ip;
> 
> Qian, care to confirm?

Yes, the fix works well.

> 
> Brian
> 
>> Brian
>> 
>>> [29331.346474][  T498] dm-2: writeback error on inode 934606, offset 0, sector 961072
>>> [29331.346502][  T498] dm-2: writeback error on inode 934607, offset 0, sector 961080
>>> [29331.346624][  T498] dm-2: writeback error on inode 934608, offset 0, sector 961088
>>> [29331.346683][  T498] dm-2: writeback error on inode 1074331758, offset 0, sector 948449704
>>> [29331.347306][  T498] dm-2: writeback error on inode 1074331762, offset 0, sector 948558816
>>> [29331.349165][  T498] dm-2: writeback error on inode 1074331759, offset 0, sector 948560984
>>> [29331.349227][  T498] dm-2: writeback error on inode 1074331760, offset 0, sector 948562944
>>> [29331.349303][  T498] dm-2: writeback error on inode 1074331761, offset 0, sector 948568000
>>> 
>>> [ 7762.204313][T124538] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0x712398e0 xfs_inode_buf_verify
>>> [ 7762.204599][T124538] XFS (dm-2): Unmount and run xfs_repair
>>> [ 7762.204625][T124538] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
>>> [ 7762.204654][T124538] 00000000: 77 65 72 70 63 2f 69 6e 63 6c 75 64 65 2f 67 65  werpc/include/ge
>>> [ 7762.204672][T124538] 00000010: 6e 65 72 61 74 65 64 2f 75 61 70 69 2f 61 73 6d  nerated/uapi/asm
>>> [ 7762.204699][T124538] 00000020: 2f 72 65 73 6f 75 72 63 65 2e 68 20 5c 0a 20 69  /resource.h \. i
>>> [ 7762.204727][T124538] 00000030: 6e 63 6c 75 64 65 2f 61 73 6d 2d 67 65 6e 65 72  nclude/asm-gener
>>> [ 7762.204745][T124538] 00000040: 69 63 2f 72 65 73 6f 75 72 63 65 2e 68 20 69 6e  ic/resource.h in
>>> [ 7762.204783][T124538] 00000050: 63 6c 75 64 65 2f 75 61 70 69 2f 61 73 6d 2d 67  clude/uapi/asm-g
>>> [ 7762.204820][T124538] 00000060: 65 6e 65 72 69 63 2f 72 65 73 6f 75 72 63 65 2e  eneric/resource.
>>> [ 7762.204858][T124538] 00000070: 68 20 5c 0a 20 69 6e 63 6c 75 64 65 2f 6c 69 6e  h \. include/lin
>>> [ 7762.205068][ T7510] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x88/0x130 [xfs]" at daddr 0x712398e0 len 32 error 117
>>> [ 7762.205466][ T7510] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
>>> [ 7762.219267][ T7510] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 000000006bce0de3
>>> [ 7762.219291][ T7510] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
>>> [ 7762.219306][ T7510] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)
>>> 
>>> 
>>> [ 1032.162278][ T1515] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x244/0x2bc [xfs], xfs_inode block 0xa2b75dc0 xfs_inode_buf_verify
>>> [ 1032.176156][ T1515] XFS (dm-2): Unmount and run xfs_repair
>>> [ 1032.181835][ T1515] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
>>> [ 1032.189140][ T1515] 00000000: 6e 29 20 22 22 20 76 61 6c 75 65 20 22 22 20 73  n) "" value "" s
>>> [ 1032.197988][ T1515] 00000010: 75 62 73 74 72 28 6c 69 6e 65 2c 20 6c 65 6e 20  ubstr(line, len 
>>> [ 1032.206723][ T1515] 00000020: 2b 20 6b 65 79 6c 65 6e 20 2b 20 33 29 0a 20 20  + keylen + 3).  
>>> [ 1032.215675][ T1515] 00000030: 20 20 20 20 6c 65 6e 20 2b 3d 20 6c 65 6e 67 74      len += lengt
>>> [ 1032.224537][ T1515] 00000040: 68 28 76 61 6c 75 65 29 20 2b 20 6c 65 6e 67 74  h(value) + lengt
>>> [ 1032.233388][ T1515] 00000050: 68 28 66 69 65 6c 64 5b 2b 2b 69 5d 29 0a 20 20  h(field[++i]).  
>>> [ 1032.242234][ T1515] 00000060: 20 20 20 20 73 75 62 73 74 65 64 20 3d 20 31 0a      substed = 1.
>>> [ 1032.251077][ T1515] 00000070: 20 20 20 20 7d 20 65 6c 73 65 0a 20 20 20 20 20      } else.     
>>> [ 1032.260792][ T4119] XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0xd8/0x18c [xfs]" at daddr 0xa2b75dc0 len 32 error 117
>>> [ 1032.273096][ T4119] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
>>> [ 1032.283283][ T4119] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3754 of file fs/xfs/xfs_inode.c. Return address = 00000000d99a2721
>>> [ 1032.296214][ T4119] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
>>> [ 1032.305158][ T4119] XFS (dm-2): Please unmount the filesystem and rectify the problem(s)


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-01 18:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-31 21:57 linux-next: xfs metadata corruption since 30 March Qian Cai
2020-03-31 22:13 ` Dave Chinner
2020-04-01  2:13   ` Qian Cai
2020-04-01  4:14 ` Chandan Rajendra
2020-04-01  4:15   ` Qian Cai
2020-04-01  4:45     ` Darrick J. Wong
2020-04-01  6:10       ` Chandan Rajendra
2020-04-01 13:54       ` Qian Cai
2020-04-01 12:34 ` Brian Foster
2020-04-01 16:21   ` Brian Foster
2020-04-01 18:24     ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).