linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xfs_repair doesn't handle: br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 corruption
@ 2020-07-15  7:05 Arkadiusz Miśkiewicz
  2020-07-15 11:40 ` Brian Foster
  0 siblings, 1 reply; 3+ messages in thread
From: Arkadiusz Miśkiewicz @ 2020-07-15  7:05 UTC (permalink / raw)
  To: linux-xfs


Hello.

xfs_repair (from for-next from about 2-3 weeks ago) doesn't seem to
handle such kind of corruption. Repair (few times) finishes just fine
but it ends up again with such trace.

Metadump is possible but problematic (will be huge).


Jul  9 14:35:51 x kernel: XFS (sdd1): xfs_dabuf_map: bno 8388608 dir:
inode 21698340263
Jul  9 14:35:51 x kernel: XFS (sdd1): [00] br_startoff 8388608
br_startblock -2 br_blockcount 1 br_state 0
Jul  9 14:35:51 x kernel: XFS (sdd1): Internal error xfs_da_do_buf(1) at
line 2557 of file fs/xfs/libxfs/xfs_da_btree.c.  Caller
xfs_da_read_buf+0x6a/0x120 [xfs]
Jul  9 14:35:51 x kernel: CPU: 3 PID: 2928 Comm: cp Tainted: G
  E     5.0.0-1-03515-g3478588b5136 #10
Jul  9 14:35:51 x kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS
3.0a 02/06/2018
Jul  9 14:35:51 x kernel: Call Trace:
Jul  9 14:35:51 x kernel:  dump_stack+0x5c/0x80
Jul  9 14:35:51 x kernel:  xfs_dabuf_map.constprop.0+0x1dc/0x390 [xfs]
Jul  9 14:35:51 x kernel:  xfs_da_read_buf+0x6a/0x120 [xfs]
Jul  9 14:35:51 x kernel:  xfs_da3_node_read+0x17/0xd0 [xfs]
Jul  9 14:35:51 x kernel:  xfs_da3_node_lookup_int+0x6c/0x370 [xfs]
Jul  9 14:35:51 x kernel:  ? kmem_cache_alloc+0x14e/0x1b0
Jul  9 14:35:51 x kernel:  xfs_dir2_node_lookup+0x4b/0x170 [xfs]
Jul  9 14:35:51 x kernel:  xfs_dir_lookup+0x1b5/0x1c0 [xfs]
Jul  9 14:35:51 x kernel:  xfs_lookup+0x57/0x120 [xfs]
Jul  9 14:35:51 x kernel:  xfs_vn_lookup+0x70/0xa0 [xfs]
Jul  9 14:35:51 x kernel:  __lookup_hash+0x6c/0xa0
Jul  9 14:35:51 x kernel:  ? _cond_resched+0x15/0x30
Jul  9 14:35:51 x kernel:  filename_create+0x91/0x160
Jul  9 14:35:51 x kernel:  do_linkat+0xa5/0x360
Jul  9 14:35:51 x kernel:  __x64_sys_linkat+0x21/0x30
Jul  9 14:35:51 x kernel:  do_syscall_64+0x55/0x100
Jul  9 14:35:51 x kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9


Longer log:
http://ixion.pld-linux.org/~arekm/xfs-10.txt


-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs_repair doesn't handle: br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 corruption
  2020-07-15  7:05 xfs_repair doesn't handle: br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 corruption Arkadiusz Miśkiewicz
@ 2020-07-15 11:40 ` Brian Foster
  2021-03-04  8:54   ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Foster @ 2020-07-15 11:40 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: linux-xfs

On Wed, Jul 15, 2020 at 09:05:47AM +0200, Arkadiusz Miśkiewicz wrote:
> 
> Hello.
> 
> xfs_repair (from for-next from about 2-3 weeks ago) doesn't seem to
> handle such kind of corruption. Repair (few times) finishes just fine
> but it ends up again with such trace.
> 

Are you saying that xfs_repair eventually resolves the corruption but it
takes multiple tries, and then the corruption reoccurs at runtime? Or
that xfs_repair doesn't ever resolve the corruption?

Either way, what does xfs_repair report?

> Metadump is possible but problematic (will be huge).
> 

How huge? Will it compress?

> 
> Jul  9 14:35:51 x kernel: XFS (sdd1): xfs_dabuf_map: bno 8388608 dir:
> inode 21698340263
> Jul  9 14:35:51 x kernel: XFS (sdd1): [00] br_startoff 8388608
> br_startblock -2 br_blockcount 1 br_state 0

It looks like we found a hole at the leaf offset of a directory. We'd
expect to find a leaf or node block there depending on the directory
format (which appears to be node format based on the stack below) that
contains hashval lookup information for the dir.

It's not clear how we'd get into this state. Had this system experienced
any crash/recovery sequences or storage issues before the first
occurrence?

Brian

> Jul  9 14:35:51 x kernel: XFS (sdd1): Internal error xfs_da_do_buf(1) at
> line 2557 of file fs/xfs/libxfs/xfs_da_btree.c.  Caller
> xfs_da_read_buf+0x6a/0x120 [xfs]
> Jul  9 14:35:51 x kernel: CPU: 3 PID: 2928 Comm: cp Tainted: G
>   E     5.0.0-1-03515-g3478588b5136 #10
> Jul  9 14:35:51 x kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS
> 3.0a 02/06/2018
> Jul  9 14:35:51 x kernel: Call Trace:
> Jul  9 14:35:51 x kernel:  dump_stack+0x5c/0x80
> Jul  9 14:35:51 x kernel:  xfs_dabuf_map.constprop.0+0x1dc/0x390 [xfs]
> Jul  9 14:35:51 x kernel:  xfs_da_read_buf+0x6a/0x120 [xfs]
> Jul  9 14:35:51 x kernel:  xfs_da3_node_read+0x17/0xd0 [xfs]
> Jul  9 14:35:51 x kernel:  xfs_da3_node_lookup_int+0x6c/0x370 [xfs]
> Jul  9 14:35:51 x kernel:  ? kmem_cache_alloc+0x14e/0x1b0
> Jul  9 14:35:51 x kernel:  xfs_dir2_node_lookup+0x4b/0x170 [xfs]
> Jul  9 14:35:51 x kernel:  xfs_dir_lookup+0x1b5/0x1c0 [xfs]
> Jul  9 14:35:51 x kernel:  xfs_lookup+0x57/0x120 [xfs]
> Jul  9 14:35:51 x kernel:  xfs_vn_lookup+0x70/0xa0 [xfs]
> Jul  9 14:35:51 x kernel:  __lookup_hash+0x6c/0xa0
> Jul  9 14:35:51 x kernel:  ? _cond_resched+0x15/0x30
> Jul  9 14:35:51 x kernel:  filename_create+0x91/0x160
> Jul  9 14:35:51 x kernel:  do_linkat+0xa5/0x360
> Jul  9 14:35:51 x kernel:  __x64_sys_linkat+0x21/0x30
> Jul  9 14:35:51 x kernel:  do_syscall_64+0x55/0x100
> Jul  9 14:35:51 x kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 
> Longer log:
> http://ixion.pld-linux.org/~arekm/xfs-10.txt
> 
> 
> -- 
> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs_repair doesn't handle: br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 corruption
  2020-07-15 11:40 ` Brian Foster
@ 2021-03-04  8:54   ` Arkadiusz Miśkiewicz
  0 siblings, 0 replies; 3+ messages in thread
From: Arkadiusz Miśkiewicz @ 2021-03-04  8:54 UTC (permalink / raw)
  To: Brian Foster, Arkadiusz Miśkiewicz; +Cc: linux-xfs

W dniu 15.07.2020 o 13:40, Brian Foster pisze:
> On Wed, Jul 15, 2020 at 09:05:47AM +0200, Arkadiusz Miśkiewicz wrote:
>>
>> Hello.
>>
>> xfs_repair (from for-next from about 2-3 weeks ago) doesn't seem to
>> handle such kind of corruption. Repair (few times) finishes just fine
>> but it ends up again with such trace.
>>
> 
> Are you saying that xfs_repair eventually resolves the corruption but it
> takes multiple tries, and then the corruption reoccurs at runtime? Or
> that xfs_repair doesn't ever resolve the corruption?
> 
> Either way, what does xfs_repair report?

http://ixion.pld-linux.org/~arekm/xfs/xfs-repair.txt

This is repair that I did back in 2020 on medadumped image (linked below)


But I also did repair recently with xfsprogs 5.10.0

http://ixion.pld-linux.org/~arekm/xfs/xfs-repair-sdd1-20210228.txt

on actual fs and today it crashed:

[ 3580.278435] XFS (sdd1): xfs_dabuf_map: bno 8388608 dir: inode 36509341678
[ 3580.278436] XFS (sdd1): [00] br_startoff 8388608 br_startblock -2
br_blockcount 1 br_state 0
[ 3580.278452] XFS (sdd1): Internal error xfs_da_do_buf(1) at line 2557
of file fs/xfs/libxfs/xfs_da_btree.c.  Caller xfs_da_read_buf+0x7c/0x130
[xfs]

so 5.10.0 repair also doesn't fix it.

> 
>> Metadump is possible but problematic (will be huge).
>>
> 
> How huge? Will it compress?

53GB

http://ixion.pld-linux.org/~arekm/xfs/sdd1.metadump.gz


> 
>>
>> Jul  9 14:35:51 x kernel: XFS (sdd1): xfs_dabuf_map: bno 8388608 dir:
>> inode 21698340263
>> Jul  9 14:35:51 x kernel: XFS (sdd1): [00] br_startoff 8388608
>> br_startblock -2 br_blockcount 1 br_state 0
> 
> It looks like we found a hole at the leaf offset of a directory. We'd
> expect to find a leaf or node block there depending on the directory
> format (which appears to be node format based on the stack below) that
> contains hashval lookup information for the dir.
> 
> It's not clear how we'd get into this state. Had this system experienced
> any crash/recovery sequences or storage issues before the first
> occurrence?

Yes, not once, that's my "famous" server which saw a lot of fs damage.

Anyway would be nice if repair could fix such messed startblock because
kernel crashes on it so easily (or at least I assume it's because of that).

> 
> Brian
> 
>> Jul  9 14:35:51 x kernel: XFS (sdd1): Internal error xfs_da_do_buf(1) at
>> line 2557 of file fs/xfs/libxfs/xfs_da_btree.c.  Caller
>> xfs_da_read_buf+0x6a/0x120 [xfs]
>> Jul  9 14:35:51 x kernel: CPU: 3 PID: 2928 Comm: cp Tainted: G
>>   E     5.0.0-1-03515-g3478588b5136 #10
>> Jul  9 14:35:51 x kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS
>> 3.0a 02/06/2018
>> Jul  9 14:35:51 x kernel: Call Trace:
>> Jul  9 14:35:51 x kernel:  dump_stack+0x5c/0x80
>> Jul  9 14:35:51 x kernel:  xfs_dabuf_map.constprop.0+0x1dc/0x390 [xfs]
>> Jul  9 14:35:51 x kernel:  xfs_da_read_buf+0x6a/0x120 [xfs]
>> Jul  9 14:35:51 x kernel:  xfs_da3_node_read+0x17/0xd0 [xfs]
>> Jul  9 14:35:51 x kernel:  xfs_da3_node_lookup_int+0x6c/0x370 [xfs]
>> Jul  9 14:35:51 x kernel:  ? kmem_cache_alloc+0x14e/0x1b0
>> Jul  9 14:35:51 x kernel:  xfs_dir2_node_lookup+0x4b/0x170 [xfs]
>> Jul  9 14:35:51 x kernel:  xfs_dir_lookup+0x1b5/0x1c0 [xfs]
>> Jul  9 14:35:51 x kernel:  xfs_lookup+0x57/0x120 [xfs]
>> Jul  9 14:35:51 x kernel:  xfs_vn_lookup+0x70/0xa0 [xfs]
>> Jul  9 14:35:51 x kernel:  __lookup_hash+0x6c/0xa0
>> Jul  9 14:35:51 x kernel:  ? _cond_resched+0x15/0x30
>> Jul  9 14:35:51 x kernel:  filename_create+0x91/0x160
>> Jul  9 14:35:51 x kernel:  do_linkat+0xa5/0x360
>> Jul  9 14:35:51 x kernel:  __x64_sys_linkat+0x21/0x30
>> Jul  9 14:35:51 x kernel:  do_syscall_64+0x55/0x100
>> Jul  9 14:35:51 x kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>>
>> Longer log:
>> http://ixion.pld-linux.org/~arekm/xfs-10.txt
>>
>>
>> -- 
>> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
>>
> 

(resend because vger still blocks my primary maven domain and most
likely nothing has changed with postmasters attitude, didn't try... :/ )

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-04  8:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15  7:05 xfs_repair doesn't handle: br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 corruption Arkadiusz Miśkiewicz
2020-07-15 11:40 ` Brian Foster
2021-03-04  8:54   ` Arkadiusz Miśkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).