linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
       [not found] <186d30f217e645728ad1f34724cbe3e7@jore.no>
@ 2020-02-09  6:19 ` John Jore
  2020-02-10  3:47   ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: John Jore @ 2020-02-09  6:19 UTC (permalink / raw)
  To: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 2590 bytes --]

Hi all,

Not sure if this is the appropriate forum to reports xfs_repair bugs? If wrong, please point me in the appropriate direction?

I have a corrupted XFS volume which mounts fine, but xfs_repair is unable to repair it and volume eventually shuts down due to metadata corruption if writes are performed.

Originally I used xfs_repair from CentOS 8.1.1911, but cloned latest xfs_repair from git://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git (Today, Feb 9th, reports as version 5.4.0)


Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 16:08:04: scanning agi unlinked lists - 64 of 64 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 45
        - agno = 15
        - agno = 0
        - agno = 30
        - agno = 60
        - agno = 46
        - agno = 16
Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
        - agno = 61
        - agno = 31
        - agno = 47
        - agno = 62
        - agno = 48
        - agno = 49
        - agno = 32
        - agno = 33
        - agno = 17
        - agno = 1
bad magic number 0x0 on inode 18253615584
bad version number 0x0 on inode 18253615584
bad magic number 0x0 on inode 18253615585
bad version number 0x0 on inode 18253615585
bad magic number 0x0 on inode 18253615586 
.....
bad magic number 0x0 on inode 18253615584, resetting magic number
bad version number 0x0 on inode 18253615584, resetting version number
bad magic number 0x0 on inode 18253615585, resetting magic number
bad version number 0x0 on inode 18253615585, resetting version number
bad magic number 0x0 on inode 18253615586, resetting magic number
bad version number 0x0 on inode 18253615586, resetting version number
....
        - agno = 16
        - agno = 17
Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
        - agno = 18
        - agno = 19
...   
Phase 7 - verify and correct link counts...
        - 16:10:41: verify and correct link counts - 64 of 64 allocation groups done
Metadata corruption detected at 0x433385, xfs_inode block 0x17312a3f0/0x2000
libxfs_writebufr: write verifier failed on xfs_inode bno 0x17312a3f0/0x2000
releasing dirty buffer (bulk) to free list!

 

Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d, it never does repair the volume.
Volume is a ~12GB LV build using 4x 4TB disks in RAID 5 using a 3Ware 9690SA controller. 


Any suggestions or additional data I can provide?


John

[-- Attachment #2: output.log --]
[-- Type: application/octet-stream, Size: 14757 bytes --]

Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
        - block cache size set to 556792 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 213781 tail block 213781
        - scan filesystem freespace and inode maps...
        - 16:08:04: scanning filesystem freespace - 64 of 64 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 16:08:04: scanning agi unlinked lists - 64 of 64 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 45
        - agno = 15
        - agno = 0
        - agno = 30
        - agno = 60
        - agno = 46
        - agno = 16
Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
        - agno = 61
        - agno = 31
        - agno = 47
        - agno = 62
        - agno = 48
        - agno = 49
        - agno = 32
        - agno = 33
        - agno = 17
        - agno = 1
bad magic number 0x0 on inode 18253615584
bad version number 0x0 on inode 18253615584
bad magic number 0x0 on inode 18253615585
bad version number 0x0 on inode 18253615585
bad magic number 0x0 on inode 18253615586
bad version number 0x0 on inode 18253615586
bad magic number 0x0 on inode 18253615587
bad version number 0x0 on inode 18253615587
bad magic number 0x0 on inode 18253615588
bad version number 0x0 on inode 18253615588
bad magic number 0x0 on inode 18253615589
bad version number 0x0 on inode 18253615589
bad magic number 0x0 on inode 18253615590
bad version number 0x0 on inode 18253615590
bad magic number 0x0 on inode 18253615591
bad version number 0x0 on inode 18253615591
bad magic number 0x0 on inode 18253615592
bad version number 0x0 on inode 18253615592
bad magic number 0x0 on inode 18253615593
bad version number 0x0 on inode 18253615593
bad magic number 0x0 on inode 18253615594
bad version number 0x0 on inode 18253615594
bad magic number 0x0 on inode 18253615595
bad version number 0x0 on inode 18253615595
bad magic number 0x0 on inode 18253615596
bad version number 0x0 on inode 18253615596
bad magic number 0x0 on inode 18253615597
bad version number 0x0 on inode 18253615597
bad magic number 0x0 on inode 18253615598
bad version number 0x0 on inode 18253615598
bad magic number 0x0 on inode 18253615599
bad version number 0x0 on inode 18253615599
bad magic number 0x0 on inode 18253615600
bad version number 0x0 on inode 18253615600
bad magic number 0x0 on inode 18253615601
bad version number 0x0 on inode 18253615601
bad magic number 0x0 on inode 18253615602
bad version number 0x0 on inode 18253615602
bad magic number 0x0 on inode 18253615603
bad version number 0x0 on inode 18253615603
bad magic number 0x0 on inode 18253615604
bad version number 0x0 on inode 18253615604
bad magic number 0x0 on inode 18253615605
bad version number 0x0 on inode 18253615605
bad magic number 0x0 on inode 18253615606
bad version number 0x0 on inode 18253615606
bad magic number 0x0 on inode 18253615607
bad version number 0x0 on inode 18253615607
bad magic number 0x0 on inode 18253615608
bad version number 0x0 on inode 18253615608
bad magic number 0x0 on inode 18253615609
bad version number 0x0 on inode 18253615609
bad magic number 0x0 on inode 18253615610
bad version number 0x0 on inode 18253615610
bad magic number 0x0 on inode 18253615611
bad version number 0x0 on inode 18253615611
bad magic number 0x0 on inode 18253615612
bad version number 0x0 on inode 18253615612
bad magic number 0x0 on inode 18253615613
bad version number 0x0 on inode 18253615613
bad magic number 0x0 on inode 18253615614
bad version number 0x0 on inode 18253615614
bad magic number 0x0 on inode 18253615615
bad version number 0x0 on inode 18253615615
bad magic number 0x0 on inode 18253615584, resetting magic number
bad version number 0x0 on inode 18253615584, resetting version number
bad magic number 0x0 on inode 18253615585, resetting magic number
bad version number 0x0 on inode 18253615585, resetting version number
bad magic number 0x0 on inode 18253615586, resetting magic number
bad version number 0x0 on inode 18253615586, resetting version number
bad magic number 0x0 on inode 18253615587, resetting magic number
bad version number 0x0 on inode 18253615587, resetting version number
bad magic number 0x0 on inode 18253615588, resetting magic number
bad version number 0x0 on inode 18253615588, resetting version number
bad magic number 0x0 on inode 18253615589, resetting magic number
bad version number 0x0 on inode 18253615589, resetting version number
bad magic number 0x0 on inode 18253615590, resetting magic number
bad version number 0x0 on inode 18253615590, resetting version number
bad magic number 0x0 on inode 18253615591, resetting magic number
bad version number 0x0 on inode 18253615591, resetting version number
bad magic number 0x0 on inode 18253615592, resetting magic number
bad version number 0x0 on inode 18253615592, resetting version number
bad magic number 0x0 on inode 18253615593, resetting magic number
bad version number 0x0 on inode 18253615593, resetting version number
bad magic number 0x0 on inode 18253615594, resetting magic number
bad version number 0x0 on inode 18253615594, resetting version number
bad magic number 0x0 on inode 18253615595, resetting magic number
bad version number 0x0 on inode 18253615595, resetting version number
bad magic number 0x0 on inode 18253615596, resetting magic number
bad version number 0x0 on inode 18253615596, resetting version number
bad magic number 0x0 on inode 18253615597, resetting magic number
bad version number 0x0 on inode 18253615597, resetting version number
bad magic number 0x0 on inode 18253615598, resetting magic number
bad version number 0x0 on inode 18253615598, resetting version number
bad magic number 0x0 on inode 18253615599, resetting magic number
bad version number 0x0 on inode 18253615599, resetting version number
bad magic number 0x0 on inode 18253615600, resetting magic number
bad version number 0x0 on inode 18253615600, resetting version number
bad magic number 0x0 on inode 18253615601, resetting magic number
bad version number 0x0 on inode 18253615601, resetting version number
bad magic number 0x0 on inode 18253615602, resetting magic number
bad version number 0x0 on inode 18253615602, resetting version number
bad magic number 0x0 on inode 18253615603, resetting magic number
bad version number 0x0 on inode 18253615603, resetting version number
bad magic number 0x0 on inode 18253615604, resetting magic number
bad version number 0x0 on inode 18253615604, resetting version number
bad magic number 0x0 on inode 18253615605, resetting magic number
bad version number 0x0 on inode 18253615605, resetting version number
bad magic number 0x0 on inode 18253615606, resetting magic number
bad version number 0x0 on inode 18253615606, resetting version number
bad magic number 0x0 on inode 18253615607, resetting magic number
bad version number 0x0 on inode 18253615607, resetting version number
bad magic number 0x0 on inode 18253615608, resetting magic number
bad version number 0x0 on inode 18253615608, resetting version number
bad magic number 0x0 on inode 18253615609, resetting magic number
bad version number 0x0 on inode 18253615609, resetting version number
bad magic number 0x0 on inode 18253615610, resetting magic number
bad version number 0x0 on inode 18253615610, resetting version number
bad magic number 0x0 on inode 18253615611, resetting magic number
bad version number 0x0 on inode 18253615611, resetting version number
bad magic number 0x0 on inode 18253615612, resetting magic number
bad version number 0x0 on inode 18253615612, resetting version number
bad magic number 0x0 on inode 18253615613, resetting magic number
bad version number 0x0 on inode 18253615613, resetting version number
bad magic number 0x0 on inode 18253615614, resetting magic number
bad version number 0x0 on inode 18253615614, resetting version number
bad magic number 0x0 on inode 18253615615, resetting magic number
bad version number 0x0 on inode 18253615615, resetting version number
        - agno = 34
        - agno = 63
        - agno = 18
        - agno = 19
        - agno = 35
        - agno = 2
        - agno = 50
        - agno = 51
        - agno = 52
        - agno = 36
        - agno = 37
        - agno = 3
        - agno = 53
        - agno = 54
        - agno = 4
        - agno = 55
        - agno = 5
        - agno = 38
        - agno = 39
        - agno = 40
        - agno = 6
        - agno = 7
        - agno = 41
        - agno = 42
        - agno = 8
        - agno = 9
        - agno = 56
        - agno = 43
        - agno = 10
        - agno = 44
        - agno = 57
        - agno = 11
        - agno = 58
        - agno = 12
        - agno = 59
        - agno = 13
        - agno = 14
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - 16:10:36: process known inodes and inode discovery - 11776 of 11776 inodes done
        - process newly discovered inodes...
        - 16:10:36: process newly discovered inodes - 64 of 64 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 16:10:37: setting up duplicate extent list - 64 of 64 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - agno = 37
        - agno = 38
        - agno = 39
        - agno = 40
        - agno = 41
        - agno = 42
        - agno = 43
        - agno = 44
        - agno = 45
        - agno = 46
        - agno = 47
        - agno = 48
        - agno = 49
        - agno = 50
        - agno = 51
        - agno = 52
        - agno = 53
        - agno = 54
        - agno = 55
        - agno = 56
        - agno = 57
        - agno = 58
        - agno = 59
        - agno = 60
        - agno = 61
        - agno = 62
        - agno = 63
        - 16:10:38: check for inodes claiming duplicate blocks - 11776 of 11776 inodes done
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - agno = 37
        - agno = 38
        - agno = 39
        - agno = 40
        - agno = 41
        - agno = 42
        - agno = 43
        - agno = 44
        - agno = 45
        - agno = 46
        - agno = 47
        - agno = 48
        - agno = 49
        - agno = 50
        - agno = 51
        - agno = 52
        - agno = 53
        - agno = 54
        - agno = 55
        - agno = 56
        - agno = 57
        - agno = 58
        - agno = 59
        - agno = 60
        - agno = 61
        - agno = 62
        - agno = 63
        - 16:10:39: rebuild AG headers and trees - 64 of 64 allocation groups done
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - agno = 37
        - agno = 38
        - agno = 39
        - agno = 40
        - agno = 41
        - agno = 42
        - agno = 43
        - agno = 44
        - agno = 45
        - agno = 46
        - agno = 47
        - agno = 48
        - agno = 49
        - agno = 50
        - agno = 51
        - agno = 52
        - agno = 53
        - agno = 54
        - agno = 55
        - agno = 56
        - agno = 57
        - agno = 58
        - agno = 59
        - agno = 60
        - agno = 61
        - agno = 62
        - agno = 63
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
        - 16:10:41: verify and correct link counts - 64 of 64 allocation groups done
Metadata corruption detected at 0x433385, xfs_inode block 0x17312a3f0/0x2000
libxfs_writebufr: write verifier failed on xfs_inode bno 0x17312a3f0/0x2000
releasing dirty buffer (bulk) to free list!

        XFS_REPAIR Summary    Sun Feb  9 16:11:57 2020

Phase		Start		End		Duration
Phase 1:	02/09 16:07:56	02/09 16:07:57	1 second
Phase 2:	02/09 16:07:57	02/09 16:08:04	7 seconds
Phase 3:	02/09 16:08:04	02/09 16:10:36	2 minutes, 32 seconds
Phase 4:	02/09 16:10:36	02/09 16:10:38	2 seconds
Phase 5:	02/09 16:10:38	02/09 16:10:39	1 second
Phase 6:	02/09 16:10:39	02/09 16:10:41	2 seconds
Phase 7:	02/09 16:10:41	02/09 16:10:41	

Total run time: 2 minutes, 45 seconds
done
Repair of readonly mount complete.  Immediate reboot encouraged.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
  2020-02-09  6:19 ` Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption John Jore
@ 2020-02-10  3:47   ` Eric Sandeen
  2020-02-10  3:49     ` Eric Sandeen
  2020-02-10 14:43     ` Eric Sandeen
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Sandeen @ 2020-02-10  3:47 UTC (permalink / raw)
  To: John Jore, linux-xfs

On 2/9/20 12:19 AM, John Jore wrote:
> Hi all,
> 
> Not sure if this is the appropriate forum to reports xfs_repair bugs? If wrong, please point me in the appropriate direction?

This is the place.

> I have a corrupted XFS volume which mounts fine, but xfs_repair is unable to repair it and volume eventually shuts down due to metadata corruption if writes are performed.

what does dmesg say when it shuts down?

> 
> Originally I used xfs_repair from CentOS 8.1.1911, but cloned latest xfs_repair from git://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git (Today, Feb 9th, reports as version 5.4.0)
> 
> 
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - 16:08:04: scanning agi unlinked lists - 64 of 64 allocation groups done
>         - process known inodes and perform inode discovery...
>         - agno = 45
>         - agno = 15
>         - agno = 0
>         - agno = 30
>         - agno = 60
>         - agno = 46
>         - agno = 16
> Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
>         - agno = 61
>         - agno = 31
>         - agno = 47
>         - agno = 62
>         - agno = 48
>         - agno = 49
>         - agno = 32
>         - agno = 33
>         - agno = 17
>         - agno = 1
> bad magic number 0x0 on inode 18253615584
> bad version number 0x0 on inode 18253615584
> bad magic number 0x0 on inode 18253615585
> bad version number 0x0 on inode 18253615585
> bad magic number 0x0 on inode 18253615586 
> .....
> bad magic number 0x0 on inode 18253615584, resetting magic number
> bad version number 0x0 on inode 18253615584, resetting version number
> bad magic number 0x0 on inode 18253615585, resetting magic number
> bad version number 0x0 on inode 18253615585, resetting version number
> bad magic number 0x0 on inode 18253615586, resetting magic number
> bad version number 0x0 on inode 18253615586, resetting version number

Looks like a whole chunk of inodes with at least 0 magic numbers.

> ....
>         - agno = 16
>         - agno = 17
> Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
>         - agno = 18
>         - agno = 19
> ...   
> Phase 7 - verify and correct link counts...
>         - 16:10:41: verify and correct link counts - 64 of 64 allocation groups done
> Metadata corruption detected at 0x433385, xfs_inode block 0x17312a3f0/0x2000
> libxfs_writebufr: write verifier failed on xfs_inode bno 0x17312a3f0/0x2000

This bit seems problematic, I guess it's unable to write the updated inode buffer,
due to some corruption, which presumably is why you keep tripping over the same
corruption each time.

> releasing dirty buffer (bulk) to free list!
> 
>  
> 
> Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d,

-d is for repairing a filesystem while mounted.  I hope you are not doing that, are you?

> it never does repair the volume.
> Volume is a ~12GB LV build using 4x 4TB disks in RAID 5 using a 3Ware 9690SA controller. 

Just to double check, are there any storage errors reported in dmesg?

> Any suggestions or additional data I can provide?

If you are willing to provide an xfs_metadump to me (off-list) I will see if I can
reproduce it from the metadump. 

# xfs_metadump /dev/$WHATEVER metadump.img
# bzip2 metadump.img

-Eric

> 
> John
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
  2020-02-10  3:47   ` Eric Sandeen
@ 2020-02-10  3:49     ` Eric Sandeen
       [not found]       ` <60f32c031f4345a2b680fbc8531f7bd3@jore.no>
  2020-02-10 14:43     ` Eric Sandeen
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2020-02-10  3:49 UTC (permalink / raw)
  To: John Jore, linux-xfs

On 2/9/20 9:47 PM, Eric Sandeen wrote:
> On 2/9/20 12:19 AM, John Jore wrote:

...

>> Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d,
> 
> -d is for repairing a filesystem while mounted.  I hope you are not doing that, are you?

"Repair of readonly mount complete.  Immediate reboot encouraged."

er, maybe you are.  Why?

-Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
       [not found]       ` <60f32c031f4345a2b680fbc8531f7bd3@jore.no>
@ 2020-02-10 10:33         ` John Jore
  2020-02-10 14:36         ` Eric Sandeen
  1 sibling, 0 replies; 7+ messages in thread
From: John Jore @ 2020-02-10 10:33 UTC (permalink / raw)
  To: linux-xfs

Hi and no, that message appears when -d is used, even if the volume is not mounted (at least on the current version from git) 

The help page states this for -d, given that the metadata corruption could not be repaired without the option, I gave it a try (there is no mention that this supports mounted volumes?):
 -d           Repair dangerously.



 John

----


From: Eric Sandeen <sandeen@sandeen.net>
Sent: 10 February 2020 14:49
To: John Jore; linux-xfs@vger.kernel.org
Subject: Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
    
On 2/9/20 9:47 PM, Eric Sandeen wrote:
> On 2/9/20 12:19 AM, John Jore wrote:

...

>> Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d,
> 
> -d is for repairing a filesystem while mounted.  I hope you are not doing that, are you?

"Repair of readonly mount complete.  Immediate reboot encouraged."

er, maybe you are.  Why?

-Eric
        

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
       [not found]       ` <60f32c031f4345a2b680fbc8531f7bd3@jore.no>
  2020-02-10 10:33         ` John Jore
@ 2020-02-10 14:36         ` Eric Sandeen
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2020-02-10 14:36 UTC (permalink / raw)
  To: John Jore, linux-xfs

On 2/10/20 4:17 AM, John Jore wrote:
> Hi and no, that message appears when -d is used, even if the volume is not mounted (at least on the current version from git) 
> 
> 
> The help page states this for -d, given that the metadata corruption could not be repaired without the option, I gave it a try (there is no mention that this supports mounted volumes?):
> 
> -d           Repair dangerously.

Man page:

-d     Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only.

-Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
  2020-02-10  3:47   ` Eric Sandeen
  2020-02-10  3:49     ` Eric Sandeen
@ 2020-02-10 14:43     ` Eric Sandeen
  2020-02-10 15:35       ` Eric Sandeen
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2020-02-10 14:43 UTC (permalink / raw)
  To: John Jore, linux-xfs

> John Jore wrote:

<inserting off-list replies to these questions back into the thread>

> On 2/9/20 9:47 PM, Eric Sandeen wrote:
>> On 2/9/20 12:19 AM, John Jore wrote:
>>> Hi all,
>>>
>>> Not sure if this is the appropriate forum to reports xfs_repair bugs? If wrong, please point me in the appropriate direction?
>> 
>> This is the place.
>> 
>>> I have a corrupted XFS volume which mounts fine, but xfs_repair is unable to repair it and volume eventually shuts down due to metadata corruption if writes are performed.
>> 
>> what does dmesg say when it shuts down?

> I dont really want to mount the volume and perform writes as I assume this could cause more corruption...? I initially thought the issue was benign as it mounted and all appeared ok, it was only when it went offline I realized it may not have been a good idea to continue to write to the volume.

You said the filesystem shuts down.  When that happened, what log messages did the kernel
emit?

...

>>>
>>> Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d,

> -d is for repairing a filesystem while mounted.  I hope you are not doing that, are you?
> 
> 
> Nope. The help page says its for dangerous repairs. I gave it a go. Multiple times.

From the man page: 

"-d  Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only."
                                                                      ^^^^^^^^^^^^^^^^^
 >> -d is for repairing a filesystem while mounted.  I hope you are not doing that, are you?
>> 
>>> it never does repair the volume.
>>> Volume is a ~12GB LV build using 4x 4TB disks in RAID 5 using a 3Ware 9690SA controller. 
>> 
>> Just to double check, are there any storage errors reported in dmesg?
>> 
>>> Any suggestions or additional data I can provide?
>> 
>> If you are willing to provide an xfs_metadump to me (off-list) I will see if I can
>> reproduce it from the metadump. 
>> 
>> # xfs_metadump /dev/$WHATEVER metadump.img
>> # bzip2 metadump.img

Thanks for providing this offline, I'll take a look.

-Eric


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
  2020-02-10 14:43     ` Eric Sandeen
@ 2020-02-10 15:35       ` Eric Sandeen
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2020-02-10 15:35 UTC (permalink / raw)
  To: John Jore, linux-xfs

On 2/10/20 8:43 AM, Eric Sandeen wrote:
>> John Jore wrote:

...

>>>> Any suggestions or additional data I can provide?
>>>
>>> If you are willing to provide an xfs_metadump to me (off-list) I will see if I can
>>> reproduce it from the metadump. 
>>>
>>> # xfs_metadump /dev/$WHATEVER metadump.img
>>> # bzip2 metadump.img
> 
> Thanks for providing this offline, I'll take a look.

Ok, so the problem is that you have a whole chunk of inodes that is nothing but
zeros; you can see this with:

# xfs_db -r -c "fsblock 1140850974" -c "type data" -c "p" /dev/$WHATEVER

where fsblock 1140850974 is the location of the bad section of inodes.
How that happened, nobody knows, but that's the corruption that's being detected.

So the problem here is that we added a new test to the inode verifiers, which
validates the next_unlinked field of the inode; this got inherited from kernelspace:

commit 2949b46779cf054a7f9067000bbadf35e55b3ce7
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Wed Apr 18 14:46:07 2018 -0500

    xfs: don't accept inode buffers with suspicious unlinked chains
    
    Source kernel commit: 6a96c5650568a2218712d43ec16f3f82296a6c53
    
    When we're verifying inode buffers, sanity-check the unlinked pointer.
    We don't want to run the risk of trying to purge something that's
    obviously broken.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>

so we now have a test for a valid next unlinked field when we write the inodes,
but we never reset it in repair, so the write verifier fails, and the modified
inodes do not get written back to disk.

I'll send a patch to fix this for this for review, and cc: you, in a moment.

-Eric





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-02-10 15:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <186d30f217e645728ad1f34724cbe3e7@jore.no>
2020-02-09  6:19 ` Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption John Jore
2020-02-10  3:47   ` Eric Sandeen
2020-02-10  3:49     ` Eric Sandeen
     [not found]       ` <60f32c031f4345a2b680fbc8531f7bd3@jore.no>
2020-02-10 10:33         ` John Jore
2020-02-10 14:36         ` Eric Sandeen
2020-02-10 14:43     ` Eric Sandeen
2020-02-10 15:35       ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).