All of lore.kernel.org
 help / color / mirror / Atom feed
* filesystem dead, xfs_repair won't help
@ 2017-04-10  9:23 Avi Kivity
  2017-04-10  9:42 ` Avi Kivity
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Avi Kivity @ 2017-04-10  9:23 UTC (permalink / raw)
  To: linux-xfs

Today my kernel complained that in memory metadata is corrupt and
asked that I run xfs_repair.  But xfs_repair doesn't like the
superblock and isn't able to find a secondary superblock.

Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
without issue).

Anything I can do to recover the data?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-10  9:23 filesystem dead, xfs_repair won't help Avi Kivity
@ 2017-04-10  9:42 ` Avi Kivity
  2017-04-10 15:35   ` Brian Foster
  2017-04-10  9:43 ` allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help) L A Walsh
  2017-04-10 15:49 ` filesystem dead, xfs_repair won't help Eric Sandeen
  2 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-10  9:42 UTC (permalink / raw)
  To: linux-xfs

On 04/10/2017 12:23 PM, Avi Kivity wrote:
> Today my kernel complained that in memory metadata is corrupt and
> asked that I run xfs_repair.  But xfs_repair doesn't like the
> superblock and isn't able to find a secondary superblock.
>
> Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
> without issue).
>
> Anything I can do to recover the data?


Initial error:

Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata 
CRC error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl 
block 0x2cb68e13
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount 
and run xfs_repair
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64 
bytes of corrupted metadata buffer:
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75400: 23 40 
8f 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed  #@.([P:..T.1....
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75410: 62 87 
57 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09  b.WQ..1..,.Fl...
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75420: ae 7a 
ea b3 91 49 7e d3 99 a4 25 49 11 c5 8b be  .z...I~...%I....
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75430: e4 2e 
14 d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5  ......_.f.gr....
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): metadata 
I/O error: block 0x2cb68e13 ("xfs_trans_read_buf_map") error 74 numblks 1
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): 
xfs_do_force_shutdown(0x8) called from line 236 of file 
fs/xfs/libxfs/xfs_defer.c.  Return address = 0xffffffffc05bdbc6
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): 
Corruption of in-memory data detected.  Shutting down filesystem
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Please 
umount the filesystem and rectify the problem(s)


After restart:

Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Mounting 
V5 Filesystem
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Starting 
recovery (logdev: internal)
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata 
CRC error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl 
block 0x2cb68e13
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount 
and run xfs_repair
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64 
bytes of corrupted metadata buffer:
Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a00: 23 40 
8f 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed  #@.([P:..T.1....
Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a10: 62 87 
57 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09  b.WQ..1..,.Fl...
Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a20: ae 7a 
ea b3 91 49 7e d3 99 a4 25 49 11 c5 8b be  .z...I~...%I....
Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a30: e4 2e 
14 d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5  ......_.f.gr....
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): metadata 
I/O error: block 0x2cb68e13 ("xfs_trans_read_buf_map") error 74 numblks 1
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Internal 
error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller 
xfs_efi_recover+0x18e/0x1c0 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: CPU: 3 PID: 1063 Comm: 
mount Not tainted 4.10.8-200.fc25.x86_64 #1
Apr 10 11:47:58 avi.cloudius-systems.com kernel: Hardware 
name:                  /DH77EB, BIOS EBH7710H.86A.0099.2013.0125.1400 
01/25/2013
Apr 10 11:47:58 avi.cloudius-systems.com kernel: Call Trace:
Apr 10 11:47:58 avi.cloudius-systems.com kernel: dump_stack+0x63/0x86
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xfs_error_report+0x3c/0x40 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? 
xfs_efi_recover+0x18e/0x1c0 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xfs_trans_cancel+0xb6/0xe0 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xfs_efi_recover+0x18e/0x1c0 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xlog_recover_process_efi+0x2c/0x50 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xlog_recover_process_intents.isra.42+0x122/0x160 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? 
xfs_reinit_percpu_counters+0x46/0x50 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xlog_recover_finish+0x23/0xb0 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xfs_log_mount_finish+0x29/0x50 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_mountfs+0x6ce/0x930 
[xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
xfs_fs_fill_super+0x3ee/0x570 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: mount_bdev+0x178/0x1b0
Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? 
xfs_test_remount_options.isra.14+0x60/0x60 [xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_fs_mount+0x15/0x20 
[xfs]
Apr 10 11:47:58 avi.cloudius-systems.com kernel: mount_fs+0x38/0x150
Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? __alloc_percpu+0x15/0x20
Apr 10 11:47:58 avi.cloudius-systems.com kernel: vfs_kern_mount+0x67/0x130
Apr 10 11:47:58 avi.cloudius-systems.com kernel: do_mount+0x1dd/0xc50
Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? 
_copy_from_user+0x4e/0x80
Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? memdup_user+0x4f/0x70
Apr 10 11:47:58 avi.cloudius-systems.com kernel: SyS_mount+0x83/0xd0
Apr 10 11:47:58 avi.cloudius-systems.com kernel: do_syscall_64+0x67/0x180
Apr 10 11:47:58 avi.cloudius-systems.com kernel: 
entry_SYSCALL64_slow_path+0x25/0x25
Apr 10 11:47:58 avi.cloudius-systems.com kernel: RIP: 0033:0x7f5cb9a626fa
Apr 10 11:47:58 avi.cloudius-systems.com kernel: RSP: 
002b:00007ffeffa2c928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
Apr 10 11:47:58 avi.cloudius-systems.com kernel: RAX: ffffffffffffffda 
RBX: 000055b59fd6f030 RCX: 00007f5cb9a626fa
Apr 10 11:47:58 avi.cloudius-systems.com kernel: RDX: 000055b59fd6f210 
RSI: 000055b59fd6f250 RDI: 000055b59fd6f230
Apr 10 11:47:58 avi.cloudius-systems.com kernel: RBP: 0000000000000000 
R08: 0000000000000000 R09: 0000000000000012
Apr 10 11:47:58 avi.cloudius-systems.com kernel: R10: 00000000c0ed0000 
R11: 0000000000000246 R12: 000055b59fd6f230
Apr 10 11:47:58 avi.cloudius-systems.com kernel: R13: 000055b59fd6f210 
R14: 0000000000000000 R15: 00000000ffffffff
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): 
xfs_do_force_shutdown(0x8) called from line 984 of file 
fs/xfs/xfs_trans.c.  Return address = 0xffffffffc056324f
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): 
Corruption of in-memory data detected.  Shutting down filesystem
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Please 
umount the filesystem and rectify the problem(s)
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Failed 
to recover intents
Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): log 
mount finish failed



smart (note error at end; there were no kernel I/O errors from the block 
layer):

$ sudo smartctl -a /dev/nvme0n1
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.8-200.fc25.x86_64] 
(local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPEKKW512G7
Serial Number:                      BTPY6313086D512F
Firmware Version:                   PSF100C
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Apr 10 12:36:41 2017 IDT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
  0 +     9.00W       -        -    0  0  0  0        5       5
  1 +     4.60W       -        -    1  1  1  1       30      30
  2 +     3.80W       -        -    2  2  2  2       30      30
  3 -   0.0700W       -        -    3  3  3  3    10000     300
  4 -   0.0050W       -        -    4  4  4  4     2000   10000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
  0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        27 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    8,854,487 [4.53 TB]
Data Units Written:                 5,652,445 [2.89 TB]
Host Read Commands:                 446,901,662
Host Write Commands:                35,627,742
Controller Busy Time:               633
Power Cycles:                       24
Power On Hours:                     987
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    1
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    11
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
   0          1     1  0x0000  0x0286      -            0     1     -


^ permalink raw reply	[flat|nested] 32+ messages in thread

* allow mounting w/crc-checking disabled?  (was Re: filesystem dead, xfs_repair won't help)
  2017-04-10  9:23 filesystem dead, xfs_repair won't help Avi Kivity
  2017-04-10  9:42 ` Avi Kivity
@ 2017-04-10  9:43 ` L A Walsh
  2017-04-10 16:01   ` Eric Sandeen
  2017-04-10 15:49 ` filesystem dead, xfs_repair won't help Eric Sandeen
  2 siblings, 1 reply; 32+ messages in thread
From: L A Walsh @ 2017-04-10  9:43 UTC (permalink / raw)
  To: linux-xfs

Avi Kivity wrote:
> Today my kernel complained that in memory metadata is corrupt and
> asked that I run xfs_repair.  But xfs_repair doesn't like the
> superblock and isn't able to find a secondary superblock.
>   
Why doesn't xfs have an option to mount with metadata checksumming
disabled so people can recover their data?

Seems like it should be easy to provide, no?

Or rather, if a disk is created with the crc option, is it possible
to later switch it off or mount it without with checking disabled?

Yes, I know the mantra is that they should have had backups, but
in practice it's seems not the case in a majority of uses outside
of enterprise usage.  It sure seems that disabling a particular file
or directory (if necessary) affected by a bad-crc, would be
preferable to losing the whole disk.  That said, how many crc
errors would be likely to make things unreadable or inaccessible?
Given that the default before crc-checking was that the disks
were still usable (often with no error being flagged or noticed),
I'd suspect that the crc-checking is causing many errors to be
be flagged that before wouldn't have even been noticed. 

Overall I'm wondering if the crc option won't cause more disk-losses
than would occur without the option.  Or, in other words, it seems
that since crc-checking seems to cause the disk to be lost, turning
on crc checking is almost guaranteed to cause a higher incidence of
data loss if it can't be disable. 




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-10  9:42 ` Avi Kivity
@ 2017-04-10 15:35   ` Brian Foster
  2017-04-11  7:46     ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2017-04-10 15:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-xfs

On Mon, Apr 10, 2017 at 12:42:33PM +0300, Avi Kivity wrote:
> On 04/10/2017 12:23 PM, Avi Kivity wrote:
> > Today my kernel complained that in memory metadata is corrupt and
> > asked that I run xfs_repair.  But xfs_repair doesn't like the
> > superblock and isn't able to find a secondary superblock.
> > 
> > Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
> > without issue).
> > 
> > Anything I can do to recover the data?
> 

Well I can't explain why you have a checksum error, but what do you mean
that xfs_repair doesn't like the superblock? Can you provide the
xfs_repair output?

It seems strange for xfs_repair to not find the superblock of a
filesystem that can otherwise run log recovery up until it encounters
the buffer with a bad crc.

It also might be useful to find out exactly what that error reported by
smartctl means. Are you aware of whether it pre-existed the filesystem
issue or not?

Brian

> 
> Initial error:
> 
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata CRC
> error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl block
> 0x2cb68e13
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount and
> run xfs_repair
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64
> bytes of corrupted metadata buffer:
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75400: 23 40 8f
> 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed  #@.([P:..T.1....
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75410: 62 87 57
> 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09  b.WQ..1..,.Fl...
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75420: ae 7a ea
> b3 91 49 7e d3 99 a4 25 49 11 c5 8b be  .z...I~...%I....
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75430: e4 2e 14
> d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5  ......_.f.gr....
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): metadata I/O
> error: block 0x2cb68e13 ("xfs_trans_read_buf_map") error 74 numblks 1
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1):
> xfs_do_force_shutdown(0x8) called from line 236 of file
> fs/xfs/libxfs/xfs_defer.c.  Return address = 0xffffffffc05bdbc6
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Corruption
> of in-memory data detected.  Shutting down filesystem
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Please
> umount the filesystem and rectify the problem(s)
> 
> 
> After restart:
> 
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Mounting V5
> Filesystem
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Starting
> recovery (logdev: internal)
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata CRC
> error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl block
> 0x2cb68e13
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount and
> run xfs_repair
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64
> bytes of corrupted metadata buffer:
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a00: 23 40 8f
> 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed  #@.([P:..T.1....
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a10: 62 87 57
> 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09  b.WQ..1..,.Fl...
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a20: ae 7a ea
> b3 91 49 7e d3 99 a4 25 49 11 c5 8b be  .z...I~...%I....
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a30: e4 2e 14
> d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5  ......_.f.gr....
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): metadata I/O
> error: block 0x2cb68e13 ("xfs_trans_read_buf_map") error 74 numblks 1
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Internal
> error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller
> xfs_efi_recover+0x18e/0x1c0 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: CPU: 3 PID: 1063 Comm:
> mount Not tainted 4.10.8-200.fc25.x86_64 #1
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: Hardware name:
> /DH77EB, BIOS EBH7710H.86A.0099.2013.0125.1400 01/25/2013
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: Call Trace:
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: dump_stack+0x63/0x86
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_error_report+0x3c/0x40
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ?
> xfs_efi_recover+0x18e/0x1c0 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_trans_cancel+0xb6/0xe0
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_efi_recover+0x18e/0x1c0
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xlog_recover_process_efi+0x2c/0x50 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xlog_recover_process_intents.isra.42+0x122/0x160 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ?
> xfs_reinit_percpu_counters+0x46/0x50 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xlog_recover_finish+0x23/0xb0 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xfs_log_mount_finish+0x29/0x50 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_mountfs+0x6ce/0x930
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xfs_fs_fill_super+0x3ee/0x570 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: mount_bdev+0x178/0x1b0
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ?
> xfs_test_remount_options.isra.14+0x60/0x60 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_fs_mount+0x15/0x20
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: mount_fs+0x38/0x150
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? __alloc_percpu+0x15/0x20
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: vfs_kern_mount+0x67/0x130
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: do_mount+0x1dd/0xc50
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ?
> _copy_from_user+0x4e/0x80
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:  ? memdup_user+0x4f/0x70
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: SyS_mount+0x83/0xd0
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: do_syscall_64+0x67/0x180
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> entry_SYSCALL64_slow_path+0x25/0x25
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RIP: 0033:0x7f5cb9a626fa
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RSP: 002b:00007ffeffa2c928
> EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RAX: ffffffffffffffda RBX:
> 000055b59fd6f030 RCX: 00007f5cb9a626fa
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RDX: 000055b59fd6f210 RSI:
> 000055b59fd6f250 RDI: 000055b59fd6f230
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RBP: 0000000000000000 R08:
> 0000000000000000 R09: 0000000000000012
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: R10: 00000000c0ed0000 R11:
> 0000000000000246 R12: 000055b59fd6f230
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: R13: 000055b59fd6f210 R14:
> 0000000000000000 R15: 00000000ffffffff
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1):
> xfs_do_force_shutdown(0x8) called from line 984 of file fs/xfs/xfs_trans.c.
> Return address = 0xffffffffc056324f
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Corruption
> of in-memory data detected.  Shutting down filesystem
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Please
> umount the filesystem and rectify the problem(s)
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Failed to
> recover intents
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): log mount
> finish failed
> 
> 
> 
> smart (note error at end; there were no kernel I/O errors from the block
> layer):
> 
> $ sudo smartctl -a /dev/nvme0n1
> smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.8-200.fc25.x86_64] (local
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Number:                       INTEL SSDPEKKW512G7
> Serial Number:                      BTPY6313086D512F
> Firmware Version:                   PSF100C
> PCI Vendor/Subsystem ID:            0x8086
> IEEE OUI Identifier:                0x5cd2e4
> Controller ID:                      1
> Number of Namespaces:               1
> Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
> Namespace 1 Formatted LBA Size:     512
> Local Time is:                      Mon Apr 10 12:36:41 2017 IDT
> Firmware Updates (0x12):            1 Slot, no Reset required
> Optional Admin Commands (0x0006):   Format Frmw_DL
> Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
> Maximum Data Transfer Size:         32 Pages
> Warning  Comp. Temp. Threshold:     70 Celsius
> Critical Comp. Temp. Threshold:     80 Celsius
> 
> Supported Power States
> St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
>  0 +     9.00W       -        -    0  0  0  0        5       5
>  1 +     4.60W       -        -    1  1  1  1       30      30
>  2 +     3.80W       -        -    2  2  2  2       30      30
>  3 -   0.0700W       -        -    3  3  3  3    10000     300
>  4 -   0.0050W       -        -    4  4  4  4     2000   10000
> 
> Supported LBA Sizes (NSID 0x1)
> Id Fmt  Data  Metadt  Rel_Perf
>  0 +     512       0         0
> 
> === START OF SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SMART/Health Information (NVMe Log 0x02, NSID 0x1)
> Critical Warning:                   0x00
> Temperature:                        27 Celsius
> Available Spare:                    100%
> Available Spare Threshold:          10%
> Percentage Used:                    0%
> Data Units Read:                    8,854,487 [4.53 TB]
> Data Units Written:                 5,652,445 [2.89 TB]
> Host Read Commands:                 446,901,662
> Host Write Commands:                35,627,742
> Controller Busy Time:               633
> Power Cycles:                       24
> Power On Hours:                     987
> Unsafe Shutdowns:                   16
> Media and Data Integrity Errors:    1
> Error Information Log Entries:      1
> Warning  Comp. Temperature Time:    11
> Critical Comp. Temperature Time:    0
> 
> Error Information (NVMe Log 0x01, max 64 entries)
> Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
>   0          1     1  0x0000  0x0286      -            0     1     -
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-10  9:23 filesystem dead, xfs_repair won't help Avi Kivity
  2017-04-10  9:42 ` Avi Kivity
  2017-04-10  9:43 ` allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help) L A Walsh
@ 2017-04-10 15:49 ` Eric Sandeen
  2017-04-10 16:23   ` Christoph Hellwig
  2017-04-11  7:48   ` Avi Kivity
  2 siblings, 2 replies; 32+ messages in thread
From: Eric Sandeen @ 2017-04-10 15:49 UTC (permalink / raw)
  To: Avi Kivity, linux-xfs

There is a known firmware bug on Intel 600P drives which
causes corruption with XFS, FWIW.

https://bugzilla.redhat.com/show_bug.cgi?id=1402533

On 4/10/17 4:23 AM, Avi Kivity wrote:
> Today my kernel complained that in memory metadata is corrupt and
> asked that I run xfs_repair.  But xfs_repair doesn't like the
> superblock and isn't able to find a secondary superblock.
> 
> Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
> without issue).
> 
> Anything I can do to recover the data?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help)
  2017-04-10  9:43 ` allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help) L A Walsh
@ 2017-04-10 16:01   ` Eric Sandeen
  2017-04-10 18:05     ` L A Walsh
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Sandeen @ 2017-04-10 16:01 UTC (permalink / raw)
  To: L A Walsh, linux-xfs

On 4/10/17 4:43 AM, L A Walsh wrote:
> Avi Kivity wrote:
>> Today my kernel complained that in memory metadata is corrupt and
>> asked that I run xfs_repair.  But xfs_repair doesn't like the
>> superblock and isn't able to find a secondary superblock.
>>   
> Why doesn't xfs have an option to mount with metadata checksumming
> disabled so people can recover their data?

Because if checksums are bad, your metadata is almost certainly bad,
and with bad metadata, you're not going to be recovering data either.

(and FWIW, CRCs are only the first line of defense: structure verifiers
come after that.  The chance of a CRC being bad and everything else
checking out is extremely small.)

> Seems like it should be easy to provide, no?
> 
> Or rather, if a disk is created with the crc option, is it possible
> to later switch it off or mount it without with checking disabled?

It is not possible.

> Yes, I know the mantra is that they should have had backups, but
> in practice it's seems not the case in a majority of uses outside
> of enterprise usage.  It sure seems that disabling a particular file
> or directory (if necessary) affected by a bad-crc, would be
> preferable to losing the whole disk.  That said, how many crc
> errors would be likely to make things unreadable or inaccessible?

How log is a piece of string?  ;)  Totally depends on the details.

> Given that the default before crc-checking was that the disks
> were still usable (often with no error being flagged or noticed),

Before, we had a lot of ad-hoc checks (or not.)  Many of those checks,
and/or IO errors when trying to read garbage metadata, would also
shut down the filesystem.

Proceeding with mutilated metadata is almost never a good thing.
You'll wander off into garbage and shut down the fs at best, and OOPS
at worst.  (Losing a filesystem is preferable to losing a system!)

> I'd suspect that the crc-checking is causing many errors to be
> be flagged that before wouldn't have even been noticed.

Yes, that's precisely the point of CRCs.  :)

> Overall I'm wondering if the crc option won't cause more disk-losses
> than would occur without the option.  Or, in other words, it seems
> that since crc-checking seems to cause the disk to be lost, turning
> on crc checking is almost guaranteed to cause a higher incidence of
> data loss if it can't be disable.

When CRCs detect metadata corruption, the next step is to run
xfs_repair to salvage what can be salvaged, and retrieve what's left of
your data after that.  Disabling CRCs and proceeding in kernelspace with
known metadata corruption would be a dangerous total crapshoot.

-Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-10 15:49 ` filesystem dead, xfs_repair won't help Eric Sandeen
@ 2017-04-10 16:23   ` Christoph Hellwig
  2017-04-11  7:48   ` Avi Kivity
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-04-10 16:23 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Avi Kivity, linux-xfs

On Mon, Apr 10, 2017 at 10:49:55AM -0500, Eric Sandeen wrote:
> There is a known firmware bug on Intel 600P drives which
> causes corruption with XFS, FWIW.

Which apparently affects the whole range of controllers, e.g. also the
Pro 6000p and E6000p at least, no idea if there are any more.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help)
  2017-04-10 16:01   ` Eric Sandeen
@ 2017-04-10 18:05     ` L A Walsh
  2017-04-11 12:57       ` Emmanuel Florac
  0 siblings, 1 reply; 32+ messages in thread
From: L A Walsh @ 2017-04-10 18:05 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Eric Sandeen wrote:
> On 4/10/17 4:43 AM, L A Walsh wrote:
>> Avi Kivity wrote:
>>> Today my kernel complained that in memory metadata is corrupt and
>>> asked that I run xfs_repair.  But xfs_repair doesn't like the
>>> superblock and isn't able to find a secondary superblock.
>>>   
>> Why doesn't xfs have an option to mount with metadata checksumming
>> disabled so people can recover their data?
> 
> Because if checksums are bad, your metadata is almost certainly bad,
> and with bad metadata, you're not going to be recovering data either.
----

	Sorry, but I really don't by that a 1 bit error
in metadata will automatically cause problems with data recovery.

	If the date on the file shows it was created at some
time with nanoseconds=1 and that gets bumped to 3 (or virtually
any number < an equivalent of 1 second) it will trigger a crc
error.  But I don't care.

> 
> (and FWIW, CRCs are only the first line of defense: structure
> verifiers come after that.  The chance of a CRC being bad and
> everything else checking out is extremely small.)
----	

	If the crc error has caught a bit rot, that wouldn't be
true. Only if the crc error catches a bug in the XFS SW, would
that be likely.  Since I was told that it was protecting me
against bit-rot and not a lower stability or quality of XFS overall,
then it's more likely that data could be recovered.

	Though, again, this is one of those things like allowing
use of the free-space extent that you could *allow* users to use
at their own risk -- but something, likely, that you won't.  

	This is another case where your logic is flawed.  
Permitting mounting w/o enforcement is not a guarantee of 
data recovery, BUT allowing the user to make the decision of
whether or not they can recover anything useful should be up
to the owner of the computer.  Yet it seems clear you aren't
using sound engineering practice to justify your actions.

	Any bit rot metadata corruption is unlikely to wipe
10 TeraBytes of data.  

	Understand your position.  You are claiming the crc option is
detecting errors that were previously undetected.  People have
operated huge filesystems (I'm certain that my 10TB partition is
tiny compared to enterprise usage) for years without experiencing
noticeable problems.  Yet when crc is turned on, suddenly they are
expected to buy into crc detecting corruption so severe that nothing
can be recovered (when such has not been the case since XFS's
inception).  

>> Seems like it should be easy to provide, no?
>>
>> Or rather, if a disk is created with the crc option, is it possible
>> to later switch it off or mount it without with checking disabled?
> 
> It is not possible.
-----
	Not possible eh?  In the SW world?  The only way it would
not be possible is if it were *administratively prohibited*.  
Working around detected bugs or flaws isn't known to be "not
possible" by a long shot. Take ZFS, which , I'm told, 
can not only recover corrupted data from other sectors, 
but doesn't require shutting down the file system due to
the problem detection.  That certainly doesn't sound 
like "impossible".

	If the crc option is only a canary, and not a cipher then
recovery of most data should be possible.

	Are you saying that the crc option doesn't simply do an integrity
check but is converting what was "plaintext" into some encoded form?
That isn't what it is documented to do.

 
>> Yes, I know the mantra is that they should have had backups, but
>> in practice it's seems not the case in a majority of uses outside
>> of enterprise usage.  It sure seems that disabling a particular file
>> or directory (if necessary) affected by a bad-crc, would be
>> preferable to losing the whole disk.  That said, how many crc
>> errors would be likely to make things unreadable or inaccessible?
> 
> How log is a piece of string?  ;)  Totally depends on the details.
--
	That depends on whether or not it is a software error
caused by a typo or by 1 or more bit-flips in a given sector.


> 
>> Given that the default before crc-checking was that the disks
>> were still usable (often with no error being flagged or noticed),
> 
> Before, we had a lot of ad-hoc checks (or not.)  Many of those checks,
> and/or IO errors when trying to read garbage metadata, would also
> shut down the filesystem.
---
	But those checks were rarely triggered.  It was often the
case (you claim) that they went undiscovered for some time -- thus
a "need"[sic] for crc to detect a 1 bit-rot-flip in a 100TB
file system and mark the entire file system as bad.

	Sorry, that's bull.  You need to compartmentalize damage
or its worthless.  Noticing a error in 1 sector shouldn't shutdown
or prevent 100TB of other daya from being accessed (or usable).



> Proceeding with mutilated metadata is almost never a good thing.
> You'll wander off into garbage and shut down the fs at best, and OOPS
> at worst.  (Losing a filesystem is preferable to losing a system!)
----
	
> 
>> I'd suspect that the crc-checking is causing many errors to be
>> be flagged that before wouldn't have even been noticed.
> 
> Yes, that's precisely the point of CRCs.  :)
----

	If they wouldn't have been noticed -- then they wouldn't
have caused problems.  crc is creating problems where before
they didn't -- by definition -- because they catch "many errors...
that before, WOULDN'T HAVE BEEN NOTICED".  That's my point.


>> Overall I'm wondering if the crc option won't cause more
>> disk-losses than would occur without the option.  Or, in other
>> words, it seems that since crc-checking seems to cause the disk
>> to be lost, turning on crc checking is almost guaranteed to cause
>> a higher incidence of data loss if it can't be disable.
> 
> When CRCs detect metadata corruption, the next step is to run
> xfs_repair to salvage what can be salvaged, and retrieve what's
> left of your data after that.  Disabling CRCs and proceeding in
> kernelspace with known metadata corruption would be a dangerous
> total crapshoot.
---

	Right..xfsrepair -- like the base-note poster tried and
and had fail.  The crc errors I'm seen complaints about are ones
were xfsrepair don't work.  At that point, disabling the volume
is not helpful.

	I'm sure it wouldn't be trivial, but creating a separate
file system, "XFS2" from the original XFS sources that responded
to data or metadata corruption by returning empty data where
it was impossible to return anything useful instead of flagging
the disk as "bad", would be a way to allow data recovery to
the extent that it made sense (assuming the original sources
couldn't do the same toggling off a config-flag).

	I'm sure you can out-type and come up with various
reasons as to why XFS or crc can't auto-correct.  Maybe instead
of a crc, you should be using a well established check that
allows recovery from multiple data bit failure.

	Supposedly the 4K block size had more error-resistance and
*recover* than the 512-byte format.  Certainly, with crc's
on all the metadata, a more robust algorithm could automatically
recover from such errors.

	If it is that fragile, then perhaps you should consider enabling
the independant use of the free-inode, which would certainly 
raise performance on mature filesystems.

	I did get that it's been tested on virgin and fresh file
systems and showed no benefit with such, but it would be nice
if such tests were done on 7-10+ year-old filesystems that
"often" exceeded 75% disk space usage -- even going over 
80-90% usage at times for a short period.  It may not be a
normal state, but it does happen.  Certainly it would be
something worthy of testing with real-life data.

:)

*cheers*
-linda


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-10 15:35   ` Brian Foster
@ 2017-04-11  7:46     ` Avi Kivity
  2017-04-11 11:30       ` Emmanuel Florac
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-11  7:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs



On 04/10/2017 06:35 PM, Brian Foster wrote:
> On Mon, Apr 10, 2017 at 12:42:33PM +0300, Avi Kivity wrote:
>> On 04/10/2017 12:23 PM, Avi Kivity wrote:
>>> Today my kernel complained that in memory metadata is corrupt and
>>> asked that I run xfs_repair.  But xfs_repair doesn't like the
>>> superblock and isn't able to find a secondary superblock.
>>>
>>> Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
>>> without issue).
>>>
>>> Anything I can do to recover the data?
> Well I can't explain why you have a checksum error, but what do you mean
> that xfs_repair doesn't like the superblock? Can you provide the
> xfs_repair output?
>
> It seems strange for xfs_repair to not find the superblock of a
> filesystem that can otherwise run log recovery up until it encounters
> the buffer with a bad crc.

Sorry, should have done it earlier.

$ sudo xfs_repair /dev/nvme0n1
Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks 
with matching geometry !!!

attempting to find secondary superblock...
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................^C


In a previous run, after returning from lunch, xfs_repair did not find 
the secondary superblock.


The superblock is there though:

$ sudo file -s /dev/nvme0n1
/dev/nvme0n1: SGI XFS filesystem data (blksz 4096, inosz 512, v2 dirs)


I can provide it if it will help.

>
> It also might be useful to find out exactly what that error reported by
> smartctl means. Are you aware of whether it pre-existed the filesystem
> issue or not?

I believe I ran it before and did not notice the error.  I deal with 
many disks, though, so it could have been that I just didn't notice it, 
or that I ran it on a different machine.

>
>> Error Information (NVMe Log 0x01, max 64 entries)
>> Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
>>    0          1     1  0x0000  0x0286      -            0     1     -
>>
>>


If CmdId is the opcode, then it's a flush (matches the fact that LBA=0), 
but I'm guessing it's the tag. 0x0286 is NVME_SC_ACCESS_DENIED, which 
doesn't appear to match, though (if I picked the right enum).




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-10 15:49 ` filesystem dead, xfs_repair won't help Eric Sandeen
  2017-04-10 16:23   ` Christoph Hellwig
@ 2017-04-11  7:48   ` Avi Kivity
  1 sibling, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2017-04-11  7:48 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs

That seems to be it.


On 04/10/2017 06:49 PM, Eric Sandeen wrote:
> There is a known firmware bug on Intel 600P drives which
> causes corruption with XFS, FWIW.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1402533
>
> On 4/10/17 4:23 AM, Avi Kivity wrote:
>> Today my kernel complained that in memory metadata is corrupt and
>> asked that I run xfs_repair.  But xfs_repair doesn't like the
>> superblock and isn't able to find a secondary superblock.
>>
>> Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
>> without issue).
>>
>> Anything I can do to recover the data?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11  7:46     ` Avi Kivity
@ 2017-04-11 11:30       ` Emmanuel Florac
  2017-04-11 11:40         ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Emmanuel Florac @ 2017-04-11 11:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Brian Foster, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]

Le Tue, 11 Apr 2017 10:46:07 +0300
Avi Kivity <avi@scylladb.com> écrivait:

> $ sudo xfs_repair /dev/nvme0n1
> Phase 1 - find and verify superblock...
> couldn't verify primary superblock - not enough secondary superblocks 
> with matching geometry !!!

Which version of xfs_repair is this?

Try to export the FS structure with xfs_metadump, something like

xfs_metadump /dev/nvme0n1 /some/file.dmp

And check the errors it reports, they may be informative. 

In the case where metadump works out fine, you should then try to have a
look at the FS structure using the dump (to avoid wrecking it more that
it already is):

xfs_db -c "sb 0" /some/file.dmp

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 11:30       ` Emmanuel Florac
@ 2017-04-11 11:40         ` Avi Kivity
  2017-04-11 12:00           ` Emmanuel Florac
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-11 11:40 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Brian Foster, linux-xfs



On 04/11/2017 02:30 PM, Emmanuel Florac wrote:
> Le Tue, 11 Apr 2017 10:46:07 +0300
> Avi Kivity <avi@scylladb.com> écrivait:
>
>> $ sudo xfs_repair /dev/nvme0n1
>> Phase 1 - find and verify superblock...
>> couldn't verify primary superblock - not enough secondary superblocks
>> with matching geometry !!!
> Which version of xfs_repair is this?

xfsprogs-4.9.0-1.fc25.x86_64

> Try to export the FS structure with xfs_metadump, something like
>
> xfs_metadump /dev/nvme0n1 /some/file.dmp
>
> And check the errors it reports, they may be informative.

bad magic number
xfs_metadump: cannot read superblock for ag 1
bad magic number
xfs_metadump: cannot read superblock for ag 2
Metadata CRC error detected at xfs_agfl block 0x1dcf0963/0x200
bad magic number
xfs_metadump: cannot read superblock for ag 3
Metadata CRC error detected at xfs_agfl block 0x2cb68e13/0x200
xfs_metadump: Filesystem log is dirty; image will contain unobfuscated 
metadata in log.
cache_purge: shake on cache 0x55accee162b0 left 3 nodes!?


> In the case where metadump works out fine, you should then try to have a
> look at the FS structure using the dump (to avoid wrecking it more that
> it already is):
>
> xfs_db -c "sb 0" /some/file.dmp
>

$ sudo xfs_db -c "sb 0" /tmp/fs
xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic 
number 0x5846534d)
Use -F to force a read attempt.
$ sudo xfs_db -F -c "sb 0" /tmp/fs
xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic 
number 0x5846534d)
xfs_db: V1 inodes unsupported. Please try an older xfsprogs.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 11:40         ` Avi Kivity
@ 2017-04-11 12:00           ` Emmanuel Florac
  2017-04-11 12:03             ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Emmanuel Florac @ 2017-04-11 12:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Brian Foster, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

Le Tue, 11 Apr 2017 14:40:15 +0300
Avi Kivity <avi@scylladb.com> écrivait:

> $ sudo xfs_db -c "sb 0" /tmp/fs
> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic 
> number 0x5846534d)
> Use -F to force a read attempt.
> $ sudo xfs_db -F -c "sb 0" /tmp/fs
> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic 
> number 0x5846534d)
> xfs_db: V1 inodes unsupported. Please try an older xfsprogs.

Oops, I forgot one important part, sorry, you must restore the
meta_dump to a file first:

xfs_mdrestore /tmp/fs /tmp/fsimage

then run xfs_db on the /tmp/fsimage:

xfs_db -c 'sb 0' -c 'p' /tmp/fsimage

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 12:00           ` Emmanuel Florac
@ 2017-04-11 12:03             ` Avi Kivity
  2017-04-11 12:49               ` Emmanuel Florac
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-11 12:03 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Brian Foster, linux-xfs



On 04/11/2017 03:00 PM, Emmanuel Florac wrote:
> Le Tue, 11 Apr 2017 14:40:15 +0300
> Avi Kivity <avi@scylladb.com> écrivait:
>
>> $ sudo xfs_db -c "sb 0" /tmp/fs
>> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic
>> number 0x5846534d)
>> Use -F to force a read attempt.
>> $ sudo xfs_db -F -c "sb 0" /tmp/fs
>> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic
>> number 0x5846534d)
>> xfs_db: V1 inodes unsupported. Please try an older xfsprogs.
> Oops, I forgot one important part, sorry, you must restore the
> meta_dump to a file first:
>
> xfs_mdrestore /tmp/fs /tmp/fsimage
>
> then run xfs_db on the /tmp/fsimage:
>
> xfs_db -c 'sb 0' -c 'p' /tmp/fsimage
>

magicnum = 0x58465342
blocksize = 4096
dblocks = 125026902
rblocks = 0
rextents = 0
uuid = 50b25ad8-3eb9-4273-b7f2-d0a435b3a08f
logstart = 67108869
rootino = 96
rbmino = 97
rsumino = 98
rextsize = 1
agblocks = 31256726
agcount = 4
rbmblocks = 0
logblocks = 61048
versionnum = 0xb4b5
sectsize = 512
inodesize = 512
inopblock = 8
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 9
inopblog = 3
agblklog = 25
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 1959744
ifree = 89
fdblocks = 91586587
frextents = 0
uquotino = null
gquotino = null
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 4
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 1
features2 = 0x18a
bad_features2 = 0x18a
features_compat = 0
features_ro_compat = 0x1
features_incompat = 0x1
features_log_incompat = 0
crc = 0x3ebf41de (correct)
spino_align = 0
pquotino = null
lsn = 0x70002828b
meta_uuid = 00000000-0000-0000-0000-000000000000



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 12:03             ` Avi Kivity
@ 2017-04-11 12:49               ` Emmanuel Florac
  2017-04-11 13:07                 ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Emmanuel Florac @ 2017-04-11 12:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Brian Foster, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

Le Tue, 11 Apr 2017 15:03:12 +0300
Avi Kivity <avi@scylladb.com> écrivait:

> On 04/11/2017 03:00 PM, Emmanuel Florac wrote:
> > Le Tue, 11 Apr 2017 14:40:15 +0300
> > Avi Kivity <avi@scylladb.com> écrivait:
> >  
> >> $ sudo xfs_db -c "sb 0" /tmp/fs
> >> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic
> >> number 0x5846534d)
> >> Use -F to force a read attempt.
> >> $ sudo xfs_db -F -c "sb 0" /tmp/fs
> >> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic
> >> number 0x5846534d)
> >> xfs_db: V1 inodes unsupported. Please try an older xfsprogs.  
> > Oops, I forgot one important part, sorry, you must restore the
> > meta_dump to a file first:
> >
> > xfs_mdrestore /tmp/fs /tmp/fsimage
> >
> > then run xfs_db on the /tmp/fsimage:
> >
> > xfs_db -c 'sb 0' -c 'p' /tmp/fsimage
> >  
> 
> magicnum = 0x58465342
> blocksize = 4096
> dblocks = 125026902
> rblocks = 0
> rextents = 0
> uuid = 50b25ad8-3eb9-4273-b7f2-d0a435b3a08f
> logstart = 67108869
> rootino = 96
> rbmino = 97
> rsumino = 98
> rextsize = 1
> agblocks = 31256726
> agcount = 4
> rbmblocks = 0
> logblocks = 61048
> versionnum = 0xb4b5
> sectsize = 512
> inodesize = 512
> inopblock = 8
> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
> blocklog = 12
> sectlog = 9
> inodelog = 9
> inopblog = 3
> agblklog = 25
> rextslog = 0
> inprogress = 0
> imax_pct = 25
> icount = 1959744
> ifree = 89
> fdblocks = 91586587
> frextents = 0
> uquotino = null
> gquotino = null
> qflags = 0
> flags = 0
> shared_vn = 0
> inoalignmt = 4
> unit = 0
> width = 0
> dirblklog = 0
> logsectlog = 0
> logsectsize = 0
> logsunit = 1
> features2 = 0x18a
> bad_features2 = 0x18a
> features_compat = 0
> features_ro_compat = 0x1
> features_incompat = 0x1
> features_log_incompat = 0
> crc = 0x3ebf41de (correct)
> spino_align = 0
> pquotino = null
> lsn = 0x70002828b
> meta_uuid = 00000000-0000-0000-0000-000000000000
> 
> 

Tha looks reasonable enough... Heck, what's happening? You could try
to run an integrity check from xfs_db (still using the dump) to locate
the error:

xfs_db -c 'sb 0' -c 'check' /tmp/fsimage

What does it report?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help)
  2017-04-10 18:05     ` L A Walsh
@ 2017-04-11 12:57       ` Emmanuel Florac
  2017-04-11 13:34         ` Eric Sandeen
  0 siblings, 1 reply; 32+ messages in thread
From: Emmanuel Florac @ 2017-04-11 12:57 UTC (permalink / raw)
  To: L A Walsh; +Cc: Eric Sandeen, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

Le Mon, 10 Apr 2017 11:05:38 -0700
L A Walsh <xfs@tlinx.org> écrivait:

> 	I'm sure it wouldn't be trivial, but creating a separate
> file system, "XFS2" from the original XFS sources that responded
> to data or metadata corruption by returning empty data where
> it was impossible to return anything useful instead of flagging
> the disk as "bad", would be a way to allow data recovery to
> the extent that it made sense (assuming the original sources
> couldn't do the same toggling off a config-flag).

It would probably much easier to add an option to mount the filesystem
without crc, similar to "norecovery", that doesn't replay the journal.
It would be of course read-only, but in a similar case it would be much
easier and practical for everyone.

So far I believed that metadata CRCs were a promise of safer
filesystems; now that I've setup several multi-hundred terabytes
volumes with CRC enabled, I'm getting nervous...

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 12:49               ` Emmanuel Florac
@ 2017-04-11 13:07                 ` Avi Kivity
  2017-04-11 16:13                   ` Emmanuel Florac
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-11 13:07 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Brian Foster, linux-xfs



On 04/11/2017 03:49 PM, Emmanuel Florac wrote:
> Le Tue, 11 Apr 2017 15:03:12 +0300
> Avi Kivity <avi@scylladb.com> écrivait:
>
>> On 04/11/2017 03:00 PM, Emmanuel Florac wrote:
>>> Le Tue, 11 Apr 2017 14:40:15 +0300
>>> Avi Kivity <avi@scylladb.com> écrivait:
>>>   
>>>> $ sudo xfs_db -c "sb 0" /tmp/fs
>>>> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic
>>>> number 0x5846534d)
>>>> Use -F to force a read attempt.
>>>> $ sudo xfs_db -F -c "sb 0" /tmp/fs
>>>> xfs_db: /tmp/fs is not a valid XFS filesystem (unexpected SB magic
>>>> number 0x5846534d)
>>>> xfs_db: V1 inodes unsupported. Please try an older xfsprogs.
>>> Oops, I forgot one important part, sorry, you must restore the
>>> meta_dump to a file first:
>>>
>>> xfs_mdrestore /tmp/fs /tmp/fsimage
>>>
>>> then run xfs_db on the /tmp/fsimage:
>>>
>>> xfs_db -c 'sb 0' -c 'p' /tmp/fsimage
>>>   
>> magicnum = 0x58465342
>> blocksize = 4096
>> dblocks = 125026902
>> rblocks = 0
>> rextents = 0
>> uuid = 50b25ad8-3eb9-4273-b7f2-d0a435b3a08f
>> logstart = 67108869
>> rootino = 96
>> rbmino = 97
>> rsumino = 98
>> rextsize = 1
>> agblocks = 31256726
>> agcount = 4
>> rbmblocks = 0
>> logblocks = 61048
>> versionnum = 0xb4b5
>> sectsize = 512
>> inodesize = 512
>> inopblock = 8
>> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
>> blocklog = 12
>> sectlog = 9
>> inodelog = 9
>> inopblog = 3
>> agblklog = 25
>> rextslog = 0
>> inprogress = 0
>> imax_pct = 25
>> icount = 1959744
>> ifree = 89
>> fdblocks = 91586587
>> frextents = 0
>> uquotino = null
>> gquotino = null
>> qflags = 0
>> flags = 0
>> shared_vn = 0
>> inoalignmt = 4
>> unit = 0
>> width = 0
>> dirblklog = 0
>> logsectlog = 0
>> logsectsize = 0
>> logsunit = 1
>> features2 = 0x18a
>> bad_features2 = 0x18a
>> features_compat = 0
>> features_ro_compat = 0x1
>> features_incompat = 0x1
>> features_log_incompat = 0
>> crc = 0x3ebf41de (correct)
>> spino_align = 0
>> pquotino = null
>> lsn = 0x70002828b
>> meta_uuid = 00000000-0000-0000-0000-000000000000
>>
>>
> Tha looks reasonable enough... Heck, what's happening? You could try
> to run an integrity check from xfs_db (still using the dump) to locate
> the error:
>
> xfs_db -c 'sb 0' -c 'check' /tmp/fsimage
>
> What does it report?
>


$ sudo xfs_db -c 'sb 0' -c 'check' /tmp/fsimage
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_db.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.


xfs_repair did not recognize the superblock, and started hunting for the 
second one, emitting dots in the process.  I stopped it, since it failed 
on the live disk.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help)
  2017-04-11 12:57       ` Emmanuel Florac
@ 2017-04-11 13:34         ` Eric Sandeen
  2017-04-11 16:18           ` Emmanuel Florac
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Sandeen @ 2017-04-11 13:34 UTC (permalink / raw)
  To: Emmanuel Florac, L A Walsh; +Cc: linux-xfs


[-- Attachment #1.1: Type: text/plain, Size: 2366 bytes --]

On 4/11/17 7:57 AM, Emmanuel Florac wrote:
> Le Mon, 10 Apr 2017 11:05:38 -0700
> L A Walsh <xfs@tlinx.org> écrivait:
> 
>> 	I'm sure it wouldn't be trivial, but creating a separate
>> file system, "XFS2" from the original XFS sources that responded
>> to data or metadata corruption by returning empty data where
>> it was impossible to return anything useful instead of flagging
>> the disk as "bad", would be a way to allow data recovery to
>> the extent that it made sense (assuming the original sources
>> couldn't do the same toggling off a config-flag).
> 
> It would probably much easier to add an option to mount the filesystem
> without crc, similar to "norecovery", that doesn't replay the journal.
> It would be of course read-only, but in a similar case it would be much
> easier and practical for everyone.

Yes, I actually whipped up a patch to do just that, because I was curious.
Although I don't think it would fly, I may send it just to have
a record out on the list.

> So far I believed that metadata CRCs were a promise of safer
> filesystems; now that I've setup several multi-hundred terabytes
> volumes with CRC enabled, I'm getting nervous...

Why?

So far there's a lot of fear & speculation from some quarters, but no
reports of any actual real-world significant downside to CRC integrity
checking.

A few amendments to my possibly too-quick reply yesterday, though...

One, not every CRC error will shut down your filesystem - far from it.
As a quick test of Linda's first scenario, you can corrupt a timestamp
without changing the CRC, using xfs_db's expert mode.  That inode
will be inaccessible until it's fixed with xfs_repair, but the filesystem
continues on happily.

Two, after talking with Darrick I realized that I misrepresented
things a bit; we checksum the entire sector of metadata, so yes,
even a bitflip in an unused portion of that location could cause
a crc mismatch and therefore a metadata read error.  But again, this would
render that data structure inaccessible until repair, but it would not take
the entire filesystem offline.

Three, none of this has anything to do with the email that started
this thread.  Bad firmware turned Avi's SSD into a vat of goo, and CRCs
are not in any way related to his inability to recover his filesystem.

Thanks,
-Eric

 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 13:07                 ` Avi Kivity
@ 2017-04-11 16:13                   ` Emmanuel Florac
  2017-04-11 16:44                     ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Emmanuel Florac @ 2017-04-11 16:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Brian Foster, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

Le Tue, 11 Apr 2017 16:07:56 +0300
Avi Kivity <avi@scylladb.com> écrivait:

> $ sudo xfs_db -c 'sb 0' -c 'check' /tmp/fsimage
> ERROR: The filesystem has valuable metadata changes in a log which
> needs to be replayed.  Mount the filesystem to replay the log, and
> unmount it before re-running xfs_db.  If you are unable to mount the
> filesystem, then use the xfs_repair -L option to destroy the log and
> attempt a repair. Note that destroying the log may cause corruption
> -- please attempt a mount of the filesystem before doing this.
> 
> 
> xfs_repair did not recognize the superblock, and started hunting for
> the second one, emitting dots in the process.  I stopped it, since it
> failed on the live disk.
> 

Can you mount the image, or does it fail immmediately because of the
CRC error?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help)
  2017-04-11 13:34         ` Eric Sandeen
@ 2017-04-11 16:18           ` Emmanuel Florac
  2017-04-11 16:34             ` Eric Sandeen
  0 siblings, 1 reply; 32+ messages in thread
From: Emmanuel Florac @ 2017-04-11 16:18 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: L A Walsh, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

Le Tue, 11 Apr 2017 08:34:44 -0500
Eric Sandeen <sandeen@sandeen.net> écrivait:

> Three, none of this has anything to do with the email that started
> this thread.  Bad firmware turned Avi's SSD into a vat of goo, and
> CRCs are not in any way related to his inability to recover his
> filesystem.

OK, but xfs_db finds and reads the sb OK, and it looks fine at first
look; why does xfs_repair fail completely? I'm not actually certain
that Avi's SSD is a "vat of goo"... 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help)
  2017-04-11 16:18           ` Emmanuel Florac
@ 2017-04-11 16:34             ` Eric Sandeen
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Sandeen @ 2017-04-11 16:34 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: L A Walsh, linux-xfs


[-- Attachment #1.1: Type: text/plain, Size: 2410 bytes --]



On 4/11/17 11:18 AM, Emmanuel Florac wrote:
> Le Tue, 11 Apr 2017 08:34:44 -0500
> Eric Sandeen <sandeen@sandeen.net> écrivait:
> 
>> Three, none of this has anything to do with the email that started
>> this thread.  Bad firmware turned Avi's SSD into a vat of goo, and
>> CRCs are not in any way related to his inability to recover his
>> filesystem.
> 
> OK, but xfs_db finds and reads the sb OK, and it looks fine at first
> look; why does xfs_repair fail completely? I'm not actually certain
> that Avi's SSD is a "vat of goo"... 

Well, the AGF printed out by the kernel on the mount attempt was
nowhere close to a valid AGF structure.

Ok, "vat of goo" may have been too strong, but there is at least
one core filesystem structure which are completely scrambled.

That's just the one that was obvious from the mount attempt; I was
assuming there were likely more areas of extreme damage, but that
was an assumption on my part.

Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata CRC error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl block 0x2cb68e13
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount and run xfs_repair
Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64 bytes of corrupted metadata buffer:
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75400: 23 40 8f 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed  #@.([P:..T.1....
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75410: 62 87 57 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09  b.WQ..1..,.Fl...
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75420: ae 7a ea b3 91 49 7e d3 99 a4 25 49 11 c5 8b be  .z...I~...%I....
Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75430: e4 2e 14 d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5  ......_.f.gr.... 

Again, don't fixate on the "CRC" error.  The above is /not/ an AGFL for this filesystem.

typedef struct xfs_agfl {
        __be32          agfl_magicnum;
        __be32          agfl_seqno;
        uuid_t          agfl_uuid;
        __be64          agfl_lsn;
        __be32          agfl_crc;
        __be32          agfl_bno[];     /* actually XFS_AGFL_SIZE(mp) */
} __attribute__((packed)) xfs_agfl_t;

The magicnum is wrong.
The seqno is invalid.
The UUID data in agfl_uuid does not match this filesystem.

etc...

-Eric


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 16:13                   ` Emmanuel Florac
@ 2017-04-11 16:44                     ` Avi Kivity
  2017-04-11 16:48                       ` Eric Sandeen
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-11 16:44 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Brian Foster, linux-xfs



On 04/11/2017 07:13 PM, Emmanuel Florac wrote:
> Le Tue, 11 Apr 2017 16:07:56 +0300
> Avi Kivity <avi@scylladb.com> écrivait:
>
>> $ sudo xfs_db -c 'sb 0' -c 'check' /tmp/fsimage
>> ERROR: The filesystem has valuable metadata changes in a log which
>> needs to be replayed.  Mount the filesystem to replay the log, and
>> unmount it before re-running xfs_db.  If you are unable to mount the
>> filesystem, then use the xfs_repair -L option to destroy the log and
>> attempt a repair. Note that destroying the log may cause corruption
>> -- please attempt a mount of the filesystem before doing this.
>>
>>
>> xfs_repair did not recognize the superblock, and started hunting for
>> the second one, emitting dots in the process.  I stopped it, since it
>> failed on the live disk.
>>
> Can you mount the image, or does it fail immmediately because of the
> CRC error?
>

Fails immediately.

I'll probably format it with ext4, wait for the firmware update, and 
then reformat it with xfs.  Since the firmware bug was acknowledged, I 
don't know what we can gain from it.  My disk is mostly a git and imap 
mirror anyway, + a large ccache repository, + a throwaway database.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 16:44                     ` Avi Kivity
@ 2017-04-11 16:48                       ` Eric Sandeen
  2017-04-12 15:15                         ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Sandeen @ 2017-04-11 16:48 UTC (permalink / raw)
  To: Avi Kivity, Emmanuel Florac; +Cc: Brian Foster, linux-xfs



On 4/11/17 11:44 AM, Avi Kivity wrote:
> 
> 
> On 04/11/2017 07:13 PM, Emmanuel Florac wrote:
>> Le Tue, 11 Apr 2017 16:07:56 +0300
>> Avi Kivity <avi@scylladb.com> écrivait:
>>
>>> $ sudo xfs_db -c 'sb 0' -c 'check' /tmp/fsimage
>>> ERROR: The filesystem has valuable metadata changes in a log which
>>> needs to be replayed.  Mount the filesystem to replay the log, and
>>> unmount it before re-running xfs_db.  If you are unable to mount the
>>> filesystem, then use the xfs_repair -L option to destroy the log and
>>> attempt a repair. Note that destroying the log may cause corruption
>>> -- please attempt a mount of the filesystem before doing this.
>>>
>>>
>>> xfs_repair did not recognize the superblock, and started hunting for
>>> the second one, emitting dots in the process.  I stopped it, since it
>>> failed on the live disk.
>>>
>> Can you mount the image, or does it fail immmediately because of the
>> CRC error?
>>
> 
> Fails immediately.
> 
> I'll probably format it with ext4, wait for the firmware update, and
> then reformat it with xfs. Since the firmware bug was acknowledged, I
> don't know what we can gain from it. My disk is mostly a git and imap
> mirror anyway, + a large ccache repository, + a throwaway database.

Honestly, I'd be a little leary of ext4 too - I don't know what the underlying
problem is, but it must be related to some IO pattern that is more common
on xfs, but it's a leap to say that it's never present on any other fs...

IOWS: a drive firmware bug that corrupts data probably can't be trusted
with any filesystem.

As an experiment, though, if you want to play, it might be interesting to
mkfs.xfs it with a 4096 byte sector size, and see if that makes it happier.
By default, xfs is doing metadata IO in 512 chunks, something other filesystems
won't do by default.

I guess you don't know how to provoke the corruption, though, to be able
to run that test reliably...

-Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-11 16:48                       ` Eric Sandeen
@ 2017-04-12 15:15                         ` Christoph Hellwig
  2017-04-12 15:34                           ` Eric Sandeen
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2017-04-12 15:15 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Avi Kivity, Emmanuel Florac, Brian Foster, linux-xfs

On Tue, Apr 11, 2017 at 11:48:19AM -0500, Eric Sandeen wrote:
> As an experiment, though, if you want to play, it might be interesting to
> mkfs.xfs it with a 4096 byte sector size, and see if that makes it happier.
> By default, xfs is doing metadata IO in 512 chunks, something other filesystems
> won't do by default.
> 
> I guess you don't know how to provoke the corruption, though, to be able
> to run that test reliably...

Btw, it might be a good idea to move to 4k sector size as the default,
on pretty much any modern hardware sector sizes are 4k or larger
internally, and 512 byte writes will always involve read-modify-write
cycles.  And unlike SATA or SCSI disks NVMe doesn't have a physical
block size attribute, so we can't even look at that.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 15:15                         ` Christoph Hellwig
@ 2017-04-12 15:34                           ` Eric Sandeen
  2017-04-12 15:45                             ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Sandeen @ 2017-04-12 15:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Avi Kivity, Emmanuel Florac, Brian Foster, linux-xfs

On 4/12/17 10:15 AM, Christoph Hellwig wrote:
> On Tue, Apr 11, 2017 at 11:48:19AM -0500, Eric Sandeen wrote:
>> As an experiment, though, if you want to play, it might be interesting to
>> mkfs.xfs it with a 4096 byte sector size, and see if that makes it happier.
>> By default, xfs is doing metadata IO in 512 chunks, something other filesystems
>> won't do by default.
>>
>> I guess you don't know how to provoke the corruption, though, to be able
>> to run that test reliably...
> 
> Btw, it might be a good idea to move to 4k sector size as the default,
> on pretty much any modern hardware sector sizes are 4k or larger
> internally, and 512 byte writes will always involve read-modify-write
> cycles.  And unlike SATA or SCSI disks NVMe doesn't have a physical
> block size attribute, so we can't even look at that.

Is it safe to do that on a device that /actually/ has only 512 sectors?

I /think/ Brian's tear detection helps, but </handwave> is it legit
to do metadata IO larger than the fundamental IO size of the storage?

-Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 15:34                           ` Eric Sandeen
@ 2017-04-12 15:45                             ` Christoph Hellwig
  2017-04-12 16:15                               ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2017-04-12 15:45 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, Avi Kivity, Emmanuel Florac, Brian Foster, linux-xfs

On Wed, Apr 12, 2017 at 10:34:47AM -0500, Eric Sandeen wrote:
> Is it safe to do that on a device that /actually/ has only 512 sectors?

Except for NVMe none of the storage standards actually guarantees
sector atomicy, although the whole storage stack traditionally relies on
it..

Maybe we should claim a 4k physical block size for NVMe devices that
hab 512 byte LBAs and a "Atomic Write Unit Power Fail" setting of at
least 8 so that the mkfs sector size logic triggers..

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 15:45                             ` Christoph Hellwig
@ 2017-04-12 16:15                               ` Avi Kivity
  2017-04-12 16:20                                 ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-12 16:15 UTC (permalink / raw)
  To: Christoph Hellwig, Eric Sandeen; +Cc: Emmanuel Florac, Brian Foster, linux-xfs

On 04/12/2017 06:45 PM, Christoph Hellwig wrote:
> On Wed, Apr 12, 2017 at 10:34:47AM -0500, Eric Sandeen wrote:
>> Is it safe to do that on a device that /actually/ has only 512 sectors?
> Except for NVMe none of the storage standards actually guarantees
> sector atomicy, although the whole storage stack traditionally relies on
> it..
>
> Maybe we should claim a 4k physical block size for NVMe devices that
> hab 512 byte LBAs and a "Atomic Write Unit Power Fail" setting of at
> least 8 so that the mkfs sector size logic triggers..

This preserves the ability to do O_DIRECT reads at 512 byte granularity, 
yes?  We make use of that (it's probably less important on NVMe; still 
why waste bandwidth needlessly).


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 16:15                               ` Avi Kivity
@ 2017-04-12 16:20                                 ` Christoph Hellwig
  2017-04-12 16:22                                   ` Eric Sandeen
  2017-04-12 16:22                                   ` Avi Kivity
  0 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-04-12 16:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Christoph Hellwig, Eric Sandeen, Emmanuel Florac, Brian Foster,
	linux-xfs

On Wed, Apr 12, 2017 at 07:15:40PM +0300, Avi Kivity wrote:
> This preserves the ability to do O_DIRECT reads at 512 byte granularity,
> yes?

No.

> We make use of that (it's probably less important on NVMe; still why
> waste bandwidth needlessly).

In that case it would be the wrong thing for you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 16:20                                 ` Christoph Hellwig
@ 2017-04-12 16:22                                   ` Eric Sandeen
  2017-04-12 16:24                                     ` Avi Kivity
  2017-04-12 16:22                                   ` Avi Kivity
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Sandeen @ 2017-04-12 16:22 UTC (permalink / raw)
  To: Christoph Hellwig, Avi Kivity; +Cc: Emmanuel Florac, Brian Foster, linux-xfs

On 4/12/17 11:20 AM, Christoph Hellwig wrote:
> On Wed, Apr 12, 2017 at 07:15:40PM +0300, Avi Kivity wrote:
>> This preserves the ability to do O_DIRECT reads at 512 byte granularity,
>> yes?
> 
> No.
> 
>> We make use of that (it's probably less important on NVMe; still why
>> waste bandwidth needlessly).
> 
> In that case it would be the wrong thing for you.

And it would be really interesting to see if the 512-byte DIOs you
issue under ext4 might trigger the same problem in the firmware.

(This is all pure speculation, but that's all we've got) ;)

-Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 16:20                                 ` Christoph Hellwig
  2017-04-12 16:22                                   ` Eric Sandeen
@ 2017-04-12 16:22                                   ` Avi Kivity
  2017-04-12 17:41                                     ` Christoph Hellwig
  1 sibling, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2017-04-12 16:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, Emmanuel Florac, Brian Foster, linux-xfs



On 04/12/2017 07:20 PM, Christoph Hellwig wrote:
> On Wed, Apr 12, 2017 at 07:15:40PM +0300, Avi Kivity wrote:
>> This preserves the ability to do O_DIRECT reads at 512 byte granularity,
>> yes?
> No.

:(

>
>> We make use of that (it's probably less important on NVMe; still why
>> waste bandwidth needlessly).
> In that case it would be the wrong thing for you.

Would it be under our control?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 16:22                                   ` Eric Sandeen
@ 2017-04-12 16:24                                     ` Avi Kivity
  0 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2017-04-12 16:24 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig; +Cc: Emmanuel Florac, Brian Foster, linux-xfs



On 04/12/2017 07:22 PM, Eric Sandeen wrote:
> On 4/12/17 11:20 AM, Christoph Hellwig wrote:
>> On Wed, Apr 12, 2017 at 07:15:40PM +0300, Avi Kivity wrote:
>>> This preserves the ability to do O_DIRECT reads at 512 byte granularity,
>>> yes?
>> No.
>>
>>> We make use of that (it's probably less important on NVMe; still why
>>> waste bandwidth needlessly).
>> In that case it would be the wrong thing for you.
> And it would be really interesting to see if the 512-byte DIOs you
> issue under ext4 might trigger the same problem in the firmware.

We only issue 512-byte reads, writes are always 4096-byte aligned (and 
usually 128k).

The disk that crashed was my /home; it did see some database loads, but 
not much.

>
> (This is all pure speculation, but that's all we've got) ;)
>
> -Eric


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: filesystem dead, xfs_repair won't help
  2017-04-12 16:22                                   ` Avi Kivity
@ 2017-04-12 17:41                                     ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-04-12 17:41 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Christoph Hellwig, Eric Sandeen, Emmanuel Florac, Brian Foster,
	linux-xfs

On Wed, Apr 12, 2017 at 07:22:20PM +0300, Avi Kivity wrote:
> > > We make use of that (it's probably less important on NVMe; still why
> > > waste bandwidth needlessly).
> > In that case it would be the wrong thing for you.
> 
> Would it be under our control?

Yes, even if we switch a mkfs default you could still manually override
it as long as the device supports a smaller logical block size, similar
to how we treat 512 logical / 4k physical SAS and SATA drives today.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-04-12 17:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-10  9:23 filesystem dead, xfs_repair won't help Avi Kivity
2017-04-10  9:42 ` Avi Kivity
2017-04-10 15:35   ` Brian Foster
2017-04-11  7:46     ` Avi Kivity
2017-04-11 11:30       ` Emmanuel Florac
2017-04-11 11:40         ` Avi Kivity
2017-04-11 12:00           ` Emmanuel Florac
2017-04-11 12:03             ` Avi Kivity
2017-04-11 12:49               ` Emmanuel Florac
2017-04-11 13:07                 ` Avi Kivity
2017-04-11 16:13                   ` Emmanuel Florac
2017-04-11 16:44                     ` Avi Kivity
2017-04-11 16:48                       ` Eric Sandeen
2017-04-12 15:15                         ` Christoph Hellwig
2017-04-12 15:34                           ` Eric Sandeen
2017-04-12 15:45                             ` Christoph Hellwig
2017-04-12 16:15                               ` Avi Kivity
2017-04-12 16:20                                 ` Christoph Hellwig
2017-04-12 16:22                                   ` Eric Sandeen
2017-04-12 16:24                                     ` Avi Kivity
2017-04-12 16:22                                   ` Avi Kivity
2017-04-12 17:41                                     ` Christoph Hellwig
2017-04-10  9:43 ` allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help) L A Walsh
2017-04-10 16:01   ` Eric Sandeen
2017-04-10 18:05     ` L A Walsh
2017-04-11 12:57       ` Emmanuel Florac
2017-04-11 13:34         ` Eric Sandeen
2017-04-11 16:18           ` Emmanuel Florac
2017-04-11 16:34             ` Eric Sandeen
2017-04-10 15:49 ` filesystem dead, xfs_repair won't help Eric Sandeen
2017-04-10 16:23   ` Christoph Hellwig
2017-04-11  7:48   ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.