All of lore.kernel.org
 help / color / mirror / Atom feed
* mount before xfs_repair hangs
@ 2020-03-07 20:36 Bart Brashers
  2020-03-08 19:43 ` Bart Brashers
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Brashers @ 2020-03-07 20:36 UTC (permalink / raw)
  To: linux-xfs

(Re-sending without the attachment, in case that's why my previous
message appears to have been silently rejected.)

I can't find a list for non-devel related post like this, so sorry if
this is going to the wrong place.

I have an older JBOD attached to a Microsemi 3164-8e RAID card (with
battery backup) using CentOS 7. It has been up and running for a few
weeks now, but backups have failed due to operator errors.

While I was working yesterday evening doing some moderate I/O, the
first backups (bacula) started, and 3 of the disks in a RAID6 set (7
data + 2 parity) suddenly went offline at the same time. I rebooted,
and the RAID card’s command line utility (arcconf) showed the drives
were back online – no longer marked “failed”. So I gave the command:

$ arcconf setstate 1 logicaldrive 3 optimal

To bring the array back online. The next step is to mount and umount
ito replay the journal, then run xfs_repair.

I gave the mount command and waited ~10 hours, but it’s still hung. I
can’t figure out why – any help would be greatly appreciated.

I realize now I should have cycled the power on the JBOD only (not
rebooted the server) to bring the drives back online.

Details:

I created the RAID arrays (7 data + 2 parity + 1 hot spare) like this
- taken from my notes:

<notes>
# create the RAID array 2 with physical labels 3.x (4TB disks):
arcconf create 1 logicaldrive stripesize 256 MAX 6 0  5 0 13 0 14 0 34
0 35 0 43 0 44 0 52 0 53 noprompt
# create the RAID array 3 with physical labels 2.x (4TB disks):
arcconf create 1 logicaldrive stripesize 256 MAX 6 0  7 0  8 0 16 0 17
0 25 0 26 0 46 0 47 0 56 noprompt
# create the RAID array 4 with physical labels 1.x (4TB disks):
arcconf create 1 logicaldrive stripesize 256 MAX 6 0 10 0 11 0 19 0 20
0 28 0 29 0 37 0 38 0 59 noprompt

# Combine the three 4TB disks arrays (called "logicaldrive" in arcconf).
# The three 4TB RAID arrays from arcconf appear as sde, sdf, and sdg.
pvcreate /dev/sde /dev/sdf /dev/sdg
vgcreate volgrp4TB /dev/sde /dev/sdf /dev/sdg
lvcreate -l 100%VG -n lvol4TB volgrp4TB # creates /dev/volgrp4TB/lvol4TB

# Store the journal on a RAID0 M2. NVMe device:

parted /dev/nvme0n1 mkpart primary  0% 25%
parted /dev/nvme0n1 mkpart primary 25% 50%
parted /dev/nvme0n1 mkpart primary 50% 75%
parted /dev/nvme0n1 mkpart primary 75% 100%

parted /dev/nvme1n1 mkpart primary  0% 25%
parted /dev/nvme1n1 mkpart primary 25% 50%
parted /dev/nvme1n1 mkpart primary 50% 75%
parted /dev/nvme1n1 mkpart primary 75% 100%

parted /dev/nvme0n1 set 1 raid on
parted /dev/nvme0n1 set 2 raid on
parted /dev/nvme0n1 set 3 raid on
parted /dev/nvme0n1 set 4 raid on

parted /dev/nvme1n1 set 1 raid on
parted /dev/nvme1n1 set 2 raid on
parted /dev/nvme1n1 set 3 raid on
parted /dev/nvme1n1 set 4 raid on

mdadm --create /dev/md/nvme1 --level=mirror --raid-devices=2
/dev/nvme0n1p1 /dev/nvme1n1p1
mdadm --create /dev/md/nvme2 --level=mirror --raid-devices=2
/dev/nvme0n1p2 /dev/nvme1n1p2
mdadm --create /dev/md/nvme3 --level=mirror --raid-devices=2
/dev/nvme0n1p3 /dev/nvme1n1p3
mdadm --create /dev/md/nvme4 --level=mirror --raid-devices=2
/dev/nvme0n1p4 /dev/nvme1n1p4
mdadm --examine --scan --config=/etc/mdadm.conf | tail -4 >> /etc/mdadm.conf

mkfs.xfs -l logdev=/dev/md/nvme2,size=500000b -d su=256k,sw=7
/dev/volgrp4TB/lvol4TB

mkdir -p /export/lvol4TB
echo "/dev/volgrp4TB/lvol4TB    /export/lvol4TB         auto
inode64,logdev=/dev/md/nvme2    1 2" >> /etc/fstab
mount -a
</notes>

After a boot, I can still see the PVs and LVs:

$ pvs; lvs
  PV         VG        Fmt  Attr PSize   PFree
  /dev/sdc   volgrp6TB lvm2 a--  <38.21t    0
  /dev/sdd   volgrp6TB lvm2 a--  <38.21t    0
  /dev/sde   volgrp4TB lvm2 a--   25.47t    0
  /dev/sdf   volgrp4TB lvm2 a--   25.47t    0
  /dev/sdg   volgrp4TB lvm2 a--   25.47t    0
  LV      VG        Attr       LSize  Pool Origin Data%  Meta%  Move
Log Cpy%Sync Convert
  lvol4TB volgrp4TB -wi-ao---- 76.41t
 lvol6TB volgrp6TB -wi-ao---- 76.41t

 When I try to mount the problematic filesystem using

% mount -o inode64,logdev=/dev/md/nvme2 -t xfs /dev/volgrp4TB/lvol4TB
/export/lvol4TB

or by using mount -a with the following in /etc/fstab:

/dev/volgrp4TB/lvol4TB  /export/lvol4TB         xfs
inode64,logdev=/dev/md/nvme2        1 2

I see this in dmesg:

[  417.033013] XFS (dm-0): Mounting V5 Filesystem
[  417.086868] XFS (dm-0): Starting recovery (logdev: /dev/md/nvme2)
[  417.094291] XFS (dm-0): Metadata corruption detected at
xfs_inode_buf_verify+0x79/0x100 [xfs], xfs_inode block
0x108605a000
[  417.094342] XFS (dm-0): Unmount and run xfs_repair
[  417.094360] XFS (dm-0): First 64 bytes of corrupted metadata buffer:
[  417.094383] ffffbd6f5ba68000: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.094412] ffffbd6f5ba68010: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.094439] ffffbd6f5ba68020: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.094467] ffffbd6f5ba68030: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.094512] XFS (dm-0): Metadata corruption detected at
xfs_inode_buf_verify+0x79/0x100 [xfs], xfs_inode block 0x108605a000
        (repeat like that many times)
[  417.209070] XFS (dm-0): Unmount and run xfs_repair
[  417.209662] XFS (dm-0): First 64 bytes of corrupted metadata buffer:
[  417.210263] ffffbd6f5ba68000: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.210893] ffffbd6f5ba68010: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.211518] ffffbd6f5ba68020: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.212121] ffffbd6f5ba68030: aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa  ................
[  417.212709] XFS (dm-0): metadata I/O error: block 0x108605a000
("xlog_recover_do..(read#2)") error 117 numblks 32
[  417.213415] XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at
line 143 of file fs/xfs/libxfs/xfs_dir2_data.c.  Caller
xfs_dir3_data_verify+0x9a/0xb0 [xfs]
[  417.214652] CPU: 19 PID: 11323 Comm: mount Kdump: loaded Tainted: G
          OE  ------------   3.10.0-1062.el7.x86_64 #1
[  417.214653] Hardware name: Supermicro SYS-6028R-TR/X10DRi, BIOS
3.1b 05/16/2019
[  417.214654] Call Trace:
[  417.214663]  [<ffffffffa6779262>] dump_stack+0x19/0x1b
[  417.214680]  [<ffffffffc06af26b>] xfs_error_report+0x3b/0x40 [xfs]
[  417.214690]  [<ffffffffc069192a>] ? xfs_dir3_data_verify+0x9a/0xb0 [xfs]
[  417.214700]  [<ffffffffc0691779>] __xfs_dir3_data_check+0x4b9/0x5d0 [xfs]
[  417.214710]  [<ffffffffc069192a>] xfs_dir3_data_verify+0x9a/0xb0 [xfs]
[  417.214719]  [<ffffffffc0691971>] xfs_dir3_data_write_verify+0x31/0xc0 [xfs]
[  417.214730]  [<ffffffffc06acc9e>] ?
xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs]
[  417.214740]  [<ffffffffc06aada7>] _xfs_buf_ioapply+0x97/0x460 [xfs]
[  417.214744]  [<ffffffffa60da0b0>] ? wake_up_state+0x20/0x20
[  417.214754]  [<ffffffffc06acc9e>] ?
xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs]
[  417.214764]  [<ffffffffc06ac8c1>] __xfs_buf_submit+0x61/0x210 [xfs]
[  417.214773]  [<ffffffffc06acc9e>]
xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs]
[  417.214783]  [<ffffffffc06ad97c>] ? xfs_buf_delwri_submit+0x3c/0xf0 [xfs]
[  417.214793]  [<ffffffffc06ad97c>] xfs_buf_delwri_submit+0x3c/0xf0 [xfs]
[  417.214806]  [<ffffffffc06daa9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
[  417.214819]  [<ffffffffc06dae59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
[  417.214830]  [<ffffffffc06daed1>] xlog_do_recover+0x31/0x180 [xfs]
[  417.214842]  [<ffffffffc06dbfef>] xlog_recover+0xbf/0x190 [xfs]
[  417.214854]  [<ffffffffc06ce58f>] xfs_log_mount+0xff/0x310 [xfs]
[  417.214866]  [<ffffffffc06c51b0>] xfs_mountfs+0x520/0x8e0 [xfs]
[  417.214877]  [<ffffffffc06c82a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
[  417.214881]  [<ffffffffa624c893>] mount_bdev+0x1b3/0x1f0
[  417.214892]  [<ffffffffc06c7e90>] ?
xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
[  417.214903]  [<ffffffffc06c6aa5>] xfs_fs_mount+0x15/0x20 [xfs]
[  417.214905]  [<ffffffffa624d1fe>] mount_fs+0x3e/0x1b0
[  417.214908]  [<ffffffffa626b377>] vfs_kern_mount+0x67/0x110
[  417.214910]  [<ffffffffa626dacf>] do_mount+0x1ef/0xce0
[  417.214913]  [<ffffffffa624521a>] ? __check_object_size+0x1ca/0x250
[  417.214916]  [<ffffffffa622368c>] ? kmem_cache_alloc_trace+0x3c/0x200
[  417.214918]  [<ffffffffa626e903>] SyS_mount+0x83/0xd0
[  417.214921]  [<ffffffffa678bede>] system_call_fastpath+0x25/0x2a
[  417.214932] XFS (dm-0): Metadata corruption detected at
xfs_dir3_data_write_verify+0xad/0xc0 [xfs], xfs_dir3_data block
0x19060071f8
[  417.216244] XFS (dm-0): Unmount and run xfs_repair
[  417.216901] XFS (dm-0): First 64 bytes of corrupted metadata buffer:
[  417.217550] ffff98b8b5edc000: 58 44 44 33 1f 58 d3 06 00 00 00 19
06 00 71 f8  XDD3.X........q.
[  417.218208] ffff98b8b5edc010: 00 00 00 01 00 11 d1 0e 77 df d5 2b
c0 8a 44 38  ........w..+..D8
[  417.218864] ffff98b8b5edc020: b6 6b 44 d5 a5 17 96 e6 00 00 00 19
06 00 d6 85  .kD.............
[  417.219527] ffff98b8b5edc030: 07 30 08 d0 00 40 00 20 00 00 00 00
00 00 20 00  .0...@. ...... .
[  417.220203] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line
1393 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06aadd7
[  417.220205] XFS (dm-0): Corruption of in-memory data detected.
Shutting down filesystem
[  417.220897] XFS (dm-0): Please umount the filesystem and rectify
the problem(s)

I would try to umount, but the mount command has hung. Ctrl-C does not
kill it. I can see it in `ps aux`, but it’s not using any CPU. `kill
-9 [pid]` does not kill the process. At this point I can only reboot
to kill off the hung mount process.

If I try to run xfs_repair before trying to mount it, it says:

$ xfs_repair -l /dev/md/nvme2 /dev/volgrp4TB/lvol4TB
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using external log on /dev/md/nvme2
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

If I try to run xfs_repair after trying to mount it, it says:

$ xfs_repair -l /dev/md/nvme2 /dev/volgrp4TB/lvol4TB
xfs_repair: cannot open /dev/volgrp4TB/lvol4TB: Device or resource busy

I can see in arcconf that the array is rebuilding.

I did the “echo w > /proc/sysrq-trigger” and captured the output of
dmesg, attached to this email.

Here’s most of the details that are requested at
https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

$ uname -a
Linux hostname1.localdomain 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7
18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ xfs_repair -V
xfs_repair version 4.5.0

$ cat /proc/cpuinfo | grep processor | wc -l
56

$ cat /proc/meminfo
MemTotal:       263856296 kB
MemFree:        261564216 kB
MemAvailable:   261016076 kB
Buffers:            4444 kB
Cached:           183392 kB
SwapCached:            0 kB
Active:           109128 kB
Inactive:         147332 kB
Active(anon):      69304 kB
Inactive(anon):     9992 kB
Active(file):      39824 kB
Inactive(file):   137340 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         68780 kB
Mapped:            29312 kB
Shmem:             10568 kB
Slab:             146228 kB
SReclaimable:      44360 kB
SUnreclaim:       101868 kB
KernelStack:       10176 kB
PageTables:         5712 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    136122448 kB
Committed_AS:     370852 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      712828 kB
VmallocChunk:   34224732156 kB
HardwareCorrupted:     0 kB
AnonHugePages:     14336 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      288568 kB
DirectMap2M:     6940672 kB
DirectMap1G:    263192576 kB

$ cat /proc/partitions
major minor  #blocks  name
   8       16 1953514584 sdb
   8       17 1895636992 sdb1
   8       18   52428800 sdb2
   8        0 1953514584 sda
   8        1     204800 sda1
   8        2    1047552 sda2
   8        3 1895636992 sda3
   8        4   52428800 sda4
   8        5    4194304 sda5
   9      127   52395008 md127
   8       32 41023425112 sdc
   8       48 41023425112 sdd
   8       64 27348895576 sde
   8       80 27348895576 sdf
   8       96 27348895576 sdg
259        1  244198584 nvme0n1
259        4   61048832 nvme0n1p1
259        6   61049856 nvme0n1p2
259        8   61048832 nvme0n1p3
259        9   61049856 nvme0n1p4
259        0  244198584 nvme1n1
259        2   61048832 nvme1n1p1
259        3   61049856 nvme1n1p2
259        5   61048832 nvme1n1p3
259        7   61049856 nvme1n1p4
   9      126 1895504896 md126
   9      125   61016064 md125
   9      124   61015040 md124
   9      123   61016064 md123
   9      122   61015040 md122
253        0 82046681088 dm-0
253        1 82046844928 dm-1

$ xfs_info /dev/volgrp4TB/
xfs_info: /dev/volgrp4TB/ is not a mounted XFS filesystem

Bart
206-550-2606 cell, textable

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-07 20:36 mount before xfs_repair hangs Bart Brashers
@ 2020-03-08 19:43 ` Bart Brashers
  2020-03-08 22:26   ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Brashers @ 2020-03-08 19:43 UTC (permalink / raw)
  To: linux-xfs

An update:

Mounting the degraded xfs filesystem still hangs, so I can't replay
the journal, so I don't yet want to run xfs_repair.

I can mount the degraded xfs filesystem like this:

$ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2
/dev/volgrp4TB/lvol4TB /export/lvol4TB/

If I do a "du" on the contents, I see 3822 files with either
"Structure needs cleaning" or "No such file or directory".

Is what I mounted what I would get if I used the xfs_repair -L option,
and discarded the journal? Or would there be more corruption, e.g. to
the directory structure?

Some of the instances of "No such file or directory" are for files
that are not in their correct directory - I can tell by the filetype
and the directory name. Does that by itself imply directory
corruption?

At this point, can I do a backup, either using rsync or xfsdump or
xfs_copy? I have a separate RAID array on the same server where I
could put the 7.8 TB of data, though the destination already has data
on it - so I don't think xfs_copy is right. Is xfsdump to a directory
faster/better than rsync? Or would it be best to use something like

$ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -)

Thanks,

Bart

On Sat, Mar 7, 2020 at 12:36 PM Bart Brashers <bart.brashers@gmail.com> wrote:
>
> (Re-sending without the attachment, in case that's why my previous
> message appears to have been silently rejected.)
>
> I can't find a list for non-devel related post like this, so sorry if
> this is going to the wrong place.
>
> I have an older JBOD attached to a Microsemi 3164-8e RAID card (with
> battery backup) using CentOS 7. It has been up and running for a few
> weeks now, but backups have failed due to operator errors.
>
> While I was working yesterday evening doing some moderate I/O, the
> first backups (bacula) started, and 3 of the disks in a RAID6 set (7
> data + 2 parity) suddenly went offline at the same time. I rebooted,
> and the RAID card’s command line utility (arcconf) showed the drives
> were back online – no longer marked “failed”. So I gave the command:
>
> $ arcconf setstate 1 logicaldrive 3 optimal
>
> To bring the array back online. The next step is to mount and umount
> ito replay the journal, then run xfs_repair.
>
> I gave the mount command and waited ~10 hours, but it’s still hung. I
> can’t figure out why – any help would be greatly appreciated.
>
> I realize now I should have cycled the power on the JBOD only (not
> rebooted the server) to bring the drives back online.
>
> Details:
>
> I created the RAID arrays (7 data + 2 parity + 1 hot spare) like this
> - taken from my notes:
>
> <notes>
> # create the RAID array 2 with physical labels 3.x (4TB disks):
> arcconf create 1 logicaldrive stripesize 256 MAX 6 0  5 0 13 0 14 0 34
> 0 35 0 43 0 44 0 52 0 53 noprompt
> # create the RAID array 3 with physical labels 2.x (4TB disks):
> arcconf create 1 logicaldrive stripesize 256 MAX 6 0  7 0  8 0 16 0 17
> 0 25 0 26 0 46 0 47 0 56 noprompt
> # create the RAID array 4 with physical labels 1.x (4TB disks):
> arcconf create 1 logicaldrive stripesize 256 MAX 6 0 10 0 11 0 19 0 20
> 0 28 0 29 0 37 0 38 0 59 noprompt
>
> # Combine the three 4TB disks arrays (called "logicaldrive" in arcconf).
> # The three 4TB RAID arrays from arcconf appear as sde, sdf, and sdg.
> pvcreate /dev/sde /dev/sdf /dev/sdg
> vgcreate volgrp4TB /dev/sde /dev/sdf /dev/sdg
> lvcreate -l 100%VG -n lvol4TB volgrp4TB # creates /dev/volgrp4TB/lvol4TB
>
> # Store the journal on a RAID0 M2. NVMe device:
>
> parted /dev/nvme0n1 mkpart primary  0% 25%
> parted /dev/nvme0n1 mkpart primary 25% 50%
> parted /dev/nvme0n1 mkpart primary 50% 75%
> parted /dev/nvme0n1 mkpart primary 75% 100%
>
> parted /dev/nvme1n1 mkpart primary  0% 25%
> parted /dev/nvme1n1 mkpart primary 25% 50%
> parted /dev/nvme1n1 mkpart primary 50% 75%
> parted /dev/nvme1n1 mkpart primary 75% 100%
>
> parted /dev/nvme0n1 set 1 raid on
> parted /dev/nvme0n1 set 2 raid on
> parted /dev/nvme0n1 set 3 raid on
> parted /dev/nvme0n1 set 4 raid on
>
> parted /dev/nvme1n1 set 1 raid on
> parted /dev/nvme1n1 set 2 raid on
> parted /dev/nvme1n1 set 3 raid on
> parted /dev/nvme1n1 set 4 raid on
>
> mdadm --create /dev/md/nvme1 --level=mirror --raid-devices=2
> /dev/nvme0n1p1 /dev/nvme1n1p1
> mdadm --create /dev/md/nvme2 --level=mirror --raid-devices=2
> /dev/nvme0n1p2 /dev/nvme1n1p2
> mdadm --create /dev/md/nvme3 --level=mirror --raid-devices=2
> /dev/nvme0n1p3 /dev/nvme1n1p3
> mdadm --create /dev/md/nvme4 --level=mirror --raid-devices=2
> /dev/nvme0n1p4 /dev/nvme1n1p4
> mdadm --examine --scan --config=/etc/mdadm.conf | tail -4 >> /etc/mdadm.conf
>
> mkfs.xfs -l logdev=/dev/md/nvme2,size=500000b -d su=256k,sw=7
> /dev/volgrp4TB/lvol4TB
>
> mkdir -p /export/lvol4TB
> echo "/dev/volgrp4TB/lvol4TB    /export/lvol4TB         auto
> inode64,logdev=/dev/md/nvme2    1 2" >> /etc/fstab
> mount -a
> </notes>
>
> After a boot, I can still see the PVs and LVs:
>
> $ pvs; lvs
>   PV         VG        Fmt  Attr PSize   PFree
>   /dev/sdc   volgrp6TB lvm2 a--  <38.21t    0
>   /dev/sdd   volgrp6TB lvm2 a--  <38.21t    0
>   /dev/sde   volgrp4TB lvm2 a--   25.47t    0
>   /dev/sdf   volgrp4TB lvm2 a--   25.47t    0
>   /dev/sdg   volgrp4TB lvm2 a--   25.47t    0
>   LV      VG        Attr       LSize  Pool Origin Data%  Meta%  Move
> Log Cpy%Sync Convert
>   lvol4TB volgrp4TB -wi-ao---- 76.41t
>  lvol6TB volgrp6TB -wi-ao---- 76.41t
>
>  When I try to mount the problematic filesystem using
>
> % mount -o inode64,logdev=/dev/md/nvme2 -t xfs /dev/volgrp4TB/lvol4TB
> /export/lvol4TB
>
> or by using mount -a with the following in /etc/fstab:
>
> /dev/volgrp4TB/lvol4TB  /export/lvol4TB         xfs
> inode64,logdev=/dev/md/nvme2        1 2
>
> I see this in dmesg:
>
> [  417.033013] XFS (dm-0): Mounting V5 Filesystem
> [  417.086868] XFS (dm-0): Starting recovery (logdev: /dev/md/nvme2)
> [  417.094291] XFS (dm-0): Metadata corruption detected at
> xfs_inode_buf_verify+0x79/0x100 [xfs], xfs_inode block
> 0x108605a000
> [  417.094342] XFS (dm-0): Unmount and run xfs_repair
> [  417.094360] XFS (dm-0): First 64 bytes of corrupted metadata buffer:
> [  417.094383] ffffbd6f5ba68000: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.094412] ffffbd6f5ba68010: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.094439] ffffbd6f5ba68020: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.094467] ffffbd6f5ba68030: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.094512] XFS (dm-0): Metadata corruption detected at
> xfs_inode_buf_verify+0x79/0x100 [xfs], xfs_inode block 0x108605a000
>         (repeat like that many times)
> [  417.209070] XFS (dm-0): Unmount and run xfs_repair
> [  417.209662] XFS (dm-0): First 64 bytes of corrupted metadata buffer:
> [  417.210263] ffffbd6f5ba68000: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.210893] ffffbd6f5ba68010: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.211518] ffffbd6f5ba68020: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.212121] ffffbd6f5ba68030: aa aa aa aa aa aa aa aa aa aa aa aa
> aa aa aa aa  ................
> [  417.212709] XFS (dm-0): metadata I/O error: block 0x108605a000
> ("xlog_recover_do..(read#2)") error 117 numblks 32
> [  417.213415] XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at
> line 143 of file fs/xfs/libxfs/xfs_dir2_data.c.  Caller
> xfs_dir3_data_verify+0x9a/0xb0 [xfs]
> [  417.214652] CPU: 19 PID: 11323 Comm: mount Kdump: loaded Tainted: G
>           OE  ------------   3.10.0-1062.el7.x86_64 #1
> [  417.214653] Hardware name: Supermicro SYS-6028R-TR/X10DRi, BIOS
> 3.1b 05/16/2019
> [  417.214654] Call Trace:
> [  417.214663]  [<ffffffffa6779262>] dump_stack+0x19/0x1b
> [  417.214680]  [<ffffffffc06af26b>] xfs_error_report+0x3b/0x40 [xfs]
> [  417.214690]  [<ffffffffc069192a>] ? xfs_dir3_data_verify+0x9a/0xb0 [xfs]
> [  417.214700]  [<ffffffffc0691779>] __xfs_dir3_data_check+0x4b9/0x5d0 [xfs]
> [  417.214710]  [<ffffffffc069192a>] xfs_dir3_data_verify+0x9a/0xb0 [xfs]
> [  417.214719]  [<ffffffffc0691971>] xfs_dir3_data_write_verify+0x31/0xc0 [xfs]
> [  417.214730]  [<ffffffffc06acc9e>] ?
> xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs]
> [  417.214740]  [<ffffffffc06aada7>] _xfs_buf_ioapply+0x97/0x460 [xfs]
> [  417.214744]  [<ffffffffa60da0b0>] ? wake_up_state+0x20/0x20
> [  417.214754]  [<ffffffffc06acc9e>] ?
> xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs]
> [  417.214764]  [<ffffffffc06ac8c1>] __xfs_buf_submit+0x61/0x210 [xfs]
> [  417.214773]  [<ffffffffc06acc9e>]
> xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs]
> [  417.214783]  [<ffffffffc06ad97c>] ? xfs_buf_delwri_submit+0x3c/0xf0 [xfs]
> [  417.214793]  [<ffffffffc06ad97c>] xfs_buf_delwri_submit+0x3c/0xf0 [xfs]
> [  417.214806]  [<ffffffffc06daa9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
> [  417.214819]  [<ffffffffc06dae59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
> [  417.214830]  [<ffffffffc06daed1>] xlog_do_recover+0x31/0x180 [xfs]
> [  417.214842]  [<ffffffffc06dbfef>] xlog_recover+0xbf/0x190 [xfs]
> [  417.214854]  [<ffffffffc06ce58f>] xfs_log_mount+0xff/0x310 [xfs]
> [  417.214866]  [<ffffffffc06c51b0>] xfs_mountfs+0x520/0x8e0 [xfs]
> [  417.214877]  [<ffffffffc06c82a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
> [  417.214881]  [<ffffffffa624c893>] mount_bdev+0x1b3/0x1f0
> [  417.214892]  [<ffffffffc06c7e90>] ?
> xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
> [  417.214903]  [<ffffffffc06c6aa5>] xfs_fs_mount+0x15/0x20 [xfs]
> [  417.214905]  [<ffffffffa624d1fe>] mount_fs+0x3e/0x1b0
> [  417.214908]  [<ffffffffa626b377>] vfs_kern_mount+0x67/0x110
> [  417.214910]  [<ffffffffa626dacf>] do_mount+0x1ef/0xce0
> [  417.214913]  [<ffffffffa624521a>] ? __check_object_size+0x1ca/0x250
> [  417.214916]  [<ffffffffa622368c>] ? kmem_cache_alloc_trace+0x3c/0x200
> [  417.214918]  [<ffffffffa626e903>] SyS_mount+0x83/0xd0
> [  417.214921]  [<ffffffffa678bede>] system_call_fastpath+0x25/0x2a
> [  417.214932] XFS (dm-0): Metadata corruption detected at
> xfs_dir3_data_write_verify+0xad/0xc0 [xfs], xfs_dir3_data block
> 0x19060071f8
> [  417.216244] XFS (dm-0): Unmount and run xfs_repair
> [  417.216901] XFS (dm-0): First 64 bytes of corrupted metadata buffer:
> [  417.217550] ffff98b8b5edc000: 58 44 44 33 1f 58 d3 06 00 00 00 19
> 06 00 71 f8  XDD3.X........q.
> [  417.218208] ffff98b8b5edc010: 00 00 00 01 00 11 d1 0e 77 df d5 2b
> c0 8a 44 38  ........w..+..D8
> [  417.218864] ffff98b8b5edc020: b6 6b 44 d5 a5 17 96 e6 00 00 00 19
> 06 00 d6 85  .kD.............
> [  417.219527] ffff98b8b5edc030: 07 30 08 d0 00 40 00 20 00 00 00 00
> 00 00 20 00  .0...@. ...... .
> [  417.220203] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line
> 1393 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06aadd7
> [  417.220205] XFS (dm-0): Corruption of in-memory data detected.
> Shutting down filesystem
> [  417.220897] XFS (dm-0): Please umount the filesystem and rectify
> the problem(s)
>
> I would try to umount, but the mount command has hung. Ctrl-C does not
> kill it. I can see it in `ps aux`, but it’s not using any CPU. `kill
> -9 [pid]` does not kill the process. At this point I can only reboot
> to kill off the hung mount process.
>
> If I try to run xfs_repair before trying to mount it, it says:
>
> $ xfs_repair -l /dev/md/nvme2 /dev/volgrp4TB/lvol4TB
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
> Phase 2 - using external log on /dev/md/nvme2
>         - zero log...
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed.  Mount the filesystem to replay the log, and unmount it before
> re-running xfs_repair.  If you are unable to mount the filesystem, then use
> the -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
>
> If I try to run xfs_repair after trying to mount it, it says:
>
> $ xfs_repair -l /dev/md/nvme2 /dev/volgrp4TB/lvol4TB
> xfs_repair: cannot open /dev/volgrp4TB/lvol4TB: Device or resource busy
>
> I can see in arcconf that the array is rebuilding.
>
> I did the “echo w > /proc/sysrq-trigger” and captured the output of
> dmesg, attached to this email.
>
> Here’s most of the details that are requested at
> https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> $ uname -a
> Linux hostname1.localdomain 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7
> 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> $ xfs_repair -V
> xfs_repair version 4.5.0
>
> $ cat /proc/cpuinfo | grep processor | wc -l
> 56
>
> $ cat /proc/meminfo
> MemTotal:       263856296 kB
> MemFree:        261564216 kB
> MemAvailable:   261016076 kB
> Buffers:            4444 kB
> Cached:           183392 kB
> SwapCached:            0 kB
> Active:           109128 kB
> Inactive:         147332 kB
> Active(anon):      69304 kB
> Inactive(anon):     9992 kB
> Active(file):      39824 kB
> Inactive(file):   137340 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:       4194300 kB
> SwapFree:        4194300 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:         68780 kB
> Mapped:            29312 kB
> Shmem:             10568 kB
> Slab:             146228 kB
> SReclaimable:      44360 kB
> SUnreclaim:       101868 kB
> KernelStack:       10176 kB
> PageTables:         5712 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    136122448 kB
> Committed_AS:     370852 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      712828 kB
> VmallocChunk:   34224732156 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:     14336 kB
> CmaTotal:              0 kB
> CmaFree:               0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:      288568 kB
> DirectMap2M:     6940672 kB
> DirectMap1G:    263192576 kB
>
> $ cat /proc/partitions
> major minor  #blocks  name
>    8       16 1953514584 sdb
>    8       17 1895636992 sdb1
>    8       18   52428800 sdb2
>    8        0 1953514584 sda
>    8        1     204800 sda1
>    8        2    1047552 sda2
>    8        3 1895636992 sda3
>    8        4   52428800 sda4
>    8        5    4194304 sda5
>    9      127   52395008 md127
>    8       32 41023425112 sdc
>    8       48 41023425112 sdd
>    8       64 27348895576 sde
>    8       80 27348895576 sdf
>    8       96 27348895576 sdg
> 259        1  244198584 nvme0n1
> 259        4   61048832 nvme0n1p1
> 259        6   61049856 nvme0n1p2
> 259        8   61048832 nvme0n1p3
> 259        9   61049856 nvme0n1p4
> 259        0  244198584 nvme1n1
> 259        2   61048832 nvme1n1p1
> 259        3   61049856 nvme1n1p2
> 259        5   61048832 nvme1n1p3
> 259        7   61049856 nvme1n1p4
>    9      126 1895504896 md126
>    9      125   61016064 md125
>    9      124   61015040 md124
>    9      123   61016064 md123
>    9      122   61015040 md122
> 253        0 82046681088 dm-0
> 253        1 82046844928 dm-1
>
> $ xfs_info /dev/volgrp4TB/
> xfs_info: /dev/volgrp4TB/ is not a mounted XFS filesystem
>
> Bart
> 206-550-2606 cell, textable

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-08 19:43 ` Bart Brashers
@ 2020-03-08 22:26   ` Dave Chinner
  2020-03-09  1:32     ` Bart Brashers
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2020-03-08 22:26 UTC (permalink / raw)
  To: Bart Brashers; +Cc: linux-xfs

On Sun, Mar 08, 2020 at 12:43:29PM -0700, Bart Brashers wrote:
> An update:
> 
> Mounting the degraded xfs filesystem still hangs, so I can't replay
> the journal, so I don't yet want to run xfs_repair.

echo w > /proc/sysrq-trigger

and dump demsg to find where it is hung. If it is not hung and is
instead stuck in a loop, use 'echo l > /proc/sysrq-trigger'.

> I can mount the degraded xfs filesystem like this:
> 
> $ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2
> /dev/volgrp4TB/lvol4TB /export/lvol4TB/
> 
> If I do a "du" on the contents, I see 3822 files with either
> "Structure needs cleaning" or "No such file or directory".

TO be expected - you mounted an inconsistent filesystem image and
it's falling off the end of structures that are incomplete and
require recovery to make consistent.

> Is what I mounted what I would get if I used the xfs_repair -L option,
> and discarded the journal? Or would there be more corruption, e.g. to
> the directory structure?

Maybe. Maybe more, maybe less. Maybe.

> Some of the instances of "No such file or directory" are for files
> that are not in their correct directory - I can tell by the filetype
> and the directory name. Does that by itself imply directory
> corruption?

Maybe.

It also may imply log recovery has not been run and so things
like renames are not complete on disk, and recvoery would fix that.

But keep in mind your array had a triple disk failure, so there is
going to be -something- lost and not recoverable. That may well be
in the journal, at which point repair is your only option...

> At this point, can I do a backup, either using rsync or xfsdump or
> xfs_copy?

Do it any way you want.

> I have a separate RAID array on the same server where I
> could put the 7.8 TB of data, though the destination already has data
> on it - so I don't think xfs_copy is right. Is xfsdump to a directory
> faster/better than rsync? Or would it be best to use something like
> 
> $ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -)

Do it how ever you are confident the data gets copied reliably in
the face of filesystem traversal errors.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-08 22:26   ` Dave Chinner
@ 2020-03-09  1:32     ` Bart Brashers
  2020-03-11 23:11       ` Bart Brashers
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Brashers @ 2020-03-09  1:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Thanks Dave!

We had what I think was a power fluctuation, and several more drives
went offline in my JBOD. I had to power-cycle the JBOD to make them
show "online" again. I unmounted the arrays first, though.

After doing the "echo w > /proc/sysrq-trigger" I was able to mount the
problematic filesystem directly, no having to read dmesg output. If
that was due to the power cycling and forcing logicalvolumes to be
"optimal" (online) again, I don't know.

I was able to run xfs_repair on both filesystems, and have tons of
files in lost+found to parse now, but at least I have most of my data
back.

Thanks!

Bart


Bart
---
Bart Brashers
3039 NW 62nd St
Seattle WA 98107
206-789-1120 Home
425-412-1812 Work
206-550-2606 Mobile


On Sun, Mar 8, 2020 at 3:26 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sun, Mar 08, 2020 at 12:43:29PM -0700, Bart Brashers wrote:
> > An update:
> >
> > Mounting the degraded xfs filesystem still hangs, so I can't replay
> > the journal, so I don't yet want to run xfs_repair.
>
> echo w > /proc/sysrq-trigger
>
> and dump demsg to find where it is hung. If it is not hung and is
> instead stuck in a loop, use 'echo l > /proc/sysrq-trigger'.
>
> > I can mount the degraded xfs filesystem like this:
> >
> > $ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2
> > /dev/volgrp4TB/lvol4TB /export/lvol4TB/
> >
> > If I do a "du" on the contents, I see 3822 files with either
> > "Structure needs cleaning" or "No such file or directory".
>
> TO be expected - you mounted an inconsistent filesystem image and
> it's falling off the end of structures that are incomplete and
> require recovery to make consistent.
>
> > Is what I mounted what I would get if I used the xfs_repair -L option,
> > and discarded the journal? Or would there be more corruption, e.g. to
> > the directory structure?
>
> Maybe. Maybe more, maybe less. Maybe.
>
> > Some of the instances of "No such file or directory" are for files
> > that are not in their correct directory - I can tell by the filetype
> > and the directory name. Does that by itself imply directory
> > corruption?
>
> Maybe.
>
> It also may imply log recovery has not been run and so things
> like renames are not complete on disk, and recvoery would fix that.
>
> But keep in mind your array had a triple disk failure, so there is
> going to be -something- lost and not recoverable. That may well be
> in the journal, at which point repair is your only option...
>
> > At this point, can I do a backup, either using rsync or xfsdump or
> > xfs_copy?
>
> Do it any way you want.
>
> > I have a separate RAID array on the same server where I
> > could put the 7.8 TB of data, though the destination already has data
> > on it - so I don't think xfs_copy is right. Is xfsdump to a directory
> > faster/better than rsync? Or would it be best to use something like
> >
> > $ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -)
>
> Do it how ever you are confident the data gets copied reliably in
> the face of filesystem traversal errors.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-09  1:32     ` Bart Brashers
@ 2020-03-11 23:11       ` Bart Brashers
  2020-03-11 23:25         ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Brashers @ 2020-03-11 23:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

After working fine for 2 days, it happened again. Drives went offline
for no apparent reason, and a logicaldevice (as arcconf calls them)
failed. arcconf listed the hard drives as all online by the time I had
logged on.

The server connected to the JBOD had rebooted by the time I noticed the problem.

There are two xfs filesystems on this server. I can mount one of them,
and ran xfs_repair on it.

I first tried mounting the other read-only,no-recovery. That worked.
Trying to mount normally hangs. I see in ps aux | grep mount that it's
not using CPU. Here's the mount command I gave:

mount -t xfs -o inode64,logdev=/dev/md/nvme2 /dev/volgrp4TB/lvol4TB
/export/lvol4TB/

I did an echo w > /proc/sysrc-trigger while I was watching the
console, it said "SysRq : Show Blocked State". Here's what the output
of dmesg looks like, starting with that line. Then it gives blocks
about what's happening on each CPU, some of which mention "xfs".

[  228.927915] SysRq : Show Blocked State
[  228.928525]   task                        PC stack   pid father
[  228.928605] mount           D ffff96f79a553150     0 11341  11254 0x00000080
[  228.928609] Call Trace:
[  228.928617]  [<ffffffffb0b7f1c9>] schedule+0x29/0x70
[  228.928624]  [<ffffffffb0b7cb51>] schedule_timeout+0x221/0x2d0
[  228.928626]  [<ffffffffb0b7f57d>] wait_for_completion+0xfd/0x140
[  228.928633]  [<ffffffffb04da0b0>] ? wake_up_state+0x20/0x20
[  228.928667]  [<ffffffffc04c599e>] ? xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  228.928682]  [<ffffffffc04c3217>] xfs_buf_iowait+0x27/0xb0 [xfs]
[  228.928696]  [<ffffffffc04c599e>] xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  228.928712]  [<ffffffffc04f2a9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
[  228.928727]  [<ffffffffc04f2e59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
[  228.928742]  [<ffffffffc04f2ed1>] xlog_do_recover+0x31/0x180 [xfs]
[  228.928758]  [<ffffffffc04f3fef>] xlog_recover+0xbf/0x190 [xfs]
[  228.928772]  [<ffffffffc04e658f>] xfs_log_mount+0xff/0x310 [xfs]
[  228.928801]  [<ffffffffc04dd1b0>] xfs_mountfs+0x520/0x8e0 [xfs]
[  228.928814]  [<ffffffffc04e02a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
[  228.928818]  [<ffffffffb064c893>] mount_bdev+0x1b3/0x1f0
[  228.928831]  [<ffffffffc04dfe90>] ?
xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
[  228.928842]  [<ffffffffc04deaa5>] xfs_fs_mount+0x15/0x20 [xfs]
[  228.928845]  [<ffffffffb064d1fe>] mount_fs+0x3e/0x1b0
[  228.928850]  [<ffffffffb066b377>] vfs_kern_mount+0x67/0x110
[  228.928852]  [<ffffffffb066dacf>] do_mount+0x1ef/0xce0
[  228.928855]  [<ffffffffb064521a>] ? __check_object_size+0x1ca/0x250
[  228.928858]  [<ffffffffb062368c>] ? kmem_cache_alloc_trace+0x3c/0x200
[  228.928860]  [<ffffffffb066e903>] SyS_mount+0x83/0xd0
[  228.928863]  [<ffffffffb0b8bede>] system_call_fastpath+0x25/0x2a
[  228.928884] Sched Debug Version: v0.11, 3.10.0-1062.el7.x86_64 #1
[  228.928886] ktime                                   : 228605.351961
[  228.928887] sched_clk                               : 228928.883526
[  228.928888] cpu_clk                                 : 228928.883743
[  228.928889] jiffies                                 : 4294895902
[  228.928891] sched_clock_stable()                    : 1

[  228.928893] sysctl_sched
[  228.928894]   .sysctl_sched_latency                    : 24.000000
[  228.928896]   .sysctl_sched_min_granularity            : 10.000000
[  228.928897]   .sysctl_sched_wakeup_granularity         : 15.000000
[  228.928898]   .sysctl_sched_child_runs_first           : 0
[  228.928899]   .sysctl_sched_features                   : 56955
[  228.928900]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)

Every 120 seconds, it adds to dmesg:

[  241.320468] INFO: task mount:11341 blocked for more than 120 seconds.
[  241.321253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.321862] mount           D ffff96f79a553150     0 11341  11254 0x00000080
[  241.321866] Call Trace:
[  241.321873]  [<ffffffffb0b7f1c9>] schedule+0x29/0x70
[  241.321879]  [<ffffffffb0b7cb51>] schedule_timeout+0x221/0x2d0
[  241.321881]  [<ffffffffb0b7f57d>] wait_for_completion+0xfd/0x140
[  241.321887]  [<ffffffffb04da0b0>] ? wake_up_state+0x20/0x20
[  241.321931]  [<ffffffffc04c599e>] ? xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  241.321945]  [<ffffffffc04c3217>] xfs_buf_iowait+0x27/0xb0 [xfs]
[  241.321962]  [<ffffffffc04c599e>] xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  241.321976]  [<ffffffffc04f2a9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
[  241.321990]  [<ffffffffc04f2e59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
[  241.322003]  [<ffffffffc04f2ed1>] xlog_do_recover+0x31/0x180 [xfs]
[  241.322017]  [<ffffffffc04f3fef>] xlog_recover+0xbf/0x190 [xfs]
[  241.322030]  [<ffffffffc04e658f>] xfs_log_mount+0xff/0x310 [xfs]
[  241.322043]  [<ffffffffc04dd1b0>] xfs_mountfs+0x520/0x8e0 [xfs]
[  241.322057]  [<ffffffffc04e02a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
[  241.322064]  [<ffffffffb064c893>] mount_bdev+0x1b3/0x1f0
[  241.322077]  [<ffffffffc04dfe90>] ?
xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
[  241.322090]  [<ffffffffc04deaa5>] xfs_fs_mount+0x15/0x20 [xfs]
[  241.322092]  [<ffffffffb064d1fe>] mount_fs+0x3e/0x1b0
[  241.322095]  [<ffffffffb066b377>] vfs_kern_mount+0x67/0x110
[  241.322097]  [<ffffffffb066dacf>] do_mount+0x1ef/0xce0
[  241.322099]  [<ffffffffb064521a>] ? __check_object_size+0x1ca/0x250
[  241.322102]  [<ffffffffb062368c>] ? kmem_cache_alloc_trace+0x3c/0x200
[  241.322104]  [<ffffffffb066e903>] SyS_mount+0x83/0xd0
[  241.322107]  [<ffffffffb0b8bede>] system_call_fastpath+0x25/0x2a

Can anyone suggest what is causing mount to hang?

Bart

On Sun, Mar 8, 2020 at 6:32 PM Bart Brashers <bart.brashers@gmail.com> wrote:
>
> Thanks Dave!
>
> We had what I think was a power fluctuation, and several more drives
> went offline in my JBOD. I had to power-cycle the JBOD to make them
> show "online" again. I unmounted the arrays first, though.
>
> After doing the "echo w > /proc/sysrq-trigger" I was able to mount the
> problematic filesystem directly, no having to read dmesg output. If
> that was due to the power cycling and forcing logicalvolumes to be
> "optimal" (online) again, I don't know.
>
> I was able to run xfs_repair on both filesystems, and have tons of
> files in lost+found to parse now, but at least I have most of my data
> back.
>
> Thanks!
>
> Bart
>
>
> Bart
> ---
> Bart Brashers
> 3039 NW 62nd St
> Seattle WA 98107
> 206-789-1120 Home
> 425-412-1812 Work
> 206-550-2606 Mobile
>
>
> On Sun, Mar 8, 2020 at 3:26 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sun, Mar 08, 2020 at 12:43:29PM -0700, Bart Brashers wrote:
> > > An update:
> > >
> > > Mounting the degraded xfs filesystem still hangs, so I can't replay
> > > the journal, so I don't yet want to run xfs_repair.
> >
> > echo w > /proc/sysrq-trigger
> >
> > and dump demsg to find where it is hung. If it is not hung and is
> > instead stuck in a loop, use 'echo l > /proc/sysrq-trigger'.
> >
> > > I can mount the degraded xfs filesystem like this:
> > >
> > > $ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2
> > > /dev/volgrp4TB/lvol4TB /export/lvol4TB/
> > >
> > > If I do a "du" on the contents, I see 3822 files with either
> > > "Structure needs cleaning" or "No such file or directory".
> >
> > TO be expected - you mounted an inconsistent filesystem image and
> > it's falling off the end of structures that are incomplete and
> > require recovery to make consistent.
> >
> > > Is what I mounted what I would get if I used the xfs_repair -L option,
> > > and discarded the journal? Or would there be more corruption, e.g. to
> > > the directory structure?
> >
> > Maybe. Maybe more, maybe less. Maybe.
> >
> > > Some of the instances of "No such file or directory" are for files
> > > that are not in their correct directory - I can tell by the filetype
> > > and the directory name. Does that by itself imply directory
> > > corruption?
> >
> > Maybe.
> >
> > It also may imply log recovery has not been run and so things
> > like renames are not complete on disk, and recvoery would fix that.
> >
> > But keep in mind your array had a triple disk failure, so there is
> > going to be -something- lost and not recoverable. That may well be
> > in the journal, at which point repair is your only option...
> >
> > > At this point, can I do a backup, either using rsync or xfsdump or
> > > xfs_copy?
> >
> > Do it any way you want.
> >
> > > I have a separate RAID array on the same server where I
> > > could put the 7.8 TB of data, though the destination already has data
> > > on it - so I don't think xfs_copy is right. Is xfsdump to a directory
> > > faster/better than rsync? Or would it be best to use something like
> > >
> > > $ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -)
> >
> > Do it how ever you are confident the data gets copied reliably in
> > the face of filesystem traversal errors.
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-11 23:11       ` Bart Brashers
@ 2020-03-11 23:25         ` Dave Chinner
  2020-03-11 23:27           ` Bart Brashers
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2020-03-11 23:25 UTC (permalink / raw)
  To: Bart Brashers; +Cc: linux-xfs

On Wed, Mar 11, 2020 at 04:11:27PM -0700, Bart Brashers wrote:
> After working fine for 2 days, it happened again. Drives went offline
> for no apparent reason, and a logicaldevice (as arcconf calls them)
> failed. arcconf listed the hard drives as all online by the time I had
> logged on.
> 
> The server connected to the JBOD had rebooted by the time I noticed the problem.
> 
> There are two xfs filesystems on this server. I can mount one of them,
> and ran xfs_repair on it.
> 
> I first tried mounting the other read-only,no-recovery. That worked.
> Trying to mount normally hangs. I see in ps aux | grep mount that it's
> not using CPU. Here's the mount command I gave:
> 
> mount -t xfs -o inode64,logdev=/dev/md/nvme2 /dev/volgrp4TB/lvol4TB
> /export/lvol4TB/
> 
> I did an echo w > /proc/sysrc-trigger while I was watching the
> console, it said "SysRq : Show Blocked State". Here's what the output
> of dmesg looks like, starting with that line. Then it gives blocks
> about what's happening on each CPU, some of which mention "xfs".
> 
> [  228.927915] SysRq : Show Blocked State
> [  228.928525]   task                        PC stack   pid father
> [  228.928605] mount           D ffff96f79a553150     0 11341  11254 0x00000080
> [  228.928609] Call Trace:
> [  228.928617]  [<ffffffffb0b7f1c9>] schedule+0x29/0x70
> [  228.928624]  [<ffffffffb0b7cb51>] schedule_timeout+0x221/0x2d0
> [  228.928626]  [<ffffffffb0b7f57d>] wait_for_completion+0xfd/0x140
> [  228.928633]  [<ffffffffb04da0b0>] ? wake_up_state+0x20/0x20
> [  228.928667]  [<ffffffffc04c599e>] ? xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
> [  228.928682]  [<ffffffffc04c3217>] xfs_buf_iowait+0x27/0xb0 [xfs]
> [  228.928696]  [<ffffffffc04c599e>] xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
> [  228.928712]  [<ffffffffc04f2a9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
> [  228.928727]  [<ffffffffc04f2e59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
> [  228.928742]  [<ffffffffc04f2ed1>] xlog_do_recover+0x31/0x180 [xfs]
> [  228.928758]  [<ffffffffc04f3fef>] xlog_recover+0xbf/0x190 [xfs]
> [  228.928772]  [<ffffffffc04e658f>] xfs_log_mount+0xff/0x310 [xfs]
> [  228.928801]  [<ffffffffc04dd1b0>] xfs_mountfs+0x520/0x8e0 [xfs]
> [  228.928814]  [<ffffffffc04e02a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
> [  228.928818]  [<ffffffffb064c893>] mount_bdev+0x1b3/0x1f0
> [  228.928831]  [<ffffffffc04dfe90>] ?
> xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
> [  228.928842]  [<ffffffffc04deaa5>] xfs_fs_mount+0x15/0x20 [xfs]
> [  228.928845]  [<ffffffffb064d1fe>] mount_fs+0x3e/0x1b0
> [  228.928850]  [<ffffffffb066b377>] vfs_kern_mount+0x67/0x110
> [  228.928852]  [<ffffffffb066dacf>] do_mount+0x1ef/0xce0
> [  228.928855]  [<ffffffffb064521a>] ? __check_object_size+0x1ca/0x250
> [  228.928858]  [<ffffffffb062368c>] ? kmem_cache_alloc_trace+0x3c/0x200
> [  228.928860]  [<ffffffffb066e903>] SyS_mount+0x83/0xd0
> [  228.928863]  [<ffffffffb0b8bede>] system_call_fastpath+0x25/0x2a

It's waiting for the metadata writes for recovered changes to
complete. This implies the underlying device is either hung or it
extremely slow. I'd suggest "extremely slow" because it's doing it's
own internal rebuild and may well be blocking new writes until it
has recovered the regions the new writes are being directed at...

This all looks like HW raid controller problems, nothign to do with
linux or the filesystem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-11 23:25         ` Dave Chinner
@ 2020-03-11 23:27           ` Bart Brashers
  2020-03-12  5:45             ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Brashers @ 2020-03-11 23:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

When this happened over the weekend, I left the mount running from
10pm till 7am, with no apparent progress.

That implies the HW raid controller card gets in some sort of hung
state, doesn't it?

Bart

On Wed, Mar 11, 2020 at 4:25 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Mar 11, 2020 at 04:11:27PM -0700, Bart Brashers wrote:
> > After working fine for 2 days, it happened again. Drives went offline
> > for no apparent reason, and a logicaldevice (as arcconf calls them)
> > failed. arcconf listed the hard drives as all online by the time I had
> > logged on.
> >
> > The server connected to the JBOD had rebooted by the time I noticed the problem.
> >
> > There are two xfs filesystems on this server. I can mount one of them,
> > and ran xfs_repair on it.
> >
> > I first tried mounting the other read-only,no-recovery. That worked.
> > Trying to mount normally hangs. I see in ps aux | grep mount that it's
> > not using CPU. Here's the mount command I gave:
> >
> > mount -t xfs -o inode64,logdev=/dev/md/nvme2 /dev/volgrp4TB/lvol4TB
> > /export/lvol4TB/
> >
> > I did an echo w > /proc/sysrc-trigger while I was watching the
> > console, it said "SysRq : Show Blocked State". Here's what the output
> > of dmesg looks like, starting with that line. Then it gives blocks
> > about what's happening on each CPU, some of which mention "xfs".
> >
> > [  228.927915] SysRq : Show Blocked State
> > [  228.928525]   task                        PC stack   pid father
> > [  228.928605] mount           D ffff96f79a553150     0 11341  11254 0x00000080
> > [  228.928609] Call Trace:
> > [  228.928617]  [<ffffffffb0b7f1c9>] schedule+0x29/0x70
> > [  228.928624]  [<ffffffffb0b7cb51>] schedule_timeout+0x221/0x2d0
> > [  228.928626]  [<ffffffffb0b7f57d>] wait_for_completion+0xfd/0x140
> > [  228.928633]  [<ffffffffb04da0b0>] ? wake_up_state+0x20/0x20
> > [  228.928667]  [<ffffffffc04c599e>] ? xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
> > [  228.928682]  [<ffffffffc04c3217>] xfs_buf_iowait+0x27/0xb0 [xfs]
> > [  228.928696]  [<ffffffffc04c599e>] xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
> > [  228.928712]  [<ffffffffc04f2a9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
> > [  228.928727]  [<ffffffffc04f2e59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
> > [  228.928742]  [<ffffffffc04f2ed1>] xlog_do_recover+0x31/0x180 [xfs]
> > [  228.928758]  [<ffffffffc04f3fef>] xlog_recover+0xbf/0x190 [xfs]
> > [  228.928772]  [<ffffffffc04e658f>] xfs_log_mount+0xff/0x310 [xfs]
> > [  228.928801]  [<ffffffffc04dd1b0>] xfs_mountfs+0x520/0x8e0 [xfs]
> > [  228.928814]  [<ffffffffc04e02a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
> > [  228.928818]  [<ffffffffb064c893>] mount_bdev+0x1b3/0x1f0
> > [  228.928831]  [<ffffffffc04dfe90>] ?
> > xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
> > [  228.928842]  [<ffffffffc04deaa5>] xfs_fs_mount+0x15/0x20 [xfs]
> > [  228.928845]  [<ffffffffb064d1fe>] mount_fs+0x3e/0x1b0
> > [  228.928850]  [<ffffffffb066b377>] vfs_kern_mount+0x67/0x110
> > [  228.928852]  [<ffffffffb066dacf>] do_mount+0x1ef/0xce0
> > [  228.928855]  [<ffffffffb064521a>] ? __check_object_size+0x1ca/0x250
> > [  228.928858]  [<ffffffffb062368c>] ? kmem_cache_alloc_trace+0x3c/0x200
> > [  228.928860]  [<ffffffffb066e903>] SyS_mount+0x83/0xd0
> > [  228.928863]  [<ffffffffb0b8bede>] system_call_fastpath+0x25/0x2a
>
> It's waiting for the metadata writes for recovered changes to
> complete. This implies the underlying device is either hung or it
> extremely slow. I'd suggest "extremely slow" because it's doing it's
> own internal rebuild and may well be blocking new writes until it
> has recovered the regions the new writes are being directed at...
>
> This all looks like HW raid controller problems, nothign to do with
> linux or the filesystem.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mount before xfs_repair hangs
  2020-03-11 23:27           ` Bart Brashers
@ 2020-03-12  5:45             ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2020-03-12  5:45 UTC (permalink / raw)
  To: Bart Brashers; +Cc: linux-xfs

On Wed, Mar 11, 2020 at 04:27:27PM -0700, Bart Brashers wrote:
> When this happened over the weekend, I left the mount running from
> 10pm till 7am, with no apparent progress.
> 
> That implies the HW raid controller card gets in some sort of hung
> state, doesn't it?

Yes, that seems quite probable from the evidence presented. Can you
query the raid controller when the mount is in this state?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-12  5:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-07 20:36 mount before xfs_repair hangs Bart Brashers
2020-03-08 19:43 ` Bart Brashers
2020-03-08 22:26   ` Dave Chinner
2020-03-09  1:32     ` Bart Brashers
2020-03-11 23:11       ` Bart Brashers
2020-03-11 23:25         ` Dave Chinner
2020-03-11 23:27           ` Bart Brashers
2020-03-12  5:45             ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.