* is it safe to xfs_repair this volume? do i have a different first step? @ 2019-02-07 13:25 David T-G 2019-02-07 14:52 ` Brian Foster 2019-02-08 18:40 ` Chris Murphy 0 siblings, 2 replies; 6+ messages in thread From: David T-G @ 2019-02-07 13:25 UTC (permalink / raw) To: Linux-XFS list Good morning! I have a four-disk RAID5 volume with an ~11T filesystem that suddenly won't mount diskfarm:root:4:~> mount -v /mnt/4Traid5md/ mount: mount /dev/md0p1 on /mnt/4Traid5md failed: Bad message after a power outage :-( Because of the GPT errors I see diskfarm:root:4:~> fdisk -l /dev/md0 The backup GPT table is corrupt, but the primary appears OK, so that will be used. Disk /dev/md0: 10.9 TiB, 12001551581184 bytes, 23440530432 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 524288 bytes / 1572864 bytes Disklabel type: gpt Disk identifier: 8D29E2FB-1A26-4C46-B284-99FA7163B89D Device Start End Sectors Size Type /dev/md0p1 2048 23440530398 23440528351 10.9T Linux filesystem diskfarm:root:4:~> parted /dev/md0 print Error: end of file while reading /dev/md0 Retry/Ignore/Cancel? ignore Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used. OK/Cancel? ok Model: Linux Software RAID Array (md) Disk /dev/md0: 12.0TB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 12.0TB 12.0TB xfs Linux filesystem when poking, I at first thought that this was a RAID issue, but all of the md reports look good and apparently the GPT table issue is common, so I'll leave all of that out unless someone asks for it. dmesg reports some XFS problems diskfarm:root:5:~> dmesg | egrep 'md[:/0]' [ 117.999012] md/raid:md127: device sdg2 operational as raid disk 1 [ 117.999014] md/raid:md127: device sdh2 operational as raid disk 2 [ 117.999015] md/raid:md127: device sdd2 operational as raid disk 0 [ 117.999246] md/raid:md127: raid level 5 active with 3 out of 3 devices, algorithm 2 [ 120.820661] md/raid:md0: not clean -- starting background reconstruction [ 120.821279] md/raid:md0: device sdf1 operational as raid disk 2 [ 120.821282] md/raid:md0: device sda1 operational as raid disk 3 [ 120.821283] md/raid:md0: device sdb1 operational as raid disk 0 [ 120.821284] md/raid:md0: device sde1 operational as raid disk 1 [ 120.822028] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2 [ 120.822063] md0: detected capacity change from 0 to 12001551581184 [ 120.888841] md0: p1 [ 202.230961] XFS (md0p1): Mounting V4 Filesystem [ 203.182567] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8. [ 203.367581] XFS (md0p1): failed to locate log tail [ 203.367587] XFS (md0p1): log mount/recovery failed: error -74 [ 203.367712] XFS (md0p1): log mount failed [ 285.893728] XFS (md0p1): Mounting V4 Filesystem [ 286.057829] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8. [ 286.203436] XFS (md0p1): failed to locate log tail [ 286.203440] XFS (md0p1): log mount/recovery failed: error -74 [ 286.203497] XFS (md0p1): log mount failed but doesn't tell me a whole lot -- or at least not a whole lot that makes enough sense to me :-) I tried an xfs_repair dry run and here diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md 2>&1 | egrep -v 'agno = ' Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... sb_fdblocks 471930978, counted 471939170 - 09:18:47: scanning filesystem freespace - 48 of 48 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - 09:18:47: scanning agi unlinked lists - 48 of 48 allocation groups done - process known inodes and perform inode discovery... - 09:24:17: process known inodes and inode discovery - 4466560 of 4466560 inodes done - process newly discovered inodes... - 09:24:17: process newly discovered inodes - 48 of 48 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 09:24:17: setting up duplicate extent list - 48 of 48 allocation groups done - check for inodes claiming duplicate blocks... - 09:29:44: check for inodes claiming duplicate blocks - 4466560 of 4466560 inodes done No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... - 09:34:02: verify and correct link counts - 48 of 48 allocation groups done No modify flag set, skipping filesystem flush and exiting. is the trimmed output that can fit on one screen. Since I don't have a second copy of all of this data, I'm a bit nervous about pulling the trigger to write changes and want to make sure that I take the right steps! How should I proceed? I'm not subscribed to this list, so please do cc/bcc me on your replies. I didn't see any other lists and did see some discussion here, so I hope that I'm in the right place, but please feel free also to point me in another direction if that's better. TIA & HAND :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: is it safe to xfs_repair this volume? do i have a different first step? 2019-02-07 13:25 is it safe to xfs_repair this volume? do i have a different first step? David T-G @ 2019-02-07 14:52 ` Brian Foster 2019-02-08 2:25 ` David T-G 2019-02-08 18:40 ` Chris Murphy 1 sibling, 1 reply; 6+ messages in thread From: Brian Foster @ 2019-02-07 14:52 UTC (permalink / raw) To: David T-G; +Cc: Linux-XFS list On Thu, Feb 07, 2019 at 08:25:34AM -0500, David T-G wrote: > Good morning! > > I have a four-disk RAID5 volume with an ~11T filesystem that suddenly > won't mount > > diskfarm:root:4:~> mount -v /mnt/4Traid5md/ > mount: mount /dev/md0p1 on /mnt/4Traid5md failed: Bad message > > after a power outage :-( Because of the GPT errors I see > > diskfarm:root:4:~> fdisk -l /dev/md0 > The backup GPT table is corrupt, but the primary appears OK, so that will be used. > Disk /dev/md0: 10.9 TiB, 12001551581184 bytes, 23440530432 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 524288 bytes / 1572864 bytes > Disklabel type: gpt > Disk identifier: 8D29E2FB-1A26-4C46-B284-99FA7163B89D > > Device Start End Sectors Size Type > /dev/md0p1 2048 23440530398 23440528351 10.9T Linux filesystem > > diskfarm:root:4:~> parted /dev/md0 print > Error: end of file while reading /dev/md0 > Retry/Ignore/Cancel? ignore > Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used. > OK/Cancel? ok > Model: Linux Software RAID Array (md) > Disk /dev/md0: 12.0TB > Sector size (logical/physical): 512B/4096B > Partition Table: gpt > Disk Flags: > > Number Start End Size File system Name Flags > 1 1049kB 12.0TB 12.0TB xfs Linux filesystem > > when poking, I at first thought that this was a RAID issue, but all of > the md reports look good and apparently the GPT table issue is common, so > I'll leave all of that out unless someone asks for it. > I'd be curious if the MD metadata format contends with GPT metadata. Is the above something you've ever tried before running into this problem and thus can confirm whether it preexisted the mount problem or not? If not, I'd suggest some more investigation into this before you make any future partition or raid changes to this storage. I thought there were different MD formats to accommodate precisely this sort of incompatibility, but I don't know for sure. linux-raid is probably more of a help here. > dmesg reports some XFS problems > > diskfarm:root:5:~> dmesg | egrep 'md[:/0]' > [ 117.999012] md/raid:md127: device sdg2 operational as raid disk 1 > [ 117.999014] md/raid:md127: device sdh2 operational as raid disk 2 > [ 117.999015] md/raid:md127: device sdd2 operational as raid disk 0 > [ 117.999246] md/raid:md127: raid level 5 active with 3 out of 3 devices, algorithm 2 > [ 120.820661] md/raid:md0: not clean -- starting background reconstruction > [ 120.821279] md/raid:md0: device sdf1 operational as raid disk 2 > [ 120.821282] md/raid:md0: device sda1 operational as raid disk 3 > [ 120.821283] md/raid:md0: device sdb1 operational as raid disk 0 > [ 120.821284] md/raid:md0: device sde1 operational as raid disk 1 > [ 120.822028] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2 > [ 120.822063] md0: detected capacity change from 0 to 12001551581184 > [ 120.888841] md0: p1 > [ 202.230961] XFS (md0p1): Mounting V4 Filesystem > [ 203.182567] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8. > [ 203.367581] XFS (md0p1): failed to locate log tail > [ 203.367587] XFS (md0p1): log mount/recovery failed: error -74 > [ 203.367712] XFS (md0p1): log mount failed > [ 285.893728] XFS (md0p1): Mounting V4 Filesystem > [ 286.057829] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8. > [ 286.203436] XFS (md0p1): failed to locate log tail > [ 286.203440] XFS (md0p1): log mount/recovery failed: error -74 > [ 286.203497] XFS (md0p1): log mount failed > > but doesn't tell me a whole lot -- or at least not a whole lot that makes > enough sense to me :-) I tried an xfs_repair dry run and here > Hmm. So part of the on-disk log is invalid. We attempt to deal with this problem by truncating off the rest of the log after the point of the corruption, but this apparently removes too much to perform a recovery. I'd guess that the torn write is due to interleaving log writes across raid devices or something, but we can't really tell from just this. > diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md 2>&1 | egrep -v 'agno = ' > Phase 1 - find and verify superblock... > - reporting progress in intervals of 15 minutes > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > sb_fdblocks 471930978, counted 471939170 The above said, the corruption here looks extremely minor. You basically have an accounting mismatch between what the superblock says is available for free space and what xfs_repair actually found via its scans and not much else going on. > - 09:18:47: scanning filesystem freespace - 48 of 48 allocation groups done > - found root inode chunk > Phase 3 - for each AG... > - scan (but don't clear) agi unlinked lists... > - 09:18:47: scanning agi unlinked lists - 48 of 48 allocation groups done > - process known inodes and perform inode discovery... > - 09:24:17: process known inodes and inode discovery - 4466560 of 4466560 inodes done > - process newly discovered inodes... > - 09:24:17: process newly discovered inodes - 48 of 48 allocation groups done > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - 09:24:17: setting up duplicate extent list - 48 of 48 allocation groups done > - check for inodes claiming duplicate blocks... > - 09:29:44: check for inodes claiming duplicate blocks - 4466560 of 4466560 inodes done > No modify flag set, skipping phase 5 > Phase 6 - check inode connectivity... > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify link counts... > - 09:34:02: verify and correct link counts - 48 of 48 allocation groups done > No modify flag set, skipping filesystem flush and exiting. > > is the trimmed output that can fit on one screen. Since I don't have a > second copy of all of this data, I'm a bit nervous about pulling the > trigger to write changes and want to make sure that I take the right > steps! How should I proceed? > What do you mean by trimmed output? Was there more output from xfs_repair that is not shown here? In general, if you're concerned about what xfs_repair might do to a particular filesystem you can always do a normal xfs_repair run against a metadump of the filesystem before the original copy. Collect a metadump of the fs: xfs_metadump -go <dev> <outputmdimg> Note that the metadump collects everything except file data so it will require a decent amount of space depending on how much metadata populates your fs vs. data. Then restore the metadump to a sparse file (on some other filesystem/storage): xfs_mdrestore -g <mdfile> <sparsefiletarget> Then you can mount/xfs_repair the restored sparse image, see what xfs_repair does, mount the before/after img, etc. Note again that file data is absent from the restored metadata image so don't expect to be able to look at file content in the metadump image. Brian > I'm not subscribed to this list, so please do cc/bcc me on your replies. > I didn't see any other lists and did see some discussion here, so I hope > that I'm in the right place, but please feel free also to point me in > another direction if that's better. > > > TIA & HAND > > :-D > -- > David T-G > See http://justpickone.org/davidtg/email/ > See http://justpickone.org/davidtg/tofu.txt > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: is it safe to xfs_repair this volume? do i have a different first step? 2019-02-07 14:52 ` Brian Foster @ 2019-02-08 2:25 ` David T-G 2019-02-08 13:00 ` Brian Foster 2019-02-08 19:45 ` Chris Murphy 0 siblings, 2 replies; 6+ messages in thread From: David T-G @ 2019-02-08 2:25 UTC (permalink / raw) To: Linux-XFS list Brian, et al -- ...and then Brian Foster said... % % On Thu, Feb 07, 2019 at 08:25:34AM -0500, David T-G wrote: % > % > I have a four-disk RAID5 volume with an ~11T filesystem that suddenly % > won't mount ... % > when poking, I at first thought that this was a RAID issue, but all of % > the md reports look good and apparently the GPT table issue is common, so % > I'll leave all of that out unless someone asks for it. % % I'd be curious if the MD metadata format contends with GPT metadata. Is % the above something you've ever tried before running into this problem % and thus can confirm whether it preexisted the mount problem or not? There's a lot I don't know, so it's quite possible that it doesn't line up. Here's what mdadm tells me: diskfarm:root:6:~> mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Feb 6 00:56:35 2017 Raid Level : raid5 Array Size : 11720265216 (11177.32 GiB 12001.55 GB) Used Dev Size : 3906755072 (3725.77 GiB 4000.52 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Fri Jan 25 03:32:18 2019 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : diskfarm:0 (local to host diskfarm) UUID : ca7008ef:90693dae:6c231ad7:08b3f92d Events : 48211 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 65 1 active sync /dev/sde1 3 8 81 2 active sync /dev/sdf1 4 8 1 3 active sync /dev/sda1 diskfarm:root:6:~> diskfarm:root:6:~> for D in a1 b1 e1 f1 ; do mdadm --examine /dev/sd$D | egrep "$D|Role|State|Checksum|Events" ; done /dev/sda1: State : clean Device UUID : f05a143b:50c9b024:36714b9a:44b6a159 Checksum : 4561f58b - correct Events : 48211 Device Role : Active device 3 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdb1: State : clean Checksum : 4654df78 - correct Events : 48211 Device Role : Active device 0 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde1: State : clean Checksum : c4ec7cb6 - correct Events : 48211 Device Role : Active device 1 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf1: State : clean Checksum : 349cf800 - correct Events : 48211 Device Role : Active device 2 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) Does that set off any alarms for you? % % If not, I'd suggest some more investigation into this before you make % any future partition or raid changes to this storage. I thought there % were different MD formats to accommodate precisely this sort of % incompatibility, but I don't know for sure. linux-raid is probably more % of a help here. Thanks :-) I have no plans to partition, but I will eventually want to grow it, so I'll definitely have to check on that. % % > dmesg reports some XFS problems % > % > diskfarm:root:5:~> dmesg | egrep 'md[:/0]' ... % > [ 202.230961] XFS (md0p1): Mounting V4 Filesystem % > [ 203.182567] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8. % > [ 203.367581] XFS (md0p1): failed to locate log tail % > [ 203.367587] XFS (md0p1): log mount/recovery failed: error -74 % > [ 203.367712] XFS (md0p1): log mount failed ... % % Hmm. So part of the on-disk log is invalid. We attempt to deal with this ... % I'd guess that the torn write is due to interleaving log writes across % raid devices or something, but we can't really tell from just this. The filesystem *shouldn't* see that there are distinct devices under there, since that's handled by the md driver, but there's STILL a lot that I don't know :-) % % > diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md 2>&1 | egrep -v 'agno = ' ... % > - scan filesystem freespace and inode maps... % > sb_fdblocks 471930978, counted 471939170 % % The above said, the corruption here looks extremely minor. You basically ... % scans and not much else going on. That sounds hopeful! :-) % % > - 09:18:47: scanning filesystem freespace - 48 of 48 allocation groups done ... % > Phase 7 - verify link counts... % > - 09:34:02: verify and correct link counts - 48 of 48 allocation groups done % > No modify flag set, skipping filesystem flush and exiting. % > % > is the trimmed output that can fit on one screen. Since I don't have a ... % % What do you mean by trimmed output? Was there more output from % xfs_repair that is not shown here? Yes. Note the | egrep -v 'agno = ' on the command line above. The full output diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md >/tmp/xfs_repair.out 2>&1 diskfarm:root:4:~> wc -l /tmp/xfs_repair.out 124 /tmp/xfs_repair.out was quite long. Shall I attach that file or post a link? % % In general, if you're concerned about what xfs_repair might do to a % particular filesystem you can always do a normal xfs_repair run against % a metadump of the filesystem before the original copy. Collect a % metadump of the fs: % % xfs_metadump -go <dev> <outputmdimg> Hey, cool! I like that :-) It generated a sizeable output file diskfarm:root:8:~> xfs_metadump /dev/disk/by-label/4Traid5md /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out >/mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr 2>&1 diskfarm:root:8:~> ls -goh /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out -rw-r--r-- 1 3.5G Feb 7 17:57 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out diskfarm:root:8:~> wc -l /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr 239 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr as well as quite a few errors. Here diskfarm:root:8:~> head /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot init perag data (5). Continuing anyway. xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read dir2 block 39/132863 (2617378559) xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read dir2 block 41/11461784 (2762925208) xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read dir2 block 41/4237562 (2755700986) xfs_metadump: error - read only 0 of 4096 bytes diskfarm:root:8:~> tail /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read superblock for ag 47 xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read agf block for ag 47 xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read agi block for ag 47 xfs_metadump: error - read only 0 of 4096 bytes xfs_metadump: cannot read agfl block for ag 47 xfs_metadump: Filesystem log is dirty; image will contain unobfuscated metadata in log. cache_purge: shake on cache 0x4ee1c0 left 117 nodes!? is a glance at the contents. Should I post/paste the full copy? % % Note that the metadump collects everything except file data so it will % require a decent amount of space depending on how much metadata % populates your fs vs. data. % % Then restore the metadump to a sparse file (on some other % filesystem/storage): % % xfs_mdrestore -g <mdfile> <sparsefiletarget> I tried this diskfarm:root:11:~> dd if=/dev/zero bs=1 count=0 seek=4G of=/mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso 0+0 records in 0+0 records out 0 bytes copied, 6.7252e-05 s, 0.0 kB/s diskfarm:root:11:~> ls -goh /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso -rw-r--r-- 1 4.0G Feb 7 21:15 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso diskfarm:root:11:~> xfs_mdrestore /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso xfs_mdrestore: cannot set filesystem image size: File too large and got an error :-( Should a 4G file be large enough for a 3.5G metadata dump? % % Then you can mount/xfs_repair the restored sparse image, see what % xfs_repair does, mount the before/after img, etc. Note again that file % data is absent from the restored metadata image so don't expect to be % able to look at file content in the metadump image. Right. That sounds like a great middle step, though. Thanks! % % Brian HAND :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: is it safe to xfs_repair this volume? do i have a different first step? 2019-02-08 2:25 ` David T-G @ 2019-02-08 13:00 ` Brian Foster 2019-02-08 19:45 ` Chris Murphy 1 sibling, 0 replies; 6+ messages in thread From: Brian Foster @ 2019-02-08 13:00 UTC (permalink / raw) To: David T-G; +Cc: Linux-XFS list On Thu, Feb 07, 2019 at 09:25:13PM -0500, David T-G wrote: > Brian, et al -- > > ...and then Brian Foster said... > % > % On Thu, Feb 07, 2019 at 08:25:34AM -0500, David T-G wrote: > % > > % > I have a four-disk RAID5 volume with an ~11T filesystem that suddenly > % > won't mount > ... > % > when poking, I at first thought that this was a RAID issue, but all of > % > the md reports look good and apparently the GPT table issue is common, so > % > I'll leave all of that out unless someone asks for it. > % > % I'd be curious if the MD metadata format contends with GPT metadata. Is > % the above something you've ever tried before running into this problem > % and thus can confirm whether it preexisted the mount problem or not? > > There's a lot I don't know, so it's quite possible that it doesn't line > up. Here's what mdadm tells me: > > diskfarm:root:6:~> mdadm --detail /dev/md0 > /dev/md0: > Version : 1.2 > Creation Time : Mon Feb 6 00:56:35 2017 > Raid Level : raid5 > Array Size : 11720265216 (11177.32 GiB 12001.55 GB) > Used Dev Size : 3906755072 (3725.77 GiB 4000.52 GB) > Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Fri Jan 25 03:32:18 2019 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Name : diskfarm:0 (local to host diskfarm) > UUID : ca7008ef:90693dae:6c231ad7:08b3f92d > Events : 48211 > > Number Major Minor RaidDevice State > 0 8 17 0 active sync /dev/sdb1 > 1 8 65 1 active sync /dev/sde1 > 3 8 81 2 active sync /dev/sdf1 > 4 8 1 3 active sync /dev/sda1 > diskfarm:root:6:~> > diskfarm:root:6:~> for D in a1 b1 e1 f1 ; do mdadm --examine /dev/sd$D | egrep "$D|Role|State|Checksum|Events" ; done > /dev/sda1: > State : clean > Device UUID : f05a143b:50c9b024:36714b9a:44b6a159 > Checksum : 4561f58b - correct > Events : 48211 > Device Role : Active device 3 > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdb1: > State : clean > Checksum : 4654df78 - correct > Events : 48211 > Device Role : Active device 0 > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sde1: > State : clean > Checksum : c4ec7cb6 - correct > Events : 48211 > Device Role : Active device 1 > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdf1: > State : clean > Checksum : 349cf800 - correct > Events : 48211 > Device Role : Active device 2 > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) > > Does that set off any alarms for you? > It looks normal to me, but I'm not an MD person. I also don't think an MD format / GPT format conflict is something that mdadm will show. It not appear until/unless you change the geometry on one side or the other. Again, I'd strongly suggest to validate your configuration with linux-raid before making any such changes. > > % > % If not, I'd suggest some more investigation into this before you make > % any future partition or raid changes to this storage. I thought there > % were different MD formats to accommodate precisely this sort of > % incompatibility, but I don't know for sure. linux-raid is probably more > % of a help here. > > Thanks :-) I have no plans to partition, but I will eventually want to > grow it, so I'll definitely have to check on that. > > > % > % > dmesg reports some XFS problems > % > > % > diskfarm:root:5:~> dmesg | egrep 'md[:/0]' > ... > % > [ 202.230961] XFS (md0p1): Mounting V4 Filesystem > % > [ 203.182567] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8. > % > [ 203.367581] XFS (md0p1): failed to locate log tail > % > [ 203.367587] XFS (md0p1): log mount/recovery failed: error -74 > % > [ 203.367712] XFS (md0p1): log mount failed > ... > % > % Hmm. So part of the on-disk log is invalid. We attempt to deal with this > ... > % I'd guess that the torn write is due to interleaving log writes across > % raid devices or something, but we can't really tell from just this. > > The filesystem *shouldn't* see that there are distinct devices under > there, since that's handled by the md driver, but there's STILL a lot > that I don't know :-) > It doesn't see multiple devices, but a contiguous range of filesystem blocks (such as the fs log) that happen to map to multiple physical devices by underlying storage layer. > > % > % > diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md 2>&1 | egrep -v 'agno = ' > ... > % > - scan filesystem freespace and inode maps... > % > sb_fdblocks 471930978, counted 471939170 > % > % The above said, the corruption here looks extremely minor. You basically > ... > % scans and not much else going on. > > That sounds hopeful! :-) > > > % > % > - 09:18:47: scanning filesystem freespace - 48 of 48 allocation groups done > ... > % > Phase 7 - verify link counts... > % > - 09:34:02: verify and correct link counts - 48 of 48 allocation groups done > % > No modify flag set, skipping filesystem flush and exiting. > % > > % > is the trimmed output that can fit on one screen. Since I don't have a > ... > % > % What do you mean by trimmed output? Was there more output from > % xfs_repair that is not shown here? > > Yes. Note the > > | egrep -v 'agno = ' > > on the command line above. The full output > > diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md >/tmp/xfs_repair.out 2>&1 > diskfarm:root:4:~> wc -l /tmp/xfs_repair.out > 124 /tmp/xfs_repair.out > > was quite long. Shall I attach that file or post a link? > Please post the full repair output. > > % > % In general, if you're concerned about what xfs_repair might do to a > % particular filesystem you can always do a normal xfs_repair run against > % a metadump of the filesystem before the original copy. Collect a > % metadump of the fs: > % > % xfs_metadump -go <dev> <outputmdimg> > > Hey, cool! I like that :-) It generated a sizeable output file > > diskfarm:root:8:~> xfs_metadump /dev/disk/by-label/4Traid5md /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out >/mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr 2>&1 > diskfarm:root:8:~> ls -goh /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out > -rw-r--r-- 1 3.5G Feb 7 17:57 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out > diskfarm:root:8:~> wc -l /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr > 239 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr > > as well as quite a few errors. Here > > diskfarm:root:8:~> head /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot init perag data (5). Continuing anyway. > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read dir2 block 39/132863 (2617378559) > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read dir2 block 41/11461784 (2762925208) > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read dir2 block 41/4237562 (2755700986) > xfs_metadump: error - read only 0 of 4096 bytes > > diskfarm:root:8:~> tail /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read superblock for ag 47 > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read agf block for ag 47 > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read agi block for ag 47 > xfs_metadump: error - read only 0 of 4096 bytes > xfs_metadump: cannot read agfl block for ag 47 > xfs_metadump: Filesystem log is dirty; image will contain unobfuscated metadata in log. > cache_purge: shake on cache 0x4ee1c0 left 117 nodes!? > > is a glance at the contents. Should I post/paste the full copy? > It couldn't hurt. Perhaps this suggests there are other issues beyond what was shown in the original repair output. > > % > % Note that the metadump collects everything except file data so it will > % require a decent amount of space depending on how much metadata > % populates your fs vs. data. > % > % Then restore the metadump to a sparse file (on some other > % filesystem/storage): > % > % xfs_mdrestore -g <mdfile> <sparsefiletarget> > > I tried this > > diskfarm:root:11:~> dd if=/dev/zero bs=1 count=0 seek=4G of=/mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso > 0+0 records in > 0+0 records out > 0 bytes copied, 6.7252e-05 s, 0.0 kB/s > diskfarm:root:11:~> ls -goh /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso > -rw-r--r-- 1 4.0G Feb 7 21:15 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso > diskfarm:root:11:~> xfs_mdrestore /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso > xfs_mdrestore: cannot set filesystem image size: File too large > > and got an error :-( Should a 4G file be large enough for a 3.5G > metadata dump? > The output file size is too large and not supported by the underlying filesystem. Note that the output file size will match the size of the original fs despite the fact that the image may only consume 3.5G worth of space. What is the underlying fs? You might need to find somewhere where you can restore this file on another XFS fs. Brian > > % > % Then you can mount/xfs_repair the restored sparse image, see what > % xfs_repair does, mount the before/after img, etc. Note again that file > % data is absent from the restored metadata image so don't expect to be > % able to look at file content in the metadump image. > > Right. That sounds like a great middle step, though. Thanks! > > > % > % Brian > > > HAND > > :-D > -- > David T-G > See http://justpickone.org/davidtg/email/ > See http://justpickone.org/davidtg/tofu.txt > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: is it safe to xfs_repair this volume? do i have a different first step? 2019-02-08 2:25 ` David T-G 2019-02-08 13:00 ` Brian Foster @ 2019-02-08 19:45 ` Chris Murphy 1 sibling, 0 replies; 6+ messages in thread From: Chris Murphy @ 2019-02-08 19:45 UTC (permalink / raw) To: David T-G; +Cc: Linux-XFS list On Thu, Feb 7, 2019 at 7:25 PM David T-G <davidtg@justpickone.org> wrote: > > diskfarm:root:6:~> mdadm --detail /dev/md0 > /dev/md0: > Version : 1.2 Version 1.2 metadata is 4K offset from the start of the member device. The member devices in your case: > Number Major Minor RaidDevice State > 0 8 17 0 active sync /dev/sdb1 > 1 8 65 1 active sync /dev/sde1 > 3 8 81 2 active sync /dev/sdf1 > 4 8 1 3 active sync /dev/sda1 That means those member devices are partitioned. The primary GPT will be in the first 34 512 byte sectors, and backup GPT in the last 34 512 byte sectors, on each physical drive. The mdadm v1.2 superblock is located at 4K from the start of the partition designated as a member of the array. And mdadm will only consider the partition as the area that can be written to which means each member device's backup GPT should be immune from being written to by md and XFS. Since there's a 512KiB chunk size, and the array is clearly also partitioned, means the array primary GPT is on one member device soon after the mdadm superblock; and the array backup GPT is on a different member device immediately before its own backup GPT. I can't think of a reason for a conflict off the top of my head. And yet there's a conflict somewhere as you have independent corruptions: XFS and GPT. Just - whatever you do, don't fix anything. Here's an idea for setting up an overlay so you can test your repairs by writing changes elsewhere, and not touch the original drives. https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file I suggest getting advice on the linux-raid list before proceeding, find out why it appears XFS and the array backup GPT are being stepped on. They'll want to see the partitioning for every device (both primary and backup if they aren't identical, i.e. one is corrupt), the full superblock for each device, and the GPT for the array. And what version of mdadm was used to create the array. They'll also want smartctl -x for each drive. And they'll want 'smartctl -l scterc' from each drive. And they'll want to know what the kernel command timer is set to for each drive: # cat /sys/block/sdX/device/timeout I imagine you're gonna get asked by someone why bother partitioning each drive with one partition, and then partition the array too, also with one partition. That's overly complicated and serves no purpose. Next time, make each whole drive an mdadm member; and then format the array. People lose their data all the time due to user error, so I can't recommend enough that you sanity check what you've done and what you intend to do, on each applicable list, using linux-raid for the mdadm stuff. And for godsake if you care at all about this data you need at least one backup copy. -- Chris Murphy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: is it safe to xfs_repair this volume? do i have a different first step? 2019-02-07 13:25 is it safe to xfs_repair this volume? do i have a different first step? David T-G 2019-02-07 14:52 ` Brian Foster @ 2019-02-08 18:40 ` Chris Murphy 1 sibling, 0 replies; 6+ messages in thread From: Chris Murphy @ 2019-02-08 18:40 UTC (permalink / raw) To: David T-G; +Cc: Linux-XFS list On Thu, Feb 7, 2019 at 6:30 AM David T-G <davidtg@justpickone.org> wrote: > > diskfarm:root:4:~> parted /dev/md0 print > Error: end of file while reading /dev/md0 > Retry/Ignore/Cancel? ignore > Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used. [snip] > when poking, I at first thought that this was a RAID issue, but all of > the md reports look good and apparently the GPT table issue is common, so > I'll leave all of that out unless someone asks for it. A corrupt backup GPT is a huge redflag that there's user confusion, that has then led to the storage stack itself becoming confused. Since GPT partitioning an array, in particular with just one partition, seems unnecessarily complicated and thus pointless; I'm suspicious that /dev/md0 is not in fact partitioned - that GPT very well may belong to the first member device of the array. Not the array. And the reason the backup is "corrupt" is because parted+fdisk looking at the end of /dev/md0 rather than the end of the device this GPT actually belongs to. So I suspect GPT and XFS have stepped on each other possibly more than once each which is why both have corruption; and the mdadm metadata doesn't. Or even possible that one or more signatures in this storage stack are stale, not having previously been properly wiped, and now are haunting this storage stack. I wouldn't make any writes until you've double checked what the layout is supposed to be. First check if the individual member drives are GPT partitioned, and whether their primary and backups are valid (not corrupt); if there's corruption don't fix it yet. Right now you just need to focus on what all of the on disk metadata says is true, and then you'll be able to discover what metadata is wrong and contributing to all this confusion. -- Chris Murphy ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-02-08 19:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-07 13:25 is it safe to xfs_repair this volume? do i have a different first step? David T-G 2019-02-07 14:52 ` Brian Foster 2019-02-08 2:25 ` David T-G 2019-02-08 13:00 ` Brian Foster 2019-02-08 19:45 ` Chris Murphy 2019-02-08 18:40 ` Chris Murphy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.