* Help ironing out persistent mismatches on raid6
@ 2021-12-02 22:04 Matt Garretson
2021-12-02 23:33 ` Eyal Lebedinsky
2021-12-03 18:00 ` Piergiorgio Sartor
0 siblings, 2 replies; 5+ messages in thread
From: Matt Garretson @ 2021-12-02 22:04 UTC (permalink / raw)
To: linux-raid
Hi, I have this RAID6 array of 6x 8TB drives:
/dev/md1:
Version : 1.2
Creation Time : Fri Jul 6 23:20:38 2018
Raid Level : raid6
Array Size : 31255166976 (29.11 TiB 32.01 TB)
Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
Raid Devices : 6
There is an ext4 fs on the device (no lvm).
The array for over a year has had 40 contiguous mismatches in the same spot:
md1: mismatch sector in range 2742891144-2742891152
md1: mismatch sector in range 2742891152-2742891160
md1: mismatch sector in range 2742891160-2742891168
md1: mismatch sector in range 2742891168-2742891176
md1: mismatch sector in range 2742891176-2742891184
Sector size is 512, so I guess this works out to be five 4KiB blocks, or
20KiB of space.
The array is checked weekly, but never been "repaired". The ext4
filesystem has been fsck'd a lot over the years, with no problems. But
I worry about what file might potentially have bad data in it. There
are a lot of files.
I have done:
dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40 |hexdump -C
... and I don't see anything meaningful to me.
I have done dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
the first block is 0. So does....
2742891144 * 512 / 4096 = 342861393
...mean we are dealing with blocks # 342861393 - 342861398 of the
filesystem? If so, is there a way for me to see what file(s) use those
blocks?
Thanks in advance for any tips...
-Matt
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help ironing out persistent mismatches on raid6
2021-12-02 22:04 Help ironing out persistent mismatches on raid6 Matt Garretson
@ 2021-12-02 23:33 ` Eyal Lebedinsky
2021-12-03 18:30 ` listy
2021-12-03 18:00 ` Piergiorgio Sartor
1 sibling, 1 reply; 5+ messages in thread
From: Eyal Lebedinsky @ 2021-12-02 23:33 UTC (permalink / raw)
To: linux-raid
On 03/12/2021 09.04, Matt Garretson wrote:
> Hi, I have this RAID6 array of 6x 8TB drives:
>
> /dev/md1:
> Version : 1.2
> Creation Time : Fri Jul 6 23:20:38 2018
> Raid Level : raid6
> Array Size : 31255166976 (29.11 TiB 32.01 TB)
> Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
> Raid Devices : 6
>
> There is an ext4 fs on the device (no lvm).
>
> The array for over a year has had 40 contiguous mismatches in the same spot:
>
> md1: mismatch sector in range 2742891144-2742891152
> md1: mismatch sector in range 2742891152-2742891160
> md1: mismatch sector in range 2742891160-2742891168
> md1: mismatch sector in range 2742891168-2742891176
> md1: mismatch sector in range 2742891176-2742891184
>
> Sector size is 512, so I guess this works out to be five 4KiB blocks, or
> 20KiB of space.
>
> The array is checked weekly, but never been "repaired". The ext4
> filesystem has been fsck'd a lot over the years, with no problems. But
> I worry about what file might potentially have bad data in it. There
> are a lot of files.
>
> I have done:
>
> dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40 |hexdump -C
>
> ... and I don't see anything meaningful to me.
>
> I have done dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
> the first block is 0. So does....
>
> 2742891144 * 512 / 4096 = 342861393
>
> ...mean we are dealing with blocks # 342861393 - 342861398 of the
> filesystem? If so, is there a way for me to see what file(s) use those
> blocks?
>
> Thanks in advance for any tips...
> -Matt
I use debugfs to do this. Knowing each fs block range (lo hi) calculated from the raid mismatch notice:
I first identify the relevant blocks in each reported range with
debugfs -R "testb $lo $((hi-lo))" $device
then locate the associated inodes with
debugfs -R "icheck $list" $device
and finally discover files in these locations with
debugfs -R "ncheck $inode" $device
Some of the above debugfs requests can take a very long time to perform. I actually have a script that
does everything and can be left to run for a day (or longer) but it is very locally specific for my setup.
HTH
--
Eyal Lebedinsky (eyal@eyal.emu.id.au)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help ironing out persistent mismatches on raid6
2021-12-02 22:04 Help ironing out persistent mismatches on raid6 Matt Garretson
2021-12-02 23:33 ` Eyal Lebedinsky
@ 2021-12-03 18:00 ` Piergiorgio Sartor
1 sibling, 0 replies; 5+ messages in thread
From: Piergiorgio Sartor @ 2021-12-03 18:00 UTC (permalink / raw)
To: Matt Garretson; +Cc: linux-raid
On Thu, Dec 02, 2021 at 05:04:57PM -0500, Matt Garretson wrote:
> Hi, I have this RAID6 array of 6x 8TB drives:
>
> /dev/md1:
> Version : 1.2
> Creation Time : Fri Jul 6 23:20:38 2018
> Raid Level : raid6
> Array Size : 31255166976 (29.11 TiB 32.01 TB)
> Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
> Raid Devices : 6
>
> There is an ext4 fs on the device (no lvm).
>
> The array for over a year has had 40 contiguous mismatches in the same spot:
>
> md1: mismatch sector in range 2742891144-2742891152
> md1: mismatch sector in range 2742891152-2742891160
> md1: mismatch sector in range 2742891160-2742891168
> md1: mismatch sector in range 2742891168-2742891176
> md1: mismatch sector in range 2742891176-2742891184
If you dare to try, you can use
"raid6check" which should be
together with "mdadm" (at least
in source form).
If I recall correctly, this could
check a given range (whole array will
take forever) and report, if possible,
*which* drive is having the mismatch.
It is also possible to use it to
repair the mismatch, if caused by a
single drive problem.
Of course, standard disclaimer is that
nothing is guarantee to work as advertised.
Use at your own risk...
bye,
pg
>
> Sector size is 512, so I guess this works out to be five 4KiB blocks, or
> 20KiB of space.
>
> The array is checked weekly, but never been "repaired". The ext4
> filesystem has been fsck'd a lot over the years, with no problems. But
> I worry about what file might potentially have bad data in it. There
> are a lot of files.
>
> I have done:
>
> dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40 |hexdump -C
>
> ... and I don't see anything meaningful to me.
>
> I have done dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
> the first block is 0. So does....
>
> 2742891144 * 512 / 4096 = 342861393
>
> ...mean we are dealing with blocks # 342861393 - 342861398 of the
> filesystem? If so, is there a way for me to see what file(s) use those
> blocks?
>
> Thanks in advance for any tips...
> -Matt
--
piergiorgio
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help ironing out persistent mismatches on raid6
2021-12-02 23:33 ` Eyal Lebedinsky
@ 2021-12-03 18:30 ` listy
0 siblings, 0 replies; 5+ messages in thread
From: listy @ 2021-12-03 18:30 UTC (permalink / raw)
To: linux-raid
On Thu, Dec 2, 2021, at 18:33, Eyal Lebedinsky wrote:
> I use debugfs to do this. Knowing each fs block range (lo hi)
> calculated from the raid mismatch notice:
>
> I first identify the relevant blocks in each reported range with
> debugfs -R "testb $lo $((hi-lo))" $device
> then locate the associated inodes with
> debugfs -R "icheck $list" $device
> and finally discover files in these locations with
> debugfs -R "ncheck $inode" $device
Thank you very much... this is exactly what I was looking for and it worked brilliantly. For posterity, here's a summary of the process in my case. First I converted the md sector numbers to ext4 block numbers:
2742891144 * 512 / 4096 + 0 = 342861393
And md also told me the mismatches basically covered 5 4KiB contiguous blocks. So...
debugfs -R "testb 342861393 5" /dev/md1
told me all 5 blocks were in use. Then using all 5 block numbers...
debugfs -R "icheck 342861393 342861394 342861395 342861396 342861397" /dev/md1
gave me a single inode number. Then...
debugfs -R "ncheck 299893552" /dev/md1
gave me the filename (e.g.) /videos/00001.MTS Then...
cp -a /videos/00001.MTS /tmp && rm /videos/00001.MTS ; debugfs -R "testb 342861393 5" /dev/md1
resulted in those same 5 blocks now *not* being in use. Then
echo 2742891144 > /sys/block/md1/md/sync_min
echo 2743074816 > /sys/block/md1/md/sync_max # rounded up to next 512KiB chunk
echo repair > /sys/block/md1/md/sync_action
cat /sys/block/md1/md/sync_completed
# only took a couple seconds
echo idle > /sys/block/md1/md/sync_action
mv /tmp/00001.MTS /videos
I was surprised that while I was hitting those sectors, md did not report the mismatches. And I worried that maybe I calculated the wrong blocks. But then I did a check of the entire array overnight, and there were no mismatches reported, for the first time in a year or two of weeks. So I guess it worked. I actully am still a bit confused on when md components are dealing in bytes, sectors or chunks, but in this case it worked out.
Anyway, thank you, Eyal!!
-Matt
^ permalink raw reply [flat|nested] 5+ messages in thread
* Help ironing out persistent mismatches on raid6
@ 2021-12-02 22:06 listy
0 siblings, 0 replies; 5+ messages in thread
From: listy @ 2021-12-02 22:06 UTC (permalink / raw)
To: linux-raid
Hi, I have this RAID6 array of 6x 8TB drives:
/dev/md1:
Version : 1.2
Creation Time : Fri Jul 6 23:20:38 2018
Raid Level : raid6
Array Size : 31255166976 (29.11 TiB 32.01 TB)
Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
Raid Devices : 6
There is an ext4 fs on the device (no lvm).
The array for over a year has had 40 contiguous mismatches in the same spot:
md1: mismatch sector in range 2742891144-2742891152
md1: mismatch sector in range 2742891152-2742891160
md1: mismatch sector in range 2742891160-2742891168
md1: mismatch sector in range 2742891168-2742891176
md1: mismatch sector in range 2742891176-2742891184
Sector size is 512, so I guess this works out to be five 4KiB blocks, or
20KiB of space.
The array is checked weekly, but never been "repaired". The ext4
filesystem has been fsck'd a lot over the years, with no problems. But
I worry about what file might potentially have bad data in it. There
are a lot of files.
I have done:
dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40 |hexdump -C
... and I don't see anything meaningful to me.
I have done dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
the first block is 0. So does....
2742891144 * 512 / 4096 = 342861393
...mean we are dealing with blocks # 342861393 - 342861398 of the
filesystem? If so, is there a way for me to see what file(s) use those
blocks?
Thanks in advance for any tips...
-Matt
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-03 18:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 22:04 Help ironing out persistent mismatches on raid6 Matt Garretson
2021-12-02 23:33 ` Eyal Lebedinsky
2021-12-03 18:30 ` listy
2021-12-03 18:00 ` Piergiorgio Sartor
2021-12-02 22:06 listy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.