All of lore.kernel.org
 help / color / mirror / Atom feed
* Help ironing out persistent mismatches on raid6
@ 2021-12-02 22:04 Matt Garretson
  2021-12-02 23:33 ` Eyal Lebedinsky
  2021-12-03 18:00 ` Piergiorgio Sartor
  0 siblings, 2 replies; 5+ messages in thread
From: Matt Garretson @ 2021-12-02 22:04 UTC (permalink / raw)
  To: linux-raid

Hi, I have this RAID6 array of 6x 8TB drives:

/dev/md1:
           Version : 1.2
     Creation Time : Fri Jul  6 23:20:38 2018
        Raid Level : raid6
        Array Size : 31255166976 (29.11 TiB 32.01 TB)
     Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
      Raid Devices : 6

There is an ext4 fs on the device (no lvm).

The array for over a year has had 40 contiguous mismatches in the same spot:

md1: mismatch sector in range 2742891144-2742891152
md1: mismatch sector in range 2742891152-2742891160
md1: mismatch sector in range 2742891160-2742891168
md1: mismatch sector in range 2742891168-2742891176
md1: mismatch sector in range 2742891176-2742891184

Sector size is 512, so I guess this works out to be five 4KiB blocks, or
20KiB of space.

The array is checked weekly, but never been "repaired".  The ext4
filesystem has been fsck'd a lot over the years, with no problems.  But
I worry about what file might potentially have bad data in it.  There
are a lot of files.

I have done:

dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40  |hexdump -C

... and I don't see anything meaningful to me.

I have done  dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
the first block is 0.  So does....

2742891144 * 512 / 4096 = 342861393

...mean we are dealing with blocks # 342861393 - 342861398 of the
filesystem?  If so, is there a way for me to see what file(s) use those
blocks?

Thanks in advance for any tips...
-Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help ironing out persistent mismatches on raid6
  2021-12-02 22:04 Help ironing out persistent mismatches on raid6 Matt Garretson
@ 2021-12-02 23:33 ` Eyal Lebedinsky
  2021-12-03 18:30   ` listy
  2021-12-03 18:00 ` Piergiorgio Sartor
  1 sibling, 1 reply; 5+ messages in thread
From: Eyal Lebedinsky @ 2021-12-02 23:33 UTC (permalink / raw)
  To: linux-raid



On 03/12/2021 09.04, Matt Garretson wrote:
> Hi, I have this RAID6 array of 6x 8TB drives:
> 
> /dev/md1:
>             Version : 1.2
>       Creation Time : Fri Jul  6 23:20:38 2018
>          Raid Level : raid6
>          Array Size : 31255166976 (29.11 TiB 32.01 TB)
>       Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
>        Raid Devices : 6
> 
> There is an ext4 fs on the device (no lvm).
> 
> The array for over a year has had 40 contiguous mismatches in the same spot:
> 
> md1: mismatch sector in range 2742891144-2742891152
> md1: mismatch sector in range 2742891152-2742891160
> md1: mismatch sector in range 2742891160-2742891168
> md1: mismatch sector in range 2742891168-2742891176
> md1: mismatch sector in range 2742891176-2742891184
> 
> Sector size is 512, so I guess this works out to be five 4KiB blocks, or
> 20KiB of space.
> 
> The array is checked weekly, but never been "repaired".  The ext4
> filesystem has been fsck'd a lot over the years, with no problems.  But
> I worry about what file might potentially have bad data in it.  There
> are a lot of files.
> 
> I have done:
> 
> dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40  |hexdump -C
> 
> ... and I don't see anything meaningful to me.
> 
> I have done  dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
> the first block is 0.  So does....
> 
> 2742891144 * 512 / 4096 = 342861393
> 
> ...mean we are dealing with blocks # 342861393 - 342861398 of the
> filesystem?  If so, is there a way for me to see what file(s) use those
> blocks?
> 
> Thanks in advance for any tips...
> -Matt

I use debugfs to do this. Knowing each fs block range (lo hi) calculated from the raid mismatch notice:

I first identify the relevant blocks in each reported range with
	debugfs -R "testb $lo $((hi-lo))" $device
then locate the associated inodes with
	debugfs -R "icheck $list" $device
and finally discover files in these locations with
	debugfs -R "ncheck $inode" $device

Some of the above debugfs requests can take a very long time to perform. I actually have a script that
does everything and can be left to run for a day (or longer) but it is very locally specific for my setup.

HTH

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help ironing out persistent mismatches on raid6
  2021-12-02 22:04 Help ironing out persistent mismatches on raid6 Matt Garretson
  2021-12-02 23:33 ` Eyal Lebedinsky
@ 2021-12-03 18:00 ` Piergiorgio Sartor
  1 sibling, 0 replies; 5+ messages in thread
From: Piergiorgio Sartor @ 2021-12-03 18:00 UTC (permalink / raw)
  To: Matt Garretson; +Cc: linux-raid

On Thu, Dec 02, 2021 at 05:04:57PM -0500, Matt Garretson wrote:
> Hi, I have this RAID6 array of 6x 8TB drives:
> 
> /dev/md1:
>            Version : 1.2
>      Creation Time : Fri Jul  6 23:20:38 2018
>         Raid Level : raid6
>         Array Size : 31255166976 (29.11 TiB 32.01 TB)
>      Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
>       Raid Devices : 6
> 
> There is an ext4 fs on the device (no lvm).
> 
> The array for over a year has had 40 contiguous mismatches in the same spot:
> 
> md1: mismatch sector in range 2742891144-2742891152
> md1: mismatch sector in range 2742891152-2742891160
> md1: mismatch sector in range 2742891160-2742891168
> md1: mismatch sector in range 2742891168-2742891176
> md1: mismatch sector in range 2742891176-2742891184

If you dare to try, you can use
"raid6check" which should be
together with "mdadm" (at least
in source form).

If I recall correctly, this could
check a given range (whole array will
take forever) and report, if possible,
*which* drive is having the mismatch.

It is also possible to use it to
repair the mismatch, if caused by a
single drive problem.

Of course, standard disclaimer is that
nothing is guarantee to work as advertised.
Use at your own risk...

bye,

pg

> 
> Sector size is 512, so I guess this works out to be five 4KiB blocks, or
> 20KiB of space.
> 
> The array is checked weekly, but never been "repaired".  The ext4
> filesystem has been fsck'd a lot over the years, with no problems.  But
> I worry about what file might potentially have bad data in it.  There
> are a lot of files.
> 
> I have done:
> 
> dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40  |hexdump -C
> 
> ... and I don't see anything meaningful to me.
> 
> I have done  dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
> the first block is 0.  So does....
> 
> 2742891144 * 512 / 4096 = 342861393
> 
> ...mean we are dealing with blocks # 342861393 - 342861398 of the
> filesystem?  If so, is there a way for me to see what file(s) use those
> blocks?
> 
> Thanks in advance for any tips...
> -Matt

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help ironing out persistent mismatches on raid6
  2021-12-02 23:33 ` Eyal Lebedinsky
@ 2021-12-03 18:30   ` listy
  0 siblings, 0 replies; 5+ messages in thread
From: listy @ 2021-12-03 18:30 UTC (permalink / raw)
  To: linux-raid

On Thu, Dec 2, 2021, at 18:33, Eyal Lebedinsky wrote:
> I use debugfs to do this. Knowing each fs block range (lo hi) 
> calculated from the raid mismatch notice:
>
> I first identify the relevant blocks in each reported range with
> 	debugfs -R "testb $lo $((hi-lo))" $device
> then locate the associated inodes with
> 	debugfs -R "icheck $list" $device
> and finally discover files in these locations with
> 	debugfs -R "ncheck $inode" $device


Thank you very much... this is exactly what I was looking for and it worked brilliantly. For posterity,  here's a summary of the process in my case.  First I converted the md sector numbers to ext4 block numbers:
2742891144 * 512 / 4096 + 0 =  342861393 
And md also told me the mismatches basically covered 5 4KiB contiguous blocks.  So...
 
   debugfs -R "testb 342861393 5" /dev/md1

told me all 5 blocks were in use.  Then using all 5 block numbers...

   debugfs -R "icheck 342861393 342861394 342861395 342861396 342861397" /dev/md1

gave me a single inode number. Then...

   debugfs -R "ncheck 299893552" /dev/md1

gave me the filename (e.g.)  /videos/00001.MTS         Then...

cp -a /videos/00001.MTS   /tmp  && rm /videos/00001.MTS ;  debugfs -R "testb 342861393 5" /dev/md1

resulted in those same 5 blocks now *not* being in use.    Then

   echo 2742891144 > /sys/block/md1/md/sync_min
   echo 2743074816 > /sys/block/md1/md/sync_max   # rounded up to next 512KiB chunk
   echo repair > /sys/block/md1/md/sync_action
   cat /sys/block/md1/md/sync_completed
   # only took a couple seconds
   echo idle > /sys/block/md1/md/sync_action
   mv /tmp/00001.MTS  /videos

I was surprised that while I was hitting those sectors, md did not report the mismatches.  And I worried that maybe I calculated the wrong blocks.  But then I did a check of the entire array overnight, and there were no mismatches reported, for the first time in a year or two of weeks.  So I guess it worked.  I actully am still a bit confused on when md components are dealing in bytes, sectors or chunks, but in this case it worked out.

Anyway, thank you, Eyal!!

-Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Help ironing out persistent mismatches on raid6
@ 2021-12-02 22:06 listy
  0 siblings, 0 replies; 5+ messages in thread
From: listy @ 2021-12-02 22:06 UTC (permalink / raw)
  To: linux-raid

Hi, I have this RAID6 array of 6x 8TB drives:

/dev/md1:
           Version : 1.2
     Creation Time : Fri Jul  6 23:20:38 2018
        Raid Level : raid6
        Array Size : 31255166976 (29.11 TiB 32.01 TB)
     Used Dev Size : 7813791744 (7.28 TiB 8.00 TB)
      Raid Devices : 6

There is an ext4 fs on the device (no lvm).

The array for over a year has had 40 contiguous mismatches in the same spot:

md1: mismatch sector in range 2742891144-2742891152
md1: mismatch sector in range 2742891152-2742891160
md1: mismatch sector in range 2742891160-2742891168
md1: mismatch sector in range 2742891168-2742891176
md1: mismatch sector in range 2742891176-2742891184

Sector size is 512, so I guess this works out to be five 4KiB blocks, or
20KiB of space.

The array is checked weekly, but never been "repaired".  The ext4
filesystem has been fsck'd a lot over the years, with no problems.  But
I worry about what file might potentially have bad data in it.  There
are a lot of files.

I have done:

dd status=none if=/dev/md1 ibs=512 skip=2742891144 count=40  |hexdump -C

... and I don't see anything meaningful to me.

I have done  dumpe2fs -h /dev/md1 and it tells me block size is 4096 and
the first block is 0.  So does....

2742891144 * 512 / 4096 = 342861393

...mean we are dealing with blocks # 342861393 - 342861398 of the
filesystem?  If so, is there a way for me to see what file(s) use those
blocks?

Thanks in advance for any tips...
-Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-03 18:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 22:04 Help ironing out persistent mismatches on raid6 Matt Garretson
2021-12-02 23:33 ` Eyal Lebedinsky
2021-12-03 18:30   ` listy
2021-12-03 18:00 ` Piergiorgio Sartor
2021-12-02 22:06 listy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.