All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
@ 2023-11-11  0:00 Bhanu Victor DiCara
  2023-11-11  6:25 ` Bagas Sanjaya
  2023-11-16 16:24 ` Song Liu
  0 siblings, 2 replies; 8+ messages in thread
From: Bhanu Victor DiCara @ 2023-11-11  0:00 UTC (permalink / raw)
  To: linux-raid; +Cc: regressions, song, jiangguoqing, jgq516

A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.


#regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
(The problem does not occur in the previous commit.)

In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.

Similarly, in commit ffc253263a1375a65fa6c9f62a893e9767fbebfa (v6.6), file drivers/md/raid5.c, when line 6200 is removed, the problem does not occur.


Steps to reproduce the problem (using bash or similar):
1. Create a degraded RAID 4/5/6 array:
fallocate -l 2056M test_array_part_1.img
fallocate -l 2056M test_array_part_2.img
lo1=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_1.img)
lo2=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_2.img)
# The RAID level must be 4 or 5 or 6 with at least 1 missing drive in any order. The following configuration seems to be the most effective:
mdadm --create /dev/md/tmp_test_array --level=4 --raid-devices=3 --chunk=1M --size=2G  $lo1 missing $lo2

2. Create the test file system and clone it to the degraded array:
fallocate -l 4G test_fs.img
mke2fs -t ext4 -b 4096 -i 65536 -m 0 -E stride=256,stripe_width=512 -L test_fs  test_fs.img
lo3=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_fs.img)
mount $lo3 /mnt/1
python3 create_test_fs.py /mnt/1
umount /mnt/1
cat test_fs.img > /dev/md/tmp_test_array
cmp -l test_fs.img /dev/md/tmp_test_array  # Optionally verify the clone
mount --read-only $lo3 /mnt/1

3. Mount the degraded array:
mount --read-only /dev/md/tmp_test_array /mnt/2

4. Compare the files:
diff -q /mnt/1 /mnt/2

If no files are detected as different, do `umount /mnt/2` and `echo 2 > /proc/sys/vm/drop_caches`, and then go to step 3.
(Doing `echo 3 > /proc/sys/vm/drop_caches` and then going to step 4 is less effective.)
(Only doing `umount /mnt/2` and/or `echo 1 > /proc/sys/vm/drop_caches` is much less effective and the effectiveness wears off.)


create_test_fs.py:
import errno
import itertools
import os
import random
import sys


def main(test_fs_path):
	rng = random.Random(0)
	try:
		for i in itertools.count():
			size = int(2**rng.uniform(12, 24))
			with open(os.path.join(test_fs_path, str(i).zfill(4) + '.bin'), 'xb') as f:
				f.write(b'\xff' * size)
			print(f'Created file {f.name!r} with size {size}')
	except OSError as e:
		if e.errno != errno.ENOSPC:
			raise
		print(f'Done: {e.strerror} (partially created file {f.name!r})')


if __name__ == '__main__':
	main(sys.argv[1])




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-11  0:00 [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted Bhanu Victor DiCara
@ 2023-11-11  6:25 ` Bagas Sanjaya
  2023-11-13  0:41   ` Song Liu
  2023-11-16 16:24 ` Song Liu
  1 sibling, 1 reply; 8+ messages in thread
From: Bagas Sanjaya @ 2023-11-11  6:25 UTC (permalink / raw)
  To: Bhanu Victor DiCara, linux-raid; +Cc: regressions, song, jiangguoqing, jgq516

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

On Sat, Nov 11, 2023 at 09:00:00AM +0900, Bhanu Victor DiCara wrote:
> A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
> 
> 
> #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
> (The problem does not occur in the previous commit.)
> 

regzbot dashboard shows that commit cc22b5407e9ca7 ("md: raid0: account
for split bio in iostat accounting") should have fixed your regression.
Can you try that commit?

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-11  6:25 ` Bagas Sanjaya
@ 2023-11-13  0:41   ` Song Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2023-11-13  0:41 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Bhanu Victor DiCara, linux-raid, regressions, jiangguoqing, jgq516

Thanks for the report!

I will look into this.

On Fri, Nov 10, 2023 at 11:25 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On Sat, Nov 11, 2023 at 09:00:00AM +0900, Bhanu Victor DiCara wrote:
> > A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
> >
> >
> > #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
> > (The problem does not occur in the previous commit.)
> >
>
> regzbot dashboard shows that commit cc22b5407e9ca7 ("md: raid0: account
> for split bio in iostat accounting") should have fixed your regression.
> Can you try that commit?

6.6 kernel already included cc22b5407e9ca7. Also cc22b5407e9ca7 only
changes the behavior of raid0, so it shouldn't fix anything with raid5.

Thanks,
Song

>
> Thanks.
>
> --
> An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-11  0:00 [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted Bhanu Victor DiCara
  2023-11-11  6:25 ` Bagas Sanjaya
@ 2023-11-16 16:24 ` Song Liu
  2023-11-17  1:05   ` Yu Kuai
  2023-11-17  9:08   ` Xiao Ni
  1 sibling, 2 replies; 8+ messages in thread
From: Song Liu @ 2023-11-16 16:24 UTC (permalink / raw)
  To: Bhanu Victor DiCara, Xiao Ni, Mateusz Grzonka
  Cc: linux-raid, regressions, jiangguoqing, jgq516

+ more folks.

On Fri, Nov 10, 2023 at 7:00 PM Bhanu Victor DiCara
<00bvd0+linux@gmail.com> wrote:
>
> A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
>
>
> #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
> (The problem does not occur in the previous commit.)
>
> In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.

The patch below should fix it. Please give it more thorough tests and
let me know whether it fixes everything. I will send patch later with
more details.

Thanks,
Song

diff --git i/drivers/md/md.c w/drivers/md/md.c
index 68f3bb6e89cb..d4fb1aa5c86f 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -8674,7 +8674,8 @@ static void md_end_clone_io(struct bio *bio)
        struct bio *orig_bio = md_io_clone->orig_bio;
        struct mddev *mddev = md_io_clone->mddev;

-       orig_bio->bi_status = bio->bi_status;
+       if (bio->bi_status)
+               orig_bio->bi_status = bio->bi_status;

        if (md_io_clone->start_time)
                bio_end_io_acct(orig_bio, md_io_clone->start_time);


>
> Similarly, in commit ffc253263a1375a65fa6c9f62a893e9767fbebfa (v6.6), file drivers/md/raid5.c, when line 6200 is removed, the problem does not occur.
>
>
> Steps to reproduce the problem (using bash or similar):
> 1. Create a degraded RAID 4/5/6 array:
> fallocate -l 2056M test_array_part_1.img
> fallocate -l 2056M test_array_part_2.img
> lo1=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_1.img)
> lo2=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_2.img)
> # The RAID level must be 4 or 5 or 6 with at least 1 missing drive in any order. The following configuration seems to be the most effective:
> mdadm --create /dev/md/tmp_test_array --level=4 --raid-devices=3 --chunk=1M --size=2G  $lo1 missing $lo2
>
> 2. Create the test file system and clone it to the degraded array:
> fallocate -l 4G test_fs.img
> mke2fs -t ext4 -b 4096 -i 65536 -m 0 -E stride=256,stripe_width=512 -L test_fs  test_fs.img
> lo3=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_fs.img)
> mount $lo3 /mnt/1
> python3 create_test_fs.py /mnt/1
> umount /mnt/1
> cat test_fs.img > /dev/md/tmp_test_array
> cmp -l test_fs.img /dev/md/tmp_test_array  # Optionally verify the clone
> mount --read-only $lo3 /mnt/1
>
> 3. Mount the degraded array:
> mount --read-only /dev/md/tmp_test_array /mnt/2
>
> 4. Compare the files:
> diff -q /mnt/1 /mnt/2
>
> If no files are detected as different, do `umount /mnt/2` and `echo 2 > /proc/sys/vm/drop_caches`, and then go to step 3.
> (Doing `echo 3 > /proc/sys/vm/drop_caches` and then going to step 4 is less effective.)
> (Only doing `umount /mnt/2` and/or `echo 1 > /proc/sys/vm/drop_caches` is much less effective and the effectiveness wears off.)
>
>
> create_test_fs.py:
> import errno
> import itertools
> import os
> import random
> import sys
>
>
> def main(test_fs_path):
>         rng = random.Random(0)
>         try:
>                 for i in itertools.count():
>                         size = int(2**rng.uniform(12, 24))
>                         with open(os.path.join(test_fs_path, str(i).zfill(4) + '.bin'), 'xb') as f:
>                                 f.write(b'\xff' * size)
>                         print(f'Created file {f.name!r} with size {size}')
>         except OSError as e:
>                 if e.errno != errno.ENOSPC:
>                         raise
>                 print(f'Done: {e.strerror} (partially created file {f.name!r})')
>
>
> if __name__ == '__main__':
>         main(sys.argv[1])
>
>
>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-16 16:24 ` Song Liu
@ 2023-11-17  1:05   ` Yu Kuai
  2023-11-18  0:32     ` Song Liu
  2023-11-17  9:08   ` Xiao Ni
  1 sibling, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2023-11-17  1:05 UTC (permalink / raw)
  To: Song Liu, Bhanu Victor DiCara, Xiao Ni, Mateusz Grzonka
  Cc: linux-raid, regressions, jiangguoqing, jgq516, yukuai (C)

Hi,

在 2023/11/17 0:24, Song Liu 写道:
> + more folks.
> 
> On Fri, Nov 10, 2023 at 7:00 PM Bhanu Victor DiCara
> <00bvd0+linux@gmail.com> wrote:
>>
>> A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
>>
>>
>> #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
>> (The problem does not occur in the previous commit.)
>>
>> In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.
> 
> The patch below should fix it. Please give it more thorough tests and
> let me know whether it fixes everything. I will send patch later with
> more details.
> 
> Thanks,
> Song
> 
> diff --git i/drivers/md/md.c w/drivers/md/md.c
> index 68f3bb6e89cb..d4fb1aa5c86f 100644
> --- i/drivers/md/md.c
> +++ w/drivers/md/md.c
> @@ -8674,7 +8674,8 @@ static void md_end_clone_io(struct bio *bio)
>          struct bio *orig_bio = md_io_clone->orig_bio;
>          struct mddev *mddev = md_io_clone->mddev;
> 
> -       orig_bio->bi_status = bio->bi_status;
> +       if (bio->bi_status)
> +               orig_bio->bi_status = bio->bi_status;

I'm confused, do you mean that orig_bio can have error while bio
doesn't? If this is the case, can you explain more how this is
possible?

Thanks,
Kuai

> 
>          if (md_io_clone->start_time)
>                  bio_end_io_acct(orig_bio, md_io_clone->start_time);
> 
> 
>>
>> Similarly, in commit ffc253263a1375a65fa6c9f62a893e9767fbebfa (v6.6), file drivers/md/raid5.c, when line 6200 is removed, the problem does not occur.
>>
>>
>> Steps to reproduce the problem (using bash or similar):
>> 1. Create a degraded RAID 4/5/6 array:
>> fallocate -l 2056M test_array_part_1.img
>> fallocate -l 2056M test_array_part_2.img
>> lo1=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_1.img)
>> lo2=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_2.img)
>> # The RAID level must be 4 or 5 or 6 with at least 1 missing drive in any order. The following configuration seems to be the most effective:
>> mdadm --create /dev/md/tmp_test_array --level=4 --raid-devices=3 --chunk=1M --size=2G  $lo1 missing $lo2
>>
>> 2. Create the test file system and clone it to the degraded array:
>> fallocate -l 4G test_fs.img
>> mke2fs -t ext4 -b 4096 -i 65536 -m 0 -E stride=256,stripe_width=512 -L test_fs  test_fs.img
>> lo3=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_fs.img)
>> mount $lo3 /mnt/1
>> python3 create_test_fs.py /mnt/1
>> umount /mnt/1
>> cat test_fs.img > /dev/md/tmp_test_array
>> cmp -l test_fs.img /dev/md/tmp_test_array  # Optionally verify the clone
>> mount --read-only $lo3 /mnt/1
>>
>> 3. Mount the degraded array:
>> mount --read-only /dev/md/tmp_test_array /mnt/2
>>
>> 4. Compare the files:
>> diff -q /mnt/1 /mnt/2
>>
>> If no files are detected as different, do `umount /mnt/2` and `echo 2 > /proc/sys/vm/drop_caches`, and then go to step 3.
>> (Doing `echo 3 > /proc/sys/vm/drop_caches` and then going to step 4 is less effective.)
>> (Only doing `umount /mnt/2` and/or `echo 1 > /proc/sys/vm/drop_caches` is much less effective and the effectiveness wears off.)
>>
>>
>> create_test_fs.py:
>> import errno
>> import itertools
>> import os
>> import random
>> import sys
>>
>>
>> def main(test_fs_path):
>>          rng = random.Random(0)
>>          try:
>>                  for i in itertools.count():
>>                          size = int(2**rng.uniform(12, 24))
>>                          with open(os.path.join(test_fs_path, str(i).zfill(4) + '.bin'), 'xb') as f:
>>                                  f.write(b'\xff' * size)
>>                          print(f'Created file {f.name!r} with size {size}')
>>          except OSError as e:
>>                  if e.errno != errno.ENOSPC:
>>                          raise
>>                  print(f'Done: {e.strerror} (partially created file {f.name!r})')
>>
>>
>> if __name__ == '__main__':
>>          main(sys.argv[1])
>>
>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-16 16:24 ` Song Liu
  2023-11-17  1:05   ` Yu Kuai
@ 2023-11-17  9:08   ` Xiao Ni
  1 sibling, 0 replies; 8+ messages in thread
From: Xiao Ni @ 2023-11-17  9:08 UTC (permalink / raw)
  To: Song Liu
  Cc: Bhanu Victor DiCara, Mateusz Grzonka, linux-raid, regressions,
	jiangguoqing, jgq516

Hi all

I can reproduce this quickly with the commands mentioned at
https://www.spinics.net/lists/raid/msg73521.html in my environment.
After several hours test, this problem hasn't happened. This patch
works for me

Tested-by: Xiao Ni <xni@redhat.com>

On Fri, Nov 17, 2023 at 12:25 AM Song Liu <song@kernel.org> wrote:
>
> + more folks.
>
> On Fri, Nov 10, 2023 at 7:00 PM Bhanu Victor DiCara
> <00bvd0+linux@gmail.com> wrote:
> >
> > A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
> >
> >
> > #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
> > (The problem does not occur in the previous commit.)
> >
> > In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.
>
> The patch below should fix it. Please give it more thorough tests and
> let me know whether it fixes everything. I will send patch later with
> more details.
>
> Thanks,
> Song
>
> diff --git i/drivers/md/md.c w/drivers/md/md.c
> index 68f3bb6e89cb..d4fb1aa5c86f 100644
> --- i/drivers/md/md.c
> +++ w/drivers/md/md.c
> @@ -8674,7 +8674,8 @@ static void md_end_clone_io(struct bio *bio)
>         struct bio *orig_bio = md_io_clone->orig_bio;
>         struct mddev *mddev = md_io_clone->mddev;
>
> -       orig_bio->bi_status = bio->bi_status;
> +       if (bio->bi_status)
> +               orig_bio->bi_status = bio->bi_status;
>
>         if (md_io_clone->start_time)
>                 bio_end_io_acct(orig_bio, md_io_clone->start_time);
>
>
> >
> > Similarly, in commit ffc253263a1375a65fa6c9f62a893e9767fbebfa (v6.6), file drivers/md/raid5.c, when line 6200 is removed, the problem does not occur.
> >
> >
> > Steps to reproduce the problem (using bash or similar):
> > 1. Create a degraded RAID 4/5/6 array:
> > fallocate -l 2056M test_array_part_1.img
> > fallocate -l 2056M test_array_part_2.img
> > lo1=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_1.img)
> > lo2=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_2.img)
> > # The RAID level must be 4 or 5 or 6 with at least 1 missing drive in any order. The following configuration seems to be the most effective:
> > mdadm --create /dev/md/tmp_test_array --level=4 --raid-devices=3 --chunk=1M --size=2G  $lo1 missing $lo2
> >
> > 2. Create the test file system and clone it to the degraded array:
> > fallocate -l 4G test_fs.img
> > mke2fs -t ext4 -b 4096 -i 65536 -m 0 -E stride=256,stripe_width=512 -L test_fs  test_fs.img
> > lo3=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_fs.img)
> > mount $lo3 /mnt/1
> > python3 create_test_fs.py /mnt/1
> > umount /mnt/1
> > cat test_fs.img > /dev/md/tmp_test_array
> > cmp -l test_fs.img /dev/md/tmp_test_array  # Optionally verify the clone
> > mount --read-only $lo3 /mnt/1
> >
> > 3. Mount the degraded array:
> > mount --read-only /dev/md/tmp_test_array /mnt/2
> >
> > 4. Compare the files:
> > diff -q /mnt/1 /mnt/2
> >
> > If no files are detected as different, do `umount /mnt/2` and `echo 2 > /proc/sys/vm/drop_caches`, and then go to step 3.
> > (Doing `echo 3 > /proc/sys/vm/drop_caches` and then going to step 4 is less effective.)
> > (Only doing `umount /mnt/2` and/or `echo 1 > /proc/sys/vm/drop_caches` is much less effective and the effectiveness wears off.)
> >
> >
> > create_test_fs.py:
> > import errno
> > import itertools
> > import os
> > import random
> > import sys
> >
> >
> > def main(test_fs_path):
> >         rng = random.Random(0)
> >         try:
> >                 for i in itertools.count():
> >                         size = int(2**rng.uniform(12, 24))
> >                         with open(os.path.join(test_fs_path, str(i).zfill(4) + '.bin'), 'xb') as f:
> >                                 f.write(b'\xff' * size)
> >                         print(f'Created file {f.name!r} with size {size}')
> >         except OSError as e:
> >                 if e.errno != errno.ENOSPC:
> >                         raise
> >                 print(f'Done: {e.strerror} (partially created file {f.name!r})')
> >
> >
> > if __name__ == '__main__':
> >         main(sys.argv[1])
> >
> >
> >
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-17  1:05   ` Yu Kuai
@ 2023-11-18  0:32     ` Song Liu
  2023-11-20  1:04       ` Yu Kuai
  0 siblings, 1 reply; 8+ messages in thread
From: Song Liu @ 2023-11-18  0:32 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Bhanu Victor DiCara, Xiao Ni, Mateusz Grzonka, linux-raid,
	regressions, jiangguoqing, jgq516, yukuai (C)

On Thu, Nov 16, 2023 at 5:05 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/11/17 0:24, Song Liu 写道:
> > + more folks.
> >
> > On Fri, Nov 10, 2023 at 7:00 PM Bhanu Victor DiCara
> > <00bvd0+linux@gmail.com> wrote:
> >>
> >> A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
> >>
> >>
> >> #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
> >> (The problem does not occur in the previous commit.)
> >>
> >> In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.
> >
> > The patch below should fix it. Please give it more thorough tests and
> > let me know whether it fixes everything. I will send patch later with
> > more details.
> >
> > Thanks,
> > Song
> >
> > diff --git i/drivers/md/md.c w/drivers/md/md.c
> > index 68f3bb6e89cb..d4fb1aa5c86f 100644
> > --- i/drivers/md/md.c
> > +++ w/drivers/md/md.c
> > @@ -8674,7 +8674,8 @@ static void md_end_clone_io(struct bio *bio)
> >          struct bio *orig_bio = md_io_clone->orig_bio;
> >          struct mddev *mddev = md_io_clone->mddev;
> >
> > -       orig_bio->bi_status = bio->bi_status;
> > +       if (bio->bi_status)
> > +               orig_bio->bi_status = bio->bi_status;
>
> I'm confused, do you mean that orig_bio can have error while bio
> doesn't? If this is the case, can you explain more how this is
> possible?

Yes, this is possible.

Basically, a big bio is split by md_submit_bio => bio_split_to_limits, so
we will have two bio's into md_clone_bio(). Let's call them
parent_orig_bio and split_orig_bio. Errors from split_orig_bio will be
propagated to parent_orig_bio by __bio_chain_endio(). If
parent_orig_bio succeeded, md_end_clone_io may overwrite the error
reported by split_orig_bio. Does this make sense?

Thanks,
Song

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.
  2023-11-18  0:32     ` Song Liu
@ 2023-11-20  1:04       ` Yu Kuai
  0 siblings, 0 replies; 8+ messages in thread
From: Yu Kuai @ 2023-11-20  1:04 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: Bhanu Victor DiCara, Xiao Ni, Mateusz Grzonka, linux-raid,
	regressions, jiangguoqing, jgq516, yukuai (C)

Hi,

在 2023/11/18 8:32, Song Liu 写道:
> On Thu, Nov 16, 2023 at 5:05 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/11/17 0:24, Song Liu 写道:
>>> + more folks.
>>>
>>> On Fri, Nov 10, 2023 at 7:00 PM Bhanu Victor DiCara
>>> <00bvd0+linux@gmail.com> wrote:
>>>>
>>>> A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
>>>>
>>>>
>>>> #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
>>>> (The problem does not occur in the previous commit.)
>>>>
>>>> In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.
>>>
>>> The patch below should fix it. Please give it more thorough tests and
>>> let me know whether it fixes everything. I will send patch later with
>>> more details.
>>>
>>> Thanks,
>>> Song
>>>
>>> diff --git i/drivers/md/md.c w/drivers/md/md.c
>>> index 68f3bb6e89cb..d4fb1aa5c86f 100644
>>> --- i/drivers/md/md.c
>>> +++ w/drivers/md/md.c
>>> @@ -8674,7 +8674,8 @@ static void md_end_clone_io(struct bio *bio)
>>>           struct bio *orig_bio = md_io_clone->orig_bio;
>>>           struct mddev *mddev = md_io_clone->mddev;
>>>
>>> -       orig_bio->bi_status = bio->bi_status;
>>> +       if (bio->bi_status)
>>> +               orig_bio->bi_status = bio->bi_status;
>>
>> I'm confused, do you mean that orig_bio can have error while bio
>> doesn't? If this is the case, can you explain more how this is
>> possible?
> 
> Yes, this is possible.
> 
> Basically, a big bio is split by md_submit_bio => bio_split_to_limits, so
> we will have two bio's into md_clone_bio(). Let's call them
> parent_orig_bio and split_orig_bio. Errors from split_orig_bio will be
> propagated to parent_orig_bio by __bio_chain_endio(). If
> parent_orig_bio succeeded, md_end_clone_io may overwrite the error
> reported by split_orig_bio. Does this make sense?

Yes, this is a good catch, there should be similiar conditions from
__bio_chain_endio().

Thanks,
Kuai

> 
> Thanks,
> Song
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-20  1:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-11  0:00 [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted Bhanu Victor DiCara
2023-11-11  6:25 ` Bagas Sanjaya
2023-11-13  0:41   ` Song Liu
2023-11-16 16:24 ` Song Liu
2023-11-17  1:05   ` Yu Kuai
2023-11-18  0:32     ` Song Liu
2023-11-20  1:04       ` Yu Kuai
2023-11-17  9:08   ` Xiao Ni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.