Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH] btrfs: raid56: data corruption on a device removal
@ 2018-12-12  0:25 Dmitriy Gorokh
  2018-12-12  9:09 ` Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dmitriy Gorokh @ 2018-12-12  0:25 UTC (permalink / raw)
  To: linux-btrfs

I found that RAID5 or RAID6 filesystem might be got corrupted in the following scenario:

1. Create 4 disks RAID6 filesystem
2. Preallocate 16 10Gb files
3. Run fio: 'fio --name=testload --directory=./ --size=10G --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256 --time_based --runtime=3600’
4. After few minutes pull out two drives: 'echo 1 > /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’

About 5 of 10 times the test is run, it led to silent data corruption of a random extent, resulting in ‘IO Error’ and ‘csum failed’ messages while trying to read the affected file. It usually affects only small portion of the files and only one underlying extent of a file. When I converted logical address of the damaged extent to physical address and dumped a stripe directly from drives, I saw specific pattern, always the same when the issue occurs.

I found that few bios which were being processed right during the drives removal, contained non zero bio->bi_iter.bi_done field despite of  EIO bi_status. bi_sector field was also increased from original one by that 'bi_done' value. Looks like this is a quite rare condition. Subsequently, in the raid_rmw_end_io handler that failed bio can be translated to a wrong stripe number and fail wrong rbio.


---
 fs/btrfs/raid56.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 3c8093757497..94ae70715195 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1451,6 +1451,9 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
        struct btrfs_bio_stripe *stripe;

        physical <<= 9;
+       // Since the failed bio can return partial data, bi_sector might be incremented
+       // by that value. We need to revert it back to the state before the bio was submitted.
+       physical -= bio->bi_iter.bi_done;

        for (i = 0; i < rbio->bbio->num_stripes; i++) {
                stripe = &rbio->bbio->stripes[i];
-- 
2.17.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] btrfs: raid56: data corruption on a device removal
  2018-12-12  0:25 [PATCH] btrfs: raid56: data corruption on a device removal Dmitriy Gorokh
@ 2018-12-12  9:09 ` Johannes Thumshirn
  2018-12-12 15:53 ` David Sterba
  2018-12-14 17:48 ` [PATCH v2] " Dmitriy Gorokh
  2 siblings, 0 replies; 11+ messages in thread
From: Johannes Thumshirn @ 2018-12-12  9:09 UTC (permalink / raw)
  To: Dmitriy Gorokh, linux-btrfs

On 12/12/2018 01:25, Dmitriy Gorokh wrote:
> I found that RAID5 or RAID6 filesystem might be got corrupted in the following scenario:
> 
> 1. Create 4 disks RAID6 filesystem
> 2. Preallocate 16 10Gb files
> 3. Run fio: 'fio --name=testload --directory=./ --size=10G --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256 --time_based --runtime=3600’
> 4. After few minutes pull out two drives: 'echo 1 > /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
> 
> About 5 of 10 times the test is run, it led to silent data corruption of a random extent, resulting in ‘IO Error’ and ‘csum failed’ messages while trying to read the affected file. It usually affects only small portion of the files and only one underlying extent of a file. When I converted logical address of the damaged extent to physical address and dumped a stripe directly from drives, I saw specific pattern, always the same when the issue occurs.
> 
> I found that few bios which were being processed right during the drives removal, contained non zero bio->bi_iter.bi_done field despite of  EIO bi_status. bi_sector field was also increased from original one by that 'bi_done' value. Looks like this is a quite rare condition. Subsequently, in the raid_rmw_end_io handler that failed bio can be translated to a wrong stripe number and fail wrong rbio.
> 
> 

Please wrap the lines in you commit message at 75 and provide a
Signed-off-by line, see [1].

[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html?highlight=signed%20off#sign-your-work-the-developer-s-certificate-of-origin

[...]

>         physical <<= 9;
> +       // Since the failed bio can return partial data, bi_sector might be incremented
> +       // by that value. We need to revert it back to the state before the bio was submitted.
> +       physical -= bio->bi_iter.bi_done;

Please no C++ Style comments.

Otherwise,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] btrfs: raid56: data corruption on a device removal
  2018-12-12  0:25 [PATCH] btrfs: raid56: data corruption on a device removal Dmitriy Gorokh
  2018-12-12  9:09 ` Johannes Thumshirn
@ 2018-12-12 15:53 ` David Sterba
  2018-12-14 17:48 ` [PATCH v2] " Dmitriy Gorokh
  2 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2018-12-12 15:53 UTC (permalink / raw)
  To: Dmitriy Gorokh; +Cc: linux-btrfs

On Wed, Dec 12, 2018 at 12:25:55AM +0000, Dmitriy Gorokh wrote:
> I found that RAID5 or RAID6 filesystem might be got corrupted in the following scenario:
> 
> 1. Create 4 disks RAID6 filesystem
> 2. Preallocate 16 10Gb files
> 3. Run fio: 'fio --name=testload --directory=./ --size=10G --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256 --time_based --runtime=3600’
> 4. After few minutes pull out two drives: 'echo 1 > /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
> 
> About 5 of 10 times the test is run, it led to silent data corruption
> of a random extent, resulting in ‘IO Error’ and ‘csum failed’ messages
> while trying to read the affected file. It usually affects only small
> portion of the files and only one underlying extent of a file. When I
> converted logical address of the damaged extent to physical address
> and dumped a stripe directly from drives, I saw specific pattern,
> always the same when the issue occurs.
> 
> I found that few bios which were being processed right during the
> drives removal, contained non zero bio->bi_iter.bi_done field despite
> of  EIO bi_status. bi_sector field was also increased from original
> one by that 'bi_done' value. Looks like this is a quite rare
> condition. Subsequently, in the raid_rmw_end_io handler that failed
> bio can be translated to a wrong stripe number and fail wrong rbio.

Thanks for the analysis, sounds correct to me and the fix too. Would be
good if you can attach the logs or portions of the dumps you've used to
understand the problem, like the pattern you mention above.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] btrfs: raid56: data corruption on a device removal
  2018-12-12  0:25 [PATCH] btrfs: raid56: data corruption on a device removal Dmitriy Gorokh
  2018-12-12  9:09 ` Johannes Thumshirn
  2018-12-12 15:53 ` David Sterba
@ 2018-12-14 17:48 ` " Dmitriy Gorokh
  2018-12-26  0:15   ` Liu Bo
  2019-01-04 16:49   ` David Sterba
  2 siblings, 2 replies; 11+ messages in thread
From: Dmitriy Gorokh @ 2018-12-14 17:48 UTC (permalink / raw)
  To: linux-btrfs

RAID5 or RAID6 filesystem might get corrupted in the following scenario:

1. Create 4 disks RAID6 filesystem
2. Preallocate 16 10Gb files
3. Run fio: 'fio --name=testload --directory=./ --size=10G
--numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256
--time_based --runtime=3600’
4. After few minutes pull out two drives: 'echo 1 >
/sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’

About 5 of 10 times the test is run, it led to a silent data
corruption of a random stripe, resulting in ‘IO Error’ and ‘csum
failed’ messages while trying to read the affected file. It usually
affects only small portion of the files.

It is possible that few bios which were being processed during the
drives removal, contained non zero bio->bi_iter.bi_done field despite
of EIO bi_status. bi_sector field was also increased from original one
by that 'bi_done' value. Looks like this is a quite rare condition.
Subsequently, in the raid_rmw_end_io handler that failed bio can be
translated to a wrong stripe number and fail wrong rbio.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dmitriy Gorokh <dmitriy.gorokh@wdc.com>
---
 fs/btrfs/raid56.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 3c8093757497..cd2038315feb 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1451,6 +1451,12 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
  struct btrfs_bio_stripe *stripe;

  physical <<= 9;
+ /*
+  * Since the failed bio can return partial data, bi_sector might be
+  * incremented by that value. We need to revert it back to the
+  * state before the bio was submitted.
+  */
+ physical -= bio->bi_iter.bi_done;

  for (i = 0; i < rbio->bbio->num_stripes; i++) {
  stripe = &rbio->bbio->stripes[i];
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2018-12-14 17:48 ` [PATCH v2] " Dmitriy Gorokh
@ 2018-12-26  0:15   ` Liu Bo
  2019-01-04 16:49   ` David Sterba
  1 sibling, 0 replies; 11+ messages in thread
From: Liu Bo @ 2018-12-26  0:15 UTC (permalink / raw)
  To: Dmitriy Gorokh; +Cc: linux-btrfs

On Fri, Dec 14, 2018 at 9:51 AM Dmitriy Gorokh <dmitriy.gorokh@gmail.com> wrote:
>
> RAID5 or RAID6 filesystem might get corrupted in the following scenario:
>
> 1. Create 4 disks RAID6 filesystem
> 2. Preallocate 16 10Gb files
> 3. Run fio: 'fio --name=testload --directory=./ --size=10G
> --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256
> --time_based --runtime=3600’
> 4. After few minutes pull out two drives: 'echo 1 >
> /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
>
> About 5 of 10 times the test is run, it led to a silent data
> corruption of a random stripe, resulting in ‘IO Error’ and ‘csum
> failed’ messages while trying to read the affected file. It usually
> affects only small portion of the files.
>
> It is possible that few bios which were being processed during the
> drives removal, contained non zero bio->bi_iter.bi_done field despite
> of EIO bi_status. bi_sector field was also increased from original one
> by that 'bi_done' value. Looks like this is a quite rare condition.
> Subsequently, in the raid_rmw_end_io handler that failed bio can be
> translated to a wrong stripe number and fail wrong rbio.
>

Looks good.

Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>

thanks,
liubo

> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Dmitriy Gorokh <dmitriy.gorokh@wdc.com>
> ---
>  fs/btrfs/raid56.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 3c8093757497..cd2038315feb 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1451,6 +1451,12 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
>   struct btrfs_bio_stripe *stripe;
>
>   physical <<= 9;
> + /*
> +  * Since the failed bio can return partial data, bi_sector might be
> +  * incremented by that value. We need to revert it back to the
> +  * state before the bio was submitted.
> +  */
> + physical -= bio->bi_iter.bi_done;
>
>   for (i = 0; i < rbio->bbio->num_stripes; i++) {
>   stripe = &rbio->bbio->stripes[i];
> --
> 2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2018-12-14 17:48 ` [PATCH v2] " Dmitriy Gorokh
  2018-12-26  0:15   ` Liu Bo
@ 2019-01-04 16:49   ` David Sterba
  2019-01-07 11:03     ` Johannes Thumshirn
  1 sibling, 1 reply; 11+ messages in thread
From: David Sterba @ 2019-01-04 16:49 UTC (permalink / raw)
  To: Dmitriy Gorokh; +Cc: linux-btrfs, jthumshirn

On Fri, Dec 14, 2018 at 08:48:50PM +0300, Dmitriy Gorokh wrote:
> RAID5 or RAID6 filesystem might get corrupted in the following scenario:
> 
> 1. Create 4 disks RAID6 filesystem
> 2. Preallocate 16 10Gb files
> 3. Run fio: 'fio --name=testload --directory=./ --size=10G
> --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256
> --time_based --runtime=3600’
> 4. After few minutes pull out two drives: 'echo 1 >
> /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
> 
> About 5 of 10 times the test is run, it led to a silent data
> corruption of a random stripe, resulting in ‘IO Error’ and ‘csum
> failed’ messages while trying to read the affected file. It usually
> affects only small portion of the files.
> 
> It is possible that few bios which were being processed during the
> drives removal, contained non zero bio->bi_iter.bi_done field despite
> of EIO bi_status. bi_sector field was also increased from original one
> by that 'bi_done' value. Looks like this is a quite rare condition.
> Subsequently, in the raid_rmw_end_io handler that failed bio can be
> translated to a wrong stripe number and fail wrong rbio.
> 
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Dmitriy Gorokh <dmitriy.gorokh@wdc.com>
> ---
>  fs/btrfs/raid56.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 3c8093757497..cd2038315feb 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1451,6 +1451,12 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
>   struct btrfs_bio_stripe *stripe;
> 
>   physical <<= 9;
> + /*
> +  * Since the failed bio can return partial data, bi_sector might be
> +  * incremented by that value. We need to revert it back to the
> +  * state before the bio was submitted.
> +  */
> + physical -= bio->bi_iter.bi_done;

The bi_done member has been removed in recent block layer changes
commit 7759eb23fd9808a2e4498cf36a798ed65cde78ae ("block: remove
bio_rewind_iter()"). I wonder what kind of block-magic do we need to do
as the iterators seem to be local and there's nothing available in the
call chain leading to find_bio_stripe. Johannes, any ideas?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2019-01-04 16:49   ` David Sterba
@ 2019-01-07 11:03     ` Johannes Thumshirn
  2019-01-07 15:34       ` David Sterba
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2019-01-07 11:03 UTC (permalink / raw)
  To: dsterba, Dmitriy Gorokh, linux-btrfs; +Cc: Ming Lei

[+Cc Ming and full quote for reference]

On 04/01/2019 17:49, David Sterba wrote:
> On Fri, Dec 14, 2018 at 08:48:50PM +0300, Dmitriy Gorokh wrote:
>> RAID5 or RAID6 filesystem might get corrupted in the following scenario:
>>
>> 1. Create 4 disks RAID6 filesystem
>> 2. Preallocate 16 10Gb files
>> 3. Run fio: 'fio --name=testload --directory=./ --size=10G
>> --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256
>> --time_based --runtime=3600’
>> 4. After few minutes pull out two drives: 'echo 1 >
>> /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
>>
>> About 5 of 10 times the test is run, it led to a silent data
>> corruption of a random stripe, resulting in ‘IO Error’ and ‘csum
>> failed’ messages while trying to read the affected file. It usually
>> affects only small portion of the files.
>>
>> It is possible that few bios which were being processed during the
>> drives removal, contained non zero bio->bi_iter.bi_done field despite
>> of EIO bi_status. bi_sector field was also increased from original one
>> by that 'bi_done' value. Looks like this is a quite rare condition.
>> Subsequently, in the raid_rmw_end_io handler that failed bio can be
>> translated to a wrong stripe number and fail wrong rbio.
>>
>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>> Signed-off-by: Dmitriy Gorokh <dmitriy.gorokh@wdc.com>
>> ---
>>  fs/btrfs/raid56.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>> index 3c8093757497..cd2038315feb 100644
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -1451,6 +1451,12 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
>>   struct btrfs_bio_stripe *stripe;
>>
>>   physical <<= 9;
>> + /*
>> +  * Since the failed bio can return partial data, bi_sector might be
>> +  * incremented by that value. We need to revert it back to the
>> +  * state before the bio was submitted.
>> +  */
>> + physical -= bio->bi_iter.bi_done;
> 
> The bi_done member has been removed in recent block layer changes
> commit 7759eb23fd9808a2e4498cf36a798ed65cde78ae ("block: remove
> bio_rewind_iter()"). I wonder what kind of block-magic do we need to do
> as the iterators seem to be local and there's nothing available in the
> call chain leading to find_bio_stripe. Johannes, any ideas?

Right, what we could do here is go the same way Ming did in
7759eb23fd980 ("block: remove bio_rewind_iter()") and save a bvec_iter
somewhere before submission and then see if we returned partial data,
but this somehow feels wrong to me (at least to do in btrfs code instead
of the block layer).

Ming can we resurrect ->bi_done, or do you have a different suggestion
for finding about partial written bios?

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2019-01-07 11:03     ` Johannes Thumshirn
@ 2019-01-07 15:34       ` David Sterba
  2019-01-10 16:49         ` Johannes Thumshirn
  0 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2019-01-07 15:34 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: dsterba, Dmitriy Gorokh, linux-btrfs, Ming Lei

On Mon, Jan 07, 2019 at 12:03:43PM +0100, Johannes Thumshirn wrote:
> >> + /*
> >> +  * Since the failed bio can return partial data, bi_sector might be
> >> +  * incremented by that value. We need to revert it back to the
> >> +  * state before the bio was submitted.
> >> +  */
> >> + physical -= bio->bi_iter.bi_done;
> > 
> > The bi_done member has been removed in recent block layer changes
> > commit 7759eb23fd9808a2e4498cf36a798ed65cde78ae ("block: remove
> > bio_rewind_iter()"). I wonder what kind of block-magic do we need to do
> > as the iterators seem to be local and there's nothing available in the
> > call chain leading to find_bio_stripe. Johannes, any ideas?
> 
> Right, what we could do here is go the same way Ming did in
> 7759eb23fd980 ("block: remove bio_rewind_iter()") and save a bvec_iter
> somewhere before submission and then see if we returned partial data,
> but this somehow feels wrong to me (at least to do in btrfs code instead
> of the block layer).

> Ming can we resurrect ->bi_done, or do you have a different suggestion
> for finding about partial written bios?

I don't think bi_done can be resurrected
(https://marc.info/?l=linux-block&m=153549921116441&w=2) but I still am
not sure that saving the iterator is the right thing (or at least how to
do that right, not that the whole idea is wrong).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2019-01-07 15:34       ` David Sterba
@ 2019-01-10 16:49         ` Johannes Thumshirn
  2019-01-11  8:08           ` Johannes Thumshirn
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2019-01-10 16:49 UTC (permalink / raw)
  To: dsterba, Dmitriy Gorokh, linux-btrfs, Ming Lei

On 07/01/2019 16:34, David Sterba wrote:
> On Mon, Jan 07, 2019 at 12:03:43PM +0100, Johannes Thumshirn wrote:
>>>> + /*
>>>> +  * Since the failed bio can return partial data, bi_sector might be
>>>> +  * incremented by that value. We need to revert it back to the
>>>> +  * state before the bio was submitted.
>>>> +  */
>>>> + physical -= bio->bi_iter.bi_done;
>>>
>>> The bi_done member has been removed in recent block layer changes
>>> commit 7759eb23fd9808a2e4498cf36a798ed65cde78ae ("block: remove
>>> bio_rewind_iter()"). I wonder what kind of block-magic do we need to do
>>> as the iterators seem to be local and there's nothing available in the
>>> call chain leading to find_bio_stripe. Johannes, any ideas?
>>
>> Right, what we could do here is go the same way Ming did in
>> 7759eb23fd980 ("block: remove bio_rewind_iter()") and save a bvec_iter
>> somewhere before submission and then see if we returned partial data,
>> but this somehow feels wrong to me (at least to do in btrfs code instead
>> of the block layer).
> 
>> Ming can we resurrect ->bi_done, or do you have a different suggestion
>> for finding about partial written bios?
> 
> I don't think bi_done can be resurrected
> (https://marc.info/?l=linux-block&m=153549921116441&w=2) but I still am
> not sure that saving the iterator is the right thing (or at least how to
> do that right, not that the whole idea is wrong).

Right, but we're looking for the number of already completed bytes to
rewind here, so from bvec.h's docs it is bi_bvec_done.

Dmitriy can you see if this works for you:

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index e74455eb42f9..2d0e2eec5413 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1350,6 +1350,7 @@ static int find_bio_stripe(struct btrfs_raid_bio
*rbio,
        struct btrfs_bio_stripe *stripe;

        physical <<= 9;
+       physical -= bio->bi_iter.bi_bvec_done;

        for (i = 0; i < rbio->bbio->num_stripes; i++) {
                stripe = &rbio->bbio->stripes[i];


-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2019-01-10 16:49         ` Johannes Thumshirn
@ 2019-01-11  8:08           ` Johannes Thumshirn
  2019-01-11  9:26             ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2019-01-11  8:08 UTC (permalink / raw)
  To: dsterba, Dmitriy Gorokh, linux-btrfs, Ming Lei

On 10/01/2019 17:49, Johannes Thumshirn wrote:
> Right, but we're looking for the number of already completed bytes to
> rewind here, so from bvec.h's docs it is bi_bvec_done.
> 
> Dmitriy can you see if this works for you:
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index e74455eb42f9..2d0e2eec5413 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1350,6 +1350,7 @@ static int find_bio_stripe(struct btrfs_raid_bio
> *rbio,
>         struct btrfs_bio_stripe *stripe;
> 
>         physical <<= 9;
> +       physical -= bio->bi_iter.bi_bvec_done;

OK talked to Hannes about this issue, he says the only way is to save
the iterator state before submitting the bio.

So the above is bogus too.

This also is what Kent said in
https://marc.info/?l=linux-block&m=153549921116441&w=2


-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: raid56: data corruption on a device removal
  2019-01-11  8:08           ` Johannes Thumshirn
@ 2019-01-11  9:26             ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2019-01-11  9:26 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: dsterba, Dmitriy Gorokh, linux-btrfs

On Fri, Jan 11, 2019 at 09:08:27AM +0100, Johannes Thumshirn wrote:
> On 10/01/2019 17:49, Johannes Thumshirn wrote:
> > Right, but we're looking for the number of already completed bytes to
> > rewind here, so from bvec.h's docs it is bi_bvec_done.
> > 
> > Dmitriy can you see if this works for you:
> > 
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index e74455eb42f9..2d0e2eec5413 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -1350,6 +1350,7 @@ static int find_bio_stripe(struct btrfs_raid_bio
> > *rbio,
> >         struct btrfs_bio_stripe *stripe;
> > 
> >         physical <<= 9;
> > +       physical -= bio->bi_iter.bi_bvec_done;
> 
> OK talked to Hannes about this issue, he says the only way is to save
> the iterator state before submitting the bio.
> 
> So the above is bogus too.
> 
> This also is what Kent said in
> https://marc.info/?l=linux-block&m=153549921116441&w=2

Yeah, .bi_bvec_done can't help, seems you need to take the similar
approach in 7759eb23fd9808a2e ("block: remove bio_rewind_iter()").

In theory, the follow way might work, but still like a hack:

	#define BVEC_ITER_ALL_INIT (struct bvec_iter)                  \
	{                                                              \
	        .bi_sector      = 0,                                   \
	        .bi_size        = UINT_MAX,                            \
	        .bi_idx         = 0,                                   \
	        .bi_bvec_done   = 0,                                   \
	}
	
	old_bi_sector = bio->bi_iter.bi_sector;
	bio->bi_iter = BVEC_ITER_ALL_INIT;
	bio->bi_iter.bi_sector = old_bi_sector;
	bio_advance(bio, where_to_rewind);

This bio can't be the fast-cloned one, which should be satisfied
by fs code.

Also 'where_to_rewind' needs to point to the offset relative to start
of this bio.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12  0:25 [PATCH] btrfs: raid56: data corruption on a device removal Dmitriy Gorokh
2018-12-12  9:09 ` Johannes Thumshirn
2018-12-12 15:53 ` David Sterba
2018-12-14 17:48 ` [PATCH v2] " Dmitriy Gorokh
2018-12-26  0:15   ` Liu Bo
2019-01-04 16:49   ` David Sterba
2019-01-07 11:03     ` Johannes Thumshirn
2019-01-07 15:34       ` David Sterba
2019-01-10 16:49         ` Johannes Thumshirn
2019-01-11  8:08           ` Johannes Thumshirn
2019-01-11  9:26             ` Ming Lei

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox