Re: [PATCH] btrfs: repair all bad mirrors

From: Nikolay Borisov <nborisov@suse.com>
To: Christoph Hellwig <hch@lst.de>,
	clm@fb.com, josef@toxicpanda.com, dsterba@suse.com
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: repair all bad mirrors
Date: Tue, 21 Jun 2022 18:19:19 +0300	[thread overview]
Message-ID: <b633dedf-b322-2d8c-adfb-8ab88af5652e@suse.com> (raw)
In-Reply-To: <20220619082821.2151052-1-hch@lst.de>


On 19.06.22 г. 11:28 ч., Christoph Hellwig wrote:
> When there is more than a single level of redundancy there can also be
> mutiple bad mirrors, and the current read repair code only repairs the
> last bad one.
> 
> Restructure btrfs_repair_one_sector so that it records the original bad
> bad mirror, and the number of copies, and then repair all bad copies
> until we reach the original bad copy in clean_io_failure.  Note that
> this also means the read repair reads will always start from the next
> bad mirror and not mirror 0.
> 
> This fixes btrfs/265 in xfstests.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 127 +++++++++++++++++++++----------------------
>   fs/btrfs/extent_io.h |   3 +-
>   2 files changed, 63 insertions(+), 67 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 854999c2e139b..9775ac5d28a9e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2432,6 +2432,20 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)
>   	return ret;
>   }
>   
> +static int next_mirror(struct io_failure_record *failrec, int cur_mirror)
> +{
> +	if (cur_mirror == failrec->num_copies)
> +		return cur_mirror + 1 - failrec->num_copies;
> +	return cur_mirror + 1;
> +}
> +
> +static int prev_mirror(struct io_failure_record *failrec, int cur_mirror)
> +{
> +	if (cur_mirror == 1)
> +		return failrec->num_copies;
> +	return cur_mirror - 1;
> +}
> +
>   /*
>    * each time an IO finishes, we do a fast check in the IO failure tree
>    * to see if we need to process or clean up an io_failure_record
> @@ -2444,7 +2458,7 @@ int clean_io_failure(struct btrfs_fs_info *fs_info,
>   	u64 private;
>   	struct io_failure_record *failrec;
>   	struct extent_state *state;
> -	int num_copies;
> +	int mirror;
>   	int ret;
>   
>   	private = 0;
> @@ -2468,20 +2482,19 @@ int clean_io_failure(struct btrfs_fs_info *fs_info,
>   					    EXTENT_LOCKED);
>   	spin_unlock(&io_tree->lock);
>   
> -	if (state && state->start <= failrec->start &&
> -	    state->end >= failrec->start + failrec->len - 1) {
> -		num_copies = btrfs_num_copies(fs_info, failrec->logical,
> -					      failrec->len);
> -		if (num_copies > 1)  {
> -			repair_io_failure(fs_info, ino, start, failrec->len,
> -					  failrec->logical, page, pg_offset,
> -					  failrec->failed_mirror);
> -		}
> -	}
> +	if (!state || state->start > failrec->start ||
> +	    state->end < failrec->start + failrec->len - 1)
> +		goto out;
> +
> +	mirror = failrec->this_mirror;
> +	do {
> +		mirror = prev_mirror(failrec, mirror);
> +		repair_io_failure(fs_info, ino, start, failrec->len,
> +				  failrec->logical, page, pg_offset, mirror);
> +	} while (mirror != failrec->orig_mirror);

But does this work as intended? Say we have a raid1c4 and we read from 
mirror 3 which is bad, in this case failrec->orig_mirror = 3 and 
->this_mirror = 4. The read from mirror 4 returns good data and 
clean_io_failure is called with mirror= 3 in which case only mirror 3 is 
repaired (assume 1/2 were also bad we don't know it yet, because the 
original bio request didn't submit to them based on the PID policy).

<snip>

> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index c0f1fb63eeae7..5bd23bb239dae 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -262,8 +262,9 @@ struct io_failure_record {
>   	u64 len;
>   	u64 logical;
>   	enum btrfs_compression_type compress_type;
> +	int orig_mirror;
nit: how about we name this failed_mirror
>   	int this_mirror;
nit: this name is also somewhat uninformative and one has to go and see 
how it's used in order to figure out the usage.
> -	int failed_mirror;
> +	int num_copies;
>   };
>   
>   int btrfs_repair_one_sector(struct inode *inode,