All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, dsterba@suse.com
Cc: hare@suse.com, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	Josef Bacik <josef@toxicpanda.com>
Subject: [PATCH v14 38/42] btrfs: relocate block group to repair IO failure in ZONED
Date: Tue, 26 Jan 2021 11:25:16 +0900	[thread overview]
Message-ID: <09a953f3fc7f068588250ecee58a529f04279df7.1611627788.git.naohiro.aota@wdc.com> (raw)
In-Reply-To: <cover.1611627788.git.naohiro.aota@wdc.com>

When btrfs find a checksum error and if the file system has a mirror of the
damaged data, btrfs read the correct data from the mirror and write the
data to damaged blocks. This repairing, however, is against the sequential
write required rule.

We can consider three methods to repair an IO failure in ZONED mode:
(1) Reset and rewrite the damaged zone
(2) Allocate new device extent and replace the damaged device extent to the
    new extent
(3) Relocate the corresponding block group

Method (1) is most similar to a behavior done with regular devices.
However, it also wipes non-damaged data in the same device extent, and so
it unnecessary degrades non-damaged data.

Method (2) is much like device replacing but done in the same device. It is
safe because it keeps the device extent until the replacing finish.
However, extending device replacing is non-trivial. It assumes
"src_dev->physical == dst_dev->physical". Also, the extent mapping
replacing function should be extended to support replacing device extent
position in one device.

Method (3) invokes relocation of the damaged block group, so it is
straightforward to implement. It relocates all the mirrored device extents,
so it is, potentially, a more costly operation than method (1) or (2). But
it relocates only using extents which reduce the total IO size.

Let's apply method (3) for now. In the future, we can extend device-replace
and apply method (2).

For protecting a block group gets relocated multiple time with multiple IO
errors, this commit introduces "relocating_repair" bit to show it's now
relocating to repair IO failures. Also it uses a new kthread
"btrfs-relocating-repair", not to block IO path with relocating process.

This commit also supports repairing in the scrub process.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.h |  1 +
 fs/btrfs/extent_io.c   |  3 ++
 fs/btrfs/scrub.c       |  3 ++
 fs/btrfs/volumes.c     | 71 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h     |  1 +
 5 files changed, 79 insertions(+)

diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 3dec66ed36cb..36654bcd2a83 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -96,6 +96,7 @@ struct btrfs_block_group {
 	unsigned int has_caching_ctl:1;
 	unsigned int removed:1;
 	unsigned int to_copy:1;
+	unsigned int relocating_repair:1;
 
 	int disk_cache_state;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8de609d1897a..49c7bf78c82e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2259,6 +2259,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 	ASSERT(!(fs_info->sb->s_flags & SB_RDONLY));
 	BUG_ON(!mirror_num);
 
+	if (btrfs_is_zoned(fs_info))
+		return btrfs_repair_one_zone(fs_info, logical);
+
 	bio = btrfs_io_bio_alloc(1);
 	bio->bi_iter.bi_size = 0;
 	map_length = length;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2f577f3b1c31..d0c47ef72d46 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -857,6 +857,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	have_csum = sblock_to_check->pagev[0]->have_csum;
 	dev = sblock_to_check->pagev[0]->dev;
 
+	if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace)
+		return btrfs_repair_one_zone(fs_info, logical);
+
 	/*
 	 * We must use GFP_NOFS because the scrub task might be waiting for a
 	 * worker task executing this function and in turn a transaction commit
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a99735dda515..0f6a79e67666 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7990,3 +7990,74 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr)
 	spin_unlock(&fs_info->swapfile_pins_lock);
 	return node != NULL;
 }
+
+static int relocating_repair_kthread(void *data)
+{
+	struct btrfs_block_group *cache = (struct btrfs_block_group *) data;
+	struct btrfs_fs_info *fs_info = cache->fs_info;
+	u64 target;
+	int ret = 0;
+
+	target = cache->start;
+	btrfs_put_block_group(cache);
+
+	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) {
+		btrfs_info(fs_info,
+			   "zoned: skip relocating block group %llu to repair: EBUSY",
+			   target);
+		return -EBUSY;
+	}
+
+	mutex_lock(&fs_info->delete_unused_bgs_mutex);
+
+	/* Ensure Block Group still exists */
+	cache = btrfs_lookup_block_group(fs_info, target);
+	if (!cache)
+		goto out;
+
+	if (!cache->relocating_repair)
+		goto out;
+
+	ret = btrfs_may_alloc_data_chunk(fs_info, target);
+	if (ret < 0)
+		goto out;
+
+	btrfs_info(fs_info, "zoned: relocating block group %llu to repair IO failure",
+		   target);
+	ret = btrfs_relocate_chunk(fs_info, target);
+
+out:
+	if (cache)
+		btrfs_put_block_group(cache);
+	mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+	btrfs_exclop_finish(fs_info);
+
+	return ret;
+}
+
+int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical)
+{
+	struct btrfs_block_group *cache;
+
+	/* Do not attempt to repair in degraded state */
+	if (btrfs_test_opt(fs_info, DEGRADED))
+		return 0;
+
+	cache = btrfs_lookup_block_group(fs_info, logical);
+	if (!cache)
+		return 0;
+
+	spin_lock(&cache->lock);
+	if (cache->relocating_repair) {
+		spin_unlock(&cache->lock);
+		btrfs_put_block_group(cache);
+		return 0;
+	}
+	cache->relocating_repair = 1;
+	spin_unlock(&cache->lock);
+
+	kthread_run(relocating_repair_kthread, cache,
+		    "btrfs-relocating-repair");
+
+	return 0;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 0bcf87a9e594..54f475e0c702 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -597,5 +597,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 int btrfs_bg_type_to_factor(u64 flags);
 const char *btrfs_bg_type_to_raid_name(u64 flags);
 int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
+int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
 
 #endif
-- 
2.27.0


  parent reply	other threads:[~2021-01-26 19:22 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26  2:24 [PATCH v14 00/42] btrfs: zoned block device support Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 01/42] block: add bio_add_zone_append_page Naohiro Aota
2021-01-26 16:08   ` Jens Axboe
2021-01-26  2:24 ` [PATCH v14 02/42] iomap: support REQ_OP_ZONE_APPEND Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 03/42] btrfs: defer loading zone info after opening trees Naohiro Aota
2021-01-30 22:09   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 04/42] btrfs: use regular SB location on emulated zoned mode Naohiro Aota
2021-01-30 22:28   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 05/42] btrfs: release path before calling into btrfs_load_block_group_zone_info Naohiro Aota
2021-01-30 23:21   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 06/42] btrfs: do not load fs_info->zoned from incompat flag Naohiro Aota
2021-01-30 23:40   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 07/42] btrfs: disallow fitrim in ZONED mode Naohiro Aota
2021-01-30 23:44   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 08/42] btrfs: allow zoned mode on non-zoned block devices Naohiro Aota
2021-01-31  1:17   ` Anand Jain
2021-02-01 11:06     ` Johannes Thumshirn
2021-02-02  1:49   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 09/42] btrfs: implement zoned chunk allocator Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 10/42] btrfs: verify device extent is aligned to zone Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 11/42] btrfs: load zone's allocation offset Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 12/42] btrfs: calculate allocation offset for conventional zones Naohiro Aota
2021-01-27 18:03   ` Josef Bacik
2021-02-03  5:19   ` Anand Jain
2021-02-03  6:10     ` Damien Le Moal
2021-02-03  6:56       ` Anand Jain
2021-02-03  7:10         ` Damien Le Moal
2021-01-26  2:24 ` [PATCH v14 13/42] btrfs: track unusable bytes for zones Naohiro Aota
2021-01-27 18:06   ` Josef Bacik
2021-01-26  2:24 ` [PATCH v14 14/42] btrfs: do sequential extent allocation in ZONED mode Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 15/42] btrfs: redirty released extent buffers " Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 16/42] btrfs: advance allocation pointer after tree log node Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 17/42] btrfs: enable to mount ZONED incompat flag Naohiro Aota
2021-01-31 12:21   ` Anand Jain
2021-01-26  2:24 ` [PATCH v14 18/42] btrfs: reset zones of unused block groups Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 19/42] btrfs: extract page adding function Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 20/42] btrfs: use bio_add_zone_append_page for zoned btrfs Naohiro Aota
2021-01-26  2:24 ` [PATCH v14 21/42] btrfs: handle REQ_OP_ZONE_APPEND as writing Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 22/42] btrfs: split ordered extent when bio is sent Naohiro Aota
2021-01-27 19:00   ` Josef Bacik
2021-01-26  2:25 ` [PATCH v14 23/42] btrfs: check if bio spans across an ordered extent Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 24/42] btrfs: extend btrfs_rmap_block for specifying a device Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 25/42] btrfs: cache if block-group is on a sequential zone Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 26/42] btrfs: save irq flags when looking up an ordered extent Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 27/42] btrfs: use ZONE_APPEND write for ZONED btrfs Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 28/42] btrfs: enable zone append writing for direct IO Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 29/42] btrfs: introduce dedicated data write path for ZONED mode Naohiro Aota
2021-02-02 15:00   ` David Sterba
2021-02-04  8:25     ` Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 30/42] btrfs: serialize meta IOs on " Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 31/42] btrfs: wait existing extents before truncating Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 32/42] btrfs: avoid async metadata checksum on ZONED mode Naohiro Aota
2021-02-02 14:54   ` David Sterba
2021-02-02 16:50     ` Johannes Thumshirn
2021-02-02 19:28       ` David Sterba
2021-01-26  2:25 ` [PATCH v14 33/42] btrfs: mark block groups to copy for device-replace Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 34/42] btrfs: implement cloning for ZONED device-replace Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 35/42] btrfs: implement copying " Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 36/42] btrfs: support dev-replace in ZONED mode Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 37/42] btrfs: enable relocation " Naohiro Aota
2021-01-26  2:25 ` Naohiro Aota [this message]
2021-01-26  2:25 ` [PATCH v14 39/42] btrfs: split alloc_log_tree() Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 40/42] btrfs: extend zoned allocator to use dedicated tree-log block group Naohiro Aota
2021-01-26  2:25 ` [PATCH v14 41/42] btrfs: serialize log transaction on ZONED mode Naohiro Aota
2021-01-27 19:01   ` Josef Bacik
2021-02-01 15:48   ` Filipe Manana
2021-01-26  2:25 ` [PATCH v14 42/42] btrfs: reorder log node allocation Naohiro Aota
2021-02-01 15:48   ` Filipe Manana
2021-02-01 15:54     ` Johannes Thumshirn
2021-01-29  7:56 ` [PATCH v14 00/42] btrfs: zoned block device support Johannes Thumshirn
2021-01-29 20:44   ` David Sterba
2021-01-30 11:30     ` Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09a953f3fc7f068588250ecee58a529f04279df7.1611627788.git.naohiro.aota@wdc.com \
    --to=naohiro.aota@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=hch@infradead.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.