All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption
@ 2016-10-17  1:27 Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 01/14] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Qu Wenruo
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

***Just RFC patch for early evaluation, please don't merge it***

For any one who wants to try it, it can be get from my repo:
https://github.com/adam900710/btrfs-progs/tree/fsck_scrub

Currently, I only tested it on SINGLE/DUP/RAID1/RAID5 filesystems, with
mirror or parity or data corrupted.
The tool are all able to detect them and give recoverbility report.

Several reports on kernel scrub screwing up good data stripes are in ML
for sometime.

The reason seems to be lack of csum check before and after reconstruction, and
unfinished parity write seems also involved.

To get a comparable tool for kernel scrub, we need a user-space tool to
act as benchmark to compare their different behaviors.

So here is the RFC patch set for user-space scrub.

Which can do:

1) All mirror/backup check for non-parity based stripe
   Which means for RAID1/DUP/RAID10, we can really check all mirrors
   other than the 1st good mirror.

   Current "--check-data-csum" option will be finally replace by scrub.
   As it doesn't really check all mirrors, if it hits a good copy, then
   resting copies will just be ignored.

2) Comprehensive RAID5 full stripe check
   It will check csum before reconstruction using parity.
   And if too many data stripes has csum mismatch, no need to
   reconstruct anyway.

   And after reconstruction, it will also check the csum of recovered
   data against csum, to ensure we didn't recover wrong result.

   For all csum match case, will re-calculate parity and compare it with
   ondisk parity, to detect parity error.

In fact, it can already expose one new btrfs kernel bug.
For example, after screwing up a data stripe, kernel did repairs using
parity, but recovered full stripe has wrong parity.
Need to scrub again to fix it.

And this patchset also introduced new map_block() function, which is
more flex than current btrfs_map_block(), and has a unified interface
for all profiles.
Check the 1st and 2nd patch for details.

They are already used in RAID5/6 scrub, but can also be used for other
profiles too.

Since it's just an evaluation patchset, it still has a long to-do list:

1) Repair support
   In fact, current tool can already report recoverability, repair is
   not hard to implement.

2) RAID6 support
   The mathematics behind RAID6 recover is more complex than RAID5.
   Need some more code to make it possible to recover data stripes,
   other than just calculating Q and P.

3) Test cases
   Need to make the infrastructure able to handle multi-device first.

4) Cleaner code and refined logical
   Need a better shared logical for all profiles to do scrub, and use
   new map_block_v2() to replace these old codes.

5) Make btrfsck able to handle RAID5 with missing device
   Now it doesn't even open RAID5 btrfs with missing device, even thouth
   scrub should be able to handle it.

Qu Wenruo (14):
  btrfs-progs: Introduce new btrfs_map_block function which returns more
    unified result.
  btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  btrfs-progs: check/csum: Introduce function to read out one data csum
  btrfs-progs: check/scrub: Introduce structures to support fsck scrub
  btrfs-progs: check/scrub: Introduce function to scrub mirror based
    tree block
  btrfs-progs: check/scrub: Introduce function to scrub mirror based
    data blocks
  btrfs-progs: check/scrub: Introduce function to scrub one extent
  btrfs-progs: check/scrub: Introduce function to scrub one data stripe
  btrfs-progs: check/scrub: Introduce function to verify parities
  btrfs-progs: extent-tree: Introduce function to check if there is any
    extent in given range.
  btrfs-progs: check/scrub: Introduce function to recover data parity
  btrfs-progs: check/scrub: Introduce a function to scrub one full
    stripe
  btrfs-progs: check/scrub: Introduce function to check a whole block
    group
  btrfs-progs: fsck: Introduce offline scrub function

 Documentation/btrfs-check.asciidoc |   8 +
 Makefile.in                        |   6 +-
 check/check.h                      |  23 ++
 check/csum.c                       |  96 +++++
 check/scrub.c                      | 812 +++++++++++++++++++++++++++++++++++++
 cmds-check.c                       |  12 +-
 ctree.h                            |   2 +
 disk-io.c                          |   4 +-
 disk-io.h                          |   2 +
 extent-tree.c                      |  52 +++
 volumes.c                          | 282 +++++++++++++
 volumes.h                          |  49 +++
 12 files changed, 1343 insertions(+), 5 deletions(-)
 create mode 100644 check/check.h
 create mode 100644 check/csum.c
 create mode 100644 check/scrub.c

-- 
2.10.0




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 01/14] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result.
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Qu Wenruo
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, __btrfs_map_block_v2().

Unlike old btrfs_map_block(), which needs different parameter to handle
different RAID profile, this new function uses unified btrfs_map_block
structure to handle all RAID profile in a more meaningful method:

Return physical address along with logical address for each stripe.

For RAID1/Single/DUP (none-stripped):
result would be like:
Map block: Logical 128M, Len 10M, Type RAID1, Stripe len 0, Nr_stripes 2
Stripe 0: Logical 128M, Physical X, Len: 10M Dev dev1
Stripe 1: Logical 128M, Physical Y, Len: 10M Dev dev2

Result will be as long as possible, since it's not stripped at all.

For RAID0/10 (stripped without parity):
Result will be aligned to full stripe size:
Map block: Logical 64K, Len 128K, Type RAID10, Stripe len 64K, Nr_stripes 4
Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
Stripe 1: Logical 64K, Physical Y, Len 64K Dev dev2
Stripe 2: Logical 128K, Physical Z, Len 64K Dev dev3
Stripe 3: Logical 128K, Physical W, Len 64K Dev dev4

For RAID5/6 (stripped with parity and dev-rotation)
Result will be aligned to full stripe size:
Map block: Logical 64K, Len 128K, Type RAID6, Stripe len 64K, Nr_stripes 4
Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
Stripe 1: Logical 128K, Physical Y, Len 64K Dev dev2
Stripe 2: Logical RAID5_P, Physical Z, Len 64K Dev dev3
Stripe 3: Logical RAID6_Q, Physical W, Len 64K Dev dev4

The new unified layout should be very flex and can even handle things
like N-way RAID1 (which old mirror_num basic one can't handle well).

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 volumes.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 volumes.h |  49 +++++++++++++++++
 2 files changed, 230 insertions(+)

diff --git a/volumes.c b/volumes.c
index a7abd92..94f3e42 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1542,6 +1542,187 @@ out:
 	return 0;
 }
 
+static inline struct btrfs_map_block *alloc_map_block(int num_stripes)
+{
+	struct btrfs_map_block *ret;
+	int size;
+
+	size = sizeof(struct btrfs_map_stripe) * num_stripes +
+		sizeof(struct btrfs_map_block);
+	ret = malloc(size);
+	if (!ret)
+		return NULL;
+	memset(ret, 0, size);
+	return ret;
+}
+
+static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
+			       struct btrfs_map_block *map_block)
+{
+	u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+	u64 bg_start = map->ce.start;
+	u64 bg_end = bg_start + map->ce.size;
+	u64 bg_offset = start - bg_start; /* offset inside the block group */
+	u64 fstripe_logical = 0;	/* Full stripe start logical bytenr */
+	u64 fstripe_size = 0;		/* Full stripe logical size */
+	u64 fstripe_phy_off = 0;	/* Full stripe offset in each dev */
+	u32 stripe_len = map->stripe_len;
+	int sub_stripes = map->sub_stripes;
+	int data_stripes = nr_data_stripes(map);
+	int dev_rotation;
+	int i;
+
+	map_block->num_stripes = map->num_stripes;
+	map_block->type = profile;
+
+	/*
+	 * Common full stripe data for stripe based profiles
+	 */
+	if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10 |
+		       BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+		fstripe_size = stripe_len * data_stripes;
+		if (sub_stripes)
+			fstripe_size /= sub_stripes;
+		fstripe_logical = round_down(bg_offset, fstripe_size) +
+				    bg_start;
+		fstripe_phy_off = bg_offset / fstripe_size * stripe_len;
+	}
+
+	switch (profile) {
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID1:
+	case 0: /* SINGLE */
+		/*
+		 * None-stripe mode,(Single, DUP and RAID1)
+		 * Just use offset to fill map_block
+		 */
+		map_block->stripe_len = 0;
+		map_block->start = start;
+		map_block->length = min(bg_end, start + length) - start;
+		for (i = 0; i < map->num_stripes; i++) {
+			struct btrfs_map_stripe *stripe;
+
+			stripe = &map_block->stripes[i];
+
+			stripe->dev = map->stripes[i].dev;
+			stripe->logical = start;
+			stripe->physical = map->stripes[i].physical + bg_offset;
+			stripe->length = map_block->length;
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID10:
+	case BTRFS_BLOCK_GROUP_RAID0:
+		/*
+		 * Stripe modes without parity(0 and 10)
+		 * Return the whole full stripe
+		 */
+
+		map_block->start = fstripe_logical;
+		map_block->length = fstripe_size;
+		map_block->stripe_len = map->stripe_len;
+		for (i = 0; i < map->num_stripes; i++) {
+			struct btrfs_map_stripe *stripe;
+			u64 cur_offset;
+
+			/* Handle RAID10 sub stripes */
+			if (sub_stripes)
+				cur_offset = i / sub_stripes * stripe_len;
+			else
+				cur_offset = stripe_len * i;
+			stripe = &map_block->stripes[i];
+
+			stripe->dev = map->stripes[i].dev;
+			stripe->logical = fstripe_logical + cur_offset;
+			stripe->length = stripe_len;
+			stripe->physical = map->stripes[i].physical +
+					   fstripe_phy_off;
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID6:
+		/*
+		 * Stripe modes with parity and device rotation(5 and 6)
+		 *
+		 * Return the whole full stripe
+		 */
+
+		dev_rotation = (bg_offset / fstripe_size) % map->num_stripes;
+
+		map_block->start = fstripe_logical;
+		map_block->length = fstripe_size;
+		map_block->stripe_len = map->stripe_len;
+		for (i = 0; i < map->num_stripes; i++) {
+			struct btrfs_map_stripe *stripe;
+			int dest_index;
+			u64 cur_offset = stripe_len * i;
+
+			stripe = &map_block->stripes[i];
+
+			dest_index = (i + dev_rotation) % map->num_stripes;
+			stripe->dev = map->stripes[dest_index].dev;
+			stripe->length = stripe_len;
+			stripe->physical = map->stripes[dest_index].physical +
+					   fstripe_phy_off;
+			if (i < data_stripes) {
+				/* data stripe */
+				stripe->logical = fstripe_logical +
+						  cur_offset;
+			} else if (i == data_stripes) {
+				/* P */
+				stripe->logical = BTRFS_RAID5_P_STRIPE;
+			} else {
+				/* Q */
+				stripe->logical = BTRFS_RAID6_Q_STRIPE;
+			}
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
+			 u64 length, struct btrfs_map_block **map_ret)
+{
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	struct btrfs_map_block *map_block;
+	int ret;
+
+	/* Eearly parameter check */
+	if (!length || !map_ret) {
+		error("wrong parameter for %s", __func__);
+		return -EINVAL;
+	}
+
+	ce = search_cache_extent(&fs_info->mapping_tree.cache_tree, logical);
+	if (!ce)
+		return -ENOENT;
+	if (ce->start > logical)
+		return -ENOENT;
+
+	map = container_of(ce, struct map_lookup, ce);
+	/*
+	 * Allocate a full map_block anyway
+	 *
+	 * For write, we need the full map_block anyway.
+	 * For read, it will be striped to the needed stripe before returning.
+	 */
+	map_block = alloc_map_block(map->num_stripes);
+	if (!map_block)
+		return -ENOMEM;
+	ret = fill_full_map_block(map, logical, length, map_block);
+	if (ret < 0) {
+		free(map_block);
+		return ret;
+	}
+	/* TODO: Remove unrelated map_stripes for READ operation */
+
+	*map_ret = map_block;
+	return 0;
+}
+
 struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid,
 				       u8 *uuid, u8 *fsid)
 {
diff --git a/volumes.h b/volumes.h
index d7b7d3c..82b8757 100644
--- a/volumes.h
+++ b/volumes.h
@@ -108,6 +108,51 @@ struct map_lookup {
 	struct btrfs_bio_stripe stripes[];
 };
 
+struct btrfs_map_stripe {
+	struct btrfs_device *dev;
+
+	/*
+	 * Logical address of the stripe start.
+	 * Caller should check if this logical is the desired map start.
+	 * It's possible that the logical is smaller or larger than desired
+	 * map range.
+	 *
+	 * For P/Q stipre, it will be BTRFS_RAID5_P_STRIPE
+	 * and BTRFS_RAID6_Q_STRIPE.
+	 */
+	u64 logical;
+
+	u64 physical;
+
+	/* The length of the stripe */
+	u64 length;
+};
+
+struct btrfs_map_block {
+	/*
+	 * The logical start of the whole map block.
+	 * For RAID5/6 it will be the bytenr of the full stripe start,
+	 * so it's possible that @start is smaller than desired map range
+	 * start.
+	 */
+	u64 start;
+
+	/*
+	 * The logical length of the map block.
+	 * For RAID5/6 it will be total data stripe size
+	 */
+	u64 length;
+
+	/* Block group type */
+	u64 type;
+
+	/* Stripe length, for non-stripped mode, it will be 0 */
+	u32 stripe_len;
+
+	int num_stripes;
+	struct btrfs_map_stripe stripes[];
+};
+
 #define btrfs_multi_bio_size(n) (sizeof(struct btrfs_multi_bio) + \
 			    (sizeof(struct btrfs_bio_stripe) * (n)))
 #define btrfs_map_lookup_size(n) (sizeof(struct map_lookup) + \
@@ -170,6 +215,10 @@ int btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
 		    u64 logical, u64 *length,
 		    struct btrfs_multi_bio **multi_ret, int mirror_num,
 		    u64 **raid_map_ret);
+
+/* TODO: Use this map_block_v2 to replace __btrfs_map_block() */
+int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
+			 u64 length, struct btrfs_map_block **map_ret);
 int btrfs_next_bg(struct btrfs_mapping_tree *map_tree, u64 *logical,
 		     u64 *size, u64 type);
 static inline int btrfs_next_bg_metadata(struct btrfs_mapping_tree *map_tree,
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 01/14] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-20  9:23   ` Sanidhya Solanki
  2016-10-17  1:27 ` [RFC PATCH v0.8 03/14] btrfs-progs: check/csum: Introduce function to read out one data csum Qu Wenruo
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

For READ, caller normally hopes to get what they request, other than
full stripe map.

In this case, we should remove unrelated stripe map, just like the
following case:
               32K               96K
               |<-request range->|
         0              64k           128K
RAID0:   |    Data 1    |   Data 2    |
              disk1         disk2
Before this patch, we return the full stripe:
Stripe 0: Logical 0, Physical X, Len 64K, Dev disk1
Stripe 1: Logical 64k, Physical Y, Len 64K, Dev disk2

After this patch, we limit the stripe result to the request range:
Stripe 0: Logical 32K, Physical X+32K, Len 32K, Dev disk1
Stripe 1: Logical 64k, Physical Y, Len 32K, Dev disk2

And if it's a RAID5/6 stripe, we just handle it like RAID0, ignoring
parities.

This should make caller easier to use.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 volumes.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/volumes.c b/volumes.c
index 94f3e42..ba16d19 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1682,6 +1682,107 @@ static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
 	return 0;
 }
 
+static void del_one_stripe(struct btrfs_map_block *map_block, int i)
+{
+	int cur_nr = map_block->num_stripes;
+	int size_left = (cur_nr - 1 - i) * sizeof(struct btrfs_map_stripe);
+
+	memmove(&map_block->stripes[i], &map_block->stripes[i + 1], size_left);
+	map_block->num_stripes--;
+}
+
+static void remove_unrelated_stripes(struct map_lookup *map,
+				     int rw, u64 start, u64 length,
+				     struct btrfs_map_block *map_block)
+{
+	int i = 0;
+	/*
+	 * RAID5/6 write must use full stripe.
+	 * No need to do anything.
+	 */
+	if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6) &&
+	    rw == WRITE)
+		return;
+
+	/*
+	 * For RAID0/1/10/DUP, whatever read/write, we can remove unrelated
+	 * stripes without causing anything wrong.
+	 * RAID5/6 READ is just like RAID0, we don't care parity unless we need
+	 * to recovery.
+	 * For recovery, rw should be set to WRITE.
+	 */
+	while (i < map_block->num_stripes) {
+		struct btrfs_map_stripe *stripe;
+		u64 orig_logical; /* Original stripe logical start */
+		u64 orig_end; /* Original stripe logical end */
+
+		stripe = &map_block->stripes[i];
+
+		/*
+		 * For READ, we don't really care parity
+		 */
+		if (stripe->logical == BTRFS_RAID5_P_STRIPE ||
+		    stripe->logical == BTRFS_RAID6_Q_STRIPE) {
+			del_one_stripe(map_block, i);
+			continue;
+		}
+		/* Completely unrelated stripe */
+		if (stripe->logical >= start + length ||
+		    stripe->logical + stripe->length <= start) {
+			del_one_stripe(map_block, i);
+			continue;
+		}
+		/* Covered stripe, modify its logical and physical */
+		orig_logical = stripe->logical;
+		orig_end = stripe->logical + stripe->length;
+		if (start + length <= orig_end) {
+			/*
+			 * |<--range-->|
+			 *   |  stripe   |
+			 * Or
+			 *     |<range>|
+			 *   |  stripe   |
+			 */
+			stripe->logical = max(orig_logical, start);
+			stripe->length = start + length;
+			stripe->physical += stripe->logical - orig_logical;
+		} else if (start >= orig_logical) {
+			/*
+			 *     |<-range--->|
+			 * |  stripe     |
+			 * Or
+			 *     |<range>|
+			 * |  stripe     |
+			 */
+			stripe->logical = start;
+			stripe->length = min(orig_end, start + length);
+			stripe->physical += stripe->logical - orig_logical;
+		}
+		/*
+		 * Remaining case:
+		 * |<----range----->|
+		 *   | stripe |
+		 * No need to do any modification
+		 */
+		i++;
+	}
+
+	/* Recaculate map_block size */
+	map_block->start = 0;
+	map_block->length = 0;
+	for (i = 0; i < map_block->num_stripes; i++) {
+		struct btrfs_map_stripe *stripe;
+
+		stripe = &map_block->stripes[i];
+		if (stripe->logical > map_block->start)
+			map_block->start = stripe->logical;
+		if (stripe->logical + stripe->length >
+		    map_block->start + map_block->length)
+			map_block->length = stripe->logical + stripe->length -
+					    map_block->start;
+	}
+}
+
 int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
 			 u64 length, struct btrfs_map_block **map_ret)
 {
@@ -1717,7 +1818,7 @@ int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
 		free(map_block);
 		return ret;
 	}
-	/* TODO: Remove unrelated map_stripes for READ operation */
+	remove_unrelated_stripes(map, rw, logical, length, map_block);
 
 	*map_ret = map_block;
 	return 0;
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 03/14] btrfs-progs: check/csum: Introduce function to read out one data csum
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 01/14] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 04/14] btrfs-progs: check/scrub: Introduce structures to support fsck scrub Qu Wenruo
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, btrfs_read_one_data_csum(), to read just one
data csum for check usage.

Unlike original implement in cmds-check.c which checks csum by one
CSUM_EXTENT, this just read out one csum(4 bytes).
It is not fast but makes code easier to read.

And will be used in later fsck scrub codes.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 Makefile.in   |  6 ++--
 check/check.h | 21 +++++++++++++
 check/csum.c  | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 2 deletions(-)
 create mode 100644 check/check.h
 create mode 100644 check/csum.c

diff --git a/Makefile.in b/Makefile.in
index b53cf2c..6e2407f 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -63,6 +63,7 @@ CFLAGS = @CFLAGS@ \
 	 -fPIC \
 	 -I$(TOPDIR) \
 	 -I$(TOPDIR)/kernel-lib \
+	 -I$(TOPDIR)/check \
 	 $(EXTRAWARN_CFLAGS) \
 	 $(DEBUG_CFLAGS_INTERNAL) \
 	 $(EXTRA_CFLAGS)
@@ -93,7 +94,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  extent-cache.o extent_io.o volumes.o utils.o repair.o \
 	  qgroup.o raid56.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
-	  inode.o file.o find-root.o free-space-tree.o help.o
+	  inode.o file.o find-root.o free-space-tree.o help.o \
+	  check/csum.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
@@ -463,7 +465,7 @@ clean-all: clean clean-doc clean-gen
 clean: $(CLEANDIRS)
 	@echo "Cleaning"
 	$(Q)$(RM) -f -- $(progs) cscope.out *.o *.o.d \
-		kernel-lib/*.o kernel-lib/*.o.d \
+		kernel-lib/*.o kernel-lib/*.o.d check/*.o check/*.o.d \
 	      dir-test ioctl-test quick-test send-test library-test library-test-static \
 	      btrfs.static mkfs.btrfs.static \
 	      $(check_defs) \
diff --git a/check/check.h b/check/check.h
new file mode 100644
index 0000000..61d1cac
--- /dev/null
+++ b/check/check.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2016 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+/* check/csum.c */
+int btrfs_read_one_data_csum(struct btrfs_fs_info *fs_info, u64 bytenr,
+			     void *csum_ret);
diff --git a/check/csum.c b/check/csum.c
new file mode 100644
index 0000000..53195ea
--- /dev/null
+++ b/check/csum.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2016 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include "ctree.h"
+#include "utils.h"
+/*
+ * TODO:
+ * 1) Add write support for csum
+ *    So we can write new data extents and add csum into csum tree
+ * 2) Add csum range search function
+ *    So we don't need to search csum tree in a per-sectorsize loop.
+ */
+
+int btrfs_read_one_data_csum(struct btrfs_fs_info *fs_info, u64 bytenr,
+			     void *csum_ret)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct btrfs_root *csum_root = fs_info->csum_root;
+	u32 item_offset;
+	u32 item_size;
+	u32 final_offset;
+	u32 sectorsize = fs_info->tree_root->sectorsize;
+	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+	int ret;
+
+	if (!csum_ret) {
+		error("wrong parameter for %s", __func__);
+		return -EINVAL;
+	}
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+	key.type = BTRFS_EXTENT_CSUM_KEY;
+	key.offset = bytenr;
+
+	ret = btrfs_search_slot(NULL, csum_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	if (ret == 0) {
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		if (!IS_ALIGNED(key.offset, sectorsize)) {
+			error("csum item bytenr %llu is not aligned to %u",
+			      key.offset, sectorsize);
+			ret = -EIO;
+			goto out;
+		}
+		u32 offset = btrfs_item_ptr_offset(path->nodes[0],
+						      path->slots[0]);
+
+		read_extent_buffer(path->nodes[0], csum_ret, offset, csum_size);
+		goto out;
+	}
+	ret = btrfs_previous_item(csum_root, path, BTRFS_EXTENT_CSUM_OBJECTID,
+				  BTRFS_EXTENT_CSUM_KEY);
+	if (ret)
+		goto out;
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	if (!IS_ALIGNED(key.offset, sectorsize)) {
+		error("csum item bytenr %llu is not aligned to %u",
+		      key.offset, sectorsize);
+		ret = -EIO;
+		goto out;
+	}
+	item_offset = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
+	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
+	if (key.offset + item_size / csum_size * sectorsize <= bytenr) {
+		ret = 1;
+		goto out;
+	}
+
+	final_offset = (bytenr - key.offset) / sectorsize * csum_size +
+		       item_offset;
+	read_extent_buffer(path->nodes[0], csum_ret, final_offset, csum_size);
+	ret = 0;
+out:
+	btrfs_free_path(path);
+	return ret;
+};
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 04/14] btrfs-progs: check/scrub: Introduce structures to support fsck scrub
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (2 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 03/14] btrfs-progs: check/csum: Introduce function to read out one data csum Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 05/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based tree block Qu Wenruo
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introuduce new local structures, scrub_full_stripe and scrub_stripe, for
incoming offline scrub support.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 Makefile.in   |   2 +-
 check/scrub.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 check/scrub.c

diff --git a/Makefile.in b/Makefile.in
index 6e2407f..b30880a 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -95,7 +95,7 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  qgroup.o raid56.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
 	  inode.o file.o find-root.o free-space-tree.o help.o \
-	  check/csum.o
+	  check/csum.o check/scrub.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
diff --git a/check/scrub.c b/check/scrub.c
new file mode 100644
index 0000000..acfe213
--- /dev/null
+++ b/check/scrub.c
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2016 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <unistd.h>
+#include "ctree.h"
+#include "volumes.h"
+#include "disk-io.h"
+#include "utils.h"
+#include "check.h"
+
+struct scrub_stripe {
+	/* For P/Q logical start will be BTRFS_RAID5/6_P/Q_STRIPE */
+	u64 logical;
+
+	/* Device is missing */
+	unsigned int dev_missing:1;
+
+	/* Any tree/data csum mismatches */
+	unsigned int csum_mismatch:1;
+
+	/* Some data doesn't have csum(nodatasum) */
+	unsigned int csum_missing:1;
+
+	char *data;
+};
+
+struct scrub_full_stripe {
+	u64 logical_start;
+	u64 logical_len;
+	u64 bg_type;
+	u32 nr_stripes;
+	u32 stripe_len;
+
+	/* Read error stripes */
+	u32 err_read_stripes;
+
+	/* Csum error data stripes */
+	u32 err_csum_dstripes;
+
+	/* Missing csum data stripes */
+	u32 missing_csum_dstripes;
+
+	/* Missing stripe index */
+	int missing_stripes[2];
+
+	struct scrub_stripe stripes[];
+};
+
+static void free_full_stripe(struct scrub_full_stripe *fstripe)
+{
+	int i;
+
+	for (i = 0; i < fstripe->nr_stripes; i++)
+		free(fstripe->stripes[i].data);
+	free(fstripe);
+}
+
+static struct scrub_full_stripe *alloc_full_stripe(int nr_stripes,
+						    u32 stripe_len)
+{
+	struct scrub_full_stripe *ret;
+	int size = sizeof(*ret) + nr_stripes * sizeof(struct scrub_stripe);
+	int i;
+
+	ret = malloc(size);
+	if (!ret)
+		return NULL;
+
+	memset(ret, 0, size);
+	ret->nr_stripes = nr_stripes;
+	ret->stripe_len = stripe_len;
+
+	/* Alloc data memory for each stripe */
+	for (i = 0; i < nr_stripes; i++) {
+		struct scrub_stripe *stripe = &ret->stripes[i];
+
+		stripe->data = malloc(stripe_len);
+		if (!stripe->data) {
+			free_full_stripe(ret);
+			return NULL;
+		}
+	}
+	return ret;
+}
+
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 05/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based tree block
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (3 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 04/14] btrfs-progs: check/scrub: Introduce structures to support fsck scrub Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 06/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based data blocks Qu Wenruo
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, scrub_tree_mirror(), to scrub mirror based
tree blocks (Single/DUP/RAID0/1/10)

This function can be used on in-memory tree blocks using @data parameter
for RAID5/6 full stripe, or just by @bytenr, for other profiles.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 disk-io.c     |  4 ++--
 disk-io.h     |  2 ++
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/check/scrub.c b/check/scrub.c
index acfe213..ce8d5e5 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -98,3 +98,62 @@ static struct scrub_full_stripe *alloc_full_stripe(int nr_stripes,
 	return ret;
 }
 
+static inline int is_data_stripe(struct scrub_stripe *stripe)
+{
+	u64 bytenr = stripe->logical;
+
+	if (bytenr == BTRFS_RAID5_P_STRIPE || bytenr == BTRFS_RAID6_Q_STRIPE)
+		return 0;
+	return 1;
+}
+
+static int scrub_tree_mirror(struct btrfs_fs_info *fs_info,
+			     struct btrfs_scrub_progress *scrub_ctx,
+			     char *data, u64 bytenr, int mirror)
+{
+	struct extent_buffer *eb;
+	u32 nodesize = fs_info->tree_root->nodesize;
+	int ret;
+
+	if (!IS_ALIGNED(bytenr, fs_info->tree_root->sectorsize)) {
+		/* Such error will be reported by check_tree_block() */
+		scrub_ctx->verify_errors++;
+		return -EIO;
+	}
+
+	eb = btrfs_find_create_tree_block(fs_info, bytenr, nodesize);
+	if (!eb)
+		return -ENOMEM;
+	if (data) {
+		memcpy(eb->data, data, nodesize);
+	} else {
+		ret = read_whole_eb(fs_info, eb, mirror);
+		if (ret) {
+			scrub_ctx->read_errors++;
+			error("failed to read tree block %llu mirror %d",
+			      bytenr, mirror);
+			goto out;
+		}
+	}
+
+	scrub_ctx->tree_bytes_scrubbed += nodesize;
+	if (csum_tree_block(fs_info->tree_root, eb, 1)) {
+		error("tree block %llu mirror %d checksum mismatch", bytenr,
+			mirror);
+		scrub_ctx->csum_errors++;
+		ret = -EIO;
+		goto out;
+	}
+	ret = check_tree_block(fs_info, eb);
+	if (ret < 0) {
+		error("tree block %llu mirror %d is invalid", bytenr, mirror);
+		scrub_ctx->verify_errors++;
+		goto out;
+	}
+
+	scrub_ctx->tree_extents_scrubbed++;
+out:
+	free_extent_buffer(eb);
+	return ret;
+}
+
diff --git a/disk-io.c b/disk-io.c
index f24567b..2750e6e 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -51,8 +51,8 @@ static u32 max_nritems(u8 level, u32 nodesize)
 		sizeof(struct btrfs_key_ptr));
 }
 
-static int check_tree_block(struct btrfs_fs_info *fs_info,
-			    struct extent_buffer *buf)
+int check_tree_block(struct btrfs_fs_info *fs_info,
+		     struct extent_buffer *buf)
 {
 
 	struct btrfs_fs_devices *fs_devices;
diff --git a/disk-io.h b/disk-io.h
index 245626c..43ce9c9 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -113,6 +113,8 @@ static inline struct extent_buffer* read_tree_block(
 			parent_transid);
 }
 
+int check_tree_block(struct btrfs_fs_info *fs_info,
+		     struct extent_buffer *buf);
 int read_extent_data(struct btrfs_root *root, char *data, u64 logical,
 		     u64 *len, int mirror);
 void readahead_tree_block(struct btrfs_root *root, u64 bytenr, u32 blocksize,
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 06/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based data blocks
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (4 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 05/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based tree block Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 07/14] btrfs-progs: check/scrub: Introduce function to scrub one extent Qu Wenruo
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, scrub_data_mirror(), to check mirror based
data blocks.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/check/scrub.c b/check/scrub.c
index ce8d5e5..5cd8bc4 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -157,3 +157,70 @@ out:
 	return ret;
 }
 
+static int scrub_data_mirror(struct btrfs_fs_info *fs_info,
+			     struct btrfs_scrub_progress *scrub_ctx,
+			     char *data, u64 start, u64 len, int mirror)
+{
+	u64 cur = 0;
+	u32 csum;
+	u32 sectorsize = fs_info->tree_root->sectorsize;
+	char *buf = NULL;
+	int ret = 0;
+	int err = 0;
+
+	if (!data) {
+		buf = malloc(len);
+		if (!buf)
+			return -ENOMEM;
+		/* Read out as much data as possible to speed up read */
+		while (cur < len) {
+			u64 read_len = len - cur;
+
+			ret = read_extent_data(fs_info->tree_root, buf + cur,
+					start + cur, &read_len, mirror);
+			if (ret < 0) {
+				error("failed to read out data at logical bytenr %llu mirror %d",
+				      start + cur, mirror);
+				scrub_ctx->read_errors++;
+				goto out;
+			}
+			scrub_ctx->data_bytes_scrubbed += read_len;
+			cur += read_len;
+		}
+	} else {
+		buf = data;
+	}
+
+	/* Check csum per-sectorsize */
+	cur = 0;
+	while (cur < len) {
+		u32 data_csum = ~(u32)0;
+
+		ret = btrfs_read_one_data_csum(fs_info, start + cur, &csum);
+		if (ret > 0) {
+			scrub_ctx->csum_discards++;
+			/* In case some csum are missing */
+			goto next;
+		}
+		data_csum = btrfs_csum_data(NULL, buf + cur, data_csum,
+					    sectorsize);
+		btrfs_csum_final(data_csum, (u8 *)&data_csum);
+		if (data_csum != csum) {
+			error("data at bytenr %llu mirror %d csum mismatch, have %u expect %u",
+			      start + cur, mirror, data_csum, csum);
+			err = 1;
+			scrub_ctx->csum_errors++;
+			cur += sectorsize;
+			continue;
+		}
+		scrub_ctx->data_bytes_scrubbed += sectorsize;
+next:
+		cur += sectorsize;
+	}
+out:
+	if (!data)
+		free(buf);
+	if (!ret && err)
+		return -EIO;
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 07/14] btrfs-progs: check/scrub: Introduce function to scrub one extent
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (5 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 06/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based data blocks Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 08/14] btrfs-progs: check/scrub: Introduce function to scrub one data stripe Qu Wenruo
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, scrub_one_extent(), as a wrapper to check one
extent.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/check/scrub.c b/check/scrub.c
index 5cd8bc4..cdba469 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -224,3 +224,76 @@ out:
 		return -EIO;
 	return ret;
 }
+
+/*
+ * Check all copies of range @start, @len.
+ * Caller must ensure the range is covered by th EXTENT_ITEM/METADATA_ITEM
+ * specified by path.
+ * If @report is set, it will report if the range is recoverable or totally
+ * corrupted if it has corrupted mirror.
+ *
+ * Return 0 if the range is all OK or recoverable.
+ * Return <0 if the range can't be recoverable.
+ */
+static int scrub_one_extent(struct btrfs_fs_info *fs_info,
+			    struct btrfs_scrub_progress *scrub_ctx,
+			    struct btrfs_path *path, u64 start, u64 len,
+			    int report)
+{
+	struct btrfs_key key;
+	struct btrfs_extent_item *ei;
+	struct extent_buffer *leaf = path->nodes[0];
+	int slot = path->slots[0];
+	int num_copies;
+	int corrupted = 0;
+	u64 extent_start;
+	u64 extent_len;
+	int metadata = 0;
+	int i;
+	int ret;
+
+	btrfs_item_key_to_cpu(leaf, &key, slot);
+	if (key.type != BTRFS_METADATA_ITEM_KEY &&
+	    key.type != BTRFS_EXTENT_ITEM_KEY)
+		goto invalid_arg;
+
+	extent_start = key.objectid;
+	if (key.type == BTRFS_METADATA_ITEM_KEY) {
+		extent_len = fs_info->tree_root->nodesize;
+		metadata = 1;
+	} else {
+		extent_len = key.offset;
+		ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item);
+		if (btrfs_extent_flags(leaf, ei) & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+			metadata = 1;
+	}
+	if (start >= extent_start + extent_len ||
+	    start + len <= extent_start)
+		goto invalid_arg;
+	num_copies = btrfs_num_copies(&fs_info->mapping_tree, start, len);
+	for (i = 1; i <= num_copies; i++) {
+		if (metadata)
+			ret = scrub_tree_mirror(fs_info, scrub_ctx,
+					NULL, extent_start, i);
+		else
+			ret = scrub_data_mirror(fs_info, scrub_ctx, NULL,
+						start, len, i);
+		if (ret < 0)
+			corrupted++;
+	}
+
+	if (report) {
+		if (corrupted && corrupted < num_copies)
+			printf("bytenr %llu len %llu has corrupted mirror, but is recoverable\n",
+				start, len);
+		else if (corrupted >= num_copies)
+			error("bytenr %llu len %llu has corrupted mirror, can't be recovered",
+				start, len);
+	}
+	if (corrupted < num_copies)
+		return 0;
+	return -EIO;
+invalid_arg:
+	error("invalid parameter for %s", __func__);
+	return -EINVAL;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 08/14] btrfs-progs: check/scrub: Introduce function to scrub one data stripe
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (6 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 07/14] btrfs-progs: check/scrub: Introduce function to scrub one extent Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 09/14] btrfs-progs: check/scrub: Introduce function to verify parities Qu Wenruo
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function, scrub_one_data_stripe(), to check all data and
tree blocks inside the data stripe.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/check/scrub.c b/check/scrub.c
index cdba469..f29effa 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -297,3 +297,114 @@ invalid_arg:
 	error("invalid parameter for %s", __func__);
 	return -EINVAL;
 }
+
+static int scrub_one_data_stripe(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 struct scrub_stripe *stripe, u32 stripe_len)
+{
+	struct btrfs_path *path;
+	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_key key;
+	u64 extent_start;
+	u64 extent_len;
+	u64 orig_csum_discards;
+	int ret;
+
+	if (!is_data_stripe(stripe))
+		return -EINVAL;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = stripe->logical + stripe_len;
+	key.offset = 0;
+	key.type = 0;
+
+	ret = btrfs_search_slot(NULL, extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	while (1) {
+		struct btrfs_extent_item *ei;
+		struct extent_buffer *eb;
+		char *data;
+		int slot;
+		int metadata = 0;
+		u64 check_start;
+		u64 check_len;
+
+		ret = btrfs_previous_extent_item(extent_root, path, 0);
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+		if (ret < 0)
+			goto out;
+		eb = path->nodes[0];
+		slot = path->slots[0];
+		btrfs_item_key_to_cpu(eb, &key, slot);
+		extent_start = key.objectid;
+		ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item);
+
+		/* tree block scrub */
+		if (key.type == BTRFS_METADATA_ITEM_KEY ||
+		    btrfs_extent_flags(eb, ei) & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			extent_len = extent_root->nodesize;
+			metadata = 1;
+		} else {
+			extent_len = key.offset;
+			metadata = 0;
+		}
+
+		/* Current extent is out of our range, loop comes to end */
+		if (extent_start + extent_len <= stripe->logical)
+			break;
+
+		if (metadata) {
+			/*
+			 * Check crossing stripe first, which can't be scrubbed
+			 */
+			if (check_crossing_stripes(extent_start,
+					extent_root->nodesize)) {
+				error("tree block at %llu is crossing stripe boundary, unable to scrub",
+					extent_start);
+				ret = -EIO;
+				goto out;
+			}
+			data = stripe->data + extent_start - stripe->logical;
+			ret = scrub_tree_mirror(fs_info, scrub_ctx,
+						data, extent_start, 0);
+			/* Any csum/verify error means the stripe is screwed */
+			if (ret < 0) {
+				stripe->csum_mismatch = 1;
+				ret = -EIO;
+				goto out;
+			}
+			ret = 0;
+			continue;
+		}
+		/* Restrict the extent range to fit stripe range */
+		check_start = max(extent_start, stripe->logical);
+		check_len = min(extent_start + extent_len, stripe->logical +
+				stripe_len) - check_start;
+
+		/* Record original csum_discards to detect missing csum case */
+		orig_csum_discards = scrub_ctx->csum_discards;
+
+		data = stripe->data + check_start - stripe->logical;
+		ret = scrub_data_mirror(fs_info, scrub_ctx, data, check_start,
+					check_len, 0);
+		/* Csum mismatch, no need to continue anyway*/
+		if (ret < 0) {
+			stripe->csum_mismatch = 1;
+			goto out;
+		}
+		/* Check if there is any missing csum for data */
+		if (scrub_ctx->csum_discards != orig_csum_discards)
+			stripe->csum_missing = 1;
+		ret = 0;
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 09/14] btrfs-progs: check/scrub: Introduce function to verify parities
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (7 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 08/14] btrfs-progs: check/scrub: Introduce function to scrub one data stripe Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 10/14] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Qu Wenruo
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function, verify_parities(), to check if parities matches
for full stripe which all data stripes matches with their csum.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/check/scrub.c b/check/scrub.c
index f29effa..d8182d6 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -408,3 +408,62 @@ out:
 	btrfs_free_path(path);
 	return ret;
 }
+
+static int verify_parities(struct btrfs_fs_info *fs_info,
+			   struct btrfs_scrub_progress *scrub_ctx,
+			   struct scrub_full_stripe *fstripe)
+{
+	void **ptrs;
+	void *ondisk_p = NULL;
+	void *ondisk_q = NULL;
+	void *buf_p;
+	void *buf_q;
+	int nr_stripes = fstripe->nr_stripes;
+	int stripe_len = BTRFS_STRIPE_LEN;
+	int i;
+	int ret;
+
+	ptrs = malloc(sizeof(void *) * fstripe->nr_stripes);
+	buf_p = malloc(fstripe->stripe_len);
+	buf_q = malloc(fstripe->stripe_len);
+	if (!ptrs || !buf_p || !buf_q) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < fstripe->nr_stripes; i++) {
+		struct scrub_stripe *stripe = &fstripe->stripes[i];
+
+		if (stripe->logical == BTRFS_RAID5_P_STRIPE) {
+			ondisk_p = stripe->data;
+			ptrs[i] = buf_p;
+			continue;
+		} else if (stripe->logical == BTRFS_RAID6_Q_STRIPE) {
+			ondisk_q = stripe->data;
+			ptrs[i] = buf_q;
+			continue;
+		} else {
+			ptrs[i] = stripe->data;
+			continue;
+		}
+	}
+	/* RAID6 */
+	if (ondisk_q) {
+		raid6_gen_syndrome(nr_stripes, stripe_len, ptrs);
+		if (memcmp(ondisk_q, ptrs[nr_stripes - 1], stripe_len) ||
+		    memcmp(ondisk_p, ptrs[nr_stripes - 2], stripe_len))
+			ret = -EIO;
+	} else {
+		ret = raid5_gen_result(nr_stripes, stripe_len, nr_stripes - 1,
+					ptrs);
+		if (ret < 0)
+			goto out;
+		if (memcmp(ondisk_p, ptrs[nr_stripes - 1], stripe_len))
+			ret = -EIO;
+	}
+out:
+	free(buf_p);
+	free(buf_q);
+	free(ptrs);
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 10/14] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range.
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (8 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 09/14] btrfs-progs: check/scrub: Introduce function to verify parities Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 11/14] btrfs-progs: check/scrub: Introduce function to recover data parity Qu Wenruo
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Will be used for later scrub usage.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 ctree.h       |  2 ++
 extent-tree.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/ctree.h b/ctree.h
index c76b1f1..d22e520 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2372,6 +2372,8 @@ int exclude_super_stripes(struct btrfs_root *root,
 u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
 		       struct btrfs_fs_info *info, u64 start, u64 end);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
+int btrfs_check_extent_exists(struct btrfs_fs_info *fs_info, u64 start,
+			      u64 len);
 
 /* ctree.c */
 int btrfs_comp_cpu_keys(struct btrfs_key *k1, struct btrfs_key *k2);
diff --git a/extent-tree.c b/extent-tree.c
index f6d0a7c..88b91df 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -4244,3 +4244,55 @@ u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
 
 	return total_added;
 }
+
+int btrfs_check_extent_exists(struct btrfs_fs_info *fs_info, u64 start,
+			      u64 len)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	u64 extent_start;
+	u64 extent_len;
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = start + len;
+	key.type = 0;
+	key.offset = 0;
+
+	ret = btrfs_search_slot(NULL, fs_info->extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	/*
+	 * Now we're pointing at slot whose key.object >= end, skip to previous
+	 * extent.
+	 */
+	ret = btrfs_previous_extent_item(fs_info->extent_root, path, 0);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = 0;
+		goto out;
+	}
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	extent_start = key.objectid;
+	if (key.type == BTRFS_METADATA_ITEM_KEY)
+		extent_len = fs_info->extent_root->nodesize;
+	else
+		extent_len = key.offset;
+
+	/*
+	 * search_slot() and previous_extent_item() has ensured that our
+	 * extent_start < start + len, we only need to care extent end.
+	 */
+	if (extent_start + extent_len <= start)
+		ret = 0;
+	else
+		ret = 1;
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 11/14] btrfs-progs: check/scrub: Introduce function to recover data parity
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (9 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 10/14] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 12/14] btrfs-progs: check/scrub: Introduce a function to scrub one full stripe Qu Wenruo
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce function, recover_from_parities(), to recover data stripes.

However this function only support RAID5 yet, but should be good enough
for the scrub framework.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/check/scrub.c b/check/scrub.c
index d8182d6..c965328 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -58,6 +58,9 @@ struct scrub_full_stripe {
 	/* Missing stripe index */
 	int missing_stripes[2];
 
+	/* Has already been recovered using parities */
+	unsigned int recovered:1;
+
 	struct scrub_stripe stripes[];
 };
 
@@ -467,3 +470,49 @@ out:
 	free(ptrs);
 	return ret;
 }
+
+static int recovery_from_parities(struct btrfs_fs_info *fs_info,
+				  struct btrfs_scrub_progress *scrub_ctx,
+				  struct scrub_full_stripe *fstripe)
+{
+	void **ptrs;
+	int nr_stripes = fstripe->nr_stripes;
+	int corrupted = -1;
+	int stripe_len = BTRFS_STRIPE_LEN;
+	int i;
+	int ret;
+
+	/* No need to recover */
+	if (!fstripe->err_read_stripes && !fstripe->err_csum_dstripes)
+		return 0;
+
+	/* Already recovered once, no more chance */
+	if (fstripe->recovered)
+		return -EINVAL;
+
+	if (fstripe->bg_type == BTRFS_BLOCK_GROUP_RAID6) {
+		/* Need to recover 2 stripes, not supported yet */
+		error("recover data stripes for RAID6 is not support yet");
+		return -ENOTTY;
+	}
+
+	/* Out of repair */
+	if (fstripe->err_read_stripes + fstripe->err_csum_dstripes > 1)
+		return -EINVAL;
+
+	ptrs = malloc(sizeof(void *) * fstripe->nr_stripes);
+	if (!ptrs)
+		return -ENOMEM;
+
+	/* Construct ptrs */
+	for (i = 0; i < nr_stripes; i++)
+		ptrs[i] = fstripe->stripes[i].data;
+	corrupted = fstripe->missing_stripes[0];
+
+	/* Recover the corrupted data csum */
+	ret = raid5_gen_result(nr_stripes, stripe_len, corrupted, ptrs);
+
+	fstripe->recovered = 1;
+	free(ptrs);
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 12/14] btrfs-progs: check/scrub: Introduce a function to scrub one full stripe
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (10 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 11/14] btrfs-progs: check/scrub: Introduce function to recover data parity Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 13/14] btrfs-progs: check/scrub: Introduce function to check a whole block group Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 14/14] btrfs-progs: fsck: Introduce offline scrub function Qu Wenruo
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, scrub_one_full_stripe(), to check a full
stripe.

It can handle the following case:
1) Device missing
   Will try to recover, then check against csum

2) Csum mismatch
   Will try to recover, then check against csum

3) All csum match
   Will check against parity, to ensure if it's OK

4) Csum missing
   Just check against parity.

Not impelmented:
1) RAID6 recovery.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 183 insertions(+), 10 deletions(-)

diff --git a/check/scrub.c b/check/scrub.c
index c965328..1c8e440 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -55,8 +55,9 @@ struct scrub_full_stripe {
 	/* Missing csum data stripes */
 	u32 missing_csum_dstripes;
 
-	/* Missing stripe index */
-	int missing_stripes[2];
+	/* currupted stripe index */
+	int corrupted_index[2];
+	int nr_corrupted_stripes;
 
 	/* Has already been recovered using parities */
 	unsigned int recovered:1;
@@ -87,6 +88,8 @@ static struct scrub_full_stripe *alloc_full_stripe(int nr_stripes,
 	memset(ret, 0, size);
 	ret->nr_stripes = nr_stripes;
 	ret->stripe_len = stripe_len;
+	ret->corrupted_index[0] = -1;
+	ret->corrupted_index[1] = -1;
 
 	/* Alloc data memory for each stripe */
 	for (i = 0; i < nr_stripes; i++) {
@@ -471,7 +474,7 @@ out:
 	return ret;
 }
 
-static int recovery_from_parities(struct btrfs_fs_info *fs_info,
+static int recover_from_parities(struct btrfs_fs_info *fs_info,
 				  struct btrfs_scrub_progress *scrub_ctx,
 				  struct scrub_full_stripe *fstripe)
 {
@@ -483,22 +486,28 @@ static int recovery_from_parities(struct btrfs_fs_info *fs_info,
 	int ret;
 
 	/* No need to recover */
-	if (!fstripe->err_read_stripes && !fstripe->err_csum_dstripes)
+	if (!fstripe->nr_corrupted_stripes)
 		return 0;
 
-	/* Already recovered once, no more chance */
-	if (fstripe->recovered)
+	if (fstripe->recovered) {
+		error("full stripe %llu has been recovered before, no more chance to recover",
+		      fstripe->logical_start);
 		return -EINVAL;
+	}
 
-	if (fstripe->bg_type == BTRFS_BLOCK_GROUP_RAID6) {
+	if (fstripe->bg_type == BTRFS_BLOCK_GROUP_RAID6 &&
+	    fstripe->nr_corrupted_stripes == 2) {
 		/* Need to recover 2 stripes, not supported yet */
-		error("recover data stripes for RAID6 is not support yet");
+		error("recover 2 data stripes for RAID6 is not support yet");
 		return -ENOTTY;
 	}
 
 	/* Out of repair */
-	if (fstripe->err_read_stripes + fstripe->err_csum_dstripes > 1)
+	if (fstripe->nr_corrupted_stripes > 1) {
+		error("full stripe %llu has too many missing stripes and csum mismatch, unable to recover",
+		      fstripe->logical_start);
 		return -EINVAL;
+	}
 
 	ptrs = malloc(sizeof(void *) * fstripe->nr_stripes);
 	if (!ptrs)
@@ -507,7 +516,7 @@ static int recovery_from_parities(struct btrfs_fs_info *fs_info,
 	/* Construct ptrs */
 	for (i = 0; i < nr_stripes; i++)
 		ptrs[i] = fstripe->stripes[i].data;
-	corrupted = fstripe->missing_stripes[0];
+	corrupted = fstripe->corrupted_index[0];
 
 	/* Recover the corrupted data csum */
 	ret = raid5_gen_result(nr_stripes, stripe_len, corrupted, ptrs);
@@ -516,3 +525,167 @@ static int recovery_from_parities(struct btrfs_fs_info *fs_info,
 	free(ptrs);
 	return ret;
 }
+
+static void record_corrupted_stripe(struct scrub_full_stripe *fstripe,
+				    int index)
+{
+	int i = 0;
+
+	for (i = 0; i < 2; i++) {
+		if (fstripe->corrupted_index[i] == -1) {
+			fstripe->corrupted_index[i] = index;
+			break;
+		}
+	}
+	fstripe->nr_corrupted_stripes++;
+}
+
+static int scrub_one_full_stripe(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 u64 start, u64 *next_ret)
+{
+	struct scrub_full_stripe *fstripe;
+	struct btrfs_map_block *map_block = NULL;
+	u32 stripe_len = BTRFS_STRIPE_LEN;
+	u64 bg_type;
+	u64 len;
+	int max_tolerance;
+	int i;
+	int ret;
+
+	if (!next_ret) {
+		error("invalid argument for %s", __func__);
+		return -EINVAL;
+	}
+
+	ret = __btrfs_map_block_v2(fs_info, WRITE, start, stripe_len,
+				   &map_block);
+	if (ret < 0)
+		return ret;
+	start = map_block->start;
+	len = map_block->length;
+	*next_ret = start + len;
+	bg_type = map_block->type;
+	if (bg_type == BTRFS_BLOCK_GROUP_RAID5)
+		max_tolerance = 1;
+	else if (bg_type == BTRFS_BLOCK_GROUP_RAID6)
+		max_tolerance = 2;
+	else {
+		free(map_block);
+		return -EINVAL;
+	}
+
+	/* Before going on, check if there is any extent in the range */
+	ret = btrfs_check_extent_exists(fs_info, start, len);
+	if (ret < 0) {
+		free(map_block);
+		return ret;
+	}
+	/* No extents in range, no need to check */
+	if (ret == 0) {
+		free(map_block);
+		return 0;
+	}
+
+	fstripe = alloc_full_stripe(map_block->num_stripes,
+				    map_block->stripe_len);
+	if (!fstripe)
+		return -ENOMEM;
+
+	/* Fill scrub_full_stripes */
+	fstripe->logical_start = map_block->start;
+	fstripe->nr_stripes = map_block->num_stripes;
+	fstripe->stripe_len = stripe_len;
+	fstripe->bg_type = bg_type;
+
+	/* Fill each stripe, including its data */
+	for (i = 0; i < map_block->num_stripes; i++) {
+		struct scrub_stripe *s_stripe = &fstripe->stripes[i];
+		struct btrfs_map_stripe *m_stripe = &map_block->stripes[i];
+
+		s_stripe->logical = m_stripe->logical;
+
+		if (m_stripe->dev->fd == -1) {
+			s_stripe->dev_missing = 1;
+			record_corrupted_stripe(fstripe, i);
+			fstripe->err_read_stripes++;
+			continue;
+		}
+
+		ret = pread(m_stripe->dev->fd, s_stripe->data, stripe_len,
+			    m_stripe->physical);
+		if (ret < stripe_len) {
+			record_corrupted_stripe(fstripe, i);
+			fstripe->err_read_stripes++;
+			continue;
+		}
+	}
+	if (fstripe->nr_corrupted_stripes > max_tolerance) {
+		error("full stripe at bytenr: %llu has too many read error, can't be recovered",
+			start);
+		ret = -EIO;
+		goto out;
+	}
+
+	ret = recover_from_parities(fs_info, scrub_ctx, fstripe);
+	if (ret < 0) {
+		error("failed to recover full stripe %llu: %s\n",
+		      fstripe->logical_start, strerror(-ret));
+		goto out;
+	}
+
+	/* Check data stripes against csum tree */
+	for (i = 0; i < map_block->num_stripes; i++) {
+		struct scrub_stripe *stripe = &fstripe->stripes[i];
+
+		if (!is_data_stripe(stripe))
+			continue;
+		ret = scrub_one_data_stripe(fs_info, scrub_ctx, stripe,
+					    stripe_len);
+		if (ret < 0)
+			fstripe->err_csum_dstripes++;
+		if (stripe->csum_missing)
+			fstripe->missing_csum_dstripes++;
+	}
+	if (fstripe->err_csum_dstripes == 0) {
+		/*
+		 * No csum error, data stripes are all OK, only need to
+		 * check parity
+		 */
+		ret = verify_parities(fs_info, scrub_ctx, fstripe);
+		if (ret < 0 && fstripe->missing_csum_dstripes == 0) {
+			error("full stripe at bytenr: %llu has correct data, but corrupted P/Q stripe",
+				start);
+			ret = 0;
+		} else if (ret < 0 && fstripe->missing_csum_dstripes) {
+			error("full stripe at bytenr: %llu has mismatch P/Q stripes, but csum is not enough to determine which is correct",
+				start);
+			ret = -EIO;
+		}
+		goto out;
+	}
+
+	/* Csum mismatch, try recover */
+	ret = recover_from_parities(fs_info, scrub_ctx, fstripe);
+	if (ret < 0) {
+		error("failed to recover full stripe %llu: %s\n",
+		      fstripe->logical_start, strerror(-ret));
+		goto out;
+	}
+
+	/* Recheck recovered stripes */
+	ret = scrub_one_data_stripe(fs_info, scrub_ctx,
+			&fstripe->stripes[fstripe->corrupted_index[0]],
+			stripe_len);
+	if (ret < 0)
+		error("full stripe %llu has unrecoverable csum mismatch",
+		      fstripe->logical_start);
+	else
+		error("full stripe %llu has csum mismatch, but can be recovered from parity",
+		      fstripe->logical_start);
+	ret = 0;
+out:
+	free_full_stripe(fstripe);
+	free(map_block);
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 13/14] btrfs-progs: check/scrub: Introduce function to check a whole block group
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (11 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 12/14] btrfs-progs: check/scrub: Introduce a function to scrub one full stripe Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  2016-10-17  1:27 ` [RFC PATCH v0.8 14/14] btrfs-progs: fsck: Introduce offline scrub function Qu Wenruo
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function, scrub_one_block_group(), to scrub a block group.

For Single/DUP/RAID0/RAID1/RAID10, we use old mirror number based
map_block, and check extent by extent.

For parity based profile (RAID5/6), we use new map_block_v2() and check
full stripe by full stripe.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 check/scrub.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/check/scrub.c b/check/scrub.c
index 1c8e440..94f8744 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -689,3 +689,88 @@ out:
 	free(map_block);
 	return ret;
 }
+
+static int scrub_one_block_group(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 struct btrfs_block_group_cache *bg_cache)
+{
+	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	u64 bg_start = bg_cache->key.objectid;
+	u64 bg_len = bg_cache->key.offset;
+	u64 cur;
+	u64 next;
+	int ret;
+
+	if (bg_cache->flags &
+	    (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+		/* RAID5/6 check full stripe by full stripe */
+		cur = bg_cache->key.objectid;
+
+		while (cur < bg_start + bg_len) {
+			ret = scrub_one_full_stripe(fs_info, scrub_ctx, cur,
+						    &next);
+			/* Ignore any non-fatal error */
+			if (ret < 0 && ret != -EIO) {
+				error("fatal error happens checking one full stripe at bytenr: %llu: %s",
+					cur, strerror(-ret));
+				return ret;
+			}
+			cur = next;
+		}
+		/* Ignore any -EIO error, such error will be reported at last */
+		return 0;
+	}
+	/* None parity based profile, check extent by extent */
+	key.objectid = bg_start;
+	key.type = 0;
+	key.offset = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+	ret = btrfs_search_slot(NULL, extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	while (1) {
+		struct extent_buffer *eb = path->nodes[0];
+		int slot = path->slots[0];
+		u64 extent_start;
+		u64 extent_len;
+
+		btrfs_item_key_to_cpu(eb, &key, slot);
+		if (key.objectid >= bg_start + bg_len)
+			break;
+		if (key.type != BTRFS_EXTENT_ITEM_KEY &&
+		    key.type != BTRFS_METADATA_ITEM_KEY)
+			goto next;
+
+		extent_start = key.objectid;
+		if (key.type == BTRFS_METADATA_ITEM_KEY)
+			extent_len = extent_root->nodesize;
+		else
+			extent_len = key.offset;
+
+		ret = scrub_one_extent(fs_info, scrub_ctx, path, extent_start,
+					extent_len, 1);
+		if (ret < 0 && ret != -EIO) {
+			error("fatal error checking extent bytenr %llu len %llu: %s",
+				extent_start, extent_len, strerror(-ret));
+			goto out;
+		}
+		ret = 0;
+next:
+		ret = btrfs_next_extent_item(extent_root, path, bg_start +
+					     bg_len);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v0.8 14/14] btrfs-progs: fsck: Introduce offline scrub function
  2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
                   ` (12 preceding siblings ...)
  2016-10-17  1:27 ` [RFC PATCH v0.8 13/14] btrfs-progs: check/scrub: Introduce function to check a whole block group Qu Wenruo
@ 2016-10-17  1:27 ` Qu Wenruo
  13 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-17  1:27 UTC (permalink / raw)
  To: linux-btrfs

Now, btrfs check has a kernel scrub equivalent.

And even more, it's has stronger csum check against reconstructed data
and existing data stripes.
It will avoid any possible silent data corruption in kernel scrub.

Now it only supports to do read-only check, but is already able to
provide info on the recoverability.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 Documentation/btrfs-check.asciidoc |  8 ++++++++
 check/check.h                      |  2 ++
 check/scrub.c                      | 36 ++++++++++++++++++++++++++++++++++++
 cmds-check.c                       | 12 +++++++++++-
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-check.asciidoc b/Documentation/btrfs-check.asciidoc
index a32e1c7..98681ff 100644
--- a/Documentation/btrfs-check.asciidoc
+++ b/Documentation/btrfs-check.asciidoc
@@ -78,6 +78,14 @@ respective superblock offset is within the device size
 This can be used to use a different starting point if some of the primary
 superblock is damaged.
 
+--scrub::
+kernel scrub equivalent.
++
+Off-line scrub has better reconstruction check than kernel. Won't cause
+possible silent data corruption for RAID5
++
+NOTE: Support for RAID6 recover is not fully implemented yet.
+
 DANGEROUS OPTIONS
 -----------------
 
diff --git a/check/check.h b/check/check.h
index 61d1cac..7c14716 100644
--- a/check/check.h
+++ b/check/check.h
@@ -19,3 +19,5 @@
 /* check/csum.c */
 int btrfs_read_one_data_csum(struct btrfs_fs_info *fs_info, u64 bytenr,
 			     void *csum_ret);
+/* check/scrub.c */
+int scrub_btrfs(struct btrfs_fs_info *fs_info);
diff --git a/check/scrub.c b/check/scrub.c
index 94f8744..3327791 100644
--- a/check/scrub.c
+++ b/check/scrub.c
@@ -774,3 +774,39 @@ out:
 	btrfs_free_path(path);
 	return ret;
 }
+
+int scrub_btrfs(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_block_group_cache *bg_cache;
+	struct btrfs_scrub_progress scrub_ctx = {0};
+	int ret = 0;
+
+	bg_cache = btrfs_lookup_first_block_group(fs_info, 0);
+	if (!bg_cache) {
+		error("no block group is found");
+		return -ENOENT;
+	}
+
+	while (1) {
+		ret = scrub_one_block_group(fs_info, &scrub_ctx, bg_cache);
+		if (ret < 0 && ret != -EIO)
+			break;
+
+		bg_cache = btrfs_lookup_first_block_group(fs_info,
+				bg_cache->key.objectid + bg_cache->key.offset);
+		if (!bg_cache)
+			break;
+	}
+
+	printf("Scrub result:\n");
+	printf("Tree bytes scrubbed: %llu\n", scrub_ctx.tree_bytes_scrubbed);
+	printf("Data bytes scrubbed: %llu\n", scrub_ctx.data_bytes_scrubbed);
+	printf("Read error: %llu\n", scrub_ctx.read_errors);
+	printf("Verify error: %llu\n", scrub_ctx.verify_errors);
+	if (scrub_ctx.csum_errors || scrub_ctx.read_errors ||
+	    scrub_ctx.uncorrectable_errors || scrub_ctx.verify_errors)
+		ret = 1;
+	else
+		ret = 0;
+	return ret;
+}
diff --git a/cmds-check.c b/cmds-check.c
index 670ccd1..a081e82 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -41,6 +41,7 @@
 #include "rbtree-utils.h"
 #include "backref.h"
 #include "ulist.h"
+#include "check.h"
 
 enum task_position {
 	TASK_EXTENTS,
@@ -11252,6 +11253,7 @@ int cmd_check(int argc, char **argv)
 	int readonly = 0;
 	int qgroup_report = 0;
 	int qgroups_repaired = 0;
+	int scrub = 0;
 	unsigned ctree_flags = OPEN_CTREE_EXCLUSIVE;
 
 	while(1) {
@@ -11259,7 +11261,7 @@ int cmd_check(int argc, char **argv)
 		enum { GETOPT_VAL_REPAIR = 257, GETOPT_VAL_INIT_CSUM,
 			GETOPT_VAL_INIT_EXTENT, GETOPT_VAL_CHECK_CSUM,
 			GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE,
-			GETOPT_VAL_MODE };
+			GETOPT_VAL_MODE, GETOPT_VAL_SCRUB };
 		static const struct option long_options[] = {
 			{ "super", required_argument, NULL, 's' },
 			{ "repair", no_argument, NULL, GETOPT_VAL_REPAIR },
@@ -11279,6 +11281,7 @@ int cmd_check(int argc, char **argv)
 			{ "progress", no_argument, NULL, 'p' },
 			{ "mode", required_argument, NULL,
 				GETOPT_VAL_MODE },
+			{ "scrub", no_argument, NULL, GETOPT_VAL_SCRUB },
 			{ NULL, 0, NULL, 0}
 		};
 
@@ -11350,6 +11353,9 @@ int cmd_check(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case GETOPT_VAL_SCRUB:
+				scrub = 1;
+				break;
 		}
 	}
 
@@ -11402,6 +11408,10 @@ int cmd_check(int argc, char **argv)
 	global_info = info;
 	root = info->fs_root;
 
+	if (scrub) {
+		ret = scrub_btrfs(info);
+		goto err_out;
+	}
 	/*
 	 * repair mode will force us to commit transaction which
 	 * will make us fail to load log tree when mounting.
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  2016-10-17  1:27 ` [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Qu Wenruo
@ 2016-10-20  9:23   ` Sanidhya Solanki
  2016-10-20  9:35     ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Sanidhya Solanki @ 2016-10-20  9:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, 17 Oct 2016 09:27:31 +0800
Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:

> For READ, caller normally hopes to get what they request, other than
> full stripe map.
> 
> In this case, we should remove unrelated stripe map, just like the
> following case:
>                32K               96K
>                |<-request range->|
>          0              64k           128K
> RAID0:   |    Data 1    |   Data 2    |
>               disk1         disk2
> Before this patch, we return the full stripe:
> Stripe 0: Logical 0, Physical X, Len 64K, Dev disk1
> Stripe 1: Logical 64k, Physical Y, Len 64K, Dev disk2
> 
> After this patch, we limit the stripe result to the request range:
> Stripe 0: Logical 32K, Physical X+32K, Len 32K, Dev disk1
> Stripe 1: Logical 64k, Physical Y, Len 32K, Dev disk2
> 
> And if it's a RAID5/6 stripe, we just handle it like RAID0, ignoring
> parities.
> 
> This should make caller easier to use.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  volumes.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/volumes.c b/volumes.c
> index 94f3e42..ba16d19 100644
> --- a/volumes.c
> +++ b/volumes.c
> @@ -1682,6 +1682,107 @@ static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
>  	return 0;
>  }
>  
> +static void del_one_stripe(struct btrfs_map_block *map_block, int i)
> +{
> +	int cur_nr = map_block->num_stripes;
> +	int size_left = (cur_nr - 1 - i) * sizeof(struct btrfs_map_stripe);
> +
> +	memmove(&map_block->stripes[i], &map_block->stripes[i + 1], size_left);
> +	map_block->num_stripes--;
> +}
> +
> +static void remove_unrelated_stripes(struct map_lookup *map,
> +				     int rw, u64 start, u64 length,
> +				     struct btrfs_map_block *map_block)
> +{
> +	int i = 0;
> +	/*
> +	 * RAID5/6 write must use full stripe.
> +	 * No need to do anything.
> +	 */
> +	if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6) &&
> +	    rw == WRITE)
> +		return;

I believe it should be an "or" operation in the loop condition, rather than a
single pipe.

Sanidhya

> +	/*
> +	 * For RAID0/1/10/DUP, whatever read/write, we can remove unrelated
> +	 * stripes without causing anything wrong.
> +	 * RAID5/6 READ is just like RAID0, we don't care parity unless we need
> +	 * to recovery.
> +	 * For recovery, rw should be set to WRITE.
> +	 */
> +	while (i < map_block->num_stripes) {
> +		struct btrfs_map_stripe *stripe;
> +		u64 orig_logical; /* Original stripe logical start */
> +		u64 orig_end; /* Original stripe logical end */
> +
> +		stripe = &map_block->stripes[i];
> +
> +		/*
> +		 * For READ, we don't really care parity
> +		 */
> +		if (stripe->logical == BTRFS_RAID5_P_STRIPE ||
> +		    stripe->logical == BTRFS_RAID6_Q_STRIPE) {
> +			del_one_stripe(map_block, i);
> +			continue;
> +		}
> +		/* Completely unrelated stripe */
> +		if (stripe->logical >= start + length ||
> +		    stripe->logical + stripe->length <= start) {
> +			del_one_stripe(map_block, i);
> +			continue;
> +		}
> +		/* Covered stripe, modify its logical and physical */
> +		orig_logical = stripe->logical;
> +		orig_end = stripe->logical + stripe->length;
> +		if (start + length <= orig_end) {
> +			/*
> +			 * |<--range-->|
> +			 *   |  stripe   |
> +			 * Or
> +			 *     |<range>|
> +			 *   |  stripe   |
> +			 */
> +			stripe->logical = max(orig_logical, start);
> +			stripe->length = start + length;
> +			stripe->physical += stripe->logical - orig_logical;
> +		} else if (start >= orig_logical) {
> +			/*
> +			 *     |<-range--->|
> +			 * |  stripe     |
> +			 * Or
> +			 *     |<range>|
> +			 * |  stripe     |
> +			 */
> +			stripe->logical = start;
> +			stripe->length = min(orig_end, start + length);
> +			stripe->physical += stripe->logical - orig_logical;
> +		}
> +		/*
> +		 * Remaining case:
> +		 * |<----range----->|
> +		 *   | stripe |
> +		 * No need to do any modification
> +		 */
> +		i++;
> +	}
> +
> +	/* Recaculate map_block size */
> +	map_block->start = 0;
> +	map_block->length = 0;
> +	for (i = 0; i < map_block->num_stripes; i++) {
> +		struct btrfs_map_stripe *stripe;
> +
> +		stripe = &map_block->stripes[i];
> +		if (stripe->logical > map_block->start)
> +			map_block->start = stripe->logical;
> +		if (stripe->logical + stripe->length >
> +		    map_block->start + map_block->length)
> +			map_block->length = stripe->logical + stripe->length -
> +					    map_block->start;
> +	}
> +}
> +
>  int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
>  			 u64 length, struct btrfs_map_block **map_ret)
>  {
> @@ -1717,7 +1818,7 @@ int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
>  		free(map_block);
>  		return ret;
>  	}
> -	/* TODO: Remove unrelated map_stripes for READ operation */
> +	remove_unrelated_stripes(map, rw, logical, length, map_block);
>  
>  	*map_ret = map_block;
>  	return 0;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  2016-10-20  9:23   ` Sanidhya Solanki
@ 2016-10-20  9:35     ` Qu Wenruo
  2016-10-20 10:12       ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2016-10-20  9:35 UTC (permalink / raw)
  To: Sanidhya Solanki; +Cc: linux-btrfs



At 10/20/2016 05:23 PM, Sanidhya Solanki wrote:
> On Mon, 17 Oct 2016 09:27:31 +0800
> Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>> For READ, caller normally hopes to get what they request, other than
>> full stripe map.
>>
>> In this case, we should remove unrelated stripe map, just like the
>> following case:
>>                32K               96K
>>                |<-request range->|
>>          0              64k           128K
>> RAID0:   |    Data 1    |   Data 2    |
>>               disk1         disk2
>> Before this patch, we return the full stripe:
>> Stripe 0: Logical 0, Physical X, Len 64K, Dev disk1
>> Stripe 1: Logical 64k, Physical Y, Len 64K, Dev disk2
>>
>> After this patch, we limit the stripe result to the request range:
>> Stripe 0: Logical 32K, Physical X+32K, Len 32K, Dev disk1
>> Stripe 1: Logical 64k, Physical Y, Len 32K, Dev disk2
>>
>> And if it's a RAID5/6 stripe, we just handle it like RAID0, ignoring
>> parities.
>>
>> This should make caller easier to use.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>  volumes.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>
>> diff --git a/volumes.c b/volumes.c
>> index 94f3e42..ba16d19 100644
>> --- a/volumes.c
>> +++ b/volumes.c
>> @@ -1682,6 +1682,107 @@ static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
>>  	return 0;
>>  }
>>
>> +static void del_one_stripe(struct btrfs_map_block *map_block, int i)
>> +{
>> +	int cur_nr = map_block->num_stripes;
>> +	int size_left = (cur_nr - 1 - i) * sizeof(struct btrfs_map_stripe);
>> +
>> +	memmove(&map_block->stripes[i], &map_block->stripes[i + 1], size_left);
>> +	map_block->num_stripes--;
>> +}
>> +
>> +static void remove_unrelated_stripes(struct map_lookup *map,
>> +				     int rw, u64 start, u64 length,
>> +				     struct btrfs_map_block *map_block)
>> +{
>> +	int i = 0;
>> +	/*
>> +	 * RAID5/6 write must use full stripe.
>> +	 * No need to do anything.
>> +	 */
>> +	if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6) &&
>> +	    rw == WRITE)
>> +		return;
>
> I believe it should be an "or" operation in the loop condition, rather than a
> single pipe.
>
> Sanidhya

Nope, it's bit OR

Here we first bit OR RAID5 and RAID6, then bit AND the type.
RAID5 and RAID6 are different bit map, so I don't see anything wrong here.

If map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6) is 
none-zero, it means it's either RAID5 or RAID6.

Or did I miss something?

Thanks,
Qu
>
>> +	/*
>> +	 * For RAID0/1/10/DUP, whatever read/write, we can remove unrelated
>> +	 * stripes without causing anything wrong.
>> +	 * RAID5/6 READ is just like RAID0, we don't care parity unless we need
>> +	 * to recovery.
>> +	 * For recovery, rw should be set to WRITE.
>> +	 */
>> +	while (i < map_block->num_stripes) {
>> +		struct btrfs_map_stripe *stripe;
>> +		u64 orig_logical; /* Original stripe logical start */
>> +		u64 orig_end; /* Original stripe logical end */
>> +
>> +		stripe = &map_block->stripes[i];
>> +
>> +		/*
>> +		 * For READ, we don't really care parity
>> +		 */
>> +		if (stripe->logical == BTRFS_RAID5_P_STRIPE ||
>> +		    stripe->logical == BTRFS_RAID6_Q_STRIPE) {
>> +			del_one_stripe(map_block, i);
>> +			continue;
>> +		}
>> +		/* Completely unrelated stripe */
>> +		if (stripe->logical >= start + length ||
>> +		    stripe->logical + stripe->length <= start) {
>> +			del_one_stripe(map_block, i);
>> +			continue;
>> +		}
>> +		/* Covered stripe, modify its logical and physical */
>> +		orig_logical = stripe->logical;
>> +		orig_end = stripe->logical + stripe->length;
>> +		if (start + length <= orig_end) {
>> +			/*
>> +			 * |<--range-->|
>> +			 *   |  stripe   |
>> +			 * Or
>> +			 *     |<range>|
>> +			 *   |  stripe   |
>> +			 */
>> +			stripe->logical = max(orig_logical, start);
>> +			stripe->length = start + length;
>> +			stripe->physical += stripe->logical - orig_logical;
>> +		} else if (start >= orig_logical) {
>> +			/*
>> +			 *     |<-range--->|
>> +			 * |  stripe     |
>> +			 * Or
>> +			 *     |<range>|
>> +			 * |  stripe     |
>> +			 */
>> +			stripe->logical = start;
>> +			stripe->length = min(orig_end, start + length);
>> +			stripe->physical += stripe->logical - orig_logical;
>> +		}
>> +		/*
>> +		 * Remaining case:
>> +		 * |<----range----->|
>> +		 *   | stripe |
>> +		 * No need to do any modification
>> +		 */
>> +		i++;
>> +	}
>> +
>> +	/* Recaculate map_block size */
>> +	map_block->start = 0;
>> +	map_block->length = 0;
>> +	for (i = 0; i < map_block->num_stripes; i++) {
>> +		struct btrfs_map_stripe *stripe;
>> +
>> +		stripe = &map_block->stripes[i];
>> +		if (stripe->logical > map_block->start)
>> +			map_block->start = stripe->logical;
>> +		if (stripe->logical + stripe->length >
>> +		    map_block->start + map_block->length)
>> +			map_block->length = stripe->logical + stripe->length -
>> +					    map_block->start;
>> +	}
>> +}
>> +
>>  int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
>>  			 u64 length, struct btrfs_map_block **map_ret)
>>  {
>> @@ -1717,7 +1818,7 @@ int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
>>  		free(map_block);
>>  		return ret;
>>  	}
>> -	/* TODO: Remove unrelated map_stripes for READ operation */
>> +	remove_unrelated_stripes(map, rw, logical, length, map_block);
>>
>>  	*map_ret = map_block;
>>  	return 0;
>
>
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  2016-10-20  9:35     ` Qu Wenruo
@ 2016-10-20 10:12       ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2016-10-20 10:12 UTC (permalink / raw)
  To: Qu Wenruo, Sanidhya Solanki; +Cc: linux-btrfs



On 10/20/2016 05:35 PM, Qu Wenruo wrote:
>
>
> At 10/20/2016 05:23 PM, Sanidhya Solanki wrote:
>> On Mon, 17 Oct 2016 09:27:31 +0800
>> Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>> For READ, caller normally hopes to get what they request, other than
>>> full stripe map.
>>>
>>> In this case, we should remove unrelated stripe map, just like the
>>> following case:
>>>                32K               96K
>>>                |<-request range->|
>>>          0              64k           128K
>>> RAID0:   |    Data 1    |   Data 2    |
>>>               disk1         disk2
>>> Before this patch, we return the full stripe:
>>> Stripe 0: Logical 0, Physical X, Len 64K, Dev disk1
>>> Stripe 1: Logical 64k, Physical Y, Len 64K, Dev disk2
>>>
>>> After this patch, we limit the stripe result to the request range:
>>> Stripe 0: Logical 32K, Physical X+32K, Len 32K, Dev disk1
>>> Stripe 1: Logical 64k, Physical Y, Len 32K, Dev disk2
>>>
>>> And if it's a RAID5/6 stripe, we just handle it like RAID0, ignoring
>>> parities.
>>>
>>> This should make caller easier to use.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>  volumes.c | 103
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/volumes.c b/volumes.c
>>> index 94f3e42..ba16d19 100644
>>> --- a/volumes.c
>>> +++ b/volumes.c
>>> @@ -1682,6 +1682,107 @@ static int fill_full_map_block(struct
>>> map_lookup *map, u64 start, u64 length,
>>>      return 0;
>>>  }
>>>
>>> +static void del_one_stripe(struct btrfs_map_block *map_block, int i)
>>> +{
>>> +    int cur_nr = map_block->num_stripes;
>>> +    int size_left = (cur_nr - 1 - i) * sizeof(struct btrfs_map_stripe);
>>> +
>>> +    memmove(&map_block->stripes[i], &map_block->stripes[i + 1],
>>> size_left);
>>> +    map_block->num_stripes--;
>>> +}
>>> +
>>> +static void remove_unrelated_stripes(struct map_lookup *map,
>>> +                     int rw, u64 start, u64 length,
>>> +                     struct btrfs_map_block *map_block)
>>> +{
>>> +    int i = 0;
>>> +    /*
>>> +     * RAID5/6 write must use full stripe.
>>> +     * No need to do anything.
>>> +     */
>>> +    if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
>>> BTRFS_BLOCK_GROUP_RAID6) &&
>>> +        rw == WRITE)
>>> +        return;
>>
>> I believe it should be an "or" operation in the loop condition, rather
>> than a
>> single pipe.
>>
>> Sanidhya

Sorry for not getting your point correctly.

If you mean the check should be in the loop, then it's still not the case.

The block_map in this step is always a full map.

For RAID5/6 it is and only a full stripe, and if we're doing write, then 
we must return the full stripe, not removing any stripe.

Only other profiles or it's a read operation, we need to stripe 
unrelated stripes in that loop.

Feel free to comment if you still have any question.

>>
>>> +    /*
>>> +     * For RAID0/1/10/DUP, whatever read/write, we can remove unrelated
>>> +     * stripes without causing anything wrong.
>>> +     * RAID5/6 READ is just like RAID0, we don't care parity unless
>>> we need
>>> +     * to recovery.
>>> +     * For recovery, rw should be set to WRITE.
>>> +     */
>>> +    while (i < map_block->num_stripes) {
>>> +        struct btrfs_map_stripe *stripe;
>>> +        u64 orig_logical; /* Original stripe logical start */
>>> +        u64 orig_end; /* Original stripe logical end */
>>> +
>>> +        stripe = &map_block->stripes[i];
>>> +
>>> +        /*
>>> +         * For READ, we don't really care parity
>>> +         */
>>> +        if (stripe->logical == BTRFS_RAID5_P_STRIPE ||
>>> +            stripe->logical == BTRFS_RAID6_Q_STRIPE) {
>>> +            del_one_stripe(map_block, i);
>>> +            continue;
>>> +        }
>>> +        /* Completely unrelated stripe */
>>> +        if (stripe->logical >= start + length ||
>>> +            stripe->logical + stripe->length <= start) {
>>> +            del_one_stripe(map_block, i);
>>> +            continue;
>>> +        }
>>> +        /* Covered stripe, modify its logical and physical */
>>> +        orig_logical = stripe->logical;
>>> +        orig_end = stripe->logical + stripe->length;
>>> +        if (start + length <= orig_end) {
>>> +            /*
>>> +             * |<--range-->|
>>> +             *   |  stripe   |
>>> +             * Or
>>> +             *     |<range>|
>>> +             *   |  stripe   |
>>> +             */
>>> +            stripe->logical = max(orig_logical, start);
>>> +            stripe->length = start + length;
>>> +            stripe->physical += stripe->logical - orig_logical;
>>> +        } else if (start >= orig_logical) {
>>> +            /*
>>> +             *     |<-range--->|
>>> +             * |  stripe     |
>>> +             * Or
>>> +             *     |<range>|
>>> +             * |  stripe     |
>>> +             */
>>> +            stripe->logical = start;
>>> +            stripe->length = min(orig_end, start + length);
>>> +            stripe->physical += stripe->logical - orig_logical;
>>> +        }
>>> +        /*
>>> +         * Remaining case:
>>> +         * |<----range----->|
>>> +         *   | stripe |
>>> +         * No need to do any modification
>>> +         */
>>> +        i++;
>>> +    }
>>> +
>>> +    /* Recaculate map_block size */
>>> +    map_block->start = 0;
>>> +    map_block->length = 0;
>>> +    for (i = 0; i < map_block->num_stripes; i++) {
>>> +        struct btrfs_map_stripe *stripe;
>>> +
>>> +        stripe = &map_block->stripes[i];
>>> +        if (stripe->logical > map_block->start)
>>> +            map_block->start = stripe->logical;
>>> +        if (stripe->logical + stripe->length >
>>> +            map_block->start + map_block->length)
>>> +            map_block->length = stripe->logical + stripe->length -
>>> +                        map_block->start;
>>> +    }
>>> +}
>>> +
>>>  int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64
>>> logical,
>>>               u64 length, struct btrfs_map_block **map_ret)
>>>  {
>>> @@ -1717,7 +1818,7 @@ int __btrfs_map_block_v2(struct btrfs_fs_info
>>> *fs_info, int rw, u64 logical,
>>>          free(map_block);
>>>          return ret;
>>>      }
>>> -    /* TODO: Remove unrelated map_stripes for READ operation */
>>> +    remove_unrelated_stripes(map, rw, logical, length, map_block);
>>>
>>>      *map_ret = map_block;
>>>      return 0;
>>
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-10-20 10:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17  1:27 [RFC PATCH v0.8 00/14] Offline scrub support, and hint to solve kernel scrub data silent corruption Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 01/14] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 02/14] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Qu Wenruo
2016-10-20  9:23   ` Sanidhya Solanki
2016-10-20  9:35     ` Qu Wenruo
2016-10-20 10:12       ` Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 03/14] btrfs-progs: check/csum: Introduce function to read out one data csum Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 04/14] btrfs-progs: check/scrub: Introduce structures to support fsck scrub Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 05/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based tree block Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 06/14] btrfs-progs: check/scrub: Introduce function to scrub mirror based data blocks Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 07/14] btrfs-progs: check/scrub: Introduce function to scrub one extent Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 08/14] btrfs-progs: check/scrub: Introduce function to scrub one data stripe Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 09/14] btrfs-progs: check/scrub: Introduce function to verify parities Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 10/14] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 11/14] btrfs-progs: check/scrub: Introduce function to recover data parity Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 12/14] btrfs-progs: check/scrub: Introduce a function to scrub one full stripe Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 13/14] btrfs-progs: check/scrub: Introduce function to check a whole block group Qu Wenruo
2016-10-17  1:27 ` [RFC PATCH v0.8 14/14] btrfs-progs: fsck: Introduce offline scrub function Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.