* [PATCH 1/8] btrfs-progs: remove the unnecessary BTRFS_SUPER_INFO_OFFSET path for tree block read
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 2/8] btrfs-progs: extract metadata restore read code into its own helper Qu Wenruo
` (8 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
We used to use read_whole_eb() to read super block, but it's no longer
the case (so long that I can not even find out which commit did the
conversion).
Thus there is no need for read_whole_eb() to handle super block read
anymore.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/disk-io.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 2a9b2681a043..f2492547f77d 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -302,8 +302,7 @@ int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirr
read_len = bytes_left;
device = NULL;
- if (!info->on_restoring &&
- eb->start != BTRFS_SUPER_INFO_OFFSET) {
+ if (!info->on_restoring) {
ret = btrfs_map_block(info, READ, eb->start + offset,
&read_len, &multi, mirror, NULL);
if (ret) {
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/8] btrfs-progs: extract metadata restore read code into its own helper
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
2022-04-05 12:48 ` [PATCH 1/8] btrfs-progs: remove the unnecessary BTRFS_SUPER_INFO_OFFSET path for tree block read Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 3/8] btrfs-progs: don't use write_extent_to_disk() directly Qu Wenruo
` (7 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
For metadata restore, our logical address is mapped to a single device
with logical address 1:1 mapped to device physical address.
Move this part of code into a helper, this will make later extent buffer
read path refactoer much easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/disk-io.c | 69 +++++++++++++++++++++++++----------------
1 file changed, 42 insertions(+), 27 deletions(-)
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index f2492547f77d..e6f5d554f13a 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -288,6 +288,30 @@ out:
}
+static int read_on_restore(struct extent_buffer *eb)
+{
+ struct btrfs_fs_info *fs_info = eb->fs_info;
+ struct btrfs_device *device;
+ int ret;
+
+ /*
+ * For on_restoring mode, there should be only one device, and logical
+ * address is mapped 1:1 to device physical offset.
+ */
+ list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
+ if (device->devid == 1)
+ break;
+ }
+ device->total_ios++;
+
+ ret = btrfs_pread(device->fd, eb->data, eb->len, eb->start,
+ eb->fs_info->zoned);
+ if (ret != eb->len)
+ ret = -EIO;
+ else
+ ret = 0;
+ return ret;
+}
int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirror)
{
@@ -302,38 +326,29 @@ int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirr
read_len = bytes_left;
device = NULL;
- if (!info->on_restoring) {
- ret = btrfs_map_block(info, READ, eb->start + offset,
- &read_len, &multi, mirror, NULL);
- if (ret) {
- printk("Couldn't map the block %llu\n", eb->start + offset);
- kfree(multi);
- return -EIO;
- }
- device = multi->stripes[0].dev;
-
- if (device->fd <= 0) {
- kfree(multi);
- return -EIO;
- }
+ if (info->on_restoring)
+ return read_on_restore(eb);
- eb->fd = device->fd;
- device->total_ios++;
- eb->dev_bytenr = multi->stripes[0].physical;
+ ret = btrfs_map_block(info, READ, eb->start + offset,
+ &read_len, &multi, mirror, NULL);
+ if (ret) {
+ printk("Couldn't map the block %llu\n", eb->start + offset);
kfree(multi);
- multi = NULL;
- } else {
- /* special case for restore metadump */
- list_for_each_entry(device, &info->fs_devices->devices, dev_list) {
- if (device->devid == 1)
- break;
- }
+ return -EIO;
+ }
+ device = multi->stripes[0].dev;
- eb->fd = device->fd;
- eb->dev_bytenr = eb->start;
- device->total_ios++;
+ if (device->fd <= 0) {
+ kfree(multi);
+ return -EIO;
}
+ eb->fd = device->fd;
+ device->total_ios++;
+ eb->dev_bytenr = multi->stripes[0].physical;
+ kfree(multi);
+ multi = NULL;
+
if (read_len > bytes_left)
read_len = bytes_left;
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/8] btrfs-progs: don't use write_extent_to_disk() directly
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
2022-04-05 12:48 ` [PATCH 1/8] btrfs-progs: remove the unnecessary BTRFS_SUPER_INFO_OFFSET path for tree block read Qu Wenruo
2022-04-05 12:48 ` [PATCH 2/8] btrfs-progs: extract metadata restore read code into its own helper Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 4/8] btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk() Qu Wenruo
` (6 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
There are two call sites using write_extent_to_disk() directly:
- debug_corrupt_block() in btrfs-corrupt-block.c
- corrupt_keys() in btrfs-corrupt-block.c
The problem of write_extent_to_disk() is, it can only handle plain
profiles (All profiles except P/Q stripes of RAID56).
Calling it directly can corrupted RAID56 P/Q, and in the future we're
going to remove eb::fd/eb::dev_bytes, so remove such call sites with
write_and_map_eb().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
btrfs-corrupt-block.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 8162c0dfd554..8ec6d63f060c 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -76,7 +76,7 @@ static int debug_corrupt_block(struct extent_buffer *eb,
printf("corrupting %llu copy %d\n", eb->start,
mirror_num);
memset(eb->data, 0, eb->len);
- ret = write_extent_to_disk(eb);
+ ret = write_and_map_eb(eb->fs_info, eb);
if (ret < 0) {
errno = -ret;
error("cannot write eb bytenr %llu: %m",
@@ -159,7 +159,7 @@ static void corrupt_keys(struct btrfs_trans_handle *trans,
u16 csum_type = fs_info->csum_type;
csum_tree_block_size(eb, csum_size, 0, csum_type);
- write_extent_to_disk(eb);
+ write_and_map_eb(eb->fs_info, eb);
}
}
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/8] btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (2 preceding siblings ...)
2022-04-05 12:48 ` [PATCH 3/8] btrfs-progs: don't use write_extent_to_disk() directly Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 5/8] btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data() Qu Wenruo
` (5 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
Function write_extent_to_disk() is just writing the content of a tree
block to disk.
It can not handle RAID56, and its work is the same as
write_data_to_disk().
Thus we can replace write_extent_to_disk() with write_data_to_disk()
easily.
There is only one special call site in write_raid56_with_parity(), which
can easily be replace with btrfs_pwrite() directly.
This reduce the write entrance, and make later eb::fd removal easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/disk-io.c | 29 +++++++++++++++--------------
kernel-shared/extent_io.c | 17 +----------------
kernel-shared/extent_io.h | 1 -
kernel-shared/volumes.c | 4 +++-
4 files changed, 19 insertions(+), 32 deletions(-)
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index e6f5d554f13a..f68284264319 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -510,12 +510,12 @@ err:
int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
{
int ret;
- int dev_nr;
+ int mirror_num;
+ int max_mirror;
u64 length;
u64 *raid_map = NULL;
struct btrfs_multi_bio *multi = NULL;
- dev_nr = 0;
length = eb->len;
ret = btrfs_map_block(fs_info, WRITE, eb->start, &length,
&multi, 0, &raid_map);
@@ -526,6 +526,7 @@ int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
goto out;
}
+ /* RAID56 write back need RMW */
if (raid_map) {
ret = write_raid56_with_parity(fs_info, eb, multi,
length, raid_map);
@@ -534,28 +535,28 @@ int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
error(
"failed to write raid56 stripe for bytenr %llu length %llu: %m",
eb->start, length);
- goto out;
}
- } else while (dev_nr < multi->num_stripes) {
- eb->fd = multi->stripes[dev_nr].dev->fd;
- eb->dev_bytenr = multi->stripes[dev_nr].physical;
- multi->stripes[dev_nr].dev->total_ios++;
- dev_nr++;
- ret = write_extent_to_disk(eb);
+ goto out;
+ }
+
+ /* For non-RAID56, we just writeback data to each mirror */
+ max_mirror = btrfs_num_copies(fs_info, eb->start, eb->len);
+ for (mirror_num = 1; mirror_num <= max_mirror; mirror_num++) {
+ ret = write_data_to_disk(fs_info, eb->data, eb->start, eb->len,
+ mirror_num);
if (ret < 0) {
errno = -ret;
error(
-"failed to write bytenr %llu length %u devid %llu dev_bytenr %llu: %m",
- eb->start, eb->len,
- multi->stripes[dev_nr].dev->devid,
- eb->dev_bytenr);
+ "failed to write bytenr %llu length %u to mirror %d: %m",
+ eb->start, eb->len, mirror_num);
goto out;
}
}
+
out:
kfree(raid_map);
kfree(multi);
- return 0;
+ return ret;
}
int write_tree_block(struct btrfs_trans_handle *trans,
diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c
index fba0b050a7e1..ccc8b98107b4 100644
--- a/kernel-shared/extent_io.c
+++ b/kernel-shared/extent_io.c
@@ -808,22 +808,6 @@ out:
return ret;
}
-int write_extent_to_disk(struct extent_buffer *eb)
-{
- int ret;
- ret = btrfs_pwrite(eb->fd, eb->data, eb->len, eb->dev_bytenr,
- eb->fs_info->zoned);
- if (ret < 0)
- goto out;
- if (ret != eb->len) {
- ret = -EIO;
- goto out;
- }
- ret = 0;
-out:
- return ret;
-}
-
int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
u64 bytes, int mirror)
{
@@ -934,6 +918,7 @@ int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
dev_bytenr = multi->stripes[dev_nr].physical;
this_len = min(this_len, bytes_left);
dev_nr++;
+ device->total_ios++;
ret = btrfs_pwrite(device->fd, buf + total_write,
this_len, dev_bytenr, info->zoned);
diff --git a/kernel-shared/extent_io.h b/kernel-shared/extent_io.h
index a4c21360a9e8..b787c19ef049 100644
--- a/kernel-shared/extent_io.h
+++ b/kernel-shared/extent_io.h
@@ -152,7 +152,6 @@ void free_extent_buffer(struct extent_buffer *eb);
void free_extent_buffer_nocache(struct extent_buffer *eb);
int read_extent_from_disk(struct extent_buffer *eb,
unsigned long offset, unsigned long len);
-int write_extent_to_disk(struct extent_buffer *eb);
int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
unsigned long start, unsigned long len);
void read_extent_buffer(const struct extent_buffer *eb, void *dst,
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index 4bf63266879b..7d1e7ea00903 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -30,6 +30,7 @@
#include "kernel-shared/volumes.h"
#include "zoned.h"
#include "common/utils.h"
+#include "common/device-utils.h"
#include "kernel-lib/raid56.h"
const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
@@ -2719,7 +2720,8 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
}
for (i = 0; i < multi->num_stripes; i++) {
- ret = write_extent_to_disk(ebs[i]);
+ ret = btrfs_pwrite(ebs[i]->fd, ebs[i]->data, ebs[i]->len,
+ ebs[i]->dev_bytenr, info->zoned);
if (ret < 0)
goto out_free_split;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/8] btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data()
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (3 preceding siblings ...)
2022-04-05 12:48 ` [PATCH 4/8] btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk() Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 6/8] btrfs-progs: remove extent_buffer::fd and extent_buffer::dev_bytes Qu Wenruo
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
The function read_extent_from_disk() is only a wrapper to read tree
block.
And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.
In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.
This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.
For the few callers of read_extent_data(), open-code a small while loop
for them.
This will allow later RAID56 read repair using P/Q much easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
btrfs-corrupt-block.c | 29 +++--------
btrfs-map-logical.c | 5 +-
btrfstune.c | 3 +-
check/main.c | 2 +-
check/mode-common.c | 4 +-
cmds/restore.c | 4 +-
image/main.c | 2 +-
kernel-shared/disk-io.c | 73 +++-------------------------
kernel-shared/disk-io.h | 2 -
kernel-shared/extent_io.c | 83 +++++++++++---------------------
kernel-shared/extent_io.h | 6 +--
kernel-shared/file.c | 20 ++++----
kernel-shared/free-space-cache.c | 20 ++++++--
13 files changed, 80 insertions(+), 173 deletions(-)
diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 8ec6d63f060c..92d608e6b9c0 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -40,33 +40,18 @@ static int debug_corrupt_block(struct extent_buffer *eb,
struct btrfs_root *root, u64 bytenr, u32 blocksize, u64 copy)
{
int ret;
- u64 length;
- struct btrfs_multi_bio *multi = NULL;
- struct btrfs_device *device;
int num_copies;
int mirror_num = 1;
- length = blocksize;
while (1) {
- ret = btrfs_map_block(root->fs_info, READ, eb->start, &length,
- &multi, mirror_num, NULL);
- if (ret) {
- error("cannot map block %llu length %llu mirror %d: %d",
- eb->start, length, mirror_num, ret);
- return ret;
- }
- device = multi->stripes[0].dev;
- eb->fd = device->fd;
- device->total_ios++;
- eb->dev_bytenr = multi->stripes[0].physical;
-
- fprintf(stdout,
- "mirror %d logical %llu physical %llu device %s\n",
- mirror_num, bytenr, eb->dev_bytenr, device->name);
- free(multi);
-
if (!copy || mirror_num == copy) {
- ret = read_extent_from_disk(eb, 0, eb->len);
+ u64 read_len = eb->len;
+
+ ret = read_data_from_disk(eb->fs_info, eb->data,
+ eb->start, &read_len,
+ mirror_num);
+ if (read_len < eb->len)
+ ret = -EIO;
if (ret < 0) {
errno = -ret;
error("cannot read eb bytenr %llu: %m",
diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
index b3a7526b22a4..860c196d6d9b 100644
--- a/btrfs-map-logical.c
+++ b/btrfs-map-logical.c
@@ -173,8 +173,9 @@ static int write_extent_content(struct btrfs_fs_info *fs_info, int out_fd,
while (cur_offset < length) {
cur_len = min_t(u64, length - cur_offset, BUFFER_SIZE);
- ret = read_extent_data(fs_info, buffer,
- logical + cur_offset, &cur_len, mirror);
+ ret = read_data_from_disk(fs_info, buffer,
+ logical + cur_offset, &cur_len,
+ mirror);
if (ret < 0) {
errno = -ret;
fprintf(stderr,
diff --git a/btrfstune.c b/btrfstune.c
index 33c83bf16291..c9a92349a44c 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -333,7 +333,8 @@ static int populate_csum(struct btrfs_trans_handle *trans,
while (offset < len) {
sectorsize = fs_info->sectorsize;
- ret = read_extent_data(fs_info, buf, start + offset, §orsize, 0);
+ ret = read_data_from_disk(fs_info, buf, start + offset,
+ §orsize, 0);
if (ret)
break;
ret = btrfs_csum_file_block(trans, start + len, start + offset,
diff --git a/check/main.c b/check/main.c
index 954d02413f82..b23e6b2e7b2b 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5757,7 +5757,7 @@ static int check_extent_csums(struct btrfs_root *root, u64 bytenr,
for (mirror = 1; mirror <= num_copies; mirror++) {
read_len = num_bytes - offset;
/* read as much space once a time */
- ret = read_extent_data(gfs_info, (char *)data + offset,
+ ret = read_data_from_disk(gfs_info, (char *)data + offset,
bytenr + offset, &read_len, mirror);
if (ret)
goto out;
diff --git a/check/mode-common.c b/check/mode-common.c
index a906a5852a46..facc672ccd21 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -1202,8 +1202,8 @@ static int populate_csum(struct btrfs_trans_handle *trans,
while (offset < len) {
sectorsize = gfs_info->sectorsize;
- ret = read_extent_data(gfs_info, buf, start + offset,
- §orsize, 0);
+ ret = read_data_from_disk(gfs_info, buf, start + offset,
+ §orsize, 0);
if (ret)
break;
ret = btrfs_csum_file_block(trans, start + len, start + offset,
diff --git a/cmds/restore.c b/cmds/restore.c
index 81ca6cd57cb5..5923d571c986 100644
--- a/cmds/restore.c
+++ b/cmds/restore.c
@@ -407,8 +407,8 @@ again:
cur = bytenr;
while (cur < bytenr + size_left) {
length = bytenr + size_left - cur;
- ret = read_extent_data(root->fs_info, inbuf + cur - bytenr, cur,
- &length, mirror_num);
+ ret = read_data_from_disk(root->fs_info, inbuf + cur - bytenr, cur,
+ &length, mirror_num);
if (ret < 0) {
mirror_num++;
if (mirror_num > num_copies) {
diff --git a/image/main.c b/image/main.c
index 75f9203fab12..88ed54f5b129 100644
--- a/image/main.c
+++ b/image/main.c
@@ -615,7 +615,7 @@ static int read_data_extent(struct metadump_struct *md,
for (cur_mirror = 1; cur_mirror <= num_copies; cur_mirror++) {
while (bytes_left) {
read_len = bytes_left;
- ret = read_extent_data(fs_info,
+ ret = read_data_from_disk(fs_info,
(char *)(async->buffer + offset),
logical, &read_len, cur_mirror);
if (ret < 0)
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index f68284264319..78fe2b39da4c 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -316,45 +316,20 @@ static int read_on_restore(struct extent_buffer *eb)
int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirror)
{
unsigned long offset = 0;
- struct btrfs_multi_bio *multi = NULL;
- struct btrfs_device *device;
int ret = 0;
- u64 read_len;
unsigned long bytes_left = eb->len;
while (bytes_left) {
- read_len = bytes_left;
- device = NULL;
+ u64 read_len = bytes_left;
if (info->on_restoring)
return read_on_restore(eb);
- ret = btrfs_map_block(info, READ, eb->start + offset,
- &read_len, &multi, mirror, NULL);
- if (ret) {
- printk("Couldn't map the block %llu\n", eb->start + offset);
- kfree(multi);
- return -EIO;
- }
- device = multi->stripes[0].dev;
-
- if (device->fd <= 0) {
- kfree(multi);
- return -EIO;
- }
-
- eb->fd = device->fd;
- device->total_ios++;
- eb->dev_bytenr = multi->stripes[0].physical;
- kfree(multi);
- multi = NULL;
-
- if (read_len > bytes_left)
- read_len = bytes_left;
-
- ret = read_extent_from_disk(eb, offset, read_len);
- if (ret)
- return -EIO;
+ ret = read_data_from_disk(info, eb->data + offset,
+ eb->start + offset, &read_len,
+ mirror);
+ if (ret < 0)
+ return ret;
offset += read_len;
bytes_left -= read_len;
}
@@ -471,42 +446,6 @@ struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
return ERR_PTR(ret);
}
-int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical,
- u64 *len, int mirror)
-{
- u64 offset = 0;
- struct btrfs_multi_bio *multi = NULL;
- struct btrfs_device *device;
- int ret = 0;
- u64 max_len = *len;
-
- ret = btrfs_map_block(fs_info, READ, logical, len, &multi, mirror,
- NULL);
- if (ret) {
- fprintf(stderr, "Couldn't map the block %llu\n",
- logical + offset);
- goto err;
- }
- device = multi->stripes[0].dev;
-
- if (*len > max_len)
- *len = max_len;
- if (device->fd < 0) {
- ret = -EIO;
- goto err;
- }
-
- ret = btrfs_pread(device->fd, data, *len, multi->stripes[0].physical,
- fs_info->zoned);
- if (ret != *len)
- ret = -EIO;
- else
- ret = 0;
-err:
- kfree(multi);
- return ret;
-}
-
int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
{
int ret;
diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
index e07141a9596d..bba97fc1a814 100644
--- a/kernel-shared/disk-io.h
+++ b/kernel-shared/disk-io.h
@@ -141,8 +141,6 @@ int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirr
struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
u64 parent_transid);
-int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical,
- u64 *len, int mirror);
void readahead_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
u64 parent_transid);
struct extent_buffer* btrfs_find_create_tree_block(
diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c
index ccc8b98107b4..c8bb30f62c2d 100644
--- a/kernel-shared/extent_io.c
+++ b/kernel-shared/extent_io.c
@@ -28,6 +28,7 @@
#include "kernel-lib/list.h"
#include "kernel-shared/ctree.h"
#include "kernel-shared/volumes.h"
+#include "kernel-shared/disk-io.h"
#include "common/utils.h"
#include "common/device-utils.h"
#include "common/internal.h"
@@ -789,69 +790,41 @@ struct extent_buffer *alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
return ret;
}
-int read_extent_from_disk(struct extent_buffer *eb,
- unsigned long offset, unsigned long len)
-{
- int ret;
- ret = btrfs_pread(eb->fd, eb->data + offset, len, eb->dev_bytenr,
- eb->fs_info->zoned);
- if (ret < 0) {
- ret = -errno;
- goto out;
- }
- if (ret != len) {
- ret = -EIO;
- goto out;
- }
- ret = 0;
-out:
- return ret;
-}
-
-int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
- u64 bytes, int mirror)
+int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 logical,
+ u64 *len, int mirror)
{
struct btrfs_multi_bio *multi = NULL;
struct btrfs_device *device;
- u64 bytes_left = bytes;
- u64 read_len;
- u64 total_read = 0;
+ u64 read_len = *len;
int ret;
- while (bytes_left) {
- read_len = bytes_left;
- ret = btrfs_map_block(info, READ, offset, &read_len, &multi,
- mirror, NULL);
- if (ret) {
- fprintf(stderr, "Couldn't map the block %llu\n",
- offset);
- return -EIO;
- }
- device = multi->stripes[0].dev;
-
- read_len = min(bytes_left, read_len);
- if (device->fd <= 0) {
- kfree(multi);
- return -EIO;
- }
+ ret = btrfs_map_block(info, READ, logical, &read_len, &multi, mirror,
+ NULL);
+ if (ret) {
+ fprintf(stderr, "Couldn't map the block %llu\n", logical);
+ return -EIO;
+ }
+ device = multi->stripes[0].dev;
- ret = btrfs_pread(device->fd, buf + total_read, read_len,
- multi->stripes[0].physical, info->zoned);
+ read_len = min(*len, read_len);
+ if (device->fd <= 0) {
kfree(multi);
- if (ret < 0) {
- fprintf(stderr, "Error reading %llu, %d\n", offset,
- ret);
- return ret;
- }
- if (ret != read_len) {
- fprintf(stderr, "Short read for %llu, read %d, "
- "read_len %llu\n", offset, ret, read_len);
- return -EIO;
- }
+ return -EIO;
+ }
- bytes_left -= read_len;
- offset += read_len;
- total_read += read_len;
+ ret = btrfs_pread(device->fd, buf, read_len,
+ multi->stripes[0].physical, info->zoned);
+ kfree(multi);
+ if (ret < 0) {
+ fprintf(stderr, "Error reading %llu, %d\n", logical,
+ ret);
+ return ret;
+ }
+ if (ret != read_len) {
+ fprintf(stderr,
+ "Short read for %llu, read %d, read_len %llu\n",
+ logical, ret, read_len);
+ return -EIO;
}
return 0;
diff --git a/kernel-shared/extent_io.h b/kernel-shared/extent_io.h
index b787c19ef049..cd5e893b165a 100644
--- a/kernel-shared/extent_io.h
+++ b/kernel-shared/extent_io.h
@@ -150,8 +150,6 @@ struct extent_buffer *alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
u64 bytenr, u32 blocksize);
void free_extent_buffer(struct extent_buffer *eb);
void free_extent_buffer_nocache(struct extent_buffer *eb);
-int read_extent_from_disk(struct extent_buffer *eb,
- unsigned long offset, unsigned long len);
int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
unsigned long start, unsigned long len);
void read_extent_buffer(const struct extent_buffer *eb, void *dst,
@@ -169,8 +167,8 @@ int extent_buffer_test_bit(struct extent_buffer *eb, unsigned long start,
unsigned long nr);
int set_extent_buffer_dirty(struct extent_buffer *eb);
int clear_extent_buffer_dirty(struct extent_buffer *eb);
-int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
- u64 bytes, int mirror);
+int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 logical,
+ u64 *len, int mirror);
int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
u64 bytes, int mirror);
void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start,
diff --git a/kernel-shared/file.c b/kernel-shared/file.c
index a31728102afa..59d82a1dd5f7 100644
--- a/kernel-shared/file.c
+++ b/kernel-shared/file.c
@@ -225,11 +225,11 @@ int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
memset(dest, 0, len);
while (1) {
struct btrfs_file_extent_item *fi;
+ u64 offset = 0;
u64 extent_start;
u64 extent_len;
u64 read_start;
u64 read_len;
- u64 read_len_ret;
u64 disk_bytenr;
leaf = path.nodes[0];
@@ -282,14 +282,16 @@ int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi) +
btrfs_file_extent_offset(leaf, fi);
- read_len_ret = read_len;
- ret = read_extent_data(fs_info, dest + read_start - start, disk_bytenr,
- &read_len_ret, 0);
- if (ret < 0)
- break;
- /* Short read, something went wrong */
- if (read_len_ret != read_len)
- return -EIO;
+ while (offset < read_len) {
+ u64 read_len_ret = read_len - offset;
+
+ ret = read_data_from_disk(fs_info,
+ dest + read_start - start + offset,
+ disk_bytenr + offset, &read_len_ret, 0);
+ if (ret < 0)
+ goto out;
+ offset += read_len_ret;
+ }
read += read_len;
next:
ret = btrfs_next_item(root, &path);
diff --git a/kernel-shared/free-space-cache.c b/kernel-shared/free-space-cache.c
index 6d7fee9c86fb..50eeb4381ed4 100644
--- a/kernel-shared/free-space-cache.c
+++ b/kernel-shared/free-space-cache.c
@@ -118,6 +118,8 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, struct btrfs_root *root,
}
while (total_read < io_ctl->total_size) {
+ u64 offset = 0;
+
if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
ret = btrfs_next_leaf(root, path);
if (ret) {
@@ -150,11 +152,19 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, struct btrfs_root *root,
bytenr = btrfs_file_extent_disk_bytenr(leaf, fi) +
btrfs_file_extent_offset(leaf, fi);
len = btrfs_file_extent_num_bytes(leaf, fi);
- ret = read_data_from_disk(root->fs_info,
- io_ctl->buffer + key.offset, bytenr,
- len, 0);
- if (ret)
- break;
+ while (offset < len) {
+ u64 read_len = len - offset;
+
+ ret = read_data_from_disk(root->fs_info,
+ io_ctl->buffer + key.offset + offset,
+ bytenr + offset,
+ &read_len, 0);
+ if (ret < 0) {
+ btrfs_release_path(path);
+ return ret;
+ }
+ offset += read_len;
+ }
total_read += len;
path->slots[0]++;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 6/8] btrfs-progs: remove extent_buffer::fd and extent_buffer::dev_bytes
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (4 preceding siblings ...)
2022-04-05 12:48 ` [PATCH 5/8] btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data() Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 7/8] btrfs-progs: allow read_data_from_disk() to rebuild RAID56 using P/Q Qu Wenruo
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
Those two members are a shortcut for non-RAID56 profiles.
But we should not use such shortcut, and move all our logical address
read/write to the unified read_data_from_disk()/write_data_to_disk().
With previous refactors, now we're safe to remove them.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
btrfs-corrupt-block.c | 5 ++---
kernel-shared/ctree.c | 5 ++---
kernel-shared/extent_io.c | 2 --
kernel-shared/extent_io.h | 2 --
kernel-shared/volumes.c | 12 +++---------
5 files changed, 7 insertions(+), 19 deletions(-)
diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 92d608e6b9c0..b17e0bf8e1c9 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -55,7 +55,7 @@ static int debug_corrupt_block(struct extent_buffer *eb,
if (ret < 0) {
errno = -ret;
error("cannot read eb bytenr %llu: %m",
- eb->dev_bytenr);
+ eb->start);
return ret;
}
printf("corrupting %llu copy %d\n", eb->start,
@@ -65,10 +65,9 @@ static int debug_corrupt_block(struct extent_buffer *eb,
if (ret < 0) {
errno = -ret;
error("cannot write eb bytenr %llu: %m",
- eb->dev_bytenr);
+ eb->start);
return ret;
}
- fsync(eb->fd);
}
num_copies = btrfs_num_copies(root->fs_info, eb->start,
diff --git a/kernel-shared/ctree.c b/kernel-shared/ctree.c
index 42bbd50d86a1..ecc37a42eecc 100644
--- a/kernel-shared/ctree.c
+++ b/kernel-shared/ctree.c
@@ -590,10 +590,9 @@ static void generic_err(const struct extent_buffer *buf, int slot,
{
va_list args;
- fprintf(stderr, "corrupt %s: root=%lld block=%llu physical=%llu slot=%d, ",
+ fprintf(stderr, "corrupt %s: root=%lld block=%llu slot=%d, ",
btrfs_header_level(buf) == 0 ? "leaf": "node",
- btrfs_header_owner(buf), btrfs_header_bytenr(buf),
- buf->dev_bytenr, slot);
+ btrfs_header_owner(buf), btrfs_header_bytenr(buf), slot);
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c
index c8bb30f62c2d..b8ded5cf7373 100644
--- a/kernel-shared/extent_io.c
+++ b/kernel-shared/extent_io.c
@@ -616,8 +616,6 @@ static struct extent_buffer *__alloc_extent_buffer(struct btrfs_fs_info *info,
eb->len = blocksize;
eb->refs = 1;
eb->flags = 0;
- eb->fd = -1;
- eb->dev_bytenr = (u64)-1;
eb->cache_node.start = bytenr;
eb->cache_node.size = blocksize;
eb->fs_info = info;
diff --git a/kernel-shared/extent_io.h b/kernel-shared/extent_io.h
index cd5e893b165a..aa4f34e187ba 100644
--- a/kernel-shared/extent_io.h
+++ b/kernel-shared/extent_io.h
@@ -88,13 +88,11 @@ struct extent_state {
struct extent_buffer {
struct cache_extent cache_node;
u64 start;
- u64 dev_bytenr;
struct list_head lru;
struct list_head recow;
u32 len;
int refs;
u32 flags;
- int fd;
struct btrfs_fs_info *fs_info;
char data[] __attribute__((aligned(8)));
};
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index 7d1e7ea00903..cb49609cc60c 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -2620,8 +2620,6 @@ static int split_eb_for_raid56(struct btrfs_fs_info *info,
eb->len = stripe_len;
eb->refs = 1;
eb->flags = 0;
- eb->fd = -1;
- eb->dev_bytenr = (u64)-1;
eb->fs_info = info;
this_eb_start = raid_map[i];
@@ -2676,9 +2674,6 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
for (i = 0; i < multi->num_stripes; i++) {
struct extent_buffer *new_eb;
if (raid_map[i] < BTRFS_RAID5_P_STRIPE) {
- ebs[i]->dev_bytenr = multi->stripes[i].physical;
- ebs[i]->fd = multi->stripes[i].dev->fd;
- multi->stripes[i].dev->total_ios++;
if (ebs[i]->start != raid_map[i]) {
ret = -EINVAL;
goto out_free_split;
@@ -2690,8 +2685,6 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
ret = -ENOMEM;
goto out_free_split;
}
- new_eb->dev_bytenr = multi->stripes[i].physical;
- new_eb->fd = multi->stripes[i].dev->fd;
multi->stripes[i].dev->total_ios++;
new_eb->len = stripe_len;
new_eb->fs_info = info;
@@ -2720,8 +2713,9 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
}
for (i = 0; i < multi->num_stripes; i++) {
- ret = btrfs_pwrite(ebs[i]->fd, ebs[i]->data, ebs[i]->len,
- ebs[i]->dev_bytenr, info->zoned);
+ multi->stripes[i].dev->total_ios++;
+ ret = btrfs_pwrite(multi->stripes[i].dev->fd, ebs[i]->data, ebs[i]->len,
+ multi->stripes[i].physical, info->zoned);
if (ret < 0)
goto out_free_split;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 7/8] btrfs-progs: allow read_data_from_disk() to rebuild RAID56 using P/Q
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (5 preceding siblings ...)
2022-04-05 12:48 ` [PATCH 6/8] btrfs-progs: remove extent_buffer::fd and extent_buffer::dev_bytes Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-05 12:48 ` [PATCH 8/8] btrfs-progs: tests/fsck: add test case for data csum check on raid5 Qu Wenruo
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
This new ability is added by:
- Allow btrfs_map_block() to return the chunk type
This makes later work much easier
- Only reset stripe offset inside btrfs_map_block() when needed
Currently if @raid_map is not NULL, btrfs_map_block() will consider
this call is for WRITE and will reset stripe offset.
This is no longer the case, as for RAID56 read with mirror_num 1/0,
we will still call btrfs_map_block() with non-NULL raid_map.
Add a small check to make sure we won't reset stripe offset for
mirror 1/0 read.
- Add new helper read_raid56() to handle rebuild
We will read the full stripe (including all data and P/Q stripes)
do the rebuild, then only copy the refered part to the caller.
There is a catch for RAID6, we have no way to exhaust all combination,
so the current repair will assume the mirror = 0 data is corrupted,
then try to find a missing device.
But if no missing device can be found, it will assume P is corrupted.
This is just a guess, and can to totally wrong, but we have no better
idea.
Now btrfs-progs have full read ability for RAID56.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/extent_io.c | 114 +++++++++++++++++++++++++++++++++++++-
kernel-shared/volumes.c | 27 +++++----
kernel-shared/volumes.h | 1 +
3 files changed, 128 insertions(+), 14 deletions(-)
diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c
index b8ded5cf7373..ee92e0f847d6 100644
--- a/kernel-shared/extent_io.c
+++ b/kernel-shared/extent_io.c
@@ -26,6 +26,7 @@
#include "kerncompat.h"
#include "kernel-shared/extent_io.h"
#include "kernel-lib/list.h"
+#include "kernel-lib/raid56.h"
#include "kernel-shared/ctree.h"
#include "kernel-shared/volumes.h"
#include "kernel-shared/disk-io.h"
@@ -788,23 +789,131 @@ struct extent_buffer *alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
return ret;
}
+static int read_raid56(struct btrfs_fs_info *fs_info, void *buf, u64 logical,
+ u64 len, int mirror, struct btrfs_multi_bio *multi,
+ u64 *raid_map)
+{
+ const int num_stripes = multi->num_stripes;
+ const u64 full_stripe_start = raid_map[0];
+ void **pointers = NULL;
+ int failed_a = -1;
+ int failed_b = -1;
+ int i;
+ int ret;
+
+ /* Only read repair should go this path */
+ ASSERT(mirror > 1);
+ ASSERT(raid_map);
+
+ /* The read length should be inside one stripe */
+ ASSERT(len <= BTRFS_STRIPE_LEN);
+
+ pointers = calloc(num_stripes, sizeof(void *));
+ if (!pointers) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ /* Allocate memory for the full stripe */
+ for (i = 0; i < num_stripes; i++) {
+ pointers[i] = malloc(BTRFS_STRIPE_LEN);
+ if (!pointers[i]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
+ /*
+ * Read the full stripe.
+ *
+ * The stripes in @multi is not rotated, thus can be used to read from
+ * disk directly.
+ */
+ for (i = 0; i < num_stripes; i++) {
+ ret = btrfs_pread(multi->stripes[i].dev->fd, pointers[i],
+ BTRFS_STRIPE_LEN, multi->stripes[i].physical,
+ fs_info->zoned);
+ if (ret < BTRFS_STRIPE_LEN) {
+ ret = -EIO;
+ goto out;
+ }
+ }
+
+ /*
+ * Get the failed index.
+ *
+ * Since we're reading using mirror_num > 1 already, it means the data
+ * stripe where @logical lies in is definitely corrupted.
+ */
+ failed_a = (logical - full_stripe_start) / BTRFS_STRIPE_LEN;
+
+ /*
+ * For RAID6, we don't have good way to exhaust all the combinations,
+ * so here we can only go through the map to see if we have missing devices.
+ */
+ if (multi->type & BTRFS_BLOCK_GROUP_RAID6) {
+ for (i = 0; i < num_stripes; i++) {
+ /* Skip failed_a, as it's already marked failed */
+ if (i == failed_a)
+ continue;
+ /* Missing dev */
+ if (multi->stripes[i].dev->fd == -1) {
+ failed_b = i;
+ break;
+ }
+ }
+ /*
+ * No missing device, we have no better idea, default to P
+ * corruption
+ */
+ if (failed_b < 0)
+ failed_b = num_stripes - 2;
+ }
+
+ /* Rebuild the full stripe */
+ ret = raid56_recov(num_stripes, BTRFS_STRIPE_LEN, multi->type,
+ failed_a, failed_b, pointers);
+ ASSERT(ret == 0);
+
+ /* Now copy the data back to original buf */
+ memcpy(buf, pointers[failed_a] + (logical - full_stripe_start) %
+ BTRFS_STRIPE_LEN, len);
+ ret = 0;
+out:
+ for (i = 0; i < num_stripes; i++)
+ free(pointers[i]);
+ free(pointers);
+ return ret;
+}
+
int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 logical,
u64 *len, int mirror)
{
struct btrfs_multi_bio *multi = NULL;
struct btrfs_device *device;
u64 read_len = *len;
+ u64 *raid_map = NULL;
int ret;
ret = btrfs_map_block(info, READ, logical, &read_len, &multi, mirror,
- NULL);
+ &raid_map);
if (ret) {
fprintf(stderr, "Couldn't map the block %llu\n", logical);
return -EIO;
}
+ read_len = min(*len, read_len);
+
+ /* We need to rebuild from P/Q */
+ if (mirror > 1 && multi->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+ ret = read_raid56(info, buf, logical, read_len, mirror, multi,
+ raid_map);
+ free(multi);
+ free(raid_map);
+ *len = read_len;
+ return ret;
+ }
+ free(raid_map);
device = multi->stripes[0].dev;
- read_len = min(*len, read_len);
if (device->fd <= 0) {
kfree(multi);
return -EIO;
@@ -824,6 +933,7 @@ int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 logical,
logical, ret, read_len);
return -EIO;
}
+ *len = read_len;
return 0;
}
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index cb49609cc60c..f082fa9f898e 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -1805,6 +1805,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
int stripes_required = 1;
int stripe_index;
int i;
+ bool need_raid_map = false;
struct btrfs_multi_bio *multi = NULL;
if (multi_ret && rw == READ) {
@@ -1842,17 +1843,18 @@ again:
}
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK
&& multi_ret && ((rw & WRITE) || mirror_num > 1) && raid_map_ret) {
- /* RAID[56] write or recovery. Return all stripes */
- stripes_required = map->num_stripes;
-
- /* Only allocate the map if we've already got a large enough multi_ret */
- if (stripes_allocated >= stripes_required) {
- raid_map = kmalloc(sizeof(u64) * map->num_stripes, GFP_NOFS);
- if (!raid_map) {
- kfree(multi);
- return -ENOMEM;
- }
- }
+ need_raid_map = true;
+ /* RAID[56] write or recovery. Return all stripes */
+ stripes_required = map->num_stripes;
+
+ /* Only allocate the map if we've already got a large enough multi_ret */
+ if (stripes_allocated >= stripes_required) {
+ raid_map = kmalloc(sizeof(u64) * map->num_stripes, GFP_NOFS);
+ if (!raid_map) {
+ kfree(multi);
+ return -ENOMEM;
+ }
+ }
}
/* if our multi bio struct is too small, back off and try again */
@@ -1890,6 +1892,7 @@ again:
goto out;
multi->num_stripes = 1;
+ multi->type = map->type;
stripe_index = 0;
if (map->type & BTRFS_BLOCK_GROUP_RAID1_MASK) {
if (rw == WRITE)
@@ -1916,7 +1919,7 @@ again:
else if (mirror_num)
stripe_index = mirror_num - 1;
} else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
- if (raid_map) {
+ if (need_raid_map && raid_map) {
int rot;
u64 tmp;
u64 raid56_full_stripe_start;
diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
index 5cfe7e39f6b8..d90065b98a3e 100644
--- a/kernel-shared/volumes.h
+++ b/kernel-shared/volumes.h
@@ -106,6 +106,7 @@ struct btrfs_bio_stripe {
};
struct btrfs_multi_bio {
+ u64 type;
int error;
int num_stripes;
struct btrfs_bio_stripe stripes[];
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 8/8] btrfs-progs: tests/fsck: add test case for data csum check on raid5
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (6 preceding siblings ...)
2022-04-05 12:48 ` [PATCH 7/8] btrfs-progs: allow read_data_from_disk() to rebuild RAID56 using P/Q Qu Wenruo
@ 2022-04-05 12:48 ` Qu Wenruo
2022-04-08 21:16 ` [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time David Sterba
2022-04-11 15:01 ` David Sterba
9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-05 12:48 UTC (permalink / raw)
To: linux-btrfs
Previously 'btrfs check --check-data-csum' will report tons of false
alerts for RAID56.
Add a test case to make sure with the new RAID56 rebuild ability, there
should be no false alerts.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
.../056-raid56-false-alerts/test.sh | 31 +++++++++++++++++++
1 file changed, 31 insertions(+)
create mode 100755 tests/fsck-tests/056-raid56-false-alerts/test.sh
diff --git a/tests/fsck-tests/056-raid56-false-alerts/test.sh b/tests/fsck-tests/056-raid56-false-alerts/test.sh
new file mode 100755
index 000000000000..b82e999c7740
--- /dev/null
+++ b/tests/fsck-tests/056-raid56-false-alerts/test.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+#
+# Make sure "btrfs check --check-data-csum" won't report false alerts on RAID56
+# data.
+#
+
+source "$TEST_TOP/common"
+
+check_prereq btrfs
+check_prereq mkfs.btrfs
+check_global_prereq losetup
+
+setup_loopdevs 3
+prepare_loopdevs
+dev1=${loopdevs[1]}
+TEST_DEV=$dev1
+
+setup_root_helper
+
+run_check $SUDO_HELPERS "$TOP/mkfs.btrfs" -f -m raid1 -d raid5 ${loopdevs[@]}
+run_check_mount_test_dev
+
+run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/file" bs=16K count=1 \
+ status=noxfer > /dev/null 2>&1
+
+run_check_umount_test_dev
+
+# Check data csum should not report false alerts
+run_check "$TOP/btrfs" check --check-data-csum "$dev1"
+
+cleanup_loopdevs
--
2.35.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (7 preceding siblings ...)
2022-04-05 12:48 ` [PATCH 8/8] btrfs-progs: tests/fsck: add test case for data csum check on raid5 Qu Wenruo
@ 2022-04-08 21:16 ` David Sterba
2022-04-11 15:01 ` David Sterba
9 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2022-04-08 21:16 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Tue, Apr 05, 2022 at 08:48:22PM +0800, Qu Wenruo wrote:
> This branch can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/raid56_rebuild
>
> Please note that, the last time I check devel branch, the RAID56 warning
> patch is not yet merged.
It was there but I had not the branch pushed.
> So this is based on the latest devel branch from github.
>
> And since we're adding proper RAID56 repair ability, there is no need
> for the patchset "btrfs-progs: check: avoid false alerts for --check-data-csum on RAID56"
OK so I'll replace it with this patchset.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time
2022-04-05 12:48 [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time Qu Wenruo
` (8 preceding siblings ...)
2022-04-08 21:16 ` [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time David Sterba
@ 2022-04-11 15:01 ` David Sterba
2022-04-25 16:29 ` David Sterba
9 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2022-04-11 15:01 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Tue, Apr 05, 2022 at 08:48:22PM +0800, Qu Wenruo wrote:
> This branch can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/raid56_rebuild
The mkfs tests fail in 001 with the following last command:
====== RUN CHECK root_helper .../mkfs.btrfs -f -d raid5 -m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
WARNING: RAID5/6 support has known problems is strongly discouraged
to be used besides testing or evaluation.
kernel-shared/transaction.c:155: __commit_transaction: BUG_ON `ret` triggered, value 65536
.../mkfs.btrfs(__commit_transaction+0x197)[0x43a107]
/...mkfs.btrfs(btrfs_commit_transaction+0x13b)[0x43a26b]
/...mkfs.btrfs(main+0x174b)[0x40f68d]
/lib64/libc.so.6(+0x2d540)[0x7f969b5ea540]
/lib64/libc.so.6(__libc_start_main+0x7c)[0x7f969b5ea5ec]
/...mkfs.btrfs(_start+0x25)[0x40d965]
/...tests/common: line 540: 29172 Aborted sudo -n "$@"
failed: root_helper .../mkfs.btrfs -f -d raid5 -m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
test failed for case 001-basic-profiles
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time
2022-04-11 15:01 ` David Sterba
@ 2022-04-25 16:29 ` David Sterba
2022-04-25 22:38 ` Qu Wenruo
0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2022-04-25 16:29 UTC (permalink / raw)
To: dsterba, Qu Wenruo, linux-btrfs
On Mon, Apr 11, 2022 at 05:01:49PM +0200, David Sterba wrote:
> On Tue, Apr 05, 2022 at 08:48:22PM +0800, Qu Wenruo wrote:
> > This branch can be fetched from github:
> > https://github.com/adam900710/btrfs-progs/tree/raid56_rebuild
>
> The mkfs tests fail in 001 with the following last command:
>
> ====== RUN CHECK root_helper .../mkfs.btrfs -f -d raid5 -m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> WARNING: RAID5/6 support has known problems is strongly discouraged
> to be used besides testing or evaluation.
>
> kernel-shared/transaction.c:155: __commit_transaction: BUG_ON `ret` triggered, value 65536
> .../mkfs.btrfs(__commit_transaction+0x197)[0x43a107]
> /...mkfs.btrfs(btrfs_commit_transaction+0x13b)[0x43a26b]
> /...mkfs.btrfs(main+0x174b)[0x40f68d]
> /lib64/libc.so.6(+0x2d540)[0x7f969b5ea540]
> /lib64/libc.so.6(__libc_start_main+0x7c)[0x7f969b5ea5ec]
> /...mkfs.btrfs(_start+0x25)[0x40d965]
> /...tests/common: line 540: 29172 Aborted sudo -n "$@"
> failed: root_helper .../mkfs.btrfs -f -d raid5 -m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> test failed for case 001-basic-profiles
It's caused by patch [PATCH 4/8] btrfs-progs: use write_data_to_disk()
to replace write_extent_to_disk()
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/8] btrfs-progs: add RAID56 rebuild ability at read time
2022-04-25 16:29 ` David Sterba
@ 2022-04-25 22:38 ` Qu Wenruo
0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-04-25 22:38 UTC (permalink / raw)
To: dsterba, Qu Wenruo, linux-btrfs
On 2022/4/26 00:29, David Sterba wrote:
> On Mon, Apr 11, 2022 at 05:01:49PM +0200, David Sterba wrote:
>> On Tue, Apr 05, 2022 at 08:48:22PM +0800, Qu Wenruo wrote:
>>> This branch can be fetched from github:
>>> https://github.com/adam900710/btrfs-progs/tree/raid56_rebuild
>>
>> The mkfs tests fail in 001 with the following last command:
>>
>> ====== RUN CHECK root_helper .../mkfs.btrfs -f -d raid5 -m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
>> WARNING: RAID5/6 support has known problems is strongly discouraged
>> to be used besides testing or evaluation.
>>
>> kernel-shared/transaction.c:155: __commit_transaction: BUG_ON `ret` triggered, value 65536
>> .../mkfs.btrfs(__commit_transaction+0x197)[0x43a107]
>> /...mkfs.btrfs(btrfs_commit_transaction+0x13b)[0x43a26b]
>> /...mkfs.btrfs(main+0x174b)[0x40f68d]
>> /lib64/libc.so.6(+0x2d540)[0x7f969b5ea540]
>> /lib64/libc.so.6(__libc_start_main+0x7c)[0x7f969b5ea5ec]
>> /...mkfs.btrfs(_start+0x25)[0x40d965]
>> /...tests/common: line 540: 29172 Aborted sudo -n "$@"
>> failed: root_helper .../mkfs.btrfs -f -d raid5 -m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
>> test failed for case 001-basic-profiles
>
> It's caused by patch [PATCH 4/8] btrfs-progs: use write_data_to_disk()
> to replace write_extent_to_disk()
That's a known one, I forgot to change the return value back to 0 for
write_extent_to_disk().
Do I need to update the whole series or just send an update based on this?
Thanks,
Qu
^ permalink raw reply [flat|nested] 13+ messages in thread