* [PATCH v2 1/5] btrfs-progs: backref: properly queue indirect refs
2022-01-14 0:51 [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations Qu Wenruo
@ 2022-01-14 0:51 ` Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 2/5] btrfs-progs: check: move csum tree population into mode-common.[ch] Qu Wenruo
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2022-01-14 0:51 UTC (permalink / raw)
To: linux-btrfs
[BUG]
When calling iterate_extent_inodes() on data extents with indirect ref
(with inline or keyed EXTENT_DATA_REF_KEY), it will fail to execute the
call back function at all.
[CAUSE]
In function find_parent_nodes(), we only add the target tree block if a
backref has @parent populated.
For indirect backref like EXTENT_DATA_REF_KEY, we rely on
__resolve_indirect_ref() to get the parent leaves.
However __resolve_indirect_ref() only grabs backrefs from
&prefstate->pending_indirect_refs.
Meaning callers should queue any indrect backref to
pending_indirect_refs.
But unfortunately in __add_prelim_ref() and __add_missing_keys(), none
of them properly queue the indirect backrefs to pending_indirect_refs,
but directly to pending.
Making all indirect backrefs never got resolved, thus no callback
function executed
[FIX]
Fix __add_prelim_ref() and __add_missing_keys() to properly queue
indirect backrefs to the correct list.
Currently there is no such direct user in btrfs-progs, but later csum
tree re-initialization code will rely this to do proper csum
re-calculate (to avoid preallocated/nodatasum extents).
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/backref.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel-shared/backref.c b/kernel-shared/backref.c
index 42832c481cae..f1a638ededa8 100644
--- a/kernel-shared/backref.c
+++ b/kernel-shared/backref.c
@@ -192,7 +192,10 @@ static int __add_prelim_ref(struct pref_state *prefstate, u64 root_id,
ref->root_id = root_id;
if (key) {
ref->key_for_search = *key;
- head = &prefstate->pending;
+ if (parent)
+ head = &prefstate->pending;
+ else
+ head = &prefstate->pending_indirect_refs;
} else if (parent) {
memset(&ref->key_for_search, 0, sizeof(ref->key_for_search));
head = &prefstate->pending;
@@ -467,7 +470,10 @@ static int __add_missing_keys(struct btrfs_fs_info *fs_info,
else
btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0);
free_extent_buffer(eb);
- list_move(&ref->list, &prefstate->pending);
+ if (ref->parent)
+ list_move(&ref->list, &prefstate->pending);
+ else
+ list_move(&ref->list, &prefstate->pending_indirect_refs);
}
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/5] btrfs-progs: check: move csum tree population into mode-common.[ch]
2022-01-14 0:51 [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 1/5] btrfs-progs: backref: properly queue indirect refs Qu Wenruo
@ 2022-01-14 0:51 ` Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 3/5] btrfs-progs: check: don't calculate csum for preallocated file extents Qu Wenruo
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2022-01-14 0:51 UTC (permalink / raw)
To: linux-btrfs
This part has no mode specific operations, just move them into
mode-common.[ch].
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
check/main.c | 244 --------------------------------------------
check/mode-common.c | 244 ++++++++++++++++++++++++++++++++++++++++++++
check/mode-common.h | 1 +
3 files changed, 245 insertions(+), 244 deletions(-)
diff --git a/check/main.c b/check/main.c
index 540130b8e223..ea2f2d7b16b8 100644
--- a/check/main.c
+++ b/check/main.c
@@ -9557,250 +9557,6 @@ static int zero_log_tree(struct btrfs_root *root)
return ret;
}
-static int populate_csum(struct btrfs_trans_handle *trans,
- struct btrfs_root *csum_root, char *buf, u64 start,
- u64 len)
-{
- u64 offset = 0;
- u64 sectorsize;
- int ret = 0;
-
- while (offset < len) {
- sectorsize = gfs_info->sectorsize;
- ret = read_extent_data(gfs_info, buf, start + offset,
- §orsize, 0);
- if (ret)
- break;
- ret = btrfs_csum_file_block(trans, start + len, start + offset,
- buf, sectorsize);
- if (ret)
- break;
- offset += sectorsize;
- }
- return ret;
-}
-
-static int fill_csum_tree_from_one_fs_root(struct btrfs_trans_handle *trans,
- struct btrfs_root *cur_root)
-{
- struct btrfs_root *csum_root;
- struct btrfs_path path;
- struct btrfs_key key;
- struct extent_buffer *node;
- struct btrfs_file_extent_item *fi;
- char *buf = NULL;
- u64 start = 0;
- u64 len = 0;
- int slot = 0;
- int ret = 0;
-
- buf = malloc(gfs_info->sectorsize);
- if (!buf)
- return -ENOMEM;
-
- btrfs_init_path(&path);
- key.objectid = 0;
- key.offset = 0;
- key.type = 0;
- ret = btrfs_search_slot(NULL, cur_root, &key, &path, 0, 0);
- if (ret < 0)
- goto out;
- /* Iterate all regular file extents and fill its csum */
- while (1) {
- btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
-
- if (key.type != BTRFS_EXTENT_DATA_KEY)
- goto next;
- node = path.nodes[0];
- slot = path.slots[0];
- fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item);
- if (btrfs_file_extent_type(node, fi) != BTRFS_FILE_EXTENT_REG)
- goto next;
- start = btrfs_file_extent_disk_bytenr(node, fi);
- len = btrfs_file_extent_disk_num_bytes(node, fi);
-
- csum_root = btrfs_csum_root(gfs_info, start);
- ret = populate_csum(trans, csum_root, buf, start, len);
- if (ret == -EEXIST)
- ret = 0;
- if (ret < 0)
- goto out;
-next:
- /*
- * TODO: if next leaf is corrupted, jump to nearest next valid
- * leaf.
- */
- ret = btrfs_next_item(cur_root, &path);
- if (ret < 0)
- goto out;
- if (ret > 0) {
- ret = 0;
- goto out;
- }
- }
-
-out:
- btrfs_release_path(&path);
- free(buf);
- return ret;
-}
-
-static int fill_csum_tree_from_fs(struct btrfs_trans_handle *trans)
-{
- struct btrfs_path path;
- struct btrfs_root *tree_root = gfs_info->tree_root;
- struct btrfs_root *cur_root;
- struct extent_buffer *node;
- struct btrfs_key key;
- int slot = 0;
- int ret = 0;
-
- btrfs_init_path(&path);
- key.objectid = BTRFS_FS_TREE_OBJECTID;
- key.offset = 0;
- key.type = BTRFS_ROOT_ITEM_KEY;
- ret = btrfs_search_slot(NULL, tree_root, &key, &path, 0, 0);
- if (ret < 0)
- goto out;
- if (ret > 0) {
- ret = -ENOENT;
- goto out;
- }
-
- while (1) {
- node = path.nodes[0];
- slot = path.slots[0];
- btrfs_item_key_to_cpu(node, &key, slot);
- if (key.objectid > BTRFS_LAST_FREE_OBJECTID)
- goto out;
- if (key.type != BTRFS_ROOT_ITEM_KEY)
- goto next;
- if (!is_fstree(key.objectid))
- goto next;
- key.offset = (u64)-1;
-
- cur_root = btrfs_read_fs_root(gfs_info, &key);
- if (IS_ERR(cur_root) || !cur_root) {
- fprintf(stderr, "Fail to read fs/subvol tree: %lld\n",
- key.objectid);
- goto out;
- }
- ret = fill_csum_tree_from_one_fs_root(trans, cur_root);
- if (ret < 0)
- goto out;
-next:
- ret = btrfs_next_item(tree_root, &path);
- if (ret > 0) {
- ret = 0;
- goto out;
- }
- if (ret < 0)
- goto out;
- }
-
-out:
- btrfs_release_path(&path);
- return ret;
-}
-
-static int fill_csum_tree_from_extent(struct btrfs_trans_handle *trans,
- struct btrfs_root *extent_root)
-{
- struct btrfs_root *csum_root;
- struct btrfs_path path;
- struct btrfs_extent_item *ei;
- struct extent_buffer *leaf;
- char *buf;
- struct btrfs_key key;
- int ret;
-
- btrfs_init_path(&path);
- key.objectid = 0;
- key.type = BTRFS_EXTENT_ITEM_KEY;
- key.offset = 0;
- ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
- if (ret < 0) {
- btrfs_release_path(&path);
- return ret;
- }
-
- buf = malloc(gfs_info->sectorsize);
- if (!buf) {
- btrfs_release_path(&path);
- return -ENOMEM;
- }
-
- while (1) {
- if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) {
- ret = btrfs_next_leaf(extent_root, &path);
- if (ret < 0)
- break;
- if (ret) {
- ret = 0;
- break;
- }
- }
- leaf = path.nodes[0];
-
- btrfs_item_key_to_cpu(leaf, &key, path.slots[0]);
- if (key.type != BTRFS_EXTENT_ITEM_KEY) {
- path.slots[0]++;
- continue;
- }
-
- ei = btrfs_item_ptr(leaf, path.slots[0],
- struct btrfs_extent_item);
- if (!(btrfs_extent_flags(leaf, ei) &
- BTRFS_EXTENT_FLAG_DATA)) {
- path.slots[0]++;
- continue;
- }
-
- csum_root = btrfs_csum_root(gfs_info, key.objectid);
- ret = populate_csum(trans, csum_root, buf, key.objectid,
- key.offset);
- if (ret)
- break;
- path.slots[0]++;
- }
-
- btrfs_release_path(&path);
- free(buf);
- return ret;
-}
-
-/*
- * Recalculate the csum and put it into the csum tree.
- *
- * Extent tree init will wipe out all the extent info, so in that case, we
- * can't depend on extent tree, but use fs tree. If search_fs_tree is set, we
- * will use fs/subvol trees to init the csum tree.
- */
-static int fill_csum_tree(struct btrfs_trans_handle *trans,
- int search_fs_tree)
-{
- struct btrfs_root *root;
- struct rb_node *n;
- int ret;
-
- if (search_fs_tree)
- return fill_csum_tree_from_fs(trans);
-
- root = btrfs_extent_root(gfs_info, 0);
- while (1) {
- ret = fill_csum_tree_from_extent(trans, root);
- if (ret)
- break;
- n = rb_next(&root->rb_node);
- if (!n)
- break;
- root = rb_entry(n, struct btrfs_root, rb_node);
- if (root->root_key.objectid != BTRFS_EXTENT_TREE_OBJECTID)
- break;
- }
- return ret;
-}
-
static void free_roots_info_cache(void)
{
if (!roots_info_cache)
diff --git a/check/mode-common.c b/check/mode-common.c
index 62a9837bde5e..1608dbdafa4c 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -1191,3 +1191,247 @@ error:
btrfs_abort_transaction(trans, ret);
return ret;
}
+
+static int populate_csum(struct btrfs_trans_handle *trans,
+ struct btrfs_root *csum_root, char *buf, u64 start,
+ u64 len)
+{
+ u64 offset = 0;
+ u64 sectorsize;
+ int ret = 0;
+
+ while (offset < len) {
+ sectorsize = gfs_info->sectorsize;
+ ret = read_extent_data(gfs_info, buf, start + offset,
+ §orsize, 0);
+ if (ret)
+ break;
+ ret = btrfs_csum_file_block(trans, start + len, start + offset,
+ buf, sectorsize);
+ if (ret)
+ break;
+ offset += sectorsize;
+ }
+ return ret;
+}
+
+static int fill_csum_tree_from_one_fs_root(struct btrfs_trans_handle *trans,
+ struct btrfs_root *cur_root)
+{
+ struct btrfs_root *csum_root;
+ struct btrfs_path path;
+ struct btrfs_key key;
+ struct extent_buffer *node;
+ struct btrfs_file_extent_item *fi;
+ char *buf = NULL;
+ u64 start = 0;
+ u64 len = 0;
+ int slot = 0;
+ int ret = 0;
+
+ buf = malloc(gfs_info->sectorsize);
+ if (!buf)
+ return -ENOMEM;
+
+ btrfs_init_path(&path);
+ key.objectid = 0;
+ key.offset = 0;
+ key.type = 0;
+ ret = btrfs_search_slot(NULL, cur_root, &key, &path, 0, 0);
+ if (ret < 0)
+ goto out;
+ /* Iterate all regular file extents and fill its csum */
+ while (1) {
+ btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+
+ if (key.type != BTRFS_EXTENT_DATA_KEY)
+ goto next;
+ node = path.nodes[0];
+ slot = path.slots[0];
+ fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item);
+ if (btrfs_file_extent_type(node, fi) != BTRFS_FILE_EXTENT_REG)
+ goto next;
+ start = btrfs_file_extent_disk_bytenr(node, fi);
+ len = btrfs_file_extent_disk_num_bytes(node, fi);
+
+ csum_root = btrfs_csum_root(gfs_info, start);
+ ret = populate_csum(trans, csum_root, buf, start, len);
+ if (ret == -EEXIST)
+ ret = 0;
+ if (ret < 0)
+ goto out;
+next:
+ /*
+ * TODO: if next leaf is corrupted, jump to nearest next valid
+ * leaf.
+ */
+ ret = btrfs_next_item(cur_root, &path);
+ if (ret < 0)
+ goto out;
+ if (ret > 0) {
+ ret = 0;
+ goto out;
+ }
+ }
+
+out:
+ btrfs_release_path(&path);
+ free(buf);
+ return ret;
+}
+
+static int fill_csum_tree_from_fs(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_path path;
+ struct btrfs_root *tree_root = gfs_info->tree_root;
+ struct btrfs_root *cur_root;
+ struct extent_buffer *node;
+ struct btrfs_key key;
+ int slot = 0;
+ int ret = 0;
+
+ btrfs_init_path(&path);
+ key.objectid = BTRFS_FS_TREE_OBJECTID;
+ key.offset = 0;
+ key.type = BTRFS_ROOT_ITEM_KEY;
+ ret = btrfs_search_slot(NULL, tree_root, &key, &path, 0, 0);
+ if (ret < 0)
+ goto out;
+ if (ret > 0) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ while (1) {
+ node = path.nodes[0];
+ slot = path.slots[0];
+ btrfs_item_key_to_cpu(node, &key, slot);
+ if (key.objectid > BTRFS_LAST_FREE_OBJECTID)
+ goto out;
+ if (key.type != BTRFS_ROOT_ITEM_KEY)
+ goto next;
+ if (!is_fstree(key.objectid))
+ goto next;
+ key.offset = (u64)-1;
+
+ cur_root = btrfs_read_fs_root(gfs_info, &key);
+ if (IS_ERR(cur_root) || !cur_root) {
+ fprintf(stderr, "Fail to read fs/subvol tree: %lld\n",
+ key.objectid);
+ goto out;
+ }
+ ret = fill_csum_tree_from_one_fs_root(trans, cur_root);
+ if (ret < 0)
+ goto out;
+next:
+ ret = btrfs_next_item(tree_root, &path);
+ if (ret > 0) {
+ ret = 0;
+ goto out;
+ }
+ if (ret < 0)
+ goto out;
+ }
+
+out:
+ btrfs_release_path(&path);
+ return ret;
+}
+
+static int fill_csum_tree_from_extent(struct btrfs_trans_handle *trans,
+ struct btrfs_root *extent_root)
+{
+ struct btrfs_root *csum_root;
+ struct btrfs_path path;
+ struct btrfs_extent_item *ei;
+ struct extent_buffer *leaf;
+ char *buf;
+ struct btrfs_key key;
+ int ret;
+
+ btrfs_init_path(&path);
+ key.objectid = 0;
+ key.type = BTRFS_EXTENT_ITEM_KEY;
+ key.offset = 0;
+ ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+ if (ret < 0) {
+ btrfs_release_path(&path);
+ return ret;
+ }
+
+ buf = malloc(gfs_info->sectorsize);
+ if (!buf) {
+ btrfs_release_path(&path);
+ return -ENOMEM;
+ }
+
+ while (1) {
+ if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) {
+ ret = btrfs_next_leaf(extent_root, &path);
+ if (ret < 0)
+ break;
+ if (ret) {
+ ret = 0;
+ break;
+ }
+ }
+ leaf = path.nodes[0];
+
+ btrfs_item_key_to_cpu(leaf, &key, path.slots[0]);
+ if (key.type != BTRFS_EXTENT_ITEM_KEY) {
+ path.slots[0]++;
+ continue;
+ }
+
+ ei = btrfs_item_ptr(leaf, path.slots[0],
+ struct btrfs_extent_item);
+ if (!(btrfs_extent_flags(leaf, ei) &
+ BTRFS_EXTENT_FLAG_DATA)) {
+ path.slots[0]++;
+ continue;
+ }
+
+ csum_root = btrfs_csum_root(gfs_info, key.objectid);
+ ret = populate_csum(trans, csum_root, buf, key.objectid,
+ key.offset);
+ if (ret)
+ break;
+ path.slots[0]++;
+ }
+
+ btrfs_release_path(&path);
+ free(buf);
+ return ret;
+}
+
+/*
+ * Recalculate the csum and put it into the csum tree.
+ *
+ * @search_fs_tree: How to get the data extent item.
+ * If true, iterate all fs roots to get all
+ * extent data (which can be slow).
+ * Otherwise, search extent tree for extent data.
+ */
+int fill_csum_tree(struct btrfs_trans_handle *trans, bool search_fs_tree)
+{
+ struct btrfs_root *root;
+ struct rb_node *n;
+ int ret;
+
+ if (search_fs_tree)
+ return fill_csum_tree_from_fs(trans);
+
+ root = btrfs_extent_root(gfs_info, 0);
+ while (1) {
+ ret = fill_csum_tree_from_extent(trans, root);
+ if (ret)
+ break;
+ n = rb_next(&root->rb_node);
+ if (!n)
+ break;
+ root = rb_entry(n, struct btrfs_root, rb_node);
+ if (root->root_key.objectid != BTRFS_EXTENT_TREE_OBJECTID)
+ break;
+ }
+ return ret;
+}
diff --git a/check/mode-common.h b/check/mode-common.h
index 8cd135b9bfc2..b894c2be4f97 100644
--- a/check/mode-common.h
+++ b/check/mode-common.h
@@ -199,4 +199,5 @@ static inline void btrfs_check_subpage_eb_alignment(struct btrfs_fs_info *info,
int repair_dev_item_bytes_used(struct btrfs_fs_info *fs_info,
u64 devid, u64 bytes_used_expected);
+int fill_csum_tree(struct btrfs_trans_handle *trans, bool search_fs_tree);
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/5] btrfs-progs: check: don't calculate csum for preallocated file extents
2022-01-14 0:51 [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 1/5] btrfs-progs: backref: properly queue indirect refs Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 2/5] btrfs-progs: check: move csum tree population into mode-common.[ch] Qu Wenruo
@ 2022-01-14 0:51 ` Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 4/5] btrfs-progs: check: handle csum generation properly for `--init-csum-tree --init-extent-tree` Qu Wenruo
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2022-01-14 0:51 UTC (permalink / raw)
To: linux-btrfs
[BUG]
If a btrfs filesystem has preallocated file extents, `btrfs check
--init-csum-tree` will create csum item for preallocated extents, and
cause error:
# mkfs.btrfs -f test.img
# mount test.img /mnt/btrfs
# fallocate -l 32K /mnt/btrfs/file
# umount /mnt/btrfs
# btrfs check --init-csum-tree --force test.img
...
[4/7] checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 376832 bytes used, error(s) found
And the csum tree is not empty, containing csum for the preallocated
extent:
$ btrfs ins dump-tree -t csum test.img
btrfs-progs v5.15.1
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 30408704 items 1 free space 16226 generation 9 owner CSUM_TREE
leaf 30408704 flags 0x1(WRITTEN) backref revision 1
fs uuid ecc79835-5611-4609-b985-e4ccd6f15b54
chunk uuid b1c75553-5b82-4aa6-bbbe-e7f50643b1a8
item 0 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16251 itemsize 32
range start 13631488 end 13664256 length 32768
[CAUSE]
For `--init-csum-tree` alone, we will use extent tree to iterate each
data extent, and calcluate csum for them.
But extent items alone can not tell us if the file extent belongs to a
NODATASUM inode, nor if it's preallocated.
Thus we create csums for those data extents, and cause the problem.
[FIX]
However the fix is not that simple, we can not just generate csum for
non-preallocated range.
As the following case we still need csum for the un-referred part:
xfs_io -f -c "pwrite 0 8K" -c "sync" -c "punch 0 4K"
So here we have to go another direction by:
- Always generate csum for the whole data extent
This is the same as the old code
- Iterate the file extents, and delete csum for preallocated range
or NODATASUM inodes
Issue: #430
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
check/mode-common.c | 106 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 105 insertions(+), 1 deletion(-)
diff --git a/check/mode-common.c b/check/mode-common.c
index 1608dbdafa4c..4c6dca2d5973 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -22,6 +22,7 @@
#include "common/utils.h"
#include "kernel-shared/disk-io.h"
#include "kernel-shared/volumes.h"
+#include "kernel-shared/backref.h"
#include "common/repair.h"
#include "check/mode-common.h"
@@ -1338,6 +1339,92 @@ out:
return ret;
}
+static int remove_csum_for_file_extent(u64 ino, u64 offset, u64 rootid, void *ctx)
+{
+ struct btrfs_trans_handle *trans = (struct btrfs_trans_handle *)ctx;
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_file_extent_item *fi;
+ struct btrfs_inode_item *ii;
+ struct btrfs_path path = {};
+ struct btrfs_key key;
+ struct btrfs_root *root;
+ bool nocsum = false;
+ u8 type;
+ u64 disk_bytenr;
+ u64 disk_len;
+ int ret = 0;
+
+ key.objectid = rootid;
+ key.type = BTRFS_ROOT_ITEM_KEY;
+ key.offset = (u64)-1;
+ root = btrfs_read_fs_root(fs_info, &key);
+ if (IS_ERR(root)) {
+ ret = PTR_ERR(root);
+ goto out;
+ }
+
+ /* Check if the inode has NODATASUM flag */
+ key.objectid = ino;
+ key.type = BTRFS_INODE_ITEM_KEY;
+ key.offset = 0;
+ ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
+ if (ret > 0)
+ ret = -ENOENT;
+ if (ret < 0)
+ goto out;
+
+ ii = btrfs_item_ptr(path.nodes[0], path.slots[0],
+ struct btrfs_inode_item);
+ if (btrfs_inode_flags(path.nodes[0], ii) & BTRFS_INODE_NODATASUM)
+ nocsum = true;
+
+ btrfs_release_path(&path);
+
+ /* Check the file extent item and delete csum if needed */
+ key.objectid = ino;
+ key.type = BTRFS_EXTENT_DATA_KEY;
+ key.offset = offset;
+ ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
+ if (ret > 0)
+ ret = -ENOENT;
+ if (ret < 0)
+ goto out;
+ fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
+ struct btrfs_file_extent_item);
+ type = btrfs_file_extent_type(path.nodes[0], fi);
+
+ if (btrfs_file_extent_disk_bytenr(path.nodes[0], fi) == 0)
+ goto out;
+
+ /* Compressed extent should have csum, skip it */
+ if (btrfs_file_extent_compression(path.nodes[0], fi) !=
+ BTRFS_COMPRESS_NONE)
+ goto out;
+ /*
+ * We only want to delete the csum range if the inode has NODATASUM
+ * flag or it's a preallocated extent.
+ */
+ if (!(nocsum || type == BTRFS_FILE_EXTENT_PREALLOC))
+ goto out;
+
+ /* If NODATASUM, we need to remove all csum for the extent */
+ if (nocsum) {
+ disk_bytenr = btrfs_file_extent_disk_bytenr(path.nodes[0], fi);
+ disk_len = btrfs_file_extent_disk_num_bytes(path.nodes[0], fi);
+ } else {
+ disk_bytenr = btrfs_file_extent_disk_bytenr(path.nodes[0], fi) +
+ btrfs_file_extent_offset(path.nodes[0], fi);
+ disk_len = btrfs_file_extent_num_bytes(path.nodes[0], fi);
+ }
+ btrfs_release_path(&path);
+
+ /* Now delete the csum for the preallocated or nodatasum range */
+ ret = btrfs_del_csums(trans, disk_bytenr, disk_len);
+out:
+ btrfs_release_path(&path);
+ return ret;
+}
+
static int fill_csum_tree_from_extent(struct btrfs_trans_handle *trans,
struct btrfs_root *extent_root)
{
@@ -1390,10 +1477,27 @@ static int fill_csum_tree_from_extent(struct btrfs_trans_handle *trans,
path.slots[0]++;
continue;
}
-
+ /*
+ * Generate the datasum unconditionally first.
+ *
+ * This will generate csum for preallocated extents, but that
+ * will be later deleted.
+ *
+ * This is to address cases like this:
+ * fallocate 0 8K
+ * pwrite 0 4k
+ * sync
+ * punch 0 4k
+ *
+ * Above case we will have csum for [0, 4K) and that's valid.
+ */
csum_root = btrfs_csum_root(gfs_info, key.objectid);
ret = populate_csum(trans, csum_root, buf, key.objectid,
key.offset);
+ if (ret < 0)
+ break;
+ ret = iterate_extent_inodes(trans->fs_info, key.objectid, 0, 0,
+ remove_csum_for_file_extent, trans);
if (ret)
break;
path.slots[0]++;
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 4/5] btrfs-progs: check: handle csum generation properly for `--init-csum-tree --init-extent-tree`
2022-01-14 0:51 [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations Qu Wenruo
` (2 preceding siblings ...)
2022-01-14 0:51 ` [PATCH v2 3/5] btrfs-progs: check: don't calculate csum for preallocated file extents Qu Wenruo
@ 2022-01-14 0:51 ` Qu Wenruo
2022-01-14 0:51 ` [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree Qu Wenruo
2022-01-14 16:50 ` [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations David Sterba
5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2022-01-14 0:51 UTC (permalink / raw)
To: linux-btrfs
When using `--init-csum-tree` with `--init-extent-tree`, csum tree
population will be done by iterating all file extent items.
This allow us to skip preallocated extents, but it still has the
folllwing problems:
- Inodes with NODATASUM
- Hole file extents
- Written preallocated extents
We will generate csum for the whole extent, while other part may still
be unwritten.
Make it to follow the same behavior of recently introduced
fill_csum_for_file_extent(), so we can generate correct csum.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
check/mode-common.c | 53 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 49 insertions(+), 4 deletions(-)
diff --git a/check/mode-common.c b/check/mode-common.c
index 4c6dca2d5973..e487f2f0bb8e 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -1225,6 +1225,7 @@ static int fill_csum_tree_from_one_fs_root(struct btrfs_trans_handle *trans,
struct extent_buffer *node;
struct btrfs_file_extent_item *fi;
char *buf = NULL;
+ u64 skip_ino = 0;
u64 start = 0;
u64 len = 0;
int slot = 0;
@@ -1243,24 +1244,68 @@ static int fill_csum_tree_from_one_fs_root(struct btrfs_trans_handle *trans,
goto out;
/* Iterate all regular file extents and fill its csum */
while (1) {
+ u8 type;
+
btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
- if (key.type != BTRFS_EXTENT_DATA_KEY)
+ if (key.type != BTRFS_EXTENT_DATA_KEY &&
+ key.type != BTRFS_INODE_ITEM_KEY)
+ goto next;
+
+ /* This item belongs to an inode with NODATASUM, skip it */
+ if (key.objectid == skip_ino)
goto next;
+
+ if (key.type == BTRFS_INODE_ITEM_KEY) {
+ struct btrfs_inode_item *ii;
+
+ ii = btrfs_item_ptr(path.nodes[0], path.slots[0],
+ struct btrfs_inode_item);
+ /* Check if the inode has NODATASUM flag */
+ if (btrfs_inode_flags(path.nodes[0], ii) &
+ BTRFS_INODE_NODATASUM)
+ skip_ino = key.objectid;
+ goto next;
+ }
node = path.nodes[0];
slot = path.slots[0];
fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item);
- if (btrfs_file_extent_type(node, fi) != BTRFS_FILE_EXTENT_REG)
+ type = btrfs_file_extent_type(node, fi);
+
+ /* Skip inline extents */
+ if (type == BTRFS_FILE_EXTENT_INLINE)
goto next;
- start = btrfs_file_extent_disk_bytenr(node, fi);
- len = btrfs_file_extent_disk_num_bytes(node, fi);
+ start = btrfs_file_extent_disk_bytenr(node, fi);
+ /* Skip holes */
+ if (start == 0)
+ goto next;
+ /*
+ * Always generate the csum for the whole preallocated/regular
+ * first, then remove the csum for preallocated range.
+ *
+ * This is to handle holes on regular extents like:
+ * xfs_io -f -c "pwrite 0 8k" -c "sync" -c "punch 0 4k".
+ *
+ * This behavior will cost extra IO/CPU time, but there is
+ * not other way to ensure the correctness.
+ */
csum_root = btrfs_csum_root(gfs_info, start);
+ len = btrfs_file_extent_disk_num_bytes(node, fi);
ret = populate_csum(trans, csum_root, buf, start, len);
if (ret == -EEXIST)
ret = 0;
if (ret < 0)
goto out;
+
+ /* Delete the csum for the preallocated range */
+ if (type == BTRFS_FILE_EXTENT_PREALLOC) {
+ start += btrfs_file_extent_offset(node, fi);
+ len = btrfs_file_extent_num_bytes(node, fi);
+ ret = btrfs_del_csums(trans, start, len);
+ if (ret < 0)
+ goto out;
+ }
next:
/*
* TODO: if next leaf is corrupted, jump to nearest next valid
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree
2022-01-14 0:51 [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations Qu Wenruo
` (3 preceding siblings ...)
2022-01-14 0:51 ` [PATCH v2 4/5] btrfs-progs: check: handle csum generation properly for `--init-csum-tree --init-extent-tree` Qu Wenruo
@ 2022-01-14 0:51 ` Qu Wenruo
2022-01-14 16:40 ` David Sterba
2022-01-14 16:50 ` [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations David Sterba
5 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2022-01-14 0:51 UTC (permalink / raw)
To: linux-btrfs
This new test script will create a fs with the following situations:
- Preallocated extents (no data csum)
- Nodatasum inodes (no data csum)
- Partially written preallocated extents (no data csum for part of the
extent)
- Regular data extents (with data csum)
- Regular data extents then hole punched (with data csum)
- Preallocated data, then written, then hole punched (with data csum)
- Compressed extents (with data csum)
And make sure after --init-csum-tree (with or without
--init-extent-tree) the result fs can still pass fsck.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tests/fsck-tests/052-init-csum-tree/test.sh | 72 +++++++++++++++++++++
1 file changed, 72 insertions(+)
create mode 100755 tests/fsck-tests/052-init-csum-tree/test.sh
diff --git a/tests/fsck-tests/052-init-csum-tree/test.sh b/tests/fsck-tests/052-init-csum-tree/test.sh
new file mode 100755
index 000000000000..34ca50e5c85f
--- /dev/null
+++ b/tests/fsck-tests/052-init-csum-tree/test.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+#
+# Verify that `btrfs check --init-csum-tree` can handle various nodatasum
+# cases.
+
+source "$TEST_TOP/common"
+
+check_prereq btrfs
+check_global_prereq fallocate
+check_global_prereq dd
+setup_root_helper
+prepare_test_dev
+
+run_check_mkfs_test_dev
+
+# Create an inode with nodatasum and some content
+run_check_mount_test_dev -o nodatasum
+
+run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/nodatasum_file" \
+ bs=16k count=1 status=noxfer > /dev/null 2>&1
+
+# Revert to default datasum
+run_check $SUDO_HELPER mount -o remount,datasum "$TEST_MNT"
+
+# Then create an inode with datasum, but all preallocated extents
+run_check fallocate -l 32k "$TEST_MNT/prealloc1"
+
+# Create preallocated extent but partially written
+run_check fallocate -l 32k "$TEST_MNT/prealloc2"
+run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/prealloc2" \
+ bs=16k count=1 conv=notrunc status=noxfer> /dev/null 2>&1
+
+# Then some regular files
+run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/regular" \
+ bs=16k count=1 status=noxfer > /dev/null 2>&1
+
+# And some regular files with holes
+run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/regular_with_holes" \
+ bs=16k count=1 status=noxfer > /dev/null 2>&1
+run_check $SUDO_HELPER fallocate -p -o 4096 -l 4096 "$TEST_MNT/regular_with_holes"
+
+# And the most complex one, preallocated, written, then hole
+run_check $SUDO_HELPER fallocate -l 8192 "$TEST_MNT/complex"
+run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/compex" \
+ bs=4k count=1 conv=notrunc status=noxfer > /dev/null 2>&1
+sync
+run_check $SUDO_HELPER fallocate -p -l 4096 "$TEST_MNT/regular_with_holes"
+
+# Create some compressed file
+run_check $SUDO_HELPER mount -o remount,compress "$TEST_MNT"
+run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/compressed" \
+ bs=16k count=4 conv=notrunc status=noxfer> /dev/null 2>&1
+
+# And create a snapshot, every data extent is at least shared twice
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume snapshot "$TEST_MNT" \
+ "$TEST_MNT/snapshot"
+run_check_umount_test_dev
+
+# --init-csum-tree should not fail
+run_check $SUDO_HELPER "$TOP/btrfs" check --force \
+ --init-csum-tree "$TEST_DEV"
+
+# No error should be found
+run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+btrfs ins dump-tree "$TEST_DEV" > /tmp/dump
+
+# --init-csum-tree with --init-extent-tree should not fail
+#run_check $SUDO_HELPER "$TOP/btrfs" check --force \
+# --init-csum-tree --init-extent-tree "$TEST_DEV"
+
+# No error should be found
+#run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree
2022-01-14 0:51 ` [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree Qu Wenruo
@ 2022-01-14 16:40 ` David Sterba
2022-01-14 23:40 ` Qu Wenruo
0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2022-01-14 16:40 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Jan 14, 2022 at 08:51:23AM +0800, Qu Wenruo wrote:
> +run_check $SUDO_HELPER "$TOP/btrfs" check --force \
> + --init-csum-tree "$TEST_DEV"
> +
> +# No error should be found
> +run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
> +btrfs ins dump-tree "$TEST_DEV" > /tmp/dump
Is this some leftover from debugging?
> +
> +# --init-csum-tree with --init-extent-tree should not fail
> +#run_check $SUDO_HELPER "$TOP/btrfs" check --force \
> +# --init-csum-tree --init-extent-tree "$TEST_DEV"
> +
> +# No error should be found
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree
2022-01-14 16:40 ` David Sterba
@ 2022-01-14 23:40 ` Qu Wenruo
2022-01-17 13:50 ` David Sterba
0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2022-01-14 23:40 UTC (permalink / raw)
To: dsterba, Qu Wenruo, linux-btrfs
On 2022/1/15 00:40, David Sterba wrote:
> On Fri, Jan 14, 2022 at 08:51:23AM +0800, Qu Wenruo wrote:
>> +run_check $SUDO_HELPER "$TOP/btrfs" check --force \
>> + --init-csum-tree "$TEST_DEV"
>> +
>> +# No error should be found
>> +run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
>> +btrfs ins dump-tree "$TEST_DEV" > /tmp/dump
>
> Is this some leftover from debugging?
Oh, sorry.
Mind to remove it when merging?
Thanks,
Qu
>
>> +
>> +# --init-csum-tree with --init-extent-tree should not fail
>> +#run_check $SUDO_HELPER "$TOP/btrfs" check --force \
>> +# --init-csum-tree --init-extent-tree "$TEST_DEV"
>> +
>> +# No error should be found
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree
2022-01-14 23:40 ` Qu Wenruo
@ 2022-01-17 13:50 ` David Sterba
0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2022-01-17 13:50 UTC (permalink / raw)
To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs
On Sat, Jan 15, 2022 at 07:40:16AM +0800, Qu Wenruo wrote:
>
>
> On 2022/1/15 00:40, David Sterba wrote:
> > On Fri, Jan 14, 2022 at 08:51:23AM +0800, Qu Wenruo wrote:
> >> +run_check $SUDO_HELPER "$TOP/btrfs" check --force \
> >> + --init-csum-tree "$TEST_DEV"
> >> +
> >> +# No error should be found
> >> +run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
> >> +btrfs ins dump-tree "$TEST_DEV" > /tmp/dump
> >
> > Is this some leftover from debugging?
>
> Oh, sorry.
>
> Mind to remove it when merging?
No.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations
2022-01-14 0:51 [PATCH v2 0/5] btrfs-progs: check: properly generate csum for various complex combinations Qu Wenruo
` (4 preceding siblings ...)
2022-01-14 0:51 ` [PATCH v2 5/5] btrfs-progs: fsck-tests: add test case for init-csum-tree Qu Wenruo
@ 2022-01-14 16:50 ` David Sterba
5 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2022-01-14 16:50 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Jan 14, 2022 at 08:51:18AM +0800, Qu Wenruo wrote:
> [BUG]
> Issue #420 mentions that --init-csum-tree creates extra csum for
> preallocated extents.
>
> Causing extra errors after --init-csum-tree.
>
> [CAUSE]
> Csum re-calculation code doesn't take the following factors into
> consideration:
>
> - NODATASUM inodes
> - Preallocated extents
> No csum should be calculated for those data extents
>
> - Partically written preallocated csum
> Csum should only be created for the referred part
>
> So currently csum recalculation will just create csum for any data
> extents it found, causing the mentioned errors.
>
> [FIX]
> - Bug fix for backref iteration
> Firstly fix a bug in backref code where indirect ref is not properly
> resolved.
>
> This allows us to use iterate_extent_inodes() to make our lives much
> easier checking the file extents.
>
> - Code move for --init-csum-tree
> Move it to mode independent mode-common.[ch]
>
> - Fix for --init-csum-tree
> There are some extreme corner cases to consider, in fact we can not
> really determine a range should have csum or not:
> * Preallocated, written, then hole punched range
> Should still have csum for the written (now hole) part.
>
> * Preallocated, then hole punched range.
> Should not have csum for the now hole part.
>
> * Regular written extent, then hole punched range
> Should have csum for the now hole part.
>
> But there is still one thing to follow, if a range is preallocated,
> it should not have csum.
>
> So here we go a different route, by always generate csum for the whole
> extent (as long as it's not belonging to NODATASUM inode), then remove
> csums for preallocated range.
>
> - Fix for --init-csum-tree --init-extent-tree
> The same fix, just into a different context
>
> - New test case
>
> [CHANGELOG]
> v2:
> - Handle written extents then hole punched cases
> Now we always generate csum for a data extent as long as it doesn't
> belong to a NODATASUM inode, then remove the csum for preallocated
> range.
>
> - Enhance test case to include hole punching and compressed extents
Added to devel, thanks, I guess this fixes some of the problems that I
saw when trying to implement the checksum algo switch.
^ permalink raw reply [flat|nested] 10+ messages in thread