* [PATCH v3] btrfs: LOGICAL_INO enhancements @ 2017-09-22 17:58 Zygo Blaxell 2017-09-22 17:58 ` [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Zygo Blaxell @ 2017-09-22 17:58 UTC (permalink / raw) To: linux-btrfs Changelog: v3-v2: - Stricter check on reserved[] field - now must be all zero, or userspace gets EINVAL. This prevents userspace from setting any of the reserved bits without the kernel providing an unambiguous interpretation of them, and doesn't require us to burn a flag bit for each one. - Moved 'flags' to the end of the reserved[] array. This allows existing source code using version 1 of the ioctl to behave the same way when using version 2 of the btrfs_ioctl_logical_ino_args struct definition (i.e. reserved[3] becomes an alias for 'flags', and the addresses of reserved[0-2] don't change). - Clarified the reasoning in the commit message for patch 2, "btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2". v2: - added patch series intro text - rebased on 4.14-rc1. v1: This patch series fixes some weaknesses in the btrfs LOGICAL_INO ioctl. Background: Suppose we have a file with one extent: root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a root@tester:~# sync Split the extent by overwriting it in the middle: root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a We should now have 3 extent refs to 2 extents, with one block unreachable. The extent tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 2 [...] item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53 extent refs 2 gen 29 flags DATA extent data backref root 5 objectid 261 offset 0 count 2 [...] item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53 extent refs 1 gen 30 flags DATA extent data backref root 5 objectid 261 offset 8192 count 1 [...] and the ref tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 5 [...] item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 0 nr 8192 ram 73728 extent compression(none) item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53 extent data disk byte 1103175680 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression(none) item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 12288 nr 61440 ram 73728 extent compression(none) [...] There are two references to the same extent with different, non-overlapping byte offsets: [------------------72K extent at 1103101952----------------------] [--8K----------------|--4K unreachable----|--60K-----------------] ^ ^ | | [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--] | v [-----4K extent-----] at 1103175680 We want to find all of the references to extent bytenr 1103101952. Without the patch (and without running btrfs-debug-tree), we have to do it with 18 LOGICAL_INO calls: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO inode 261 offset 0 root 5 root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode inode 261 offset 0 root 5 inode 261 offset 4096 root 5 <- same extent ref as offset 0 (offset 8192 returns empty set, not reachable) inode 261 offset 12288 root 5 inode 261 offset 16384 root 5 \ inode 261 offset 20480 root 5 | inode 261 offset 24576 root 5 | inode 261 offset 28672 root 5 | inode 261 offset 32768 root 5 | inode 261 offset 36864 root 5 \ inode 261 offset 40960 root 5 > all the same extent ref as offset 12288. inode 261 offset 45056 root 5 / More processing required in userspace inode 261 offset 49152 root 5 | to figure out these are all duplicates. inode 261 offset 53248 root 5 | inode 261 offset 57344 root 5 | inode 261 offset 61440 root 5 | inode 261 offset 65536 root 5 | inode 261 offset 69632 root 5 / In the worst case the extents are 128MB long, and we have to do 32768 iterations of the loop to find one 4K extent ref. With the patch, we just use one call to map all refs to the extent at once: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO_V2 inode 261 offset 0 root 5 inode 261 offset 12288 root 5 The TREE_SEARCH ioctl allows userspace to retrieve the offset and extent bytenr fields easily once the root, inode and offset are known. This is sufficient information to build a complete map of the extent and all of its references. Userspace can use this information to make better choices to dedup or defrag. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-09-22 17:58 [PATCH v3] btrfs: LOGICAL_INO enhancements Zygo Blaxell @ 2017-09-22 17:58 ` Zygo Blaxell 2017-10-19 17:03 ` David Sterba 2017-09-22 17:58 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell ` (2 subsequent siblings) 3 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-22 17:58 UTC (permalink / raw) To: linux-btrfs The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and offset (encoded as a single logical address) to a list of extent refs. LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping (extent ref -> extent bytenr and offset, or logical address). These are useful capabilities for programs that manipulate extents and extent references from userspace (e.g. dedup and defrag utilities). When the extents are uncompressed (and not encrypted and not other), check_extent_in_eb performs filtering of the extent refs to remove any extent refs which do not contain the same extent offset as the 'logical' parameter's extent offset. This prevents LOGICAL_INO from returning references to more than a single block. To find the set of extent references to an uncompressed extent from [a, b), userspace has to run a loop like this pseudocode: for (i = a; i < b; ++i) extent_ref_set += LOGICAL_INO(i); At each iteration of the loop (up to 32768 iterations for a 128M extent), data we are interested in is collected in the kernel, then deleted by the filter in check_extent_in_eb. When the extents are compressed (or encrypted or other), the 'logical' parameter must be an extent bytenr (the 'a' parameter in the loop). No filtering by extent offset is done (or possible?) so the result is the complete set of extent refs for the entire extent. This removes the need for the loop, since we get all the extent refs in one call. Add an 'ignore_offset' argument to iterate_inodes_from_logical, [...several levels of function call graph...], and check_extent_in_eb, so that we can disable the extent offset filtering for uncompressed extents. This flag can be set by an improved version of the LOGICAL_INO ioctl to get either behavior as desired. There is no functional change in this patch. The new flag is always false. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/backref.c | 63 ++++++++++++++++++++++++++----------------- fs/btrfs/backref.h | 8 +++--- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 8 +++--- fs/btrfs/scrub.c | 6 ++--- fs/btrfs/send.c | 2 +- fs/btrfs/tests/qgroup-tests.c | 20 +++++++------- 8 files changed, 63 insertions(+), 48 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index b517ef1477ea..a2609786cd86 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -40,12 +40,14 @@ static int check_extent_in_eb(const struct btrfs_key *key, const struct extent_buffer *eb, const struct btrfs_file_extent_item *fi, u64 extent_item_pos, - struct extent_inode_elem **eie) + struct extent_inode_elem **eie, + bool ignore_offset) { u64 offset = 0; struct extent_inode_elem *e; - if (!btrfs_file_extent_compression(eb, fi) && + if (!ignore_offset && + !btrfs_file_extent_compression(eb, fi) && !btrfs_file_extent_encryption(eb, fi) && !btrfs_file_extent_other_encoding(eb, fi)) { u64 data_offset; @@ -84,7 +86,8 @@ static void free_inode_elem_list(struct extent_inode_elem *eie) static int find_extent_in_eb(const struct extent_buffer *eb, u64 wanted_disk_byte, u64 extent_item_pos, - struct extent_inode_elem **eie) + struct extent_inode_elem **eie, + bool ignore_offset) { u64 disk_byte; struct btrfs_key key; @@ -113,7 +116,7 @@ static int find_extent_in_eb(const struct extent_buffer *eb, if (disk_byte != wanted_disk_byte) continue; - ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie); + ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie, ignore_offset); if (ret < 0) return ret; } @@ -419,7 +422,7 @@ static int add_indirect_ref(const struct btrfs_fs_info *fs_info, static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, struct ulist *parents, struct prelim_ref *ref, int level, u64 time_seq, const u64 *extent_item_pos, - u64 total_refs) + u64 total_refs, bool ignore_offset) { int ret = 0; int slot; @@ -472,7 +475,7 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, if (extent_item_pos) { ret = check_extent_in_eb(&key, eb, fi, *extent_item_pos, - &eie); + &eie, ignore_offset); if (ret < 0) break; } @@ -510,7 +513,8 @@ next: static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 time_seq, struct prelim_ref *ref, struct ulist *parents, - const u64 *extent_item_pos, u64 total_refs) + const u64 *extent_item_pos, u64 total_refs, + bool ignore_offset) { struct btrfs_root *root; struct btrfs_key root_key; @@ -581,7 +585,7 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, } ret = add_all_parents(root, path, parents, ref, level, time_seq, - extent_item_pos, total_refs); + extent_item_pos, total_refs, ignore_offset); out: path->lowest_level = 0; btrfs_release_path(path); @@ -616,7 +620,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 time_seq, struct preftrees *preftrees, const u64 *extent_item_pos, u64 total_refs, - struct share_check *sc) + struct share_check *sc, bool ignore_offset) { int err; int ret = 0; @@ -661,7 +665,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, } err = resolve_indirect_ref(fs_info, path, time_seq, ref, parents, extent_item_pos, - total_refs); + total_refs, ignore_offset); /* * we can only tolerate ENOENT,otherwise,we should catch error * and return directly. @@ -1107,13 +1111,17 @@ static int add_keyed_refs(struct btrfs_fs_info *fs_info, * * Otherwise this returns 0 for success and <0 for an error. * + * If ignore_offset is set to false, only extent refs whose offsets match + * extent_item_pos are returned. If true, every extent ref is returned + * and extent_item_pos is ignored. + * * FIXME some caching might speed things up */ static int find_parent_nodes(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist *refs, struct ulist *roots, const u64 *extent_item_pos, - struct share_check *sc) + struct share_check *sc, bool ignore_offset) { struct btrfs_key key; struct btrfs_path *path; @@ -1235,7 +1243,7 @@ again: WARN_ON(!RB_EMPTY_ROOT(&preftrees.indirect_missing_keys.root)); ret = resolve_indirect_refs(fs_info, path, time_seq, &preftrees, - extent_item_pos, total_refs, sc); + extent_item_pos, total_refs, sc, ignore_offset); if (ret) goto out; @@ -1282,7 +1290,7 @@ again: btrfs_tree_read_lock(eb); btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); ret = find_extent_in_eb(eb, bytenr, - *extent_item_pos, &eie); + *extent_item_pos, &eie, ignore_offset); btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); if (ret < 0) @@ -1350,7 +1358,7 @@ static void free_leaf_list(struct ulist *blocks) static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist **leafs, - const u64 *extent_item_pos) + const u64 *extent_item_pos, bool ignore_offset) { int ret; @@ -1359,7 +1367,7 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, return -ENOMEM; ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - *leafs, NULL, extent_item_pos, NULL); + *leafs, NULL, extent_item_pos, NULL, ignore_offset); if (ret < 0 && ret != -ENOENT) { free_leaf_list(*leafs); return ret; @@ -1383,7 +1391,8 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, */ static int btrfs_find_all_roots_safe(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots) + u64 time_seq, struct ulist **roots, + bool ignore_offset) { struct ulist *tmp; struct ulist_node *node = NULL; @@ -1402,7 +1411,7 @@ static int btrfs_find_all_roots_safe(struct btrfs_trans_handle *trans, ULIST_ITER_INIT(&uiter); while (1) { ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - tmp, *roots, NULL, NULL); + tmp, *roots, NULL, NULL, ignore_offset); if (ret < 0 && ret != -ENOENT) { ulist_free(tmp); ulist_free(*roots); @@ -1421,14 +1430,15 @@ static int btrfs_find_all_roots_safe(struct btrfs_trans_handle *trans, int btrfs_find_all_roots(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots) + u64 time_seq, struct ulist **roots, + bool ignore_offset) { int ret; if (!trans) down_read(&fs_info->commit_root_sem); ret = btrfs_find_all_roots_safe(trans, fs_info, bytenr, - time_seq, roots); + time_seq, roots, ignore_offset); if (!trans) up_read(&fs_info->commit_root_sem); return ret; @@ -1483,7 +1493,7 @@ int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr) ULIST_ITER_INIT(&uiter); while (1) { ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, tmp, - roots, NULL, &shared); + roots, NULL, &shared, false); if (ret == BACKREF_FOUND_SHARED) { /* this is the only condition under which we return 1 */ ret = 1; @@ -1877,7 +1887,8 @@ static int iterate_leaf_refs(struct btrfs_fs_info *fs_info, int iterate_extent_inodes(struct btrfs_fs_info *fs_info, u64 extent_item_objectid, u64 extent_item_pos, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx) + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset) { int ret; struct btrfs_trans_handle *trans = NULL; @@ -1903,14 +1914,15 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info, ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, tree_mod_seq_elem.seq, &refs, - &extent_item_pos); + &extent_item_pos, ignore_offset); if (ret) goto out; ULIST_ITER_INIT(&ref_uiter); while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) { ret = btrfs_find_all_roots_safe(trans, fs_info, ref_node->val, - tree_mod_seq_elem.seq, &roots); + tree_mod_seq_elem.seq, &roots, + ignore_offset); if (ret) break; ULIST_ITER_INIT(&root_uiter); @@ -1943,7 +1955,8 @@ out: int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, - iterate_extent_inodes_t *iterate, void *ctx) + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset) { int ret; u64 extent_item_pos; @@ -1961,7 +1974,7 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, extent_item_pos = logical - found_key.objectid; ret = iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, search_commit_root, - iterate, ctx); + iterate, ctx, ignore_offset); return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index e410335841aa..0c2fab8514ff 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -43,17 +43,19 @@ int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, int iterate_extent_inodes(struct btrfs_fs_info *fs_info, u64 extent_item_objectid, u64 extent_offset, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx); + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset); int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, - iterate_extent_inodes_t *iterate, void *ctx); + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset); int paths_from_inode(u64 inum, struct inode_fs_paths *ipath); int btrfs_find_all_roots(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots); + u64 time_seq, struct ulist **roots, bool ignore_offset); char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, u32 name_len, unsigned long name_off, struct extent_buffer *eb_in, u64 parent, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 128f3e58634f..d9aef54e2186 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2452,7 +2452,7 @@ static noinline bool record_extent_backrefs(struct btrfs_path *path, ret = iterate_inodes_from_logical(old->bytenr + old->extent_offset, fs_info, path, record_one_backref, - old); + old, false); if (ret < 0 && ret != -ENOENT) return false; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d6715c2bcdc4..b7de32568082 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4566,7 +4566,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, } ret = iterate_inodes_from_logical(loi->logical, fs_info, path, - build_ino_list, inodes); + build_ino_list, inodes, false); if (ret == -EINVAL) ret = -ENOENT; if (ret < 0) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 5c8b61c86e61..6408af8c8fd6 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1441,7 +1441,7 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_fs_info *fs_info, u64 bytenr = qrecord->bytenr; int ret; - ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root); + ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root, false); if (ret < 0) return ret; @@ -2031,7 +2031,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans, /* Search commit root to find old_roots */ ret = btrfs_find_all_roots(NULL, fs_info, record->bytenr, 0, - &record->old_roots); + &record->old_roots, false); if (ret < 0) goto cleanup; } @@ -2042,7 +2042,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans, * root. It's safe inside commit_transaction(). */ ret = btrfs_find_all_roots(trans, fs_info, - record->bytenr, SEQ_LAST, &new_roots); + record->bytenr, SEQ_LAST, &new_roots, false); if (ret < 0) goto cleanup; if (qgroup_to_skip) { @@ -2572,7 +2572,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, num_bytes = found.offset; ret = btrfs_find_all_roots(NULL, fs_info, found.objectid, 0, - &roots); + &roots, false); if (ret < 0) goto out; /* For rescan, just pass old_roots as NULL */ diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index e3f6c49e5c4d..40153d83ad9a 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -883,7 +883,7 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) swarn.dev = dev; iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, 1, - scrub_print_warning_inode, &swarn); + scrub_print_warning_inode, &swarn, false); } out: @@ -1047,7 +1047,7 @@ static void scrub_fixup_nodatasum(struct btrfs_work *work) * can be found. */ ret = iterate_inodes_from_logical(fixup->logical, fs_info, path, - scrub_fixup_readpage, fixup); + scrub_fixup_readpage, fixup, false); if (ret < 0) { uncorrectable = 1; goto out; @@ -4390,7 +4390,7 @@ static void copy_nocow_pages_worker(struct btrfs_work *work) } ret = iterate_inodes_from_logical(logical, fs_info, path, - record_inode_for_nocow, nocow_ctx); + record_inode_for_nocow, nocow_ctx, false); if (ret != 0 && ret != -ENOENT) { btrfs_warn(fs_info, "iterate_inodes_from_logical() failed: log %llu, phys %llu, len %llu, mir %u, ret %d", diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 32b043ef8ac9..1e211255b1c6 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1429,7 +1429,7 @@ static int find_extent_clone(struct send_ctx *sctx, extent_item_pos = 0; ret = iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, 1, __iterate_backrefs, - backref_ctx); + backref_ctx, false); if (ret < 0) goto out; diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c index 0f4ce970d195..49f010f42bb7 100644 --- a/fs/btrfs/tests/qgroup-tests.c +++ b/fs/btrfs/tests/qgroup-tests.c @@ -240,7 +240,7 @@ static int test_no_shared_qgroup(struct btrfs_root *root, * we can only call btrfs_qgroup_account_extent() directly to test * quota. */ - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { ulist_free(old_roots); test_msg("Couldn't find old roots: %d\n", ret); @@ -252,7 +252,7 @@ static int test_no_shared_qgroup(struct btrfs_root *root, if (ret) return ret; - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); ulist_free(new_roots); @@ -275,7 +275,7 @@ static int test_no_shared_qgroup(struct btrfs_root *root, old_roots = NULL; new_roots = NULL; - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { ulist_free(old_roots); test_msg("Couldn't find old roots: %d\n", ret); @@ -286,7 +286,7 @@ static int test_no_shared_qgroup(struct btrfs_root *root, if (ret) return -EINVAL; - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); ulist_free(new_roots); @@ -337,7 +337,7 @@ static int test_multiple_refs(struct btrfs_root *root, return ret; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { ulist_free(old_roots); test_msg("Couldn't find old roots: %d\n", ret); @@ -349,7 +349,7 @@ static int test_multiple_refs(struct btrfs_root *root, if (ret) return ret; - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); ulist_free(new_roots); @@ -370,7 +370,7 @@ static int test_multiple_refs(struct btrfs_root *root, return -EINVAL; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { ulist_free(old_roots); test_msg("Couldn't find old roots: %d\n", ret); @@ -382,7 +382,7 @@ static int test_multiple_refs(struct btrfs_root *root, if (ret) return ret; - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); ulist_free(new_roots); @@ -409,7 +409,7 @@ static int test_multiple_refs(struct btrfs_root *root, return -EINVAL; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { ulist_free(old_roots); test_msg("Couldn't find old roots: %d\n", ret); @@ -421,7 +421,7 @@ static int test_multiple_refs(struct btrfs_root *root, if (ret) return ret; - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots); + ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); ulist_free(new_roots); -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-09-22 17:58 ` [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell @ 2017-10-19 17:03 ` David Sterba 0 siblings, 0 replies; 14+ messages in thread From: David Sterba @ 2017-10-19 17:03 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs On Fri, Sep 22, 2017 at 01:58:45PM -0400, Zygo Blaxell wrote: > The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and > offset (encoded as a single logical address) to a list of extent refs. > LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping > (extent ref -> extent bytenr and offset, or logical address). These are > useful capabilities for programs that manipulate extents and extent > references from userspace (e.g. dedup and defrag utilities). > > When the extents are uncompressed (and not encrypted and not other), > check_extent_in_eb performs filtering of the extent refs to remove any > extent refs which do not contain the same extent offset as the 'logical' > parameter's extent offset. This prevents LOGICAL_INO from returning > references to more than a single block. > > To find the set of extent references to an uncompressed extent from [a, > b), userspace has to run a loop like this pseudocode: > > for (i = a; i < b; ++i) > extent_ref_set += LOGICAL_INO(i); > > At each iteration of the loop (up to 32768 iterations for a 128M extent), > data we are interested in is collected in the kernel, then deleted by > the filter in check_extent_in_eb. > > When the extents are compressed (or encrypted or other), the 'logical' > parameter must be an extent bytenr (the 'a' parameter in the loop). > No filtering by extent offset is done (or possible?) so the result is > the complete set of extent refs for the entire extent. This removes > the need for the loop, since we get all the extent refs in one call. > > Add an 'ignore_offset' argument to iterate_inodes_from_logical, > [...several levels of function call graph...], and check_extent_in_eb, so > that we can disable the extent offset filtering for uncompressed extents. > This flag can be set by an improved version of the LOGICAL_INO ioctl to > get either behavior as desired. > > There is no functional change in this patch. The new flag is always > false. > > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: David Sterba <dsterba@suse.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-22 17:58 [PATCH v3] btrfs: LOGICAL_INO enhancements Zygo Blaxell 2017-09-22 17:58 ` [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell @ 2017-09-22 17:58 ` Zygo Blaxell 2017-09-23 20:38 ` Hans van Kranenburg 2017-09-22 17:58 ` [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl Zygo Blaxell 2019-08-30 7:55 ` Fwd: [PATCH v3] btrfs: LOGICAL_INO enhancements Anand Jain 3 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-22 17:58 UTC (permalink / raw) To: linux-btrfs Now that check_extent_in_eb()'s extent offset filter can be turned off, we need a way to do it from userspace. Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent offset filtering, taking the place of one of the existing reserved[] fields. Previous versions of LOGICAL_INO neglected to check whether any of the reserved fields have non-zero values. Assigning meaning to those fields now may change the behavior of existing programs that left these fields uninitialized. The lack of a zero check also means that new programs have no way to know whether the kernel is honoring the flags field. To avoid these problems, define a new ioctl LOGICAL_INO_V2. We can use the same argument layout as LOGICAL_INO, but shorten the reserved[] array by one element and turn it into the 'flags' field. The V2 ioctl explicitly checks that reserved fields and unsupported flag bits are zero so that userspace can negotiate future feature bits as they are defined. Since the memory layouts of the two ioctls' arguments are compatible, there is no need for a separate function for logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search where the layout and code are quite different). A version parameter and an 'if' statement will suffice. Now that we have a flags field in logical_ino_args, add a flag BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, and pass it down the stack to iterate_inodes_from_logical. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/ioctl.c | 26 +++++++++++++++++++++++--- include/uapi/linux/btrfs.h | 8 +++++++- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index b7de32568082..f4281ffd1833 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) } static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, - void __user *arg) + void __user *arg, int version) { int ret = 0; int size; struct btrfs_ioctl_logical_ino_args *loi; struct btrfs_data_container *inodes = NULL; struct btrfs_path *path = NULL; + bool ignore_offset; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -4551,6 +4552,22 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, if (IS_ERR(loi)) return PTR_ERR(loi); + if (version == 1) { + ignore_offset = false; + } else { + /* All reserved bits must be 0 for now */ + if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) { + ret = -EINVAL; + goto out_loi; + } + /* Only accept flags we have defined so far */ + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { + ret = -EINVAL; + goto out_loi; + } + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; + } + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; @@ -4566,7 +4583,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, } ret = iterate_inodes_from_logical(loi->logical, fs_info, path, - build_ino_list, inodes, false); + build_ino_list, inodes, ignore_offset); if (ret == -EINVAL) ret = -ENOENT; if (ret < 0) @@ -4580,6 +4597,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, out: btrfs_free_path(path); kvfree(inodes); +out_loi: kfree(loi); return ret; @@ -5550,7 +5568,9 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_INO_PATHS: return btrfs_ioctl_ino_to_path(root, argp); case BTRFS_IOC_LOGICAL_INO: - return btrfs_ioctl_logical_to_ino(fs_info, argp); + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); + case BTRFS_IOC_LOGICAL_INO_V2: + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(fs_info, argp); case BTRFS_IOC_SYNC: { diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 378230c163d5..99bb7988e6fe 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args { struct btrfs_ioctl_logical_ino_args { __u64 logical; /* in */ __u64 size; /* in */ - __u64 reserved[4]; + __u64 reserved[3]; /* must be 0 for now */ + __u64 flags; /* in, v2 only */ /* struct btrfs_data_container *inodes; out */ __u64 inodes; }; +/* Return every ref to the extent, not just those containing logical block. + * Requires logical == extent bytenr. */ +#define BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET (1ULL << 0) enum btrfs_dev_stat_values { /* disk I/O failure stats */ @@ -835,5 +839,7 @@ enum btrfs_err_code { struct btrfs_ioctl_feature_flags[3]) #define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 58, \ struct btrfs_ioctl_vol_args_v2) +#define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ + struct btrfs_ioctl_logical_ino_args) #endif /* _UAPI_LINUX_BTRFS_H */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-22 17:58 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell @ 2017-09-23 20:38 ` Hans van Kranenburg 0 siblings, 0 replies; 14+ messages in thread From: Hans van Kranenburg @ 2017-09-23 20:38 UTC (permalink / raw) To: Zygo Blaxell, linux-btrfs On 09/22/2017 07:58 PM, Zygo Blaxell wrote: > Now that check_extent_in_eb()'s extent offset filter can be turned off, > we need a way to do it from userspace. > > Add a 'flags' field to the btrfs_logical_ino_args structure to disable > extent offset filtering, taking the place of one of the existing > reserved[] fields. > > Previous versions of LOGICAL_INO neglected to check whether any of the > reserved fields have non-zero values. Assigning meaning to those fields > now may change the behavior of existing programs that left these fields > uninitialized. The lack of a zero check also means that new programs > have no way to know whether the kernel is honoring the flags field. > > To avoid these problems, define a new ioctl LOGICAL_INO_V2. We can > use the same argument layout as LOGICAL_INO, but shorten the reserved[] > array by one element and turn it into the 'flags' field. The V2 ioctl > explicitly checks that reserved fields and unsupported flag bits are zero > so that userspace can negotiate future feature bits as they are defined. > > Since the memory layouts of the two ioctls' arguments are compatible, > there is no need for a separate function for logical_to_ino_v2 (contrast > with tree_search_v2 vs tree_search where the layout and code are quite > different). A version parameter and an 'if' statement will suffice. > > Now that we have a flags field in logical_ino_args, add a flag > BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, > and pass it down the stack to iterate_inodes_from_logical. Nice! This is a very welcome addition for userspace. I have seen the question: "ok, but which files are using any part of this extent" regularly in the past, and it caused some disappointment when I was first implementing that ioctl, since I was hoping I could do that easily. :) Today I built a 4.14-rc1 with the 3 patches on top. I have been testing with python-btrfs (of course :D) so I can quickly try out things. https://github.com/knorrie/python-btrfs/commits/logical_ino_v2 > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > --- > fs/btrfs/ioctl.c | 26 +++++++++++++++++++++++--- > include/uapi/linux/btrfs.h | 8 +++++++- > 2 files changed, 30 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index b7de32568082..f4281ffd1833 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) > } > > static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > - void __user *arg) > + void __user *arg, int version) > { > int ret = 0; > int size; > struct btrfs_ioctl_logical_ino_args *loi; > struct btrfs_data_container *inodes = NULL; > struct btrfs_path *path = NULL; > + bool ignore_offset; > > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > @@ -4551,6 +4552,22 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > if (IS_ERR(loi)) > return PTR_ERR(loi); > > + if (version == 1) { > + ignore_offset = false; > + } else { > + /* All reserved bits must be 0 for now */ > + if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) { > + ret = -EINVAL; > + goto out_loi; > + } If I put a non-zero bit in the reserved field, it explodes with OSError: [Errno 22] Invalid argument, check. > + /* Only accept flags we have defined so far */ > + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { > + ret = -EINVAL; > + goto out_loi; > + } If I put any other flag in than nothing or only the first set, it explodes with OSError: [Errno 22] Invalid argument, check. > + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; > + } > + > path = btrfs_alloc_path(); > if (!path) { > [...] Testing: -# cp /boot/vmlinuz-4.14.0-rc1-zygo1 /btrfs -# ./show_block_groups.py /btrfs block group vaddr 0 length 4194304 flags SYSTEM used 16384 used_pct 0 block group vaddr 4194304 length 8388608 flags METADATA used 131072 used_pct 2 block group vaddr 12582912 length 8388608 flags DATA used 0 used_pct 0 block group vaddr 20971520 length 268435456 flags METADATA used 0 used_pct 0 Using 'v1': -# ./show_block_group_data_extent_filenames.py 12582912 /btrfs block group vaddr 12582912 length 8388608 flags DATA used 4198400 used_pct 50 extent vaddr 12582912 length 4198400 refs 1 gen 17 flags DATA root 5 inode 258 offset 0 path utf-8 vmlinuz-4.14.0-rc1-zygo1 Let's overwrite the first few blocks: -# dd if=/dev/urandom bs=16K conv=notrunc of=/btrfs/vmlinuz-4.14.0-rc1-zygo1 count=1 1+0 records in 1+0 records out 16384 bytes (16 kB, 16 KiB) copied, 0.000191925 s, 85.4 MB/s Here we see the limitation of the 'v1' ioctl. I get no name back for the first extent any more: -# ./show_block_group_data_extent_filenames.py 12582912 /btrfs block group vaddr 12582912 length 8388608 flags DATA used 4214784 used_pct 50 extent vaddr 12582912 length 4198400 refs 1 gen 17 flags DATA extent vaddr 16781312 length 16384 refs 1 gen 19 flags DATA root 5 inode 258 offset 0 path utf-8 vmlinuz-4.14.0-rc1-zygo1 (And to test that the 'v1' also still works...) When I change it to use LOGICAL_INO_V2, it shows the same: -# ./show_block_group_data_extent_filenames2.py 12582912 /btrfs block group vaddr 12582912 length 8388608 flags DATA used 4214784 used_pct 50 extent vaddr 12582912 length 4198400 refs 1 gen 17 flags DATA extent vaddr 16781312 length 16384 refs 1 gen 19 flags DATA root 5 inode 258 offset 0 path utf-8 vmlinuz-4.14.0-rc1-zygo1 When I also set the ignore_offset flag: -# ./show_block_group_data_extent_filenames2.py 12582912 /btrfs block group vaddr 12582912 length 8388608 flags DATA used 4214784 used_pct 50 extent vaddr 12582912 length 4198400 refs 1 gen 17 flags DATA root 5 inode 258 offset 16384 path utf-8 vmlinuz-4.14.0-rc1-zygo1 extent vaddr 16781312 length 16384 refs 1 gen 19 flags DATA root 5 inode 258 offset 0 path utf-8 vmlinuz-4.14.0-rc1-zygo1 Moo! No more brute forcing needed! Reviewed-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> -- Hans van Kranenburg ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl 2017-09-22 17:58 [PATCH v3] btrfs: LOGICAL_INO enhancements Zygo Blaxell 2017-09-22 17:58 ` [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell 2017-09-22 17:58 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell @ 2017-09-22 17:58 ` Zygo Blaxell 2017-09-23 21:06 ` Hans van Kranenburg 2019-08-30 7:55 ` Fwd: [PATCH v3] btrfs: LOGICAL_INO enhancements Anand Jain 3 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-22 17:58 UTC (permalink / raw) To: linux-btrfs Build-server workloads have hundreds of references per file after dedup. Multiply by a few snapshots and we quickly exhaust the limit of 2730 references per extent that can fit into a 64K buffer. Raise the limit to 16M to be consistent with other btrfs ioctls (e.g. TREE_SEARCH_V2, FILE_EXTENT_SAME). To minimize surprising userspace behavior, apply this change only to the LOGICAL_INO_V2 ioctl. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/ioctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f4281ffd1833..1940678fc440 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4554,6 +4554,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, if (version == 1) { ignore_offset = false; + size = min_t(u32, loi->size, SZ_64K); } else { /* All reserved bits must be 0 for now */ if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) { @@ -4566,6 +4567,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, goto out_loi; } ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; + size = min_t(u32, loi->size, SZ_16M); } path = btrfs_alloc_path(); @@ -4574,7 +4576,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, goto out; } - size = min_t(u32, loi->size, SZ_64K); inodes = init_data_container(size); if (IS_ERR(inodes)) { ret = PTR_ERR(inodes); -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl 2017-09-22 17:58 ` [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl Zygo Blaxell @ 2017-09-23 21:06 ` Hans van Kranenburg 2017-10-19 17:07 ` David Sterba 0 siblings, 1 reply; 14+ messages in thread From: Hans van Kranenburg @ 2017-09-23 21:06 UTC (permalink / raw) To: Zygo Blaxell, linux-btrfs On 09/22/2017 07:58 PM, Zygo Blaxell wrote: > Build-server workloads have hundreds of references per file after dedup. > Multiply by a few snapshots and we quickly exhaust the limit of 2730 > references per extent that can fit into a 64K buffer. Simulating this scenario: /btrfs 2-# btrfs sub create 0 Create subvolume './0' /btrfs 2-# cp /boot/vmlinuz-4.14.0-rc1-zygo1 0/0 /btrfs 2-# for i in $(seq 1 499); do cp --reflink 0/0 0/$i; done /btrfs 2-# for i in $(seq 1 5); do btrfs sub snap 0 $i; done Create a snapshot of '0' in './1' Create a snapshot of '0' in './2' Create a snapshot of '0' in './3' Create a snapshot of '0' in './4' Create a snapshot of '0' in './5' -# ./show_block_groups.py /btrfs block group vaddr 0 length 4194304 flags SYSTEM used 16384 used_pct 0 block group vaddr 4194304 length 8388608 flags METADATA used 507904 used_pct 6 block group vaddr 12582912 length 8388608 flags DATA used 4198400 used_pct 50 block group vaddr 20971520 length 268435456 flags METADATA used 0 used_pct 0 -# ./show_block_group_contents.py 12582912 /btrfs block group vaddr 12582912 length 8388608 flags DATA used 4198400 used_pct 50 extent vaddr 12582912 length 4198400 refs 500 gen 25 flags DATA inline extent data backref root 257 objectid 262 offset 0 count 1 inline extent data backref root 257 objectid 277 offset 0 count 1 inline extent data backref root 257 objectid 288 offset 0 count 1 [...] extent data backref root 257 objectid 663 offset 0 count 1 extent data backref root 257 objectid 366 offset 0 count 1 extent data backref root 257 objectid 715 offset 0 count 1 extent data backref root 257 objectid 306 offset 0 count 1 extent data backref root 257 objectid 470 offset 0 count 1 [...] Total 500 lines, the extra 2500 files in the snapshots are hidden behind the shared metadata refs now... > > Raise the limit to 16M to be consistent with other btrfs ioctls > (e.g. TREE_SEARCH_V2, FILE_EXTENT_SAME). > > To minimize surprising userspace behavior, apply this change only to > the LOGICAL_INO_V2 ioctl. > > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > --- > fs/btrfs/ioctl.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index f4281ffd1833..1940678fc440 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -4554,6 +4554,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > if (version == 1) { > ignore_offset = false; > + size = min_t(u32, loi->size, SZ_64K); > } else { > /* All reserved bits must be 0 for now */ > if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) { > @@ -4566,6 +4567,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > goto out_loi; > } > ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; > + size = min_t(u32, loi->size, SZ_16M); > } > > path = btrfs_alloc_path(); > @@ -4574,7 +4576,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > goto out; > } > > - size = min_t(u32, loi->size, SZ_64K); > inodes = init_data_container(size); > if (IS_ERR(inodes)) { > ret = PTR_ERR(inodes); > >>> import btrfs >>> fs = btrfs.FileSystem('/btrfs') Checking that 'v1' still works: >>> inodes, bytes_missed = btrfs.ioctl.logical_to_ino(fs.fd, 12582912, 65536) >>> len(inodes) 2730 >>> bytes_missed 6480 Yes, we only get 2730, as expected with a 64k buffer. v2 can do the same: >>> inodes, bytes_missed = btrfs.ioctl.logical_to_ino_v2(fs.fd, 12582912, 65536) >>> len(inodes) 2730 >>> bytes_missed 6480 The bytes_missed is really useful, because it tells us the exact size of the buf we need instead :) >>> inodes, bytes_missed = btrfs.ioctl.logical_to_ino_v2(fs.fd, 12582912, 65536 + 6480) >>> len(inodes) 3000 >>> bytes_missed 0 Yay! If I remove the buffer size sanity check inside the python-btrfs ioctl code: >>> import btrfs >>> fs = btrfs.FileSystem('/btrfs') >>> inodes, bytes_missed = btrfs.ioctl.logical_to_ino(fs.fd, 12582912, 65536 + 6480) >>> len(inodes) 2730 >>> bytes_missed 6480 Yes, buffer still gets truncated to 64k in the v1 code. Reviewed-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> -- Hans van Kranenburg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl 2017-09-23 21:06 ` Hans van Kranenburg @ 2017-10-19 17:07 ` David Sterba 0 siblings, 0 replies; 14+ messages in thread From: David Sterba @ 2017-10-19 17:07 UTC (permalink / raw) To: Hans van Kranenburg, Zygo Blaxell; +Cc: linux-btrfs Hi, On Sat, Sep 23, 2017 at 11:06:42PM +0200, Hans van Kranenburg wrote: > Reviewed-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> > Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> the patches look good to me and the usecase and testing coverage seem sufficient to take the patches to 4.15, though we're close to the informal merging deadline. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Fwd: [PATCH v3] btrfs: LOGICAL_INO enhancements 2017-09-22 17:58 [PATCH v3] btrfs: LOGICAL_INO enhancements Zygo Blaxell ` (2 preceding siblings ...) 2017-09-22 17:58 ` [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl Zygo Blaxell @ 2019-08-30 7:55 ` Anand Jain 3 siblings, 0 replies; 14+ messages in thread From: Anand Jain @ 2019-08-30 7:55 UTC (permalink / raw) To: linux-btrfs I wonder what happened to the btrfs-progs part of this new ioctl #define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ struct btrfs_ioctl_logical_ino_args) test result shows using it but I can't find it the btrfs-progs nor in the mailing list. Thanks, Anand -------- Forwarded Message -------- Subject: [PATCH v3] btrfs: LOGICAL_INO enhancements Date: Fri, 22 Sep 2017 13:58:44 -0400 From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> To: linux-btrfs@vger.kernel.org Newsgroups: org.kernel.vger.linux-btrfs Changelog: v3-v2: - Stricter check on reserved[] field - now must be all zero, or userspace gets EINVAL. This prevents userspace from setting any of the reserved bits without the kernel providing an unambiguous interpretation of them, and doesn't require us to burn a flag bit for each one. - Moved 'flags' to the end of the reserved[] array. This allows existing source code using version 1 of the ioctl to behave the same way when using version 2 of the btrfs_ioctl_logical_ino_args struct definition (i.e. reserved[3] becomes an alias for 'flags', and the addresses of reserved[0-2] don't change). - Clarified the reasoning in the commit message for patch 2, "btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2". v2: - added patch series intro text - rebased on 4.14-rc1. v1: This patch series fixes some weaknesses in the btrfs LOGICAL_INO ioctl. Background: Suppose we have a file with one extent: root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a root@tester:~# sync Split the extent by overwriting it in the middle: root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a We should now have 3 extent refs to 2 extents, with one block unreachable. The extent tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 2 [...] item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53 extent refs 2 gen 29 flags DATA extent data backref root 5 objectid 261 offset 0 count 2 [...] item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53 extent refs 1 gen 30 flags DATA extent data backref root 5 objectid 261 offset 8192 count 1 [...] and the ref tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 5 [...] item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 0 nr 8192 ram 73728 extent compression(none) item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53 extent data disk byte 1103175680 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression(none) item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 12288 nr 61440 ram 73728 extent compression(none) [...] There are two references to the same extent with different, non-overlapping byte offsets: [------------------72K extent at 1103101952----------------------] [--8K----------------|--4K unreachable----|--60K-----------------] ^ ^ | | [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--] | v [-----4K extent-----] at 1103175680 We want to find all of the references to extent bytenr 1103101952. Without the patch (and without running btrfs-debug-tree), we have to do it with 18 LOGICAL_INO calls: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO inode 261 offset 0 root 5 root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode inode 261 offset 0 root 5 inode 261 offset 4096 root 5 <- same extent ref as offset 0 (offset 8192 returns empty set, not reachable) inode 261 offset 12288 root 5 inode 261 offset 16384 root 5 \ inode 261 offset 20480 root 5 | inode 261 offset 24576 root 5 | inode 261 offset 28672 root 5 | inode 261 offset 32768 root 5 | inode 261 offset 36864 root 5 \ inode 261 offset 40960 root 5 > all the same extent ref as offset 12288. inode 261 offset 45056 root 5 / More processing required in userspace inode 261 offset 49152 root 5 | to figure out these are all duplicates. inode 261 offset 53248 root 5 | inode 261 offset 57344 root 5 | inode 261 offset 61440 root 5 | inode 261 offset 65536 root 5 | inode 261 offset 69632 root 5 / In the worst case the extents are 128MB long, and we have to do 32768 iterations of the loop to find one 4K extent ref. With the patch, we just use one call to map all refs to the extent at once: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO_V2 inode 261 offset 0 root 5 inode 261 offset 12288 root 5 The TREE_SEARCH ioctl allows userspace to retrieve the offset and extent bytenr fields easily once the root, inode and offset are known. This is sufficient information to build a complete map of the extent and all of its references. Userspace can use this information to make better choices to dedup or defrag. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] btrfs: LOGICAL_INO enhancements (this time based on 4.14-rc1) @ 2017-09-21 4:10 Zygo Blaxell 2017-09-21 4:10 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell 0 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-21 4:10 UTC (permalink / raw) To: linux-btrfs The previous patch series was based on v4.12.14, and this introductory text was missing. This patch series fixes some weaknesses in the btrfs LOGICAL_INO ioctl. Background: Suppose we have a file with one extent: root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a root@tester:~# sync Split the extent by overwriting it in the middle: root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a We should now have 3 extent refs to 2 extents, with one block unreachable. The extent tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 2 [...] item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53 extent refs 2 gen 29 flags DATA extent data backref root 5 objectid 261 offset 0 count 2 [...] item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53 extent refs 1 gen 30 flags DATA extent data backref root 5 objectid 261 offset 8192 count 1 [...] and the ref tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 5 [...] item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 0 nr 8192 ram 73728 extent compression(none) item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53 extent data disk byte 1103175680 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression(none) item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 12288 nr 61440 ram 73728 extent compression(none) [...] There are two references to the same extent with different, non-overlapping byte offsets: [------------------72K extent at 1103101952----------------------] [--8K----------------|--4K unreachable----|--60K-----------------] ^ ^ | | [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--] | v [-----4K extent-----] at 1103175680 We now want to find all of the references to extent bytenr 1103101952. Without the patch (and without running btrfs-debug-tree), we have to do it with 18 LOGICAL_INO calls: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO inode 261 offset 0 root 5 root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode inode 261 offset 0 root 5 inode 261 offset 4096 root 5 <- same extent ref as offset 0 (offset 8192 returns empty set, not reachable) inode 261 offset 12288 root 5 inode 261 offset 16384 root 5 \ inode 261 offset 20480 root 5 | inode 261 offset 24576 root 5 | inode 261 offset 28672 root 5 | inode 261 offset 32768 root 5 | inode 261 offset 36864 root 5 \ inode 261 offset 40960 root 5 > all the same extent ref as offset 12288. inode 261 offset 45056 root 5 / More processing required in userspace inode 261 offset 49152 root 5 | to figure out these are all duplicates. inode 261 offset 53248 root 5 | inode 261 offset 57344 root 5 | inode 261 offset 61440 root 5 | inode 261 offset 65536 root 5 | inode 261 offset 69632 root 5 / In the worst case the extents are 128MB long, and we have to do 32768 iterations of the loop to find one 4K extent ref. With the patch, we just use one call to map all refs to the extent at once: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO_V2 inode 261 offset 0 root 5 inode 261 offset 12288 root 5 The TREE_SEARCH ioctl allows userspace to retrieve the offset and extent bytenr fields easily once the root, inode and offset are known. This is sufficient information to build a complete map of the extent and all of its references. Userspace can use this information to make better choices to dedup or defrag. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-21 4:10 [PATCH v2] btrfs: LOGICAL_INO enhancements (this time based on 4.14-rc1) Zygo Blaxell @ 2017-09-21 4:10 ` Zygo Blaxell 2017-09-21 19:59 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-21 4:10 UTC (permalink / raw) To: linux-btrfs Now that check_extent_in_eb()'s extent offset filter can be turned off, we need a way to do it from userspace. Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent offset filtering, taking the place of one of the reserved[] fields. Previous versions of LOGICAL_INO neglected to check whether any of the reserved fields have non-zero values. Assigning meaning to those fields now may change the behavior of existing programs that left these fields uninitialized. To avoid any surprises, define a new ioctl LOGICAL_INO_V2 which uses the same argument layout as LOGICAL_INO, but uses one of the reserved fields for flags. The V2 ioctl explicitly checks that unsupported flag bits are zero so that userspace can probe for future feature bits as they are defined. If the other reserved fields are used in the future, one of the remaining flag bits could specify that the other reserved fields are valid, so we don't need to check those for now. Since the memory layouts and behavior of the two ioctls' arguments are almost identical, there is no need for a separate function for logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search). A version parameter and an 'if' statement will suffice. Now that we have a flags field in logical_ino_args, add a flag BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, and pass it down the stack to iterate_inodes_from_logical. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/ioctl.c | 21 ++++++++++++++++++--- include/uapi/linux/btrfs.h | 8 +++++++- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index b7de32568082..2bc3a9588d1d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) } static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, - void __user *arg) + void __user *arg, int version) { int ret = 0; int size; struct btrfs_ioctl_logical_ino_args *loi; struct btrfs_data_container *inodes = NULL; struct btrfs_path *path = NULL; + bool ignore_offset; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -4551,6 +4552,17 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, if (IS_ERR(loi)) return PTR_ERR(loi); + if (version == 1) { + ignore_offset = false; + } else { + /* Only accept flags we have defined so far */ + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { + ret = -EINVAL; + goto out_loi; + } + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; + } + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; @@ -4566,7 +4578,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, } ret = iterate_inodes_from_logical(loi->logical, fs_info, path, - build_ino_list, inodes, false); + build_ino_list, inodes, ignore_offset); if (ret == -EINVAL) ret = -ENOENT; if (ret < 0) @@ -4580,6 +4592,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, out: btrfs_free_path(path); kvfree(inodes); +out_loi: kfree(loi); return ret; @@ -5550,7 +5563,9 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_INO_PATHS: return btrfs_ioctl_ino_to_path(root, argp); case BTRFS_IOC_LOGICAL_INO: - return btrfs_ioctl_logical_to_ino(fs_info, argp); + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); + case BTRFS_IOC_LOGICAL_INO_V2: + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(fs_info, argp); case BTRFS_IOC_SYNC: { diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 378230c163d5..0b3de597e04f 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args { struct btrfs_ioctl_logical_ino_args { __u64 logical; /* in */ __u64 size; /* in */ - __u64 reserved[4]; + __u64 flags; /* in, v2 only */ + __u64 reserved[3]; /* struct btrfs_data_container *inodes; out */ __u64 inodes; }; +/* Return every ref to the extent, not just those containing logical block. + * Requires logical == extent bytenr. */ +#define BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET (1ULL << 0) enum btrfs_dev_stat_values { /* disk I/O failure stats */ @@ -835,5 +839,7 @@ enum btrfs_err_code { struct btrfs_ioctl_feature_flags[3]) #define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 58, \ struct btrfs_ioctl_vol_args_v2) +#define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ + struct btrfs_ioctl_logical_ino_args) #endif /* _UAPI_LINUX_BTRFS_H */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-21 4:10 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell @ 2017-09-21 19:59 ` Darrick J. Wong 2017-09-21 20:16 ` Zygo Blaxell 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2017-09-21 19:59 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs On Thu, Sep 21, 2017 at 12:10:15AM -0400, Zygo Blaxell wrote: > Now that check_extent_in_eb()'s extent offset filter can be turned off, > we need a way to do it from userspace. > > Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent > offset filtering, taking the place of one of the reserved[] fields. > > Previous versions of LOGICAL_INO neglected to check whether any of the > reserved fields have non-zero values. Assigning meaning to those fields > now may change the behavior of existing programs that left these fields > uninitialized. > > To avoid any surprises, define a new ioctl LOGICAL_INO_V2 which uses > the same argument layout as LOGICAL_INO, but uses one of the reserved > fields for flags. The V2 ioctl explicitly checks that unsupported flag > bits are zero so that userspace can probe for future feature bits as > they are defined. If the other reserved fields are used in the future, > one of the remaining flag bits could specify that the other reserved > fields are valid, so we don't need to check those for now. > > Since the memory layouts and behavior of the two ioctls' arguments > are almost identical, there is no need for a separate function for > logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search). > A version parameter and an 'if' statement will suffice. > > Now that we have a flags field in logical_ino_args, add a flag > BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, > and pass it down the stack to iterate_inodes_from_logical. > > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > --- > fs/btrfs/ioctl.c | 21 ++++++++++++++++++--- > include/uapi/linux/btrfs.h | 8 +++++++- > 2 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index b7de32568082..2bc3a9588d1d 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) > } > > static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > - void __user *arg) > + void __user *arg, int version) > { > int ret = 0; > int size; > struct btrfs_ioctl_logical_ino_args *loi; > struct btrfs_data_container *inodes = NULL; > struct btrfs_path *path = NULL; > + bool ignore_offset; > > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > @@ -4551,6 +4552,17 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > if (IS_ERR(loi)) > return PTR_ERR(loi); > > + if (version == 1) { > + ignore_offset = false; > + } else { > + /* Only accept flags we have defined so far */ > + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { > + ret = -EINVAL; > + goto out_loi; > + } > + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; Please check loi->reserved[3] for zeroness so that the next person who wants to add a field to btrfs_ioctl_logical_ino_args doesn't have to create LOGICAL_INO_V3 for the same reason you're creating V2. --D > + } > + > path = btrfs_alloc_path(); > if (!path) { > ret = -ENOMEM; > @@ -4566,7 +4578,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > } > > ret = iterate_inodes_from_logical(loi->logical, fs_info, path, > - build_ino_list, inodes, false); > + build_ino_list, inodes, ignore_offset); > if (ret == -EINVAL) > ret = -ENOENT; > if (ret < 0) > @@ -4580,6 +4592,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > out: > btrfs_free_path(path); > kvfree(inodes); > +out_loi: > kfree(loi); > > return ret; > @@ -5550,7 +5563,9 @@ long btrfs_ioctl(struct file *file, unsigned int > case BTRFS_IOC_INO_PATHS: > return btrfs_ioctl_ino_to_path(root, argp); > case BTRFS_IOC_LOGICAL_INO: > - return btrfs_ioctl_logical_to_ino(fs_info, argp); > + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); > + case BTRFS_IOC_LOGICAL_INO_V2: > + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); > case BTRFS_IOC_SPACE_INFO: > return btrfs_ioctl_space_info(fs_info, argp); > case BTRFS_IOC_SYNC: { > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h > index 378230c163d5..0b3de597e04f 100644 > --- a/include/uapi/linux/btrfs.h > +++ b/include/uapi/linux/btrfs.h > @@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args { > struct btrfs_ioctl_logical_ino_args { > __u64 logical; /* in */ > __u64 size; /* in */ > - __u64 reserved[4]; > + __u64 flags; /* in, v2 only */ > + __u64 reserved[3]; > /* struct btrfs_data_container *inodes; out */ > __u64 inodes; > }; > +/* Return every ref to the extent, not just those containing logical block. > + * Requires logical == extent bytenr. */ > +#define BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET (1ULL << 0) > > enum btrfs_dev_stat_values { > /* disk I/O failure stats */ > @@ -835,5 +839,7 @@ enum btrfs_err_code { > struct btrfs_ioctl_feature_flags[3]) > #define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 58, \ > struct btrfs_ioctl_vol_args_v2) > +#define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ > + struct btrfs_ioctl_logical_ino_args) > > #endif /* _UAPI_LINUX_BTRFS_H */ > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-21 19:59 ` Darrick J. Wong @ 2017-09-21 20:16 ` Zygo Blaxell 2017-09-21 20:37 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-21 20:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 7644 bytes --] On Thu, Sep 21, 2017 at 12:59:42PM -0700, Darrick J. Wong wrote: > On Thu, Sep 21, 2017 at 12:10:15AM -0400, Zygo Blaxell wrote: > > Now that check_extent_in_eb()'s extent offset filter can be turned off, > > we need a way to do it from userspace. > > > > Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent > > offset filtering, taking the place of one of the reserved[] fields. > > > > Previous versions of LOGICAL_INO neglected to check whether any of the > > reserved fields have non-zero values. Assigning meaning to those fields > > now may change the behavior of existing programs that left these fields > > uninitialized. > > > > To avoid any surprises, define a new ioctl LOGICAL_INO_V2 which uses > > the same argument layout as LOGICAL_INO, but uses one of the reserved > > fields for flags. The V2 ioctl explicitly checks that unsupported flag > > bits are zero so that userspace can probe for future feature bits as > > they are defined. If the other reserved fields are used in the future, > > one of the remaining flag bits could specify that the other reserved > > fields are valid, so we don't need to check those for now. > > > > Since the memory layouts and behavior of the two ioctls' arguments > > are almost identical, there is no need for a separate function for > > logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search). > > A version parameter and an 'if' statement will suffice. > > > > Now that we have a flags field in logical_ino_args, add a flag > > BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, > > and pass it down the stack to iterate_inodes_from_logical. > > > > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > > --- > > fs/btrfs/ioctl.c | 21 ++++++++++++++++++--- > > include/uapi/linux/btrfs.h | 8 +++++++- > > 2 files changed, 25 insertions(+), 4 deletions(-) > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > index b7de32568082..2bc3a9588d1d 100644 > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) > > } > > > > static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > - void __user *arg) > > + void __user *arg, int version) > > { > > int ret = 0; > > int size; > > struct btrfs_ioctl_logical_ino_args *loi; > > struct btrfs_data_container *inodes = NULL; > > struct btrfs_path *path = NULL; > > + bool ignore_offset; > > > > if (!capable(CAP_SYS_ADMIN)) > > return -EPERM; > > @@ -4551,6 +4552,17 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > if (IS_ERR(loi)) > > return PTR_ERR(loi); > > > > + if (version == 1) { > > + ignore_offset = false; > > + } else { > > + /* Only accept flags we have defined so far */ > > + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { > > + ret = -EINVAL; > > + goto out_loi; > > + } > > + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; > > Please check loi->reserved[3] for zeroness so that the next person who > wants to add a field to btrfs_ioctl_logical_ino_args doesn't have to > create LOGICAL_INO_V3 for the same reason you're creating V2. OK now I'm confused, in several distinct ways. I wonder if you meant reserved[1] and reserved[2] there, since I'm not checking them (for reasons stated in the commit log--we can use flags to indicate whether and what values are present there). But that's not the bigger problem. Maybe you did mean reserved[3], but there's no "reserved[3]" any more. I shortened the reserved array from 4 elements to 3, so "reserved[3]" is no longer a valid memory reference. Also "reserved[0]" no longer refers to the same thing it once did. > --D > > > + } > > + > > path = btrfs_alloc_path(); > > if (!path) { > > ret = -ENOMEM; > > @@ -4566,7 +4578,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > } > > > > ret = iterate_inodes_from_logical(loi->logical, fs_info, path, > > - build_ino_list, inodes, false); > > + build_ino_list, inodes, ignore_offset); > > if (ret == -EINVAL) > > ret = -ENOENT; > > if (ret < 0) > > @@ -4580,6 +4592,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > out: > > btrfs_free_path(path); > > kvfree(inodes); > > +out_loi: > > kfree(loi); > > > > return ret; > > @@ -5550,7 +5563,9 @@ long btrfs_ioctl(struct file *file, unsigned int > > case BTRFS_IOC_INO_PATHS: > > return btrfs_ioctl_ino_to_path(root, argp); > > case BTRFS_IOC_LOGICAL_INO: > > - return btrfs_ioctl_logical_to_ino(fs_info, argp); > > + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); > > + case BTRFS_IOC_LOGICAL_INO_V2: > > + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); > > case BTRFS_IOC_SPACE_INFO: > > return btrfs_ioctl_space_info(fs_info, argp); > > case BTRFS_IOC_SYNC: { > > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h > > index 378230c163d5..0b3de597e04f 100644 > > --- a/include/uapi/linux/btrfs.h > > +++ b/include/uapi/linux/btrfs.h > > @@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args { > > struct btrfs_ioctl_logical_ino_args { > > __u64 logical; /* in */ > > __u64 size; /* in */ > > - __u64 reserved[4]; > > + __u64 flags; /* in, v2 only */ > > + __u64 reserved[3]; Should I rename 'reserved' here, e.g. __u64 logical; /* in */ __u64 size; /* in */ __u64 flags; /* in, v2 only */ __u64 reserved1; __u64 reserved2; __u64 reserved3; /* struct btrfs_data_container *inodes; out */ __u64 inodes; That way any existing code that happened to use reserved[3] now fails to compile instead of clobbering the first u64 of inodes, and there's no silent rearrangement of reserved[0..2]. Another option is I just put flags where reserved[3] is: __u64 logical; /* in */ __u64 size; /* in */ __u64 reserved[3]; __u64 flags; /* in, v2 only */ /* struct btrfs_data_container *inodes; out */ __u64 inodes; This means code that has reserved[3] still works (maybe it just prints the value for a human to read), and it just happens to land where the new flags field is. > > /* struct btrfs_data_container *inodes; out */ > > __u64 inodes; > > }; > > +/* Return every ref to the extent, not just those containing logical block. > > + * Requires logical == extent bytenr. */ > > +#define BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET (1ULL << 0) > > > > enum btrfs_dev_stat_values { > > /* disk I/O failure stats */ > > @@ -835,5 +839,7 @@ enum btrfs_err_code { > > struct btrfs_ioctl_feature_flags[3]) > > #define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 58, \ > > struct btrfs_ioctl_vol_args_v2) > > +#define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ > > + struct btrfs_ioctl_logical_ino_args) > > > > #endif /* _UAPI_LINUX_BTRFS_H */ > > -- > > 2.11.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-21 20:16 ` Zygo Blaxell @ 2017-09-21 20:37 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2017-09-21 20:37 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs On Thu, Sep 21, 2017 at 04:16:35PM -0400, Zygo Blaxell wrote: > On Thu, Sep 21, 2017 at 12:59:42PM -0700, Darrick J. Wong wrote: > > On Thu, Sep 21, 2017 at 12:10:15AM -0400, Zygo Blaxell wrote: > > > Now that check_extent_in_eb()'s extent offset filter can be turned off, > > > we need a way to do it from userspace. > > > > > > Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent > > > offset filtering, taking the place of one of the reserved[] fields. > > > > > > Previous versions of LOGICAL_INO neglected to check whether any of the > > > reserved fields have non-zero values. Assigning meaning to those fields > > > now may change the behavior of existing programs that left these fields > > > uninitialized. > > > > > > To avoid any surprises, define a new ioctl LOGICAL_INO_V2 which uses > > > the same argument layout as LOGICAL_INO, but uses one of the reserved > > > fields for flags. The V2 ioctl explicitly checks that unsupported flag > > > bits are zero so that userspace can probe for future feature bits as > > > they are defined. If the other reserved fields are used in the future, > > > one of the remaining flag bits could specify that the other reserved > > > fields are valid, so we don't need to check those for now. > > > > > > Since the memory layouts and behavior of the two ioctls' arguments > > > are almost identical, there is no need for a separate function for > > > logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search). > > > A version parameter and an 'if' statement will suffice. > > > > > > Now that we have a flags field in logical_ino_args, add a flag > > > BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, > > > and pass it down the stack to iterate_inodes_from_logical. > > > > > > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > > > --- > > > fs/btrfs/ioctl.c | 21 ++++++++++++++++++--- > > > include/uapi/linux/btrfs.h | 8 +++++++- > > > 2 files changed, 25 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > > index b7de32568082..2bc3a9588d1d 100644 > > > --- a/fs/btrfs/ioctl.c > > > +++ b/fs/btrfs/ioctl.c > > > @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) > > > } > > > > > > static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > > - void __user *arg) > > > + void __user *arg, int version) > > > { > > > int ret = 0; > > > int size; > > > struct btrfs_ioctl_logical_ino_args *loi; > > > struct btrfs_data_container *inodes = NULL; > > > struct btrfs_path *path = NULL; > > > + bool ignore_offset; > > > > > > if (!capable(CAP_SYS_ADMIN)) > > > return -EPERM; > > > @@ -4551,6 +4552,17 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > > if (IS_ERR(loi)) > > > return PTR_ERR(loi); > > > > > > + if (version == 1) { > > > + ignore_offset = false; > > > + } else { > > > + /* Only accept flags we have defined so far */ > > > + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { > > > + ret = -EINVAL; > > > + goto out_loi; > > > + } > > > + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; > > > > Please check loi->reserved[3] for zeroness so that the next person who > > wants to add a field to btrfs_ioctl_logical_ino_args doesn't have to > > create LOGICAL_INO_V3 for the same reason you're creating V2. > > OK now I'm confused, in several distinct ways. > > I wonder if you meant reserved[1] and reserved[2] there, since I'm not > checking them (for reasons stated in the commit log--we can use flags > to indicate whether and what values are present there). You can do that, though that means you have to burn flag bits to light up the remaining reserved area, which means you can't in the future decide that a non-zero field value will turn on some new feature. You retain the ability to use flag bits to turn on the new field, if it's the case that zero has a meaning. > But that's not the bigger problem. Maybe you did mean reserved[3], but > there's no "reserved[3]" any more. I shortened the reserved array from > 4 elements to 3, so "reserved[3]" is no longer a valid memory reference. > Also "reserved[0]" no longer refers to the same thing it once did. Oops, sorry, that was a typo, I meant reserved[], as in 'check the whole array via memchr_inv'. --D > > > --D > > > > > + } > > > + > > > path = btrfs_alloc_path(); > > > if (!path) { > > > ret = -ENOMEM; > > > @@ -4566,7 +4578,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > > } > > > > > > ret = iterate_inodes_from_logical(loi->logical, fs_info, path, > > > - build_ino_list, inodes, false); > > > + build_ino_list, inodes, ignore_offset); > > > if (ret == -EINVAL) > > > ret = -ENOENT; > > > if (ret < 0) > > > @@ -4580,6 +4592,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, > > > out: > > > btrfs_free_path(path); > > > kvfree(inodes); > > > +out_loi: > > > kfree(loi); > > > > > > return ret; > > > @@ -5550,7 +5563,9 @@ long btrfs_ioctl(struct file *file, unsigned int > > > case BTRFS_IOC_INO_PATHS: > > > return btrfs_ioctl_ino_to_path(root, argp); > > > case BTRFS_IOC_LOGICAL_INO: > > > - return btrfs_ioctl_logical_to_ino(fs_info, argp); > > > + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); > > > + case BTRFS_IOC_LOGICAL_INO_V2: > > > + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); > > > case BTRFS_IOC_SPACE_INFO: > > > return btrfs_ioctl_space_info(fs_info, argp); > > > case BTRFS_IOC_SYNC: { > > > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h > > > index 378230c163d5..0b3de597e04f 100644 > > > --- a/include/uapi/linux/btrfs.h > > > +++ b/include/uapi/linux/btrfs.h > > > @@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args { > > > struct btrfs_ioctl_logical_ino_args { > > > __u64 logical; /* in */ > > > __u64 size; /* in */ > > > - __u64 reserved[4]; > > > + __u64 flags; /* in, v2 only */ > > > + __u64 reserved[3]; > > Should I rename 'reserved' here, e.g. > > __u64 logical; /* in */ > __u64 size; /* in */ > __u64 flags; /* in, v2 only */ > __u64 reserved1; > __u64 reserved2; > __u64 reserved3; > /* struct btrfs_data_container *inodes; out */ > __u64 inodes; > > That way any existing code that happened to use reserved[3] now > fails to compile instead of clobbering the first u64 of inodes, > and there's no silent rearrangement of reserved[0..2]. > > Another option is I just put flags where reserved[3] is: > > __u64 logical; /* in */ > __u64 size; /* in */ > __u64 reserved[3]; > __u64 flags; /* in, v2 only */ > /* struct btrfs_data_container *inodes; out */ > __u64 inodes; > > This means code that has reserved[3] still works (maybe it just prints > the value for a human to read), and it just happens to land where the > new flags field is. > > > > /* struct btrfs_data_container *inodes; out */ > > > __u64 inodes; > > > }; > > > +/* Return every ref to the extent, not just those containing logical block. > > > + * Requires logical == extent bytenr. */ > > > +#define BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET (1ULL << 0) > > > > > > enum btrfs_dev_stat_values { > > > /* disk I/O failure stats */ > > > @@ -835,5 +839,7 @@ enum btrfs_err_code { > > > struct btrfs_ioctl_feature_flags[3]) > > > #define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 58, \ > > > struct btrfs_ioctl_vol_args_v2) > > > +#define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ > > > + struct btrfs_ioctl_logical_ino_args) > > > > > > #endif /* _UAPI_LINUX_BTRFS_H */ > > > -- > > > 2.11.0 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents @ 2017-09-21 0:33 Zygo Blaxell 2017-09-21 0:33 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell 0 siblings, 1 reply; 14+ messages in thread From: Zygo Blaxell @ 2017-09-21 0:33 UTC (permalink / raw) To: linux-btrfs The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and offset (encoded as a single logical address) to a list of extent refs. LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping (extent ref -> extent bytenr and offset, or logical address). These are useful capabilities for programs that manipulate extents and extent references from userspace (e.g. dedup and defrag utilities). When the extents are uncompressed (and not encrypted and not other), check_extent_in_eb performs filtering of the extent refs to remove any extent refs which do not contain the same extent offset as the 'logical' parameter's extent offset. This prevents LOGICAL_INO from returning references to more than a single block. To find the set of extent references to an uncompressed extent from [a, b), userspace has to run a loop like this pseudocode: for (i = a; i < b; ++i) extent_ref_set += LOGICAL_INO(i); At each iteration of the loop (up to 32768 iterations for a 128M extent), data we are interested in is collected in the kernel, then deleted by the filter in check_extent_in_eb. When the extents are compressed (or encrypted or other), the 'logical' parameter must be an extent bytenr (the 'a' parameter in the loop). No filtering by extent offset is done (or possible?) so the result is the complete set of extent refs for the entire extent. This removes the need for the loop, since we get all the extent refs in one call. Add an 'ignore_offset' argument to iterate_inodes_from_logical, [...several levels of function call graph...], and check_extent_in_eb, so that we can disable the extent offset filtering for uncompressed extents. This flag can be set by an improved version of the LOGICAL_INO ioctl to get either behavior as desired. There is no functional change in this patch. The new flag is always false. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/backref.c | 62 ++++++++++++++++++++++++++++++++---------------------- fs/btrfs/backref.h | 8 ++++--- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 8 +++---- fs/btrfs/scrub.c | 6 +++--- fs/btrfs/send.c | 2 +- 7 files changed, 52 insertions(+), 38 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 1d71a5a4b1b9..3bffd36c6897 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -302,12 +302,14 @@ static int ref_tree_add(struct ref_root *ref_tree, u64 root_id, u64 object_id, static int check_extent_in_eb(struct btrfs_key *key, struct extent_buffer *eb, struct btrfs_file_extent_item *fi, u64 extent_item_pos, - struct extent_inode_elem **eie) + struct extent_inode_elem **eie, + bool ignore_offset) { u64 offset = 0; struct extent_inode_elem *e; - if (!btrfs_file_extent_compression(eb, fi) && + if (!ignore_offset && + !btrfs_file_extent_compression(eb, fi) && !btrfs_file_extent_encryption(eb, fi) && !btrfs_file_extent_other_encoding(eb, fi)) { u64 data_offset; @@ -346,7 +348,8 @@ static void free_inode_elem_list(struct extent_inode_elem *eie) static int find_extent_in_eb(struct extent_buffer *eb, u64 wanted_disk_byte, u64 extent_item_pos, - struct extent_inode_elem **eie) + struct extent_inode_elem **eie, + bool ignore_offset) { u64 disk_byte; struct btrfs_key key; @@ -375,7 +378,7 @@ static int find_extent_in_eb(struct extent_buffer *eb, u64 wanted_disk_byte, if (disk_byte != wanted_disk_byte) continue; - ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie); + ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie, ignore_offset); if (ret < 0) return ret; } @@ -511,7 +514,7 @@ static int __add_prelim_ref(struct list_head *head, u64 root_id, static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, struct ulist *parents, struct __prelim_ref *ref, int level, u64 time_seq, const u64 *extent_item_pos, - u64 total_refs) + u64 total_refs, bool ignore_offset) { int ret = 0; int slot; @@ -564,7 +567,7 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, if (extent_item_pos) { ret = check_extent_in_eb(&key, eb, fi, *extent_item_pos, - &eie); + &eie, ignore_offset); if (ret < 0) break; } @@ -603,7 +606,8 @@ static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 time_seq, struct __prelim_ref *ref, struct ulist *parents, - const u64 *extent_item_pos, u64 total_refs) + const u64 *extent_item_pos, u64 total_refs, + bool ignore_offset) { struct btrfs_root *root; struct btrfs_key root_key; @@ -674,7 +678,7 @@ static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info, } ret = add_all_parents(root, path, parents, ref, level, time_seq, - extent_item_pos, total_refs); + extent_item_pos, total_refs, ignore_offset); out: path->lowest_level = 0; btrfs_release_path(path); @@ -688,7 +692,7 @@ static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 time_seq, struct list_head *head, const u64 *extent_item_pos, u64 total_refs, - u64 root_objectid) + u64 root_objectid, bool ignore_offset) { int err; int ret = 0; @@ -719,7 +723,7 @@ static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info, } err = __resolve_indirect_ref(fs_info, path, time_seq, ref, parents, extent_item_pos, - total_refs); + total_refs, ignore_offset); /* * we can only tolerate ENOENT,otherwise,we should catch error * and return directly. @@ -1209,13 +1213,18 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info, * If check_shared is set to 1, any extent has more than one ref item, will * be returned BACKREF_FOUND_SHARED immediately. * + * If ignore_offset is set to false, only extent refs whose offsets match + * extent_item_pos are returned. If true, every extent ref is returned + * and extent_item_pos is ignored. + * * FIXME some caching might speed things up */ static int find_parent_nodes(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist *refs, struct ulist *roots, const u64 *extent_item_pos, - u64 root_objectid, u64 inum, int check_shared) + u64 root_objectid, u64 inum, int check_shared, + bool ignore_offset) { struct btrfs_key key; struct btrfs_path *path; @@ -1383,7 +1392,7 @@ again: ret = __resolve_indirect_refs(fs_info, path, time_seq, &prefs, extent_item_pos, total_refs, - root_objectid); + root_objectid, ignore_offset); if (ret) goto out; @@ -1420,7 +1429,7 @@ again: btrfs_tree_read_lock(eb); btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); ret = find_extent_in_eb(eb, bytenr, - *extent_item_pos, &eie); + *extent_item_pos, &eie, ignore_offset); btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); if (ret < 0) @@ -1497,7 +1506,7 @@ static void free_leaf_list(struct ulist *blocks) static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist **leafs, - const u64 *extent_item_pos) + const u64 *extent_item_pos, bool ignore_offset) { int ret; @@ -1506,7 +1515,7 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, return -ENOMEM; ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - *leafs, NULL, extent_item_pos, 0, 0, 0); + *leafs, NULL, extent_item_pos, 0, 0, 0, ignore_offset); if (ret < 0 && ret != -ENOENT) { free_leaf_list(*leafs); return ret; @@ -1530,7 +1539,7 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, */ static int __btrfs_find_all_roots(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots) + u64 time_seq, struct ulist **roots, bool ignore_offset) { struct ulist *tmp; struct ulist_node *node = NULL; @@ -1549,7 +1558,7 @@ static int __btrfs_find_all_roots(struct btrfs_trans_handle *trans, ULIST_ITER_INIT(&uiter); while (1) { ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - tmp, *roots, NULL, 0, 0, 0); + tmp, *roots, NULL, 0, 0, 0, ignore_offset); if (ret < 0 && ret != -ENOENT) { ulist_free(tmp); ulist_free(*roots); @@ -1568,13 +1577,14 @@ static int __btrfs_find_all_roots(struct btrfs_trans_handle *trans, int btrfs_find_all_roots(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots) + u64 time_seq, struct ulist **roots, + bool ignore_offset) { int ret; if (!trans) down_read(&fs_info->commit_root_sem); - ret = __btrfs_find_all_roots(trans, fs_info, bytenr, time_seq, roots); + ret = __btrfs_find_all_roots(trans, fs_info, bytenr, time_seq, roots, ignore_offset); if (!trans) up_read(&fs_info->commit_root_sem); return ret; @@ -1619,7 +1629,7 @@ int btrfs_check_shared(struct btrfs_trans_handle *trans, ULIST_ITER_INIT(&uiter); while (1) { ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, tmp, - roots, NULL, root_objectid, inum, 1); + roots, NULL, root_objectid, inum, 1, false); if (ret == BACKREF_FOUND_SHARED) { /* this is the only condition under which we return 1 */ ret = 1; @@ -2005,7 +2015,8 @@ static int iterate_leaf_refs(struct btrfs_fs_info *fs_info, int iterate_extent_inodes(struct btrfs_fs_info *fs_info, u64 extent_item_objectid, u64 extent_item_pos, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx) + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset) { int ret; struct btrfs_trans_handle *trans = NULL; @@ -2031,14 +2042,14 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info, ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, tree_mod_seq_elem.seq, &refs, - &extent_item_pos); + &extent_item_pos, ignore_offset); if (ret) goto out; ULIST_ITER_INIT(&ref_uiter); while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) { ret = __btrfs_find_all_roots(trans, fs_info, ref_node->val, - tree_mod_seq_elem.seq, &roots); + tree_mod_seq_elem.seq, &roots, ignore_offset); if (ret) break; ULIST_ITER_INIT(&root_uiter); @@ -2071,7 +2082,8 @@ out: int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, - iterate_extent_inodes_t *iterate, void *ctx) + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset) { int ret; u64 extent_item_pos; @@ -2089,7 +2101,7 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, extent_item_pos = logical - found_key.objectid; ret = iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, search_commit_root, - iterate, ctx); + iterate, ctx, ignore_offset); return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 9c41fbac3009..9c9f5e9d7a89 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -43,17 +43,19 @@ int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, int iterate_extent_inodes(struct btrfs_fs_info *fs_info, u64 extent_item_objectid, u64 extent_offset, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx); + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset); int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, - iterate_extent_inodes_t *iterate, void *ctx); + iterate_extent_inodes_t *iterate, void *ctx, + bool ignore_offset); int paths_from_inode(u64 inum, struct inode_fs_paths *ipath); int btrfs_find_all_roots(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots); + u64 time_seq, struct ulist **roots, bool ignore_offset); char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, u32 name_len, unsigned long name_off, struct extent_buffer *eb_in, u64 parent, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 275366621ac6..9dcb5a2451bf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2438,7 +2438,7 @@ static noinline bool record_extent_backrefs(struct btrfs_path *path, ret = iterate_inodes_from_logical(old->bytenr + old->extent_offset, fs_info, path, record_one_backref, - old); + old, false); if (ret < 0 && ret != -ENOENT) return false; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 8825ae54c968..c6787660d91f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4572,7 +4572,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, } ret = iterate_inodes_from_logical(loi->logical, fs_info, path, - build_ino_list, inodes); + build_ino_list, inodes, false); if (ret == -EINVAL) ret = -ENOENT; if (ret < 0) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index deffbeb74a0b..8c6c423f76c0 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1428,7 +1428,7 @@ int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans, node); if (WARN_ON(!record->old_roots)) ret = btrfs_find_all_roots(NULL, fs_info, - record->bytenr, 0, &record->old_roots); + record->bytenr, 0, &record->old_roots, false); if (ret < 0) break; if (qgroup_to_skip) @@ -1474,7 +1474,7 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_fs_info *fs_info, u64 bytenr = qrecord->bytenr; int ret; - ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root); + ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root, false); if (ret < 0) return ret; @@ -2022,7 +2022,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans, * root. It's safe inside commit_transaction(). */ ret = btrfs_find_all_roots(trans, fs_info, - record->bytenr, SEQ_LAST, &new_roots); + record->bytenr, SEQ_LAST, &new_roots, false); if (ret < 0) goto cleanup; if (qgroup_to_skip) @@ -2544,7 +2544,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, num_bytes = found.offset; ret = btrfs_find_all_roots(NULL, fs_info, found.objectid, 0, - &roots); + &roots, false); if (ret < 0) goto out; /* For rescan, just pass old_roots as NULL */ diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index c7b45eb2403d..cc1fc17b7a50 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -883,7 +883,7 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) swarn.dev = dev; iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, 1, - scrub_print_warning_inode, &swarn); + scrub_print_warning_inode, &swarn, false); } out: @@ -1047,7 +1047,7 @@ static void scrub_fixup_nodatasum(struct btrfs_work *work) * can be found. */ ret = iterate_inodes_from_logical(fixup->logical, fs_info, path, - scrub_fixup_readpage, fixup); + scrub_fixup_readpage, fixup, false); if (ret < 0) { uncorrectable = 1; goto out; @@ -4442,7 +4442,7 @@ static void copy_nocow_pages_worker(struct btrfs_work *work) } ret = iterate_inodes_from_logical(logical, fs_info, path, - record_inode_for_nocow, nocow_ctx); + record_inode_for_nocow, nocow_ctx, false); if (ret != 0 && ret != -ENOENT) { btrfs_warn(fs_info, "iterate_inodes_from_logical() failed: log %llu, phys %llu, len %llu, mir %u, ret %d", diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 641017b68d02..85e23a06c1a3 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1433,7 +1433,7 @@ static int find_extent_clone(struct send_ctx *sctx, extent_item_pos = 0; ret = iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, 1, __iterate_backrefs, - backref_ctx); + backref_ctx, false); if (ret < 0) goto out; -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-09-21 0:33 [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell @ 2017-09-21 0:33 ` Zygo Blaxell 0 siblings, 0 replies; 14+ messages in thread From: Zygo Blaxell @ 2017-09-21 0:33 UTC (permalink / raw) To: linux-btrfs Now that check_extent_in_eb()'s extent offset filter can be turned off, we need a way to do it from userspace. Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent offset filtering, taking the place of one of the reserved[] fields. Previous versions of LOGICAL_INO neglected to check whether any of the reserved fields have non-zero values. Assigning meaning to those fields now may change the behavior of existing programs that left these fields uninitialized. To avoid any surprises, define a new ioctl LOGICAL_INO_V2 which uses the same argument layout as LOGICAL_INO, but uses one of the reserved fields for flags. The V2 ioctl explicitly checks that unsupported flag bits are zero so that userspace can probe for future feature bits as they are defined. If the other reserved fields are used in the future, one of the remaining flag bits could specify that the other reserved fields are valid, so we don't need to check those for now. Since the memory layouts and behavior of the two ioctls' arguments are almost identical, there is no need for a separate function for logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search). A version parameter and an 'if' statement will suffice. Now that we have a flags field in logical_ino_args, add a flag BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, and pass it down the stack to iterate_inodes_from_logical. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/ioctl.c | 21 ++++++++++++++++++--- include/uapi/linux/btrfs.h | 8 +++++++- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index c6787660d91f..def0ab85134a 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4542,13 +4542,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) } static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, - void __user *arg) + void __user *arg, int version) { int ret = 0; int size; struct btrfs_ioctl_logical_ino_args *loi; struct btrfs_data_container *inodes = NULL; struct btrfs_path *path = NULL; + bool ignore_offset; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -4557,6 +4558,17 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, if (IS_ERR(loi)) return PTR_ERR(loi); + if (version == 1) { + ignore_offset = false; + } else { + /* Only accept flags we have defined so far */ + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { + ret = -EINVAL; + goto out_loi; + } + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; + } + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; @@ -4572,7 +4584,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, } ret = iterate_inodes_from_logical(loi->logical, fs_info, path, - build_ino_list, inodes, false); + build_ino_list, inodes, ignore_offset); if (ret == -EINVAL) ret = -ENOENT; if (ret < 0) @@ -4586,6 +4598,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, out: btrfs_free_path(path); vfree(inodes); +out_loi: kfree(loi); return ret; @@ -5559,7 +5572,9 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_INO_PATHS: return btrfs_ioctl_ino_to_path(root, argp); case BTRFS_IOC_LOGICAL_INO: - return btrfs_ioctl_logical_to_ino(fs_info, argp); + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); + case BTRFS_IOC_LOGICAL_INO_V2: + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(fs_info, argp); case BTRFS_IOC_SYNC: { diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index a456e5309238..a23555026994 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -591,10 +591,14 @@ struct btrfs_ioctl_ino_path_args { struct btrfs_ioctl_logical_ino_args { __u64 logical; /* in */ __u64 size; /* in */ - __u64 reserved[4]; + __u64 flags; /* in, v2 only */ + __u64 reserved[3]; /* struct btrfs_data_container *inodes; out */ __u64 inodes; }; +/* Return every ref to the extent, not just those containing logical block. + * Requires logical == extent bytenr. */ +#define BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET (1ULL << 0) enum btrfs_dev_stat_values { /* disk I/O failure stats */ @@ -818,5 +822,7 @@ enum btrfs_err_code { struct btrfs_ioctl_feature_flags[3]) #define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 58, \ struct btrfs_ioctl_vol_args_v2) +#define BTRFS_IOC_LOGICAL_INO_V2 _IOWR(BTRFS_IOCTL_MAGIC, 59, \ + struct btrfs_ioctl_logical_ino_args) #endif /* _UAPI_LINUX_BTRFS_H */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-08-30 7:56 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-22 17:58 [PATCH v3] btrfs: LOGICAL_INO enhancements Zygo Blaxell 2017-09-22 17:58 ` [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell 2017-10-19 17:03 ` David Sterba 2017-09-22 17:58 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell 2017-09-23 20:38 ` Hans van Kranenburg 2017-09-22 17:58 ` [PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl Zygo Blaxell 2017-09-23 21:06 ` Hans van Kranenburg 2017-10-19 17:07 ` David Sterba 2019-08-30 7:55 ` Fwd: [PATCH v3] btrfs: LOGICAL_INO enhancements Anand Jain -- strict thread matches above, loose matches on Subject: below -- 2017-09-21 4:10 [PATCH v2] btrfs: LOGICAL_INO enhancements (this time based on 4.14-rc1) Zygo Blaxell 2017-09-21 4:10 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell 2017-09-21 19:59 ` Darrick J. Wong 2017-09-21 20:16 ` Zygo Blaxell 2017-09-21 20:37 ` Darrick J. Wong 2017-09-21 0:33 [PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents Zygo Blaxell 2017-09-21 0:33 ` [PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 Zygo Blaxell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.