From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: remove some duplicated checks in btrfs_previous_*_item()
Date: Tue, 14 Dec 2021 10:27:42 +0000 [thread overview]
Message-ID: <Ybhxngswi6vN+vH4@debian9.Home> (raw)
In-Reply-To: <20211214071411.48324-1-wqu@suse.com>
On Tue, Dec 14, 2021 at 03:14:11PM +0800, Qu Wenruo wrote:
> In btrfs_previous_item() and btrfs_previous_extent_item() we check
> btrfs_header_nritems() in a loop.
>
> But in fact we don't need to do it in a loop at all.
>
> Firstly, if a tree block is empty, the whole tree is empty and nodes[0]
> is the tree root.
> We don't need to do anything and can exit immediately.
>
> Secondly, the only timing we could get a slots[0] >= nritems is when the
> function get called. After the first slots[0]--, the slot should always
> be <= nritems.
>
> So this patch will move all the nritems related checks out of the loop
> by:
>
> - Check nritems of nodes[0] to do a quick exit
>
> - Sanitize path->slots[0] before entering the loop
> I doubt if there is any caller setting path->slots[0] beyond
> nritems + 1 (setting to nritems is possible when item is not found).
> Sanitize path->slots[0] to nritems won't hurt anyway.
>
> - Unconditionally reduce path->slots[0]
> Since we're sure all tree blocks should not be empty, and
> btrfs_prev_leaf() will return with path->slots[0] == nritems, we
> can safely reduce slots[0] unconditionally.
> Just keep an extra ASSERT() to make sure no tree block is empty.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/ctree.c | 52 +++++++++++++++++++++++++++++++++---------------
> 1 file changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 781537692a4a..555345aed84d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -4704,23 +4704,39 @@ int btrfs_previous_item(struct btrfs_root *root,
> {
> struct btrfs_key found_key;
> struct extent_buffer *leaf;
> - u32 nritems;
> + const u32 nritems = btrfs_header_nritems(path->nodes[0]);
> int ret;
>
> + /*
> + * Check nritems first, if the tree is empty we exit immediately.
> + * And if this leave is not empty, none of the tree blocks of this root
> + * should be empty.
> + */
> + if (nritems == 0)
> + return 1;
> +
> + /*
> + * If we're several slots beyond nritems, we reset slot to nritems,
> + * and it will be handled properly inside the loop.
> + */
> + if (unlikely(path->slots[0] > nritems))
> + path->slots[0] = nritems;
> +
> while (1) {
> if (path->slots[0] == 0) {
> ret = btrfs_prev_leaf(root, path);
> if (ret != 0)
> return ret;
> - } else {
> - path->slots[0]--;
> }
> leaf = path->nodes[0];
> - nritems = btrfs_header_nritems(leaf);
> - if (nritems == 0)
> - return 1;
> - if (path->slots[0] == nritems)
> - path->slots[0]--;
> + ASSERT(btrfs_header_nritems(leaf));
> + /*
> + * This is for both regular case and above btrfs_prev_leaf() case.
> + * As btrfs_prev_leaf() will return with path->slots[0] == nritems,
> + * and we're sure no tree block is empty, we can go safely
> + * reduce slots[0] here.
> + */
> + path->slots[0]--;
No, this is wrong.
btrfs_prev_leaf() computes the previous key and does a search_slot() for it.
With this unconditional decrement we can miss the previous key in 2 ways:
1) The previous key exists, and btrfs_prev_leaf() leaves us in a leaf that has it
and the slot is btrfs_header_nritems(prev_leaf) - 1 -> the last key on a leaf;
2) The previous key exists, but after btrfs_prev_leaf() released the path and
before it called search_slot(), there was a balance operation and it pushed the
previous key in the middle of the leaf we had, or some other leaf, and the slot
will be something less than btrfs_header_nritems(), it can even be 0.
That's why we have the call to header_nritems() in the loop, and check if slots[0]
is == to nritems before decrementing...
Thanks.
>
> btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> if (found_key.objectid < min_objectid)
> @@ -4745,23 +4761,27 @@ int btrfs_previous_extent_item(struct btrfs_root *root,
> {
> struct btrfs_key found_key;
> struct extent_buffer *leaf;
> - u32 nritems;
> + const u32 nritems = btrfs_header_nritems(path->nodes[0]);
> int ret;
>
> + /*
> + * Refer to btrfs_previous_item() for the reason of all nritems related
> + * checks/modifications.
> + */
> + if (nritems == 0)
> + return 1;
> + if (unlikely(path->slots[0] > nritems))
> + path->slots[0] = nritems;
> +
> while (1) {
> if (path->slots[0] == 0) {
> ret = btrfs_prev_leaf(root, path);
> if (ret != 0)
> return ret;
> - } else {
> - path->slots[0]--;
> }
> leaf = path->nodes[0];
> - nritems = btrfs_header_nritems(leaf);
> - if (nritems == 0)
> - return 1;
> - if (path->slots[0] == nritems)
> - path->slots[0]--;
> + ASSERT(btrfs_header_nritems(leaf));
> + path->slots[0]--;
>
> btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> if (found_key.objectid < min_objectid)
> --
> 2.34.1
>
next prev parent reply other threads:[~2021-12-14 10:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-14 7:14 [PATCH] btrfs: remove some duplicated checks in btrfs_previous_*_item() Qu Wenruo
2021-12-14 10:27 ` Filipe Manana [this message]
2021-12-14 10:50 ` Qu Wenruo
2021-12-14 14:37 ` Josef Bacik
2021-12-14 23:33 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ybhxngswi6vN+vH4@debian9.Home \
--to=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).