All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: ethanwu <ethanwu@synology.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: add extra ending condition for indirect data backref resolution
Date: Fri, 3 Jan 2020 11:31:04 -0500	[thread overview]
Message-ID: <017fb679-ca13-f38e-e67b-6f1d42c1fbbd@toxicpanda.com> (raw)
In-Reply-To: <1578044681-25562-1-git-send-email-ethanwu@synology.com>

On 1/3/20 4:44 AM, ethanwu wrote:
> Btrfs has two types of data backref.
> For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the
> exact block number. Therefore, we need to call resolve_indirect_refs
> which uses btrfs_search_slot to locate the leaf block. After that,
> we need to walk through the leafs to search for the EXTENT_DATA items
> that have disk bytenr matching the extent item(add_all_parents).
> 
> The only conditions we'll stop searching are
> 1. We find different object id or type is not EXTENT_DATA
> 2. We've already got all the refs we want(total_refs)
> 
> Take the following EXTENT_ITEM as example:
> item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize 95
>      extent refs 24 gen 7302 flags DATA
>      extent data backref root 257 objectid 260 offset 65536 count 5 #backref entry 1
>      extent data backref root 258 objectid 265 offset 0 count 9 #backref entry 2
>      shared data backref parent 394985472 count 10 #backref entry 3
> 
> If we want to search for backref entry 1, total_refs here would be 24 rather
> than its count 5.
> 
> The reason to use 24 is because some EXTENT_DATA in backref entry 3 block
> 394985472 also points to EXTENT_ITEM 40831553536, if this block also belongs to
> root 257 and lies between these 5 items of backref entry 1,
> and we use total_refs = 5, we'll end up missing some refs from backref
> entry 1.
> 

This seems like the crux of the problem here.  The backref stuff is just blindly 
looking for counts, without keeping track of which counts matter.  So for full 
refs we should only be looking down paths where generation > the snapshot 
generation.  And then for the shared refs it should be anything that comes from 
that shared block.  That would be the proper way to fix the problem, not put 
some arbitrary limit on how far into the inode we can search.

That's not to say what you are doing here is wrong, we really won't have 
anything past the given extent size so we can definitely break out earlier.  But 
what I worry about is say 394985472 _was_ in between the leaves while searching 
down for backref entry #1, we'd end up with duplicate entries and not catch some 
of the other entries.  This feels like we need to fix the backref logic to know 
if it's looking for direct refs, and thus only go down paths with generation > 
snapshot generation, or shared refs and thus only count things that directly 
point to the parent block.  Thanks,

Josef

  parent reply	other threads:[~2020-01-03 16:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-03  9:44 [PATCH] btrfs: add extra ending condition for indirect data backref resolution ethanwu
2020-01-03 10:15 ` Qu Wenruo
2020-01-03 11:37   ` ethanwu
2020-01-03 12:32     ` Qu Wenruo
2020-01-03 16:31 ` Josef Bacik [this message]
2020-01-06  3:45   ` ethanwu
2020-01-06 16:05     ` Josef Bacik
2020-01-17 10:44       ` ethanwu
2020-01-17 14:21         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=017fb679-ca13-f38e-e67b-6f1d42c1fbbd@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=ethanwu@synology.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.