Re: [PATCH v2 03/11] btrfs: introduce new members for extent_map

From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org,  David Sterba <dsterba@suse.com>
Subject: Re: [PATCH v2 03/11] btrfs: introduce new members for extent_map
Date: Fri, 10 May 2024 12:26:46 +0100	[thread overview]
Message-ID: <CAL3q7H6rz9Z8xCbsjvqaEQC34m7uiM_FH1ecW+PQC5kARWKxrA@mail.gmail.com> (raw)
In-Reply-To: <1fe28d75-a4a3-4304-9998-a88a5fee4919@gmx.com>

On Thu, May 9, 2024 at 11:12 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/5/10 02:35, Filipe Manana 写道:
> > On Fri, May 3, 2024 at 7:02 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> Introduce two new members for extent_map:
> >>
> >> - disk_bytenr
> >> - offset
> >>
> >> Both are matching the members with the same name inside
> >> btrfs_file_extent_items.
> >>
> >> For now this patch only touches those members when:
> >>
> >> - Reading btrfs_file_extent_items from disk
> >> - Inserting new holes
> >> - Merging two extent maps
> >>    With the new disk_bytenr and disk_num_bytes, doing merging would be a
> >>    little complex, as we have 3 different cases:
> >>
> >>    * Both extent maps are referring to the same data extent
> >>    * Both extent maps are referring to different data extents, but
> >>      those data extents are adjacent, and extent maps are at head/tail
> >>      of each data extents
> >
> > I have no idea what this last part of the sentence means:
> >
> > "and extent maps are at head/tail of each data extents"
>
> For this case:
>
>          |<- data extent 1 ->|<- data extent 2 ->|
>         |          |////////|////////|          |
>                       FE A      FE B
>
> In above case, FE A is only referring the tailing part of data extent 1,
> and FE B is referring to the heading part of data extent 2.
>
> In that case, FE A and FE B can be merged into something like this:
>
>          |<------ merged extent 1 + 2 --------->|
>         |          |/////////////////|         |
>                         FE A + B
>
> In that case, merged file extent would have:
>
> - disk_bytenr = fe_a->disk_bytenr
> - disk_num_bytes = fe_a->disk_num_bytes + fe_b->disk_num_bytes
> - ram_bytes = fe_a->ram_bytes + fe_b->ram_bytes
> - offset = fe_a->offset
> - num_bytes = fe_a->num_bytes + fe_b->num_bytes
>
> >
> >>    * One of the extent map is referring to an merged and larger data
> >
> > map -> maps
> > an -> a
> >
> >>      extent that covers both extent maps
> >
> > Not sure if I can understand this. How can one of the extent maps
> > already cover the range of the other extent map?
> > This suggests some overlapping, or most likely I misunderstood what
> > this is trying to say.
>
> That's the for impossible test case 3, but as you mentioned, it's more
> sane just to remove it.
>
> [...]
> >
> >> +        *
> >> +        * The calculation here always merge the data extents first, then update
> >> +        * @offset using the new data extents.
> >> +        *
> >> +        * For case 1), the merged data extent would be the same.
> >> +        * For case 2), we just merge the two data extents into one.
> >> +        * For case 3), we just got the larger data extent.
> >> +        */
> >> +       new_disk_bytenr = min(prev->disk_bytenr, next->disk_bytenr);
> >> +       new_disk_num_bytes = max(prev->disk_bytenr + prev->disk_num_bytes,
> >> +                                next->disk_bytenr + next->disk_num_bytes) -
> >> +                            new_disk_bytenr;
> >
> > So this is confusing, disk_num_bytes being a max between the two
> > extent maps and not their sum.
>
> Check case 1).
>
> Both file extents are referring to the same data extent.
>
> Like this:
>
> FE 1, file pos 0, num_bytes 4K, disk_bytenr X, disk_num_bytes 16K,
> offset 0, ram_bytes 16K, compression none
>
> FE 2, file pos 4k, num_bytes 4K, disk_bytenr X, disk_num_bytes 16K,
> offset 4k, ram_bytes 16K, compression none.
>
> In that case we should not just sum up the disk_num_bytes at all.
> Remember disk_num_bytes are for the data extent.

Yes, but for cases where they refer to different file extent items
that are contiguous on disk, the max is confusing, and a sum is what
makes sense. Example:

FE 1, file pos 0, num_bytes 4K, disk_bytenr X, disk_num_bytes 4K,
offset 0, ram_bytes 4K, no compression

FE 2, file pos 4K, num_bytes 4K, disk_bytenr X + 4K, disk_num_bytes
4K, offset 0, ram_bytes 4K, no compression

When merging the corresponding extent maps it's natural to think
disk_num_bytes is 8K and not 4K.

But as I mentioned before, after merging we really don't use
disk_num_bytes anywhere.
It's only used during extent logging, which only happens for extents
maps that were not merged and can not be before they are logged.

Before this patchset, when disk_num_bytes was named orig_block_len,
that was never updated during merging, exactly because we don't use
cases for it.

>
> > I gather this is modelled after what we already do at
> > btrfs_drop_extent_map_range() when splitting.
> >
> > But the truth is we never used the disk_num_bytes of an extent map
> > that was merged - we also didn't do it before this patch, for that
> > reason.
> > It's only used for logging new extents - which can't be merged - they
> > can be merged only after being logged.
> >
> > We can set this to the sum, or leave with some value to signal it's invalid.
> > Just leave a comment saying disk_num_bytes it's not used anywhere for
> > merged extent maps.
> >
> > In the splitting case at btrfs_drop_extent_map_range() it's what we
> > need since in the case the extent is new and not logged (in the
> > modified list), disk_num_bytes always represents the size of the
> > original, before split, extent.
>
> [...]
>
> >> +       file_extent.disk_bytenr = ins.objectid;
> >> +       file_extent.disk_num_bytes = ins.offset;
> >> +       file_extent.num_bytes = ins.offset;
> >> +       file_extent.ram_bytes = ins.offset;
> >> +       file_extent.offset = 0;
> >> +       file_extent.compression = BTRFS_COMPRESS_NONE;
> >
> > Same as before:
> >
> > "If we always have to initialize all the members of the structure,
> > it's pointless to have it initialized to zeros when we declared it."
> >
> [...]
> >> +       file_extent.disk_bytenr = ins.objectid;
> >> +       file_extent.disk_num_bytes = ins.offset;
> >> +       file_extent.num_bytes = num_bytes;
> >> +       file_extent.ram_bytes = ram_bytes;
> >> +       file_extent.offset = encoded->unencoded_offset;
> >> +       file_extent.compression = compression;
> >
> > Same as before:
> >
> > "If we always have to initialize all the members of the structure,
> > it's pointless to have it initialized to zeros when we declared it."
>
> Fair enough, and an uninitilized structure member can also make compiler
> to warn us.
>
> Thanks,
> Qu
> >
> > The rest I suppose seems fine, but I have to look at the rest of the patchset.
> >
> > Thanks.
> >
> >>          em = create_io_em(inode, start, num_bytes,
> >>                            start - encoded->unencoded_offset, ins.objectid,
> >>                            ins.offset, ins.offset, ram_bytes, compression,
> >> -                         BTRFS_ORDERED_COMPRESSED);
> >> +                         &file_extent, BTRFS_ORDERED_COMPRESSED);
> >>          if (IS_ERR(em)) {
> >>                  ret = PTR_ERR(em);
> >>                  goto out_free_reserved;
> >> --
> >> 2.45.0
> >>
> >>
> >