All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Qu Wenruo <wqu@suse.com>, Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Why btrfs no longer allocate the extent at the beginning of an empty chunk (was: Re: [PATCH 12/15] btrfs: add new read repair infrastructure)
Date: Fri, 20 May 2022 14:43:55 +0800	[thread overview]
Message-ID: <e636ebd2-2e67-0e94-9758-925df5a89557@gmx.com> (raw)
In-Reply-To: <d99b2ba3-23d2-0ea1-9aa4-13a898e77ab6@suse.com>

>> I tried to write a test case for this by copying btrfs/140 and then
>> as a first step extending it to three mirrors unsing the raid1c1
>> profile.  But it seems that the tricks used there don't work,
>> as the code in btrfs/140 relies on the fact that the btrfs logic
>> address repored by file frag is reported by dump-tree as the item
>> "index" ĭn this line:
>>
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsiz
>>
>> but for the raid1c3 profile that line reports something entirely
>> different.
>>
>> for raid1:
>>
>> logical: 137756672
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751
>> itemsize 112
>>
>> for raid1c3:
>>
>> logical: 343998464
>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15621
>> itemsize 144
>>
>> any idea how to find physical sectors to corrupt for raid1c1?
>>
>
> I also recently hit weird cases why extent allocator no longer puts the
> first data extent at the beginning of a chunk.

Thankfully, this is not a bug, but a combination of seemingly
straightforward behaviors, which leads to a weird combined result.

It takes me a short adventure into the free space handling to find the
problem.

For my example, I'm using 3x10G disks, and running RAID0 for data, RAID1
for metadata:

    Label:              (null)
    UUID:               bb10a539-0344-445a-9e77-bbda65d79366
    Node size:          16384
    Sector size:        4096
    Filesystem size:    30.00GiB
    Block group profiles:
      Data:             RAID0             3.00GiB
      Metadata:         RAID1           256.00MiB
      System:           RAID1             8.00MiB
    SSD detected:       no
    Zoned device:       no
    Incompat features:  extref, skinny-metadata, no-holes
    Runtime features:   free-space-tree
    Checksum:           crc32c
    Number of devices:  3
    Devices:
       ID        SIZE  PATH
        1    10.00GiB  /dev/test/scratch1
        2    10.00GiB  /dev/test/scratch2
        3    10.00GiB  /dev/test/scratch3

The 3GiB data chunk (at logical 298844160, length 3GiB) is completely
empty, but notice that, btrfs needs to avoid allocating extents for
super block reservations.

And we have one logical bytenr 434110464, which is at the superblock
location of /dev/test/scratch1.

So the free space of that 3GiB chunk is split into two parts:

[298844160, +135266304)
[434176000, +3085893632)

Notice the latter part is much larger.

So far so good, but there is another thing involved, the cached free
space behavior.

In find_free_space(), if we are searching from the beginning of a block
group, we will use `rb_first_cached(&ctl->free_space_bytes);`

But free_space_bytes rbtree is not sorted using logical bytenr, but the
free space.
And the leftmost one will have the most amount of free space.
So instead of choose [298844160, +135266304), we choose [434176000,
+3085893632) which has the much larger free space.


Thus we got the seemingly weird bytenr, 434176000, for our first data
extent.


And each behavior itself is completely sane and straightforward.
We can not use space reserved for superblocks.
We should use the free space which has the most free space.

But in the end, when combining two of them, we got the behavior that not
returning the beginning of a seemingly empty chunk.

So in short, we should not rely on the dirty dump tree hacks, but a
better version of btrfs-map-logical to grab the real physical offset of
a logical bytenr.

Thanks,
Qu




  parent reply	other threads:[~2022-05-20  6:44 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
2022-05-17 14:59   ` Johannes Thumshirn
2022-05-18  8:44     ` Christoph Hellwig
2022-05-20  8:45   ` Nikolay Borisov
2022-05-20 16:24     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
2022-05-17 15:00   ` Johannes Thumshirn
2022-05-18 17:07   ` Anand Jain
2022-05-20  8:47   ` Nikolay Borisov
2022-05-20 16:25     ` Christoph Hellwig
2022-05-20 22:36       ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 03/15] btrfs: save the original bi_iter into btrfs_bio for buffered read Christoph Hellwig
2022-05-17 14:50 ` [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
2022-05-17 15:35   ` Johannes Thumshirn
2022-05-20 10:05   ` Nikolay Borisov
2022-05-17 14:50 ` [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
2022-05-17 15:27   ` Johannes Thumshirn
2022-05-18  8:46     ` Christoph Hellwig
2022-05-18 10:07       ` Qu Wenruo
2022-05-20 16:27         ` Christoph Hellwig
2022-05-21  1:16           ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c Christoph Hellwig
2022-05-17 15:18   ` Johannes Thumshirn
2022-05-17 14:50 ` [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair Christoph Hellwig
2022-05-17 15:18   ` Johannes Thumshirn
2022-05-17 22:17   ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 08/15] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
2022-05-17 22:22   ` Qu Wenruo
2022-05-18  8:48     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
2022-05-17 15:24   ` Johannes Thumshirn
2022-05-18  8:45     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper Christoph Hellwig
2022-05-17 15:37   ` Johannes Thumshirn
2022-05-17 22:26   ` Qu Wenruo
2022-05-18  8:47     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage Christoph Hellwig
2022-05-17 22:47   ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
2022-05-17 23:04   ` Qu Wenruo
2022-05-18  8:54     ` Christoph Hellwig
2022-05-18 10:20       ` Qu Wenruo
2022-05-18 12:48         ` Christoph Hellwig
2022-05-19  9:36     ` Christoph Hellwig
2022-05-19 10:41       ` Qu Wenruo
2022-05-19 10:45         ` Nikolay Borisov
2022-05-19 10:46           ` Qu Wenruo
2022-05-19 10:50         ` Christoph Hellwig
2022-05-19 11:27           ` Qu Wenruo
2022-05-20  6:43         ` Qu Wenruo [this message]
2022-05-20 15:25     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 13/15] btrfs: use the new read repair code for direct I/O Christoph Hellwig
2022-05-17 14:50 ` [PATCH 14/15] btrfs: use the new read repair code for buffered reads Christoph Hellwig
2022-05-17 14:50 ` [PATCH 15/15] btrfs: remove io_failure_record infrastructure completely Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e636ebd2-2e67-0e94-9758-925df5a89557@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.