All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Balance loops: what we know so far
Date: Wed, 13 May 2020 01:02:05 -0400	[thread overview]
Message-ID: <20200513050204.GX10769@hungrycats.org> (raw)
In-Reply-To: <b7b8bbf8-119b-02ea-5fad-0f7c3abab07d@gmx.com>

On Wed, May 13, 2020 at 10:28:37AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/5/12 下午10:11, Zygo Blaxell wrote:
> > On Tue, May 12, 2020 at 09:43:06AM -0400, Zygo Blaxell wrote:
> >> On Mon, May 11, 2020 at 04:31:32PM +0800, Qu Wenruo wrote:
> >>> Hi Zygo,
> >>>
> >>> Would you like to test this diff?
> >>>
> >>> Although I haven't find a solid reason yet, there is another report and
> >>> with the help from the reporter, it turns out that balance hangs at
> >>> relocating DATA_RELOC tree block.
> >>>
> >>> After some more digging, DATA_RELOC tree doesn't need REF_COW bit at all
> >>> since we can't create snapshot for data reloc tree.
> >>>
> >>> By removing the REF_COW bit, we could ensure that data reloc tree always
> >>> get cowed for relocation (just like extent tree), this would hugely
> >>> reduce the complexity for data reloc tree.
> >>>
> >>> Not sure if this would help, but it passes my local balance run.
> >>
> >> I ran it last night.  It did 30804 loops during a metadata block group
> >> balance, and is now looping on a data block group as I write this.
> 
> OK, not that surprised the patch doesn't help.
> But still, the patch itself could still make sense for removing the
> REFCOW bit for data reloc tree.
> 
> But I'm interesting in that, after that 30804 loops, it found its way to
> next block group?!

No, it got cancelled.  My test scripts start a balance every (hour +/-
random(1 hour) and cancels a balance on a similar schedule.

> > Here's the block group that is failing, and some poking around with it:
> 
> In fact, such poking would be the most valuable part.
> 
> > 
> > 	root@tester:~# ~/share/python-btrfs/examples/show_block_group_contents.py 4368594108416 /media/testfs/
> > 	block group vaddr 4368594108416 length 1073741824 flags DATA used 530509824 used_pct 49
> > 	extent vaddr 4368594108416 length 121053184 refs 1 gen 1318394 flags DATA
> > 	    inline shared data backref parent 4374646833152 count 1
> > 	extent vaddr 4368715161600 length 120168448 refs 1 gen 1318394 flags DATA
> > 	    inline shared data backref parent 4374801383424 count 1
> > 	extent vaddr 4368835330048 length 127623168 refs 1 gen 1318394 flags DATA
> > 	    inline shared data backref parent 4374801383424 count 1
> > 	extent vaddr 4368962953216 length 124964864 refs 1 gen 1318394 flags DATA
> > 	    inline shared data backref parent 4374801383424 count 1
> > 	extent vaddr 4369182420992 length 36700160 refs 1 gen 1321064 flags DATA
> > 	    inline extent data backref root 257 objectid 257 offset 822607872 count 1
> 
> One interesting thing is, there are 5 extents during the loop.
> The first 4 looks like they belong to data reloc tree, which means they
> have been swapped, waiting to be cleaned up.
> 
> The last one belongs to root 257, and looks like it hadn't been relocated.
> 
> > 
> > The extent data backref is unusual--during loops, I don't usually see those.
> > And...as I write this, it disappeared (it was part of the bees hash table, and
> > was overwritten).  Now there are 4 extents reported in the balance loop (note:
> > I added a loop counter to the log message):
> 
> Then it means the last one get properly relocated.
> 
> The cleanup for the first 4 doesn't happen properly.
> > 
> > 	[Tue May 12 09:44:22 2020] BTRFS info (device dm-0): found 5 extents, loops 378, stage: update data pointers
> > 	[Tue May 12 09:44:23 2020] BTRFS info (device dm-0): found 5 extents, loops 379, stage: update data pointers
> > 	[Tue May 12 09:44:24 2020] BTRFS info (device dm-0): found 5 extents, loops 380, stage: update data pointers
> > 	[Tue May 12 09:44:26 2020] BTRFS info (device dm-0): found 5 extents, loops 381, stage: update data pointers
> > 	[Tue May 12 09:44:27 2020] BTRFS info (device dm-0): found 5 extents, loops 382, stage: update data pointers
> > 	[Tue May 12 10:04:49 2020] BTRFS info (device dm-0): found 5 extents, loops 383, stage: update data pointers
> > 	[Tue May 12 10:04:53 2020] BTRFS info (device dm-0): found 4 extents, loops 384, stage: update data pointers
> > 	[Tue May 12 10:04:58 2020] BTRFS info (device dm-0): found 4 extents, loops 385, stage: update data pointers
> > 	[Tue May 12 10:05:00 2020] BTRFS info (device dm-0): found 4 extents, loops 386, stage: update data pointers
> > 	[Tue May 12 10:05:00 2020] BTRFS info (device dm-0): found 4 extents, loops 387, stage: update data pointers
> > 	[Tue May 12 10:05:01 2020] BTRFS info (device dm-0): found 4 extents, loops 388, stage: update data pointers
> > 
> > Some of the extents that remain are confusing python-btrfs a little:
> > 
> > 	root@tester:~# ~/share/python-btrfs/examples/show_block_group_data_extent_filenames.py 4368594108416 /media/testfs/
> > 	block group vaddr 4368594108416 length 1073741824 flags DATA used 530509824 used_pct 49
> > 	extent vaddr 4368594108416 length 121053184 refs 1 gen 1318394 flags DATA
> > 	Traceback (most recent call last):
> > 	  File "/root/share/python-btrfs/examples/show_block_group_data_extent_filenames.py", line 52, in <module>
> > 	    inodes, bytes_missed = logical_to_ino_fn(fs.fd, extent.vaddr)
> > 	  File "/root/share/python-btrfs/examples/show_block_group_data_extent_filenames.py", line 28, in find_out_about_v1_or_v2
> > 	    inodes, bytes_missed = using_v2(fd, vaddr)
> > 	  File "/root/share/python-btrfs/examples/show_block_group_data_extent_filenames.py", line 17, in using_v2
> > 	    inodes, bytes_missed = btrfs.ioctl.logical_to_ino_v2(fd, vaddr, ignore_offset=True)
> > 	  File "/media/share/python-btrfs/examples/btrfs/ioctl.py", line 565, in logical_to_ino_v2
> > 	    return _logical_to_ino(fd, vaddr, bufsize, ignore_offset, _v2=True)
> > 	  File "/media/share/python-btrfs/examples/btrfs/ioctl.py", line 581, in _logical_to_ino
> > 	    fcntl.ioctl(fd, IOC_LOGICAL_INO_V2, args)
> 
> I'm a little surprised about the it's using logical ino ioctl, not just
> TREE_SEARCH.

Tree search can't read shared backrefs because they refer directly to
disk blocks, not to object/type/offset tuples.  It would be nice to have
an ioctl that can read a metadata block (or even a data block) by bytenr.

Or even better, just a fd that can be obtained by some ioctl to access
the btrfs virtual address space with pread().

> I guess if we could get a plain tree search based one (it only search
> commit root, which is exactly balance based on), it would be easier to
> do the digging.

That would be nice.  I have an application for it.  ;)

> > 	OSError: [Errno 22] Invalid argument
> > 
> > 	root@tester:~# btrfs ins log 4368594108416 /media/testfs/
> > 	/media/testfs//snap-1589258042/testhost/var/log/messages.6.lzma
> > 	/media/testfs//current/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589249822/testhost/var/log/messages.6.lzma
> > 	ERROR: ino paths ioctl: No such file or directory
> > 	/media/testfs//snap-1589249547/testhost/var/log/messages.6.lzma
> > 	ERROR: ino paths ioctl: No such file or directory
> > 	/media/testfs//snap-1589248407/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589256422/testhost/var/log/messages.6.lzma
> > 	ERROR: ino paths ioctl: No such file or directory
> > 	/media/testfs//snap-1589251322/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589251682/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589253842/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589246727/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589258582/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589244027/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589245227/testhost/var/log/messages.6.lzma
> > 	ERROR: ino paths ioctl: No such file or directory
> > 	ERROR: ino paths ioctl: No such file or directory
> > 	/media/testfs//snap-1589246127/testhost/var/log/messages.6.lzma
> > 	/media/testfs//snap-1589247327/testhost/var/log/messages.6.lzma
> > 	ERROR: ino paths ioctl: No such file or directory
> > 
> > Hmmm, I wonder if there's a problem with deleted snapshots?
> 
> Yes, also what I'm guessing.
> 
> The cleanup of data reloc tree doesn't look correct to me.
> 
> Thanks for the new clues,
> Qu

Here's a fun one:

1.  Delete all the files on a filesystem where balance loops
have occurred.

2.  Verify there are no data blocks (one data block group
with used = 0):

# show_block_groups.py /testfs/
block group vaddr 435969589248 length 1073741824 flags METADATA|RAID1 used 180224 used_pct 0
block group vaddr 4382686969856 length 33554432 flags SYSTEM|RAID1 used 16384 used_pct 0
block group vaddr 4383794266112 length 1073741824 flags DATA used 0 used_pct 0

3.  Create a new file with a single reference in the only (root) subvol:
# head -c 1024m > file
# sync
# show_block_groups.py .
block group vaddr 435969589248 length 1073741824 flags METADATA|RAID1 used 1245184 used_pct 0
block group vaddr 4382686969856 length 33554432 flags SYSTEM|RAID1 used 16384 used_pct 0
block group vaddr 4384868007936 length 1073741824 flags DATA used 961708032 used_pct 90
block group vaddr 4385941749760 length 1073741824 flags DATA used 112033792 used_pct 10

4.  Run balance, and it immediately loops on a single extent:
# btrfs balance start -d .
[Wed May 13 00:41:58 2020] BTRFS info (device dm-0): balance: start -d
[Wed May 13 00:41:58 2020] BTRFS info (device dm-0): relocating block group 4385941749760 flags data
[Wed May 13 00:42:00 2020] BTRFS info (device dm-0): found 1 extents, loops 1, stage: move data extents
[Wed May 13 00:42:00 2020] BTRFS info (device dm-0): found 1 extents, loops 2, stage: update data pointers
[Wed May 13 00:42:01 2020] BTRFS info (device dm-0): found 1 extents, loops 3, stage: update data pointers
[Wed May 13 00:42:01 2020] BTRFS info (device dm-0): found 1 extents, loops 4, stage: update data pointers
[Wed May 13 00:42:01 2020] BTRFS info (device dm-0): found 1 extents, loops 5, stage: update data pointers
[Wed May 13 00:42:01 2020] BTRFS info (device dm-0): found 1 extents, loops 6, stage: update data pointers
[Wed May 13 00:42:01 2020] BTRFS info (device dm-0): found 1 extents, loops 7, stage: update data pointers
[Wed May 13 00:42:02 2020] BTRFS info (device dm-0): found 1 extents, loops 8, stage: update data pointers
[Wed May 13 00:42:02 2020] BTRFS info (device dm-0): found 1 extents, loops 9, stage: update data pointers
[Wed May 13 00:42:02 2020] BTRFS info (device dm-0): found 1 extents, loops 10, stage: update data pointers
[Wed May 13 00:42:02 2020] BTRFS info (device dm-0): found 1 extents, loops 11, stage: update data pointers
[etc...]

I tried it a 3 more times time and there was no loop.  The 5th try looped again.

There might be a correlation with cancels.  After a fresh boot, I can
often balance a few dozen block groups before there's a loop, but if I
cancel a balance, the next balance almost always loops.

  reply	other threads:[~2020-05-13  5:02 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-11 21:14 Balance loops: what we know so far Zygo Blaxell
2020-04-27  7:07 ` Qu Wenruo
2020-04-28  4:55   ` Zygo Blaxell
2020-04-28  9:54     ` Qu Wenruo
2020-04-28 14:51       ` Zygo Blaxell
2020-04-29  5:34         ` Qu Wenruo
2020-04-29 12:23           ` Sebastian Döring
2020-05-04 18:54       ` Andrea Gelmini
2020-05-04 23:48         ` Qu Wenruo
2020-05-05  9:10           ` Andrea Gelmini
2020-05-06  5:58             ` Qu Wenruo
2020-05-06 18:24               ` Andrea Gelmini
2020-05-07  9:59                 ` Andrea Gelmini
2020-05-08  6:33                 ` Qu Wenruo
2020-05-11  8:31     ` Qu Wenruo
2020-05-12 13:43       ` Zygo Blaxell
2020-05-12 14:11         ` Zygo Blaxell
2020-05-13  2:28           ` Qu Wenruo
2020-05-13  5:02             ` Zygo Blaxell [this message]
2020-05-13  6:36               ` Qu Wenruo
2020-05-13  5:24             ` Zygo Blaxell
2020-05-13 11:23               ` Qu Wenruo
2020-05-13 12:21                 ` Zygo Blaxell
2020-05-14  8:08                   ` Qu Wenruo
2020-05-14  8:55                     ` Qu Wenruo
2020-05-14 17:44                       ` Zygo Blaxell
2020-05-14 23:43                         ` Qu Wenruo
2020-05-15  6:57                         ` Qu Wenruo
2020-05-15 15:17                           ` Zygo Blaxell
2020-05-18  5:25                             ` Qu Wenruo
2020-05-20  7:27                             ` Qu Wenruo
2020-05-21  3:26                               ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513050204.GX10769@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.