Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: General Zed <general-zed@zedlx.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Chris Murphy <lists@colorremedies.com>,
	"Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Fri, 13 Sep 2019 02:58:32 -0400
Message-ID: <20190913025832.Horde.Bwn_M-5buBYcgGbqhc_wDkU@server53.web-hosting.com> (raw)
In-Reply-To: <20190913031242.GF22121@hungrycats.org>


Quoting Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:

> On Thu, Sep 12, 2019 at 08:26:04PM -0400, General Zed wrote:
>>
>> Quoting Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:
>>
>> > On Thu, Sep 12, 2019 at 06:57:26PM -0400, General Zed wrote:
>> > >
>> > > At worst, it just has to completely write-out "all metadata",  
>> all the way up
>> > > to the super. It needs to be done just once, because what's the point of
>> > > writing it 10 times over? Then, the super is updated as the  
>> final commit.
>> >
>> > This is kind of a silly discussion.  The biggest extent possible on
>> > btrfs is 128MB, and the incremental gains of forcing 128MB extents to
>> > be consecutive are negligible.  If you're defragging a 10GB file, you're
>> > just going to end up doing 80 separate defrag operations.
>>
>> Ok, then the max extent is 128 MB, that's fine. Someone here previously said
>> that it is 2 GB, so he has disinformed me (in order to further his false
>> argument).
>
> If the 128MB limit is removed, you then hit the block group size limit,
> which is some number of GB from 1 to 10 depending on number of disks
> available and raid profile selection (the striping raid profiles cap
> block group sizes at 10 disks, and single/raid1 profiles always use 1GB
> block groups regardless of disk count).  So 2GB is _also_ a valid extent
> size limit, just not the first limit that is relevant for defrag.
>
> A lot of people get confused by 'filefrag -v' output, which coalesces
> physically adjacent but distinct extents.  So if you use that tool,
> it can _seem_ like there is a 2.5GB extent in a file, but it is really
> 20 distinct 128MB extents that start and end at adjacent addresses.
> You can see the true structure in 'btrfs ins dump-tree' output.
>
> That also brings up another reason why 10GB defrags are absurd on btrfs:
> extent addresses are virtual.  There's no guarantee that a pair of extents
> that meet at a block group boundary are physically adjacent, and after
> operations like RAID array reorganization or free space defragmentation,
> they are typically quite far apart physically.
>
>> I didn't ever said that I would force extents larger than 128 MB.
>>
>> If you are defragging a 10 GB file, you'll likely have to do it in 10 steps,
>> because the defrag is usually allowed to only use a limited amount of disk
>> space while in operation. That has nothing to do with the extent size.
>
> Defrag is literally manipulating the extent size.  Fragments and extents
> are the same thing in btrfs.
>
> Currently a 10GB defragment will work in 80 steps, but doesn't necessarily
> commit metadata updates after each step, so more than 128MB of temporary
> space may be used (especially if your disks are fast and empty,
> and you start just after the end of the previous commit interval).
> There are some opportunities to coalsce metadata updates, occupying up
> to a (arbitrary) limit of 512MB of RAM (or when memory pressure forces
> a flush, whichever comes first), but exploiting those opportunities
> requires more space for uncommitted data.
>
> If the filesystem starts to get low on space during a defrag, it can
> inject commits to force metadata updates to happen more often, which
> reduces the amount of temporary space needed (we can't delete the original
> fragmented extents until their replacement extent is committed); however,
> if the filesystem is so low on space that you're worried about running
> out during a defrag, then you probably don't have big enough contiguous
> free areas to relocate data into anyway, i.e. the defrag is just going to
> push data from one fragmented location to a different fragmented location,
> or bail out with "sorry, can't defrag that."

Nope.

Each defrag "cycle" consists of two parts:
      1) move-out part
      2) move-in part

The move-out part select one contiguous area of the disk. Almost any  
area will do, but some smart choices are better. It then moves-out all  
data from that contiguous area into whatever holes there are left  
empty on the disk. The biggest problem is actually updating the  
metadata, since the updates are not localized.
Anyway, this part can even be skipped.

The move-in part now populates the completely free contiguous area  
with defragmented data.

In the case that the move-out part needs to be skipped because the  
defrag estimates that the update to metatada will be too big (like in  
the pathological case of a disk with 156 GB of metadata), it can  
sucessfully defrag by performing only the move-in part. In that case,  
the move-in area is not free of data and "defragmented" data won't be  
fully defragmented. Also, there should be at least 20% free disk space  
in this case in order to avoid defrag turning pathological.

But, these are all some pathological cases. They should be considered  
in some other discussion.



  parent reply index

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  2:55 zedlryqc
2019-09-09  3:51 ` Qu Wenruo
2019-09-09 11:25   ` zedlryqc
2019-09-09 12:18     ` Qu Wenruo
2019-09-09 12:28       ` Qu Wenruo
2019-09-09 17:11         ` webmaster
2019-09-10 17:39           ` Andrei Borzenkov
2019-09-10 22:41             ` webmaster
2019-09-09 15:29       ` Graham Cobb
2019-09-09 17:24         ` Remi Gauvin
2019-09-09 19:26         ` webmaster
2019-09-10 19:22           ` Austin S. Hemmelgarn
2019-09-10 23:32             ` webmaster
2019-09-11 12:02               ` Austin S. Hemmelgarn
2019-09-11 16:26                 ` Zygo Blaxell
2019-09-11 17:20                 ` webmaster
2019-09-11 18:19                   ` Austin S. Hemmelgarn
2019-09-11 20:01                     ` webmaster
2019-09-11 21:42                       ` Zygo Blaxell
2019-09-13  1:33                         ` General Zed
2019-09-11 21:37                     ` webmaster
2019-09-12 11:31                       ` Austin S. Hemmelgarn
2019-09-12 19:18                         ` webmaster
2019-09-12 19:44                           ` Chris Murphy
2019-09-12 21:34                             ` General Zed
2019-09-12 22:28                               ` Chris Murphy
2019-09-12 22:57                                 ` General Zed
2019-09-12 23:54                                   ` Zygo Blaxell
2019-09-13  0:26                                     ` General Zed
2019-09-13  3:12                                       ` Zygo Blaxell
2019-09-13  5:05                                         ` General Zed
2019-09-14  0:56                                           ` Zygo Blaxell
2019-09-14  1:50                                             ` General Zed
2019-09-14  4:42                                               ` Zygo Blaxell
2019-09-14  4:53                                                 ` Zygo Blaxell
2019-09-15 17:54                                                 ` General Zed
2019-09-16 22:51                                                   ` Zygo Blaxell
2019-09-17  1:03                                                     ` General Zed
2019-09-17  1:34                                                       ` General Zed
2019-09-17  1:44                                                       ` Chris Murphy
2019-09-17  4:55                                                         ` Zygo Blaxell
2019-09-17  4:19                                                       ` Zygo Blaxell
2019-09-17  3:10                                                     ` General Zed
2019-09-17  4:05                                                       ` General Zed
2019-09-14  1:56                                             ` General Zed
2019-09-13  5:22                                         ` General Zed
2019-09-13  6:16                                         ` General Zed
2019-09-13  6:58                                         ` General Zed [this message]
2019-09-13  9:25                                           ` General Zed
2019-09-13 17:02                                             ` General Zed
2019-09-14  0:59                                             ` Zygo Blaxell
2019-09-14  1:28                                               ` General Zed
2019-09-14  4:28                                                 ` Zygo Blaxell
2019-09-15 18:05                                                   ` General Zed
2019-09-16 23:05                                                     ` Zygo Blaxell
2019-09-13  7:51                                         ` General Zed
2019-09-13 11:04                                     ` Austin S. Hemmelgarn
2019-09-13 20:43                                       ` Zygo Blaxell
2019-09-14  0:20                                         ` General Zed
2019-09-14 18:29                                       ` Chris Murphy
2019-09-14 23:39                                         ` Zygo Blaxell
2019-09-13 11:09                                   ` Austin S. Hemmelgarn
2019-09-13 17:20                                     ` General Zed
2019-09-13 18:20                                       ` General Zed
2019-09-12 19:54                           ` Austin S. Hemmelgarn
2019-09-12 22:21                             ` General Zed
2019-09-13 11:53                               ` Austin S. Hemmelgarn
2019-09-13 16:54                                 ` General Zed
2019-09-13 18:29                                   ` Austin S. Hemmelgarn
2019-09-13 19:40                                     ` General Zed
2019-09-14 15:10                                       ` Jukka Larja
2019-09-12 22:47                             ` General Zed
2019-09-11 21:37                   ` Zygo Blaxell
2019-09-11 23:21                     ` webmaster
2019-09-12  0:10                       ` Remi Gauvin
2019-09-12  3:05                         ` webmaster
2019-09-12  3:30                           ` Remi Gauvin
2019-09-12  3:33                             ` Remi Gauvin
2019-09-12  5:19                       ` Zygo Blaxell
2019-09-12 21:23                         ` General Zed
2019-09-14  4:12                           ` Zygo Blaxell
2019-09-16 11:42                             ` General Zed
2019-09-17  0:49                               ` Zygo Blaxell
2019-09-17  2:30                                 ` General Zed
2019-09-17  5:30                                   ` Zygo Blaxell
2019-09-17 10:07                                     ` General Zed
2019-09-17 23:40                                       ` Zygo Blaxell
2019-09-18  4:37                                         ` General Zed
2019-09-18 18:00                                           ` Zygo Blaxell
2019-09-10 23:58             ` webmaster
2019-09-09 23:24         ` Qu Wenruo
2019-09-09 23:25         ` webmaster
2019-09-09 16:38       ` webmaster
2019-09-09 23:44         ` Qu Wenruo
2019-09-10  0:00           ` Chris Murphy
2019-09-10  0:51             ` Qu Wenruo
2019-09-10  0:06           ` webmaster
2019-09-10  0:48             ` Qu Wenruo
2019-09-10  1:24               ` webmaster
2019-09-10  1:48                 ` Qu Wenruo
2019-09-10  3:32                   ` webmaster
2019-09-10 14:14                     ` Nikolay Borisov
2019-09-10 22:35                       ` webmaster
2019-09-11  6:40                         ` Nikolay Borisov
2019-09-10 22:48                     ` webmaster
2019-09-10 23:14                   ` webmaster
2019-09-11  0:26               ` webmaster
2019-09-11  0:36                 ` webmaster
2019-09-11  1:00                 ` webmaster
2019-09-10 11:12     ` Austin S. Hemmelgarn
2019-09-09  3:12 webmaster

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190913025832.Horde.Bwn_M-5buBYcgGbqhc_wDkU@server53.web-hosting.com \
    --to=general-zed@zedlx.com \
    --cc=ahferroin7@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git