Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: General Zed <general-zed@zedlx.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Fri, 13 Sep 2019 20:20:47 -0400
Message-ID: <20190913202047.Horde.AoCq_fiuaZoXuLQ52u9Ljzt@server53.web-hosting.com> (raw)
In-Reply-To: <20190913204321.GG22121@hungrycats.org>


Quoting Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:

> On Fri, Sep 13, 2019 at 07:04:28AM -0400, Austin S. Hemmelgarn wrote:
>> On 2019-09-12 19:54, Zygo Blaxell wrote:
>> > On Thu, Sep 12, 2019 at 06:57:26PM -0400, General Zed wrote:
>> > >
>> > > Quoting Chris Murphy <lists@colorremedies.com>:
>> > >
>> > > > On Thu, Sep 12, 2019 at 3:34 PM General Zed  
>> <general-zed@zedlx.com> wrote:
>> > > > >
>> > > > >
>> > > > > Quoting Chris Murphy <lists@colorremedies.com>:
>> > > > >
>> > > > > > On Thu, Sep 12, 2019 at 1:18 PM <webmaster@zedlx.com> wrote:
>> > > > > > >
>> > > > > > > It is normal and common for defrag operation to use  
>> some disk space
>> > > > > > > while it is running. I estimate that a reasonable limit  
>> would be to
>> > > > > > > use up to 1% of total partition size. So, if a  
>> partition size is 100
>> > > > > > > GB, the defrag can use 1 GB. Lets call this "defrag  
>> operation space".
>> > > > > >
>> > > > > > The simplest case of a file with no shared extents, the  
>> minimum free
>> > > > > > space should be set to the potential maximum rewrite of  
>> the file, i.e.
>> > > > > > 100% of the file size. Since Btrfs is COW, the entire  
>> operation must
>> > > > > > succeed or fail, no possibility of an ambiguous in  
>> between state, and
>> > > > > > this does apply to defragment.
>> > > > > >
>> > > > > > So if you're defragging a 10GiB file, you need 10GiB minimum free
>> > > > > > space to COW those extents to a new, mostly contiguous,  
>> set of exents,
>> > > > >
>> > > > > False.
>> > > > >
>> > > > > You can defragment just 1 GB of that file, and then just  
>> write out to
>> > > > > disk (in new extents) an entire new version of b-trees.
>> > > > > Of course, you don't really need to do all that, as usually only a
>> > > > > small part of the b-trees need to be updated.
>> > > >
>> > > > The `-l` option allows the user to choose a maximum amount to
>> > > > defragment. Setting up a default defragment behavior that has a
>> > > > variable outcome is not idempotent and probably not a good idea.
>> > >
>> > > We are talking about a future, imagined defrag. It has no -l  
>> option yet, as
>> > > we haven't discussed it yet.
>> > >
>> > > > As for kernel behavior, it presumably could defragment in portions,
>> > > > but it would have to completely update all affected metadata after
>> > > > each e.g. 1GiB section, translating into 10 separate rewrites of file
>> > > > metadata, all affected nodes, all the way up the tree to the super.
>> > > > There is no such thing as metadata overwrites in Btrfs. You're
>> > > > familiar with the wandering trees problem?
>> > >
>> > > No, but it doesn't matter.
>> > >
>> > > At worst, it just has to completely write-out "all metadata",  
>> all the way up
>> > > to the super. It needs to be done just once, because what's the point of
>> > > writing it 10 times over? Then, the super is updated as the  
>> final commit.
>> >
>> > This is kind of a silly discussion.  The biggest extent possible on
>> > btrfs is 128MB, and the incremental gains of forcing 128MB extents to
>> > be consecutive are negligible.  If you're defragging a 10GB file, you're
>> > just going to end up doing 80 separate defrag operations.
>
>> Do you have a source for this claim of a 128MB max extent size?
>
> 	~/linux$ git grep BTRFS.*MAX.*EXTENT
> 	fs/btrfs/ctree.h:#define BTRFS_MAX_EXTENT_SIZE SZ_128M
>
> Plus years of watching bees logs scroll by, which never have an extent
> above 128M in size that contains data.
>
> I think there are a couple of exceptions for non-data-block extent items
> like holes.  A hole extent item doesn't have any physical location on
> disk, so its size field can be any 64-bit integer.  btrfs imposes no
> restriction there.
>
> PREALLOC extents are half hole, half nodatacow extent.  They can be
> larger than 128M when they are empty, but when data is written to them,
> they are replaced only in 128M chunks.
>
>> Because
>> everything I've seen indicates the max extent size is a full data chunk (so
>> 1GB for the common case, potentially up to about 5GB for really big
>> filesystems)
>
> If what you've seen so far is 'filefrag -v' output (or any tool based
> on the FIEMAP ioctl), then you are seeing post-processed extent sizes
> (where adjacent extents where begin[n+1] == end[n] are coalesced for
> human consumption), not true on-disk and in-metadata sizes.  FIEMAP is
> slow and full of lies.
>
>> > 128MB is big enough you're going to be seeking in the middle of reading
>> > an extent anyway.  Once you have the file arranged in 128MB contiguous
>> > fragments (or even a tenth of that on medium-fast spinning drives),
>> > the job is done.
>> >
>> > > On my comouter the ENTIRE METADATA is 1 GB. That would be very  
>> tolerable and
>> > > doable.
>> >
>> > You must have a small filesystem...mine range from 16 to 156GB, a bit too
>> > big to fit in RAM comfortably.
>> >
>> > Don't forget you have to write new checksum and free space tree pages.
>> > In the worst case, you'll need about 1GB of new metadata pages for each
>> > 128MB you defrag (though you get to delete 99.5% of them immediately
>> > after).
>> >
>> > > But that is a very bad case, because usually not much metadata has to be
>> > > updated or written out to disk.
>> > >
>> > > So, there is no problem.
>> > >

Mr. Blaxell, could you be so kind to help me out on this mission of  
mine to describe a good defrag algorithm for BTRFS.

In order for me to better understand the circumstances, I need to know  
a few statistics about BTRFS filesystes. I'm interested in both the  
extreme, and in the common values.

One of the values in question is the total number of reflinks in BTRFS  
fielsystems. In fact, I would like to know the followin information  
related to some btrfs partition: number of extents, the number of  
reflinks, the size of physical data written on disk, and the size of  
logical (by sharing) data written on disk, the total size of the  
partition, the size of metadata, and the number of snapshots.

So, if you could please provide me with a few values that you think  
could be valid on typical (common) partitions, and also some of the  
extreme values that you encountered while using btrfs.

Thanks,

     General Zed





  reply index

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  2:55 zedlryqc
2019-09-09  3:51 ` Qu Wenruo
2019-09-09 11:25   ` zedlryqc
2019-09-09 12:18     ` Qu Wenruo
2019-09-09 12:28       ` Qu Wenruo
2019-09-09 17:11         ` webmaster
2019-09-10 17:39           ` Andrei Borzenkov
2019-09-10 22:41             ` webmaster
2019-09-09 15:29       ` Graham Cobb
2019-09-09 17:24         ` Remi Gauvin
2019-09-09 19:26         ` webmaster
2019-09-10 19:22           ` Austin S. Hemmelgarn
2019-09-10 23:32             ` webmaster
2019-09-11 12:02               ` Austin S. Hemmelgarn
2019-09-11 16:26                 ` Zygo Blaxell
2019-09-11 17:20                 ` webmaster
2019-09-11 18:19                   ` Austin S. Hemmelgarn
2019-09-11 20:01                     ` webmaster
2019-09-11 21:42                       ` Zygo Blaxell
2019-09-13  1:33                         ` General Zed
2019-09-11 21:37                     ` webmaster
2019-09-12 11:31                       ` Austin S. Hemmelgarn
2019-09-12 19:18                         ` webmaster
2019-09-12 19:44                           ` Chris Murphy
2019-09-12 21:34                             ` General Zed
2019-09-12 22:28                               ` Chris Murphy
2019-09-12 22:57                                 ` General Zed
2019-09-12 23:54                                   ` Zygo Blaxell
2019-09-13  0:26                                     ` General Zed
2019-09-13  3:12                                       ` Zygo Blaxell
2019-09-13  5:05                                         ` General Zed
2019-09-14  0:56                                           ` Zygo Blaxell
2019-09-14  1:50                                             ` General Zed
2019-09-14  4:42                                               ` Zygo Blaxell
2019-09-14  4:53                                                 ` Zygo Blaxell
2019-09-15 17:54                                                 ` General Zed
2019-09-16 22:51                                                   ` Zygo Blaxell
2019-09-17  1:03                                                     ` General Zed
2019-09-17  1:34                                                       ` General Zed
2019-09-17  1:44                                                       ` Chris Murphy
2019-09-17  4:55                                                         ` Zygo Blaxell
2019-09-17  4:19                                                       ` Zygo Blaxell
2019-09-17  3:10                                                     ` General Zed
2019-09-17  4:05                                                       ` General Zed
2019-09-14  1:56                                             ` General Zed
2019-09-13  5:22                                         ` General Zed
2019-09-13  6:16                                         ` General Zed
2019-09-13  6:58                                         ` General Zed
2019-09-13  9:25                                           ` General Zed
2019-09-13 17:02                                             ` General Zed
2019-09-14  0:59                                             ` Zygo Blaxell
2019-09-14  1:28                                               ` General Zed
2019-09-14  4:28                                                 ` Zygo Blaxell
2019-09-15 18:05                                                   ` General Zed
2019-09-16 23:05                                                     ` Zygo Blaxell
2019-09-13  7:51                                         ` General Zed
2019-09-13 11:04                                     ` Austin S. Hemmelgarn
2019-09-13 20:43                                       ` Zygo Blaxell
2019-09-14  0:20                                         ` General Zed [this message]
2019-09-14 18:29                                       ` Chris Murphy
2019-09-14 23:39                                         ` Zygo Blaxell
2019-09-13 11:09                                   ` Austin S. Hemmelgarn
2019-09-13 17:20                                     ` General Zed
2019-09-13 18:20                                       ` General Zed
2019-09-12 19:54                           ` Austin S. Hemmelgarn
2019-09-12 22:21                             ` General Zed
2019-09-13 11:53                               ` Austin S. Hemmelgarn
2019-09-13 16:54                                 ` General Zed
2019-09-13 18:29                                   ` Austin S. Hemmelgarn
2019-09-13 19:40                                     ` General Zed
2019-09-14 15:10                                       ` Jukka Larja
2019-09-12 22:47                             ` General Zed
2019-09-11 21:37                   ` Zygo Blaxell
2019-09-11 23:21                     ` webmaster
2019-09-12  0:10                       ` Remi Gauvin
2019-09-12  3:05                         ` webmaster
2019-09-12  3:30                           ` Remi Gauvin
2019-09-12  3:33                             ` Remi Gauvin
2019-09-12  5:19                       ` Zygo Blaxell
2019-09-12 21:23                         ` General Zed
2019-09-14  4:12                           ` Zygo Blaxell
2019-09-16 11:42                             ` General Zed
2019-09-17  0:49                               ` Zygo Blaxell
2019-09-17  2:30                                 ` General Zed
2019-09-17  5:30                                   ` Zygo Blaxell
2019-09-17 10:07                                     ` General Zed
2019-09-17 23:40                                       ` Zygo Blaxell
2019-09-18  4:37                                         ` General Zed
2019-09-18 18:00                                           ` Zygo Blaxell
2019-09-10 23:58             ` webmaster
2019-09-09 23:24         ` Qu Wenruo
2019-09-09 23:25         ` webmaster
2019-09-09 16:38       ` webmaster
2019-09-09 23:44         ` Qu Wenruo
2019-09-10  0:00           ` Chris Murphy
2019-09-10  0:51             ` Qu Wenruo
2019-09-10  0:06           ` webmaster
2019-09-10  0:48             ` Qu Wenruo
2019-09-10  1:24               ` webmaster
2019-09-10  1:48                 ` Qu Wenruo
2019-09-10  3:32                   ` webmaster
2019-09-10 14:14                     ` Nikolay Borisov
2019-09-10 22:35                       ` webmaster
2019-09-11  6:40                         ` Nikolay Borisov
2019-09-10 22:48                     ` webmaster
2019-09-10 23:14                   ` webmaster
2019-09-11  0:26               ` webmaster
2019-09-11  0:36                 ` webmaster
2019-09-11  1:00                 ` webmaster
2019-09-10 11:12     ` Austin S. Hemmelgarn
2019-09-09  3:12 webmaster

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190913202047.Horde.AoCq_fiuaZoXuLQ52u9Ljzt@server53.web-hosting.com \
    --to=general-zed@zedlx.com \
    --cc=ahferroin7@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git