Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: General Zed <general-zed@zedlx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Tue, 17 Sep 2019 01:30:55 -0400
Message-ID: <20190917053055.GG24379@hungrycats.org> (raw)
In-Reply-To: <20190916223039.Horde.HZvhrBkQsN12DF6IDemvio6@server53.web-hosting.com>

On Mon, Sep 16, 2019 at 10:30:39PM -0400, General Zed wrote:
> Quoting Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:
> > and I think that's impossible so I start from designs
> > that make forward progress with a fixed allocation of resources.
> 
> Well, that's not useless, but it's kind of meh. Waste of time. Solve the
> problem like a real man! Shoot with thermonuclear weapons only!

I have thermonuclear weapons:  the metadata trees in my filesystems.  ;)

> > > So I think you should all inform yourself a little better about various
> > > defrag algorithms and solutions that exist. Apparently, you all lost the
> > > sight of the big picture. You can't see the wood from the trees.
> > 
> > I can see the woods, but any solution that starts with "enumerate all
> > the trees" will be met with extreme skepticism, unless it can do that
> > enumeration incrementally.
> 
> I think that I'm close to a solution that only needs to scan the free-space
> tree in the entirety at start. All other trees can be only partially
> scanned. I mean, at start. As the defrag progresses, it will go through all
> the trees (except in case of defragging only a part of the partition). If a
> partition is to be only partially defragged, then the trees do not need to
> be red in entirety. Only the free space tree needs to be red in entirety at
> start (and the virtual-physical address translation trees, which are small,
> I guess).

I doubt that on a 50TB filesystem you need to read the whole tree...are
you going to globally optimize 50TB at once?  That will take a while.
Start with a 100GB sliding window, maybe.

> > This is fairly common on btrfs:  the btrfs words don't mean the same as
> > other words, causing confusion.  How many copies are there in a btrfs
> > 4-disk raid1 array?
> 
> 2 copies of everything, except the superblock which has 2-6 copies.

Good, you can enter the clubhouse.  A lot of new btrfs users are surprised
it's less than 4.

> > > > > This is sovled simply by always running defrag before dedupe.
> > > > Defrag and dedupe in separate passes is nonsense on btrfs.
> > > Defrag can be run without dedupe.
> > Yes, but if you're planning to run both on the same filesystem, they
> > had better be aware of each other.
> 
> On-demand defrag doesn't need to be aware of on-demand dedupe. Or, only in
> the sense that dedupe should be shut down while defrag is running.
> 
> Perhaps you were referring to an on-the-fly dedupe. In that case, yes.

My dedupe runs continuously (well, polling with incremental scan).
It doesn't shut down.

> > > Now, how to organize dedupe? I didn't think about it yet. I'll leave it to
> > > you, but it seems to me that defrag should be involved there. And, my defrag
> > > solution would help there very, very much.
> > 
> > I can't see defrag in isolation as anything but counterproductive to
> > dedupe (and vice versa).
> 
> Share-preserving defrag can't be harmful to dedupe.

Sure it can.  Dedupe needs to split extents by content, and btrfs only
supports that by copying.  If defrag is making new extents bigger before
dedupe gets to them, there is more work for dedupe when it needs to make
extents smaller again.
 
> I would suggest one of the two following simple solutions:
>    a) the on-demand defrag should be run BEFORE AND AFTER the on-demand
> dedupe.
> or b) the on-demand defrag should be run BEFORE the on-demand dedupe, and
> on-demand dedupe uses defrag functionality to defrag while dedupe is in
> progress.
> 
> So I guess you were thinking about the solution b) all the time when you
> said that dedupe and defrag need to be related.

Well, both would be running continuously in the same process, so
they would negotiate with each other as required.  Dedupe runs first
on new extents to create a plan for increasing extent sharing, then
defrag creates a plan for sufficient logical/physical contiguity of
those extents after dedupe has cut them into content-aligned pieces.
Extents that are entirely duplicate simply disappear and do not form
part of the defrag workload (at least until it is time to defragment
free space...).  Both plans are combined and optimized, then the final
data relocation command sequence is sent to the filesystem.

> > > > Extent splitting in-place is not possible on btrfs, so extent boundary
> > > > changes necessarily involve data copies.  Reference counting is done
> > > > by extent in btrfs, so it is only possible to free complete extents.
> > > 
> > > Great, there is reference counting in btrfs. That helps. Good design.
> > 
> > Well, I say "reference counting" because I'm simplifying for an audience
> > that does not yet all know the low-level details.  The counter, such as
> > it is, gives values "zero" or "more than zero."  You never know exactly
> > how many references there are without doing the work to enumerate them.
> > The "is extent unique" function in btrfs runs the enumeration loop until
> > the second reference is found or the supply of references is exhausted,
> > whichever comes first.  It's a tradeoff to make snapshots fast.
> 
> Well, that's a disappointment.
> 
> > When a reference is created to a new extent, it refers to the entire
> > extent.  References can refer to parts of extents (the reference has an
> > offset and length field), so when an extent is partially overwritten, the
> > extent is not modified.  Only the reference is modified, to make it refer
> > to a subset of the extent (references in other snapshots are not changed,
> > and the extent data itself is immutable).  This makes POSIX fast, but it
> > creates some headaches related to garbage collection, dedupe, defrag, etc.
> 
> Ok, got it. Thaks.
> 
> 
> 
> 

  reply index

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  2:55 zedlryqc
2019-09-09  3:51 ` Qu Wenruo
2019-09-09 11:25   ` zedlryqc
2019-09-09 12:18     ` Qu Wenruo
2019-09-09 12:28       ` Qu Wenruo
2019-09-09 17:11         ` webmaster
2019-09-10 17:39           ` Andrei Borzenkov
2019-09-10 22:41             ` webmaster
2019-09-09 15:29       ` Graham Cobb
2019-09-09 17:24         ` Remi Gauvin
2019-09-09 19:26         ` webmaster
2019-09-10 19:22           ` Austin S. Hemmelgarn
2019-09-10 23:32             ` webmaster
2019-09-11 12:02               ` Austin S. Hemmelgarn
2019-09-11 16:26                 ` Zygo Blaxell
2019-09-11 17:20                 ` webmaster
2019-09-11 18:19                   ` Austin S. Hemmelgarn
2019-09-11 20:01                     ` webmaster
2019-09-11 21:42                       ` Zygo Blaxell
2019-09-13  1:33                         ` General Zed
2019-09-11 21:37                     ` webmaster
2019-09-12 11:31                       ` Austin S. Hemmelgarn
2019-09-12 19:18                         ` webmaster
2019-09-12 19:44                           ` Chris Murphy
2019-09-12 21:34                             ` General Zed
2019-09-12 22:28                               ` Chris Murphy
2019-09-12 22:57                                 ` General Zed
2019-09-12 23:54                                   ` Zygo Blaxell
2019-09-13  0:26                                     ` General Zed
2019-09-13  3:12                                       ` Zygo Blaxell
2019-09-13  5:05                                         ` General Zed
2019-09-14  0:56                                           ` Zygo Blaxell
2019-09-14  1:50                                             ` General Zed
2019-09-14  4:42                                               ` Zygo Blaxell
2019-09-14  4:53                                                 ` Zygo Blaxell
2019-09-15 17:54                                                 ` General Zed
2019-09-16 22:51                                                   ` Zygo Blaxell
2019-09-17  1:03                                                     ` General Zed
2019-09-17  1:34                                                       ` General Zed
2019-09-17  1:44                                                       ` Chris Murphy
2019-09-17  4:55                                                         ` Zygo Blaxell
2019-09-17  4:19                                                       ` Zygo Blaxell
2019-09-17  3:10                                                     ` General Zed
2019-09-17  4:05                                                       ` General Zed
2019-09-14  1:56                                             ` General Zed
2019-09-13  5:22                                         ` General Zed
2019-09-13  6:16                                         ` General Zed
2019-09-13  6:58                                         ` General Zed
2019-09-13  9:25                                           ` General Zed
2019-09-13 17:02                                             ` General Zed
2019-09-14  0:59                                             ` Zygo Blaxell
2019-09-14  1:28                                               ` General Zed
2019-09-14  4:28                                                 ` Zygo Blaxell
2019-09-15 18:05                                                   ` General Zed
2019-09-16 23:05                                                     ` Zygo Blaxell
2019-09-13  7:51                                         ` General Zed
2019-09-13 11:04                                     ` Austin S. Hemmelgarn
2019-09-13 20:43                                       ` Zygo Blaxell
2019-09-14  0:20                                         ` General Zed
2019-09-14 18:29                                       ` Chris Murphy
2019-09-14 23:39                                         ` Zygo Blaxell
2019-09-13 11:09                                   ` Austin S. Hemmelgarn
2019-09-13 17:20                                     ` General Zed
2019-09-13 18:20                                       ` General Zed
2019-09-12 19:54                           ` Austin S. Hemmelgarn
2019-09-12 22:21                             ` General Zed
2019-09-13 11:53                               ` Austin S. Hemmelgarn
2019-09-13 16:54                                 ` General Zed
2019-09-13 18:29                                   ` Austin S. Hemmelgarn
2019-09-13 19:40                                     ` General Zed
2019-09-14 15:10                                       ` Jukka Larja
2019-09-12 22:47                             ` General Zed
2019-09-11 21:37                   ` Zygo Blaxell
2019-09-11 23:21                     ` webmaster
2019-09-12  0:10                       ` Remi Gauvin
2019-09-12  3:05                         ` webmaster
2019-09-12  3:30                           ` Remi Gauvin
2019-09-12  3:33                             ` Remi Gauvin
2019-09-12  5:19                       ` Zygo Blaxell
2019-09-12 21:23                         ` General Zed
2019-09-14  4:12                           ` Zygo Blaxell
2019-09-16 11:42                             ` General Zed
2019-09-17  0:49                               ` Zygo Blaxell
2019-09-17  2:30                                 ` General Zed
2019-09-17  5:30                                   ` Zygo Blaxell [this message]
2019-09-17 10:07                                     ` General Zed
2019-09-17 23:40                                       ` Zygo Blaxell
2019-09-18  4:37                                         ` General Zed
2019-09-18 18:00                                           ` Zygo Blaxell
2019-09-10 23:58             ` webmaster
2019-09-09 23:24         ` Qu Wenruo
2019-09-09 23:25         ` webmaster
2019-09-09 16:38       ` webmaster
2019-09-09 23:44         ` Qu Wenruo
2019-09-10  0:00           ` Chris Murphy
2019-09-10  0:51             ` Qu Wenruo
2019-09-10  0:06           ` webmaster
2019-09-10  0:48             ` Qu Wenruo
2019-09-10  1:24               ` webmaster
2019-09-10  1:48                 ` Qu Wenruo
2019-09-10  3:32                   ` webmaster
2019-09-10 14:14                     ` Nikolay Borisov
2019-09-10 22:35                       ` webmaster
2019-09-11  6:40                         ` Nikolay Borisov
2019-09-10 22:48                     ` webmaster
2019-09-10 23:14                   ` webmaster
2019-09-11  0:26               ` webmaster
2019-09-11  0:36                 ` webmaster
2019-09-11  1:00                 ` webmaster
2019-09-10 11:12     ` Austin S. Hemmelgarn
2019-09-09  3:12 webmaster

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190917053055.GG24379@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=general-zed@zedlx.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git