Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: zedlryqc@server53.web-hosting.com, linux-btrfs@vger.kernel.org
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Mon, 9 Sep 2019 20:18:10 +0800
Message-ID: <fb80b97a-9bcd-5d13-0026-63e11e1a06b5@gmx.com> (raw)
In-Reply-To: <20190909072518.Horde.c4SobsfDkO6FUtKo3e_kKu0@server53.web-hosting.com>

[-- Attachment #1.1: Type: text/plain, Size: 8728 bytes --]



On 2019/9/9 下午7:25, zedlryqc@server53.web-hosting.com wrote:
> 
> Quoting Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> 1) Full online backup (or copy, whatever you want to call it)
>>> btrfs backup <filesystem name> <partition name> [-f]
>>> - backups a btrfs filesystem given by <filesystem name> to a partition
>>> <partition name> (with all subvolumes).
>>
>> Why not just btrfs send?
>>
>> Or you want to keep the whole subvolume structures/layout?
> 
> Yes, I want to keep the whole subvolume structures/layout. I want to
> keep everything. Usually, when I want to backup a partition, I want to
> keep everything, and I suppose most other people have a similar idea.
> 
>> I'd say current send/receive is more flex.
> 
> Um, 'flexibility' has nothing to do with it. Send/receive is a
> completely different use case.
> So, each one has some benefits and some drawbacks, but 'send/receive'
> cannot replace 'full online backup'
> 
> Here is where send/receive is lacking:
>     - too complicated to do if many subvolumes are involved
>     - may require recursive subvolume enumeration in order to emulate
> 'full online backup'
>     - may require extra storage space
>     - is not mountable, not easy to browse the backup contents
>     - not easy to recover just a few selected files from backup
> There's probably more things where send/receive is lacking, but I think
> I have given sufficient number of important differences which show that
> send/receive cannot successfully replace the functionality of 'full
> online backup'.
> 
>> And you also needs to understand btrfs also integrates volume
>> management, thus it's not just <partition name>, you also needs RAID
>> level and things like that.
> 
> This is a minor point. So, please, let's not get into too many
> irrelevant details here.
> 
> There can be a sensible default to 'single data, DUP metadata', and a
> way for a user to override this default, but that feature is
> not-so-important. If the user wants to change the RAID level, he can
> easily do it later by mounting the backup.
> 
>>
>> All can be done already by send/receive, although at subvolume level.
> 
> Yeah, maybe I should manually type it all for all subvolumes, one by
> one. Also must be carefull to do it in the correct order if I want it
> not to consume extra space.
> And the backup is not mountable.
> 
> This proposal (workaround) of yours appears to me as too complicated,
> too error prone,
> missing important features.
> 
> But, I just thought, you can actually emulate 'full online backup' with
> this send/receive. Here is how.
> You do a script which does the following:
>     - makes a temporary snapshot of every subvolume
>     - use 'btrfs send' to send all the temporary snapshots, on-the-fly
> (maybe via pipe), in the correct order, to a proces running a 'brtfs
> receive', which should then immediately write it all to the destination
> partition. All the buffers can stay in-memory.
>     - when all the snapshots are received and written to destination,
> fix subvol IDs
>     - delete temporary snapshots from source
> Of course, this script should then be a part of standard btrfs tools.
> 
>> Please check if send/receive is suitable for your use case.
> 
> No. Absolutely not.
> 
> 
>>> 2) Sensible defrag
>>> The defrag is currently a joke.
> 
>>> How to do it:
>>> - The extents must not be unshared, but just shuffled a bit. Unsharing
>>> the extents is, in most situations, not tolerable.
> 
>> I definitely see cases unsharing extents makes sense, so at least we
>> should let user to determine what they want.
> 
> Maybe there are such cases, but I would say that a vast majority of
> users (99,99%) in a vast majority of cases (99,99%) don't want the
> defrag operation to reduce free disk space.
> 
>> What's wrong with current file based defrag?
>> If you want to defrag a subvolume, just iterate through all files.
> 
> I repeat: The defrag should not decrease free space. That's the 'normal'
> expectation.

Since you're talking about btrfs, it's going to do CoW for metadata not
matter whatever, as long as you're going to change anything, btrfs will
cause extra space usage.
(Although the final result may not cause extra used disk space as freed
space is as large as newly allocated space, but to maintain CoW, newly
allocated space can't overlap with old data)

Further more, talking about snapshots with space wasted by extent
booking, it's definitely possible user want to break the shared extents:

Subvol 257, inode 257 has the following file extents:
(257 EXTENT_DATA 0)
disk bytenr X len 16M
offset 0 num_bytes 4k  << Only 4k is referred in the whole 16M extent.

Subvol 258, inode 257 has the following file extents:
(257 EXTENT_DATA 0)
disk bytenr X len 16M
offset 0 num_bytes 4K  << Shared with that one in subv 257
(257 EXTENT_DATA 4K)
disk bytenr Y len 16M
offset 0 num_bytes 4K  << Similar case, only 4K of 16M is used.

In that case, user definitely want to defrag file in subvol 258, as if
that extent at bytenr Y can be freed, we can free up 16M, and allocate a
new 8K extent for subvol 258, ino 257.
(And will also want to defrag the extent in subvol 257 ino 257 too)

That's why knowledge in btrfs tech details can make a difference.
Sometimes you may find some ideas are brilliant and why btrfs is not
implementing it, but if you understand btrfs to some extent, you will
know the answer by yourself.


> 
>>> - I think it would be wrong to use a general deduplication algorithm for
>>> this. Instead, the information about the shared extents should be
>>> analyzed given the starting state of the filesystem, and than the
>>> algorithm should produce an optimal solution based on the currently
>>> shared extents.
>>
>> Please be more specific, like giving an example for it.
> 
> Let's say that there is a file FFF with extents e11, e12, e13, e22, e23,
> e33, e34
> - in subvolA the file FFF consists of e11, e12, e13
> - in subvolB the file FFF consists of e11, e22, e23
> - in subvolC the file FFF consists of e11, e22, e33, e34
> 
> After defrag, where 'selected subvolume' is subvolC, the extents are
> ordered on disk as follows:
> 
> e11,e22,e33,e34 - e23 - e12,e13

Inode FFF in different subvolumes are different inodes. They have no
knowledge of other inodes in other subvolumes.

If FFF in subvol C is e11, e22, e33, e34, then that's it.
I didn't see the point still.

And what's the on-disk bytenr of all these extents? Which has larger
bytenr and length?

Please provide a better description like xfs_io -c "fiemap -v" output
before and after.

> 
> In the list above, the comma denotes neighbouring extents, the dash
> indicates that there can be a possible gap.
> As you can see in the list, the file FFF is fully defragmented in
> subvolC, since its extents are occupying neighbouring disk sectors.
> 
> 
>>> 3) Downgrade to 'single' or 'DUP' (also, general easy way to switch
>>> between RAID levels)
>>>  Currently, as much as I gather, user has to do a "btrfs balance start
>>> -dconvert=single -mconvert=single
>>> ", than delete a drive, which is a bit ridiculous sequence of
>>> operations.
> 
>> That's a shortcut for chunk profile change.
>> My first idea of this is, it could cause more problem than benefit.
>> (It only benefits profile downgrade, thus only makes sense for
>> RAID1->SINGLE, DUP->SINGLE, and RAID10->RAID0, nothing else)
> 
> Those listed cases are exactly the ones I judge to be most important.
> Three important cases.

I'd argue it's downgrade, not that important. As most users want to
replace the missing/bad device and maintain the raid profile.

> 
>> I still prefer the safer allocate-new-chunk way to convert chunks, even
>> at a cost of extra IO.
> 
> I don't mind whether it allocates new chunks or not. It is better, in my
> opinion, if new chunks are not allocated, but both ways are essentially OK.
> 
> What I am complaining about is that at one point in time, after issuing
> the command:
>     btrfs balance start -dconvert=single -mconvert=single
> and before issuing the 'btrfs delete', the system could be in a too
> fragile state, with extents unnecesarily spread out over two drives,
> which is both a completely unnecessary operation, and it also seems to
> me that it could be dangerous in some situations involving potentially
> malfunctioning drives.

In that case, you just need to replace that malfunctioning device other
than fall back to SINGLE.

Thanks,
Qu

> 
> Please reconsider.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply index

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  2:55 zedlryqc
2019-09-09  3:51 ` Qu Wenruo
2019-09-09 11:25   ` zedlryqc
2019-09-09 12:18     ` Qu Wenruo [this message]
2019-09-09 12:28       ` Qu Wenruo
2019-09-09 17:11         ` webmaster
2019-09-10 17:39           ` Andrei Borzenkov
2019-09-10 22:41             ` webmaster
2019-09-09 15:29       ` Graham Cobb
2019-09-09 17:24         ` Remi Gauvin
2019-09-09 19:26         ` webmaster
2019-09-10 19:22           ` Austin S. Hemmelgarn
2019-09-10 23:32             ` webmaster
2019-09-11 12:02               ` Austin S. Hemmelgarn
2019-09-11 16:26                 ` Zygo Blaxell
2019-09-11 17:20                 ` webmaster
2019-09-11 18:19                   ` Austin S. Hemmelgarn
2019-09-11 20:01                     ` webmaster
2019-09-11 21:42                       ` Zygo Blaxell
2019-09-13  1:33                         ` General Zed
2019-09-11 21:37                     ` webmaster
2019-09-12 11:31                       ` Austin S. Hemmelgarn
2019-09-12 19:18                         ` webmaster
2019-09-12 19:44                           ` Chris Murphy
2019-09-12 21:34                             ` General Zed
2019-09-12 22:28                               ` Chris Murphy
2019-09-12 22:57                                 ` General Zed
2019-09-12 23:54                                   ` Zygo Blaxell
2019-09-13  0:26                                     ` General Zed
2019-09-13  3:12                                       ` Zygo Blaxell
2019-09-13  5:05                                         ` General Zed
2019-09-14  0:56                                           ` Zygo Blaxell
2019-09-14  1:50                                             ` General Zed
2019-09-14  4:42                                               ` Zygo Blaxell
2019-09-14  4:53                                                 ` Zygo Blaxell
2019-09-15 17:54                                                 ` General Zed
2019-09-16 22:51                                                   ` Zygo Blaxell
2019-09-17  1:03                                                     ` General Zed
2019-09-17  1:34                                                       ` General Zed
2019-09-17  1:44                                                       ` Chris Murphy
2019-09-17  4:55                                                         ` Zygo Blaxell
2019-09-17  4:19                                                       ` Zygo Blaxell
2019-09-17  3:10                                                     ` General Zed
2019-09-17  4:05                                                       ` General Zed
2019-09-14  1:56                                             ` General Zed
2019-09-13  5:22                                         ` General Zed
2019-09-13  6:16                                         ` General Zed
2019-09-13  6:58                                         ` General Zed
2019-09-13  9:25                                           ` General Zed
2019-09-13 17:02                                             ` General Zed
2019-09-14  0:59                                             ` Zygo Blaxell
2019-09-14  1:28                                               ` General Zed
2019-09-14  4:28                                                 ` Zygo Blaxell
2019-09-15 18:05                                                   ` General Zed
2019-09-16 23:05                                                     ` Zygo Blaxell
2019-09-13  7:51                                         ` General Zed
2019-09-13 11:04                                     ` Austin S. Hemmelgarn
2019-09-13 20:43                                       ` Zygo Blaxell
2019-09-14  0:20                                         ` General Zed
2019-09-14 18:29                                       ` Chris Murphy
2019-09-14 23:39                                         ` Zygo Blaxell
2019-09-13 11:09                                   ` Austin S. Hemmelgarn
2019-09-13 17:20                                     ` General Zed
2019-09-13 18:20                                       ` General Zed
2019-09-12 19:54                           ` Austin S. Hemmelgarn
2019-09-12 22:21                             ` General Zed
2019-09-13 11:53                               ` Austin S. Hemmelgarn
2019-09-13 16:54                                 ` General Zed
2019-09-13 18:29                                   ` Austin S. Hemmelgarn
2019-09-13 19:40                                     ` General Zed
2019-09-14 15:10                                       ` Jukka Larja
2019-09-12 22:47                             ` General Zed
2019-09-11 21:37                   ` Zygo Blaxell
2019-09-11 23:21                     ` webmaster
2019-09-12  0:10                       ` Remi Gauvin
2019-09-12  3:05                         ` webmaster
2019-09-12  3:30                           ` Remi Gauvin
2019-09-12  3:33                             ` Remi Gauvin
2019-09-12  5:19                       ` Zygo Blaxell
2019-09-12 21:23                         ` General Zed
2019-09-14  4:12                           ` Zygo Blaxell
2019-09-16 11:42                             ` General Zed
2019-09-17  0:49                               ` Zygo Blaxell
2019-09-17  2:30                                 ` General Zed
2019-09-17  5:30                                   ` Zygo Blaxell
2019-09-17 10:07                                     ` General Zed
2019-09-17 23:40                                       ` Zygo Blaxell
2019-09-18  4:37                                         ` General Zed
2019-09-18 18:00                                           ` Zygo Blaxell
2019-09-10 23:58             ` webmaster
2019-09-09 23:24         ` Qu Wenruo
2019-09-09 23:25         ` webmaster
2019-09-09 16:38       ` webmaster
2019-09-09 23:44         ` Qu Wenruo
2019-09-10  0:00           ` Chris Murphy
2019-09-10  0:51             ` Qu Wenruo
2019-09-10  0:06           ` webmaster
2019-09-10  0:48             ` Qu Wenruo
2019-09-10  1:24               ` webmaster
2019-09-10  1:48                 ` Qu Wenruo
2019-09-10  3:32                   ` webmaster
2019-09-10 14:14                     ` Nikolay Borisov
2019-09-10 22:35                       ` webmaster
2019-09-11  6:40                         ` Nikolay Borisov
2019-09-10 22:48                     ` webmaster
2019-09-10 23:14                   ` webmaster
2019-09-11  0:26               ` webmaster
2019-09-11  0:36                 ` webmaster
2019-09-11  1:00                 ` webmaster
2019-09-10 11:12     ` Austin S. Hemmelgarn
2019-09-09  3:12 webmaster

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb80b97a-9bcd-5d13-0026-63e11e1a06b5@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zedlryqc@server53.web-hosting.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git