Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: webmaster@zedlx.com
To: linux-btrfs@vger.kernel.org
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Mon, 09 Sep 2019 12:38:18 -0400
Message-ID: <20190909123818.Horde.dbl-yi_cNi8aKDaW_QYXVij@server53.web-hosting.com> (raw)
In-Reply-To: <fb80b97a-9bcd-5d13-0026-63e11e1a06b5@gmx.com>


Quoting Qu Wenruo <quwenruo.btrfs@gmx.com>:

>>>> 2) Sensible defrag
>>>> The defrag is currently a joke.
>>
>> Maybe there are such cases, but I would say that a vast majority of
>> users (99,99%) in a vast majority of cases (99,99%) don't want the
>> defrag operation to reduce free disk space.
>>
>>> What's wrong with current file based defrag?
>>> If you want to defrag a subvolume, just iterate through all files.
>>
>> I repeat: The defrag should not decrease free space. That's the 'normal'
>> expectation.
>
> Since you're talking about btrfs, it's going to do CoW for metadata not
> matter whatever, as long as you're going to change anything, btrfs will
> cause extra space usage.
> (Although the final result may not cause extra used disk space as freed
> space is as large as newly allocated space, but to maintain CoW, newly
> allocated space can't overlap with old data)

It is OK for defrag to temporarily decrease free space while defrag  
operation is in progress. That's normal.

> Further more, talking about snapshots with space wasted by extent
> booking, it's definitely possible user want to break the shared extents:
>
> Subvol 257, inode 257 has the following file extents:
> (257 EXTENT_DATA 0)
> disk bytenr X len 16M
> offset 0 num_bytes 4k  << Only 4k is referred in the whole 16M extent.
>
> Subvol 258, inode 257 has the following file extents:
> (257 EXTENT_DATA 0)
> disk bytenr X len 16M
> offset 0 num_bytes 4K  << Shared with that one in subv 257
> (257 EXTENT_DATA 4K)
> disk bytenr Y len 16M
> offset 0 num_bytes 4K  << Similar case, only 4K of 16M is used.
>
> In that case, user definitely want to defrag file in subvol 258, as if
> that extent at bytenr Y can be freed, we can free up 16M, and allocate a
> new 8K extent for subvol 258, ino 257.
> (And will also want to defrag the extent in subvol 257 ino 257 too)

You are confusing the actual defrag with a separate concern, let's  
call it 'reserved space optimization'. It is about partially used  
extents. The actual name 'reserved space optimization' doesn't matter,  
I just made it up.

'reserved space optimization' is usually performed as a part of the  
defrag operation, but it doesn't have to be, as the actual defrag is  
something separate.

Yes, 'reserved space optimization' can break up extents.

'reserved space optimization' can either decrease or increase the free  
space. If the algorithm determines that more space should be reserved,  
than free space will decrease. If the algorithm determines that less  
space should be reserved, than free space will increase.

The 'reserved space optimization' can be accomplished such that the  
free space does not decrease, if such behavior is needed.

Also, the defrag operation can join some extents. In my original example,
the extents e33 and e34 can be fused into one.

> That's why knowledge in btrfs tech details can make a difference.
> Sometimes you may find some ideas are brilliant and why btrfs is not
> implementing it, but if you understand btrfs to some extent, you will
> know the answer by yourself.

Yes, it is true, but what you are posting so far are all 'red  
herring'-type arguments. It's just some irrelevant concerns, and you  
just got me explaining thinks like I would to a little baby. I don't  
know whether I stumbled on some rookie member of btrfs project, or you  
are just lazy and you don't want to think or consider my proposals.

When I post an explanation, please try to UNDERSTAND HOW IT CAN WORK,  
fill in the missing gaps, because there are tons of them, because I  
can't explain everything via three e-mail posts. Don't just come up  
with some half-baked, forced, illogical reason why things are better  
as they are.

>>>> - I think it would be wrong to use a general deduplication algorithm for
>>>> this. Instead, the information about the shared extents should be
>>>> analyzed given the starting state of the filesystem, and than the
>>>> algorithm should produce an optimal solution based on the currently
>>>> shared extents.
>>>
>>> Please be more specific, like giving an example for it.
>>
>> Let's say that there is a file FFF with extents e11, e12, e13, e22, e23,
>> e33, e34
>> - in subvolA the file FFF consists of e11, e12, e13
>> - in subvolB the file FFF consists of e11, e22, e23
>> - in subvolC the file FFF consists of e11, e22, e33, e34
>>
>> After defrag, where 'selected subvolume' is subvolC, the extents are
>> ordered on disk as follows:
>>
>> e11,e22,e33,e34 - e23 - e12,e13
>
> Inode FFF in different subvolumes are different inodes. They have no
> knowledge of other inodes in other subvolumes.

You can easily notice that, if necessary, the defrag algorithm can  
work without this knowledge, that is, without knowledge of other  
versions of FFF.

This time I'm leaving it to you to figure out how.

Another red herring.

> If FFF in subvol C is e11, e22, e33, e34, then that's it.
> I didn't see the point still.

Now I need to explain like I would to a baby.

If the extents e11, e22, e33, e34 are stored in neighbouring sectors,  
then the disk data reads are faster because they become sequential, as  
opposed to spread out.

So, while the file FFF in subvolC still has 4 extents like it had  
before defrag, reading of those 4 extents is much faster than before  
because the read can be sequential.

So, the defrag can actually be performed without fusing any extents.  
It would still have a noticeable performance benefit.

As I have already said, the defrag operation can join(fuse) some  
extents. In my original example,
the extents e33 and e34 can be fused into one.

> And what's the on-disk bytenr of all these extents? Which has larger
> bytenr and length?

For the sake of simplicity, let's say that all the extents in the  
example have equal length (so, you can choose ANY size), and are fully  
used.

> Please provide a better description like xfs_io -c "fiemap -v" output
> before and after.

No. My example is simple and clear. Nit-picking, like this you are  
doing, is not helpful. Concentrate, think, try to figure it out.

>>> That's a shortcut for chunk profile change.
>>> My first idea of this is, it could cause more problem than benefit.
>>> (It only benefits profile downgrade, thus only makes sense for
>>> RAID1->SINGLE, DUP->SINGLE, and RAID10->RAID0, nothing else)
>>
>> Those listed cases are exactly the ones I judge to be most important.
>> Three important cases.
>
> I'd argue it's downgrade, not that important. As most users want to
> replace the missing/bad device and maintain the raid profile.
>
>> What I am complaining about is that at one point in time, after issuing
>> the command:
>>     btrfs balance start -dconvert=single -mconvert=single
>> and before issuing the 'btrfs delete', the system could be in a too
>> fragile state, with extents unnecesarily spread out over two drives,
>> which is both a completely unnecessary operation, and it also seems to
>> me that it could be dangerous in some situations involving potentially
>> malfunctioning drives.
>
> In that case, you just need to replace that malfunctioning device other
> than fall back to SINGLE.

You are assuming that user has the time and money to replace the  
malfunctioning drive. In A LOT of cases, this is not true.

What if the drive is failing, but the user has some important work to  
do finish.
He has a presentation to perform. He doesn't want the presentation to  
be interrupted by a failing disk drive.

What if the user doesn't have any spare SATA cables on hand?

What if user doesn't have any space space in the case? What if it is a  
laptop computer?

While a user might want to maintain a RAID1 long-term, in the short  
term he might want to perform a downgrade.





  parent reply index

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  2:55 zedlryqc
2019-09-09  3:51 ` Qu Wenruo
2019-09-09 11:25   ` zedlryqc
2019-09-09 12:18     ` Qu Wenruo
2019-09-09 12:28       ` Qu Wenruo
2019-09-09 17:11         ` webmaster
2019-09-10 17:39           ` Andrei Borzenkov
2019-09-10 22:41             ` webmaster
2019-09-09 15:29       ` Graham Cobb
2019-09-09 17:24         ` Remi Gauvin
2019-09-09 19:26         ` webmaster
2019-09-10 19:22           ` Austin S. Hemmelgarn
2019-09-10 23:32             ` webmaster
2019-09-11 12:02               ` Austin S. Hemmelgarn
2019-09-11 16:26                 ` Zygo Blaxell
2019-09-11 17:20                 ` webmaster
2019-09-11 18:19                   ` Austin S. Hemmelgarn
2019-09-11 20:01                     ` webmaster
2019-09-11 21:42                       ` Zygo Blaxell
2019-09-13  1:33                         ` General Zed
2019-09-11 21:37                     ` webmaster
2019-09-12 11:31                       ` Austin S. Hemmelgarn
2019-09-12 19:18                         ` webmaster
2019-09-12 19:44                           ` Chris Murphy
2019-09-12 21:34                             ` General Zed
2019-09-12 22:28                               ` Chris Murphy
2019-09-12 22:57                                 ` General Zed
2019-09-12 23:54                                   ` Zygo Blaxell
2019-09-13  0:26                                     ` General Zed
2019-09-13  3:12                                       ` Zygo Blaxell
2019-09-13  5:05                                         ` General Zed
2019-09-14  0:56                                           ` Zygo Blaxell
2019-09-14  1:50                                             ` General Zed
2019-09-14  4:42                                               ` Zygo Blaxell
2019-09-14  4:53                                                 ` Zygo Blaxell
2019-09-15 17:54                                                 ` General Zed
2019-09-16 22:51                                                   ` Zygo Blaxell
2019-09-17  1:03                                                     ` General Zed
2019-09-17  1:34                                                       ` General Zed
2019-09-17  1:44                                                       ` Chris Murphy
2019-09-17  4:55                                                         ` Zygo Blaxell
2019-09-17  4:19                                                       ` Zygo Blaxell
2019-09-17  3:10                                                     ` General Zed
2019-09-17  4:05                                                       ` General Zed
2019-09-14  1:56                                             ` General Zed
2019-09-13  5:22                                         ` General Zed
2019-09-13  6:16                                         ` General Zed
2019-09-13  6:58                                         ` General Zed
2019-09-13  9:25                                           ` General Zed
2019-09-13 17:02                                             ` General Zed
2019-09-14  0:59                                             ` Zygo Blaxell
2019-09-14  1:28                                               ` General Zed
2019-09-14  4:28                                                 ` Zygo Blaxell
2019-09-15 18:05                                                   ` General Zed
2019-09-16 23:05                                                     ` Zygo Blaxell
2019-09-13  7:51                                         ` General Zed
2019-09-13 11:04                                     ` Austin S. Hemmelgarn
2019-09-13 20:43                                       ` Zygo Blaxell
2019-09-14  0:20                                         ` General Zed
2019-09-14 18:29                                       ` Chris Murphy
2019-09-14 23:39                                         ` Zygo Blaxell
2019-09-13 11:09                                   ` Austin S. Hemmelgarn
2019-09-13 17:20                                     ` General Zed
2019-09-13 18:20                                       ` General Zed
2019-09-12 19:54                           ` Austin S. Hemmelgarn
2019-09-12 22:21                             ` General Zed
2019-09-13 11:53                               ` Austin S. Hemmelgarn
2019-09-13 16:54                                 ` General Zed
2019-09-13 18:29                                   ` Austin S. Hemmelgarn
2019-09-13 19:40                                     ` General Zed
2019-09-14 15:10                                       ` Jukka Larja
2019-09-12 22:47                             ` General Zed
2019-09-11 21:37                   ` Zygo Blaxell
2019-09-11 23:21                     ` webmaster
2019-09-12  0:10                       ` Remi Gauvin
2019-09-12  3:05                         ` webmaster
2019-09-12  3:30                           ` Remi Gauvin
2019-09-12  3:33                             ` Remi Gauvin
2019-09-12  5:19                       ` Zygo Blaxell
2019-09-12 21:23                         ` General Zed
2019-09-14  4:12                           ` Zygo Blaxell
2019-09-16 11:42                             ` General Zed
2019-09-17  0:49                               ` Zygo Blaxell
2019-09-17  2:30                                 ` General Zed
2019-09-17  5:30                                   ` Zygo Blaxell
2019-09-17 10:07                                     ` General Zed
2019-09-17 23:40                                       ` Zygo Blaxell
2019-09-18  4:37                                         ` General Zed
2019-09-18 18:00                                           ` Zygo Blaxell
2019-09-10 23:58             ` webmaster
2019-09-09 23:24         ` Qu Wenruo
2019-09-09 23:25         ` webmaster
2019-09-09 16:38       ` webmaster [this message]
2019-09-09 23:44         ` Qu Wenruo
2019-09-10  0:00           ` Chris Murphy
2019-09-10  0:51             ` Qu Wenruo
2019-09-10  0:06           ` webmaster
2019-09-10  0:48             ` Qu Wenruo
2019-09-10  1:24               ` webmaster
2019-09-10  1:48                 ` Qu Wenruo
2019-09-10  3:32                   ` webmaster
2019-09-10 14:14                     ` Nikolay Borisov
2019-09-10 22:35                       ` webmaster
2019-09-11  6:40                         ` Nikolay Borisov
2019-09-10 22:48                     ` webmaster
2019-09-10 23:14                   ` webmaster
2019-09-11  0:26               ` webmaster
2019-09-11  0:36                 ` webmaster
2019-09-11  1:00                 ` webmaster
2019-09-10 11:12     ` Austin S. Hemmelgarn
2019-09-09  3:12 webmaster

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190909123818.Horde.dbl-yi_cNi8aKDaW_QYXVij@server53.web-hosting.com \
    --to=webmaster@zedlx.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git