Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: General Zed <general-zed@zedlx.com>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Thu, 12 Sep 2019 18:47:19 -0400
Message-ID: <20190912184719.Horde.Hl_ez-nVt2rCMxFVw4Zy7XQ@server53.web-hosting.com> (raw)
In-Reply-To: <5e25ea36-0c96-2770-d3b9-56b1b9f4066d@gmail.com>


Quoting "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:

> On 2019-09-12 15:18, webmaster@zedlx.com wrote:
>>
>> Quoting "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
>>
>>> On 2019-09-11 17:37, webmaster@zedlx.com wrote:
>>>>
>>>> Quoting "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
>>>>
>>>>> On 2019-09-11 13:20, webmaster@zedlx.com wrote:
>>>>>>
>>>>>> Quoting "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
>>>>>>
>>>>>>> On 2019-09-10 19:32, webmaster@zedlx.com wrote:
>>>>>>>>
>>>>>>>> Quoting "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
>>>>>>>>
>>>>

>>>>> * Reflinks can reference partial extents.  This means,  
>>>>> ultimately, that you may end up having to split extents in odd  
>>>>> ways during defrag if you want to preserve reflinks, and might  
>>>>> have to split extents _elsewhere_ that are only tangentially  
>>>>> related to the region being defragmented. See the example in my  
>>>>> previous email for a case like this, maintaining the shared  
>>>>> regions as being shared when you defragment either file to a  
>>>>> single extent will require splitting extents in the other file  
>>>>> (in either case, whichever file you don't defragment to a single  
>>>>> extent will end up having 7 extents if you try to force the one  
>>>>> that's been defragmented to be the canonical version).  Once you  
>>>>> consider that a given extent can have multiple ranges reflinked  
>>>>> from multiple other locations, it gets even more complicated.
>>>>
>>>> I think that this problem can be solved, and that it can be  
>>>> solved perfectly (the result is a perfectly-defragmented file).  
>>>> But, if it is so hard to do, just skip those problematic extents  
>>>> in initial version of defrag.
>>>>
>>>> Ultimately, in the super-duper defrag, those partially-referenced  
>>>> extents should be split up by defrag.
>>>>
>>>>> * If you choose to just not handle the above point by not  
>>>>> letting defrag split extents, you put a hard lower limit on the  
>>>>> amount of fragmentation present in a file if you want to  
>>>>> preserve reflinks.  IOW, you can't defragment files past a  
>>>>> certain point.  If we go this way, neither of the two files in  
>>>>> the example from my previous email could be defragmented any  
>>>>> further than they already are, because doing so would require  
>>>>> splitting extents.
>>>>
>>>> Oh, you're reading my thoughts. That's good.
>>>>
>>>> Initial implementation of defrag might be not-so-perfect. It  
>>>> would still be better than the current defrag.
>>>>
>>>> This is not a one-way street. Handling of partially-used extents  
>>>> can be improved in later versions.
>>>>
>>>>> * Determining all the reflinks to a given region of a given  
>>>>> extent is not a cheap operation, and the information may  
>>>>> immediately be stale (because an operation right after you fetch  
>>>>> the info might change things).  We could work around this by  
>>>>> locking the extent somehow, but doing so would be expensive  
>>>>> because you would have to hold the lock for the entire defrag  
>>>>> operation.
>>>>
>>>> No. DO NOT LOCK TO RETRIEVE REFLINKS.
>>>>
>>>> Instead, you have to create a hook in every function that updates  
>>>> the reflink structure or extents (for exaple, write-to-file  
>>>> operation). So, when a reflink gets changed, the defrag is  
>>>> immediately notified about this. That way the defrag can keep its  
>>>> data about reflinks in-sync with the filesystem.
>>
>>> This doesn't get around the fact that it's still an expensive  
>>> operation to enumerate all the reflinks for a given region of a  
>>> file or extent.
>>
>> No, you are wrong.
>>
>> In order to enumerate all the reflinks in a region, the defrag  
>> needs to have another array, which is also kept in memory and in  
>> sync with the filesystem. It is the easiest to divide the disk into  
>> regions of equal size, where each region is a few MB large. Lets  
>> call this array "regions-to-extents" array. This array doesn't need  
>> to be associative, it is a plain array.
>> This in-memory array links regions of disk to extents that are in  
>> the region. The array in initialized when defrag starts.
>>
>> This array makes the operation of finding all extents of a region  
>> extremely fast.
> That has two issues:
>
> * That's going to be a _lot_ of memory.  You still need to be able  
> to defragment big (dozens plus TB) arrays without needing multiple  
> GB of RAM just for the defrag operation, otherwise it's not  
> realistically useful (remember, it was big arrays that had issues  
> with the old reflink-aware defrag too).

> * You still have to populate the array in the first place.  A sane  
> implementation wouldn't be keeping it in memory even when defrag is  
> not running (no way is anybody going to tolerate even dozens of MB  
> of memory overhead for this), so you're not going to get around the  
> need to enumerate all the reflinks for a file at least once (during  
> startup, or when starting to process that file), so you're just  
> moving the overhead around instead of eliminating it.

Nope, I'm not just "moving the overhead around instead of eliminating  
it", I am eliminating it.

The only overhead is at defrag startup, when the entire b-tree  
structure has to be loaded and examined. That happens in a few seconds.

After this point, there is no more "overhead" because the running  
defrag is always notified of any changes to the b-trees (by hookc in  
b-tree update routines). Whenever there is such a change,  
region-extents array gets updated. Since this region-extents array is  
in-memory, the update is so fast that it can be considered a zero  
overhead.



  parent reply index

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  2:55 zedlryqc
2019-09-09  3:51 ` Qu Wenruo
2019-09-09 11:25   ` zedlryqc
2019-09-09 12:18     ` Qu Wenruo
2019-09-09 12:28       ` Qu Wenruo
2019-09-09 17:11         ` webmaster
2019-09-10 17:39           ` Andrei Borzenkov
2019-09-10 22:41             ` webmaster
2019-09-09 15:29       ` Graham Cobb
2019-09-09 17:24         ` Remi Gauvin
2019-09-09 19:26         ` webmaster
2019-09-10 19:22           ` Austin S. Hemmelgarn
2019-09-10 23:32             ` webmaster
2019-09-11 12:02               ` Austin S. Hemmelgarn
2019-09-11 16:26                 ` Zygo Blaxell
2019-09-11 17:20                 ` webmaster
2019-09-11 18:19                   ` Austin S. Hemmelgarn
2019-09-11 20:01                     ` webmaster
2019-09-11 21:42                       ` Zygo Blaxell
2019-09-13  1:33                         ` General Zed
2019-09-11 21:37                     ` webmaster
2019-09-12 11:31                       ` Austin S. Hemmelgarn
2019-09-12 19:18                         ` webmaster
2019-09-12 19:44                           ` Chris Murphy
2019-09-12 21:34                             ` General Zed
2019-09-12 22:28                               ` Chris Murphy
2019-09-12 22:57                                 ` General Zed
2019-09-12 23:54                                   ` Zygo Blaxell
2019-09-13  0:26                                     ` General Zed
2019-09-13  3:12                                       ` Zygo Blaxell
2019-09-13  5:05                                         ` General Zed
2019-09-14  0:56                                           ` Zygo Blaxell
2019-09-14  1:50                                             ` General Zed
2019-09-14  4:42                                               ` Zygo Blaxell
2019-09-14  4:53                                                 ` Zygo Blaxell
2019-09-15 17:54                                                 ` General Zed
2019-09-16 22:51                                                   ` Zygo Blaxell
2019-09-17  1:03                                                     ` General Zed
2019-09-17  1:34                                                       ` General Zed
2019-09-17  1:44                                                       ` Chris Murphy
2019-09-17  4:55                                                         ` Zygo Blaxell
2019-09-17  4:19                                                       ` Zygo Blaxell
2019-09-17  3:10                                                     ` General Zed
2019-09-17  4:05                                                       ` General Zed
2019-09-14  1:56                                             ` General Zed
2019-09-13  5:22                                         ` General Zed
2019-09-13  6:16                                         ` General Zed
2019-09-13  6:58                                         ` General Zed
2019-09-13  9:25                                           ` General Zed
2019-09-13 17:02                                             ` General Zed
2019-09-14  0:59                                             ` Zygo Blaxell
2019-09-14  1:28                                               ` General Zed
2019-09-14  4:28                                                 ` Zygo Blaxell
2019-09-15 18:05                                                   ` General Zed
2019-09-16 23:05                                                     ` Zygo Blaxell
2019-09-13  7:51                                         ` General Zed
2019-09-13 11:04                                     ` Austin S. Hemmelgarn
2019-09-13 20:43                                       ` Zygo Blaxell
2019-09-14  0:20                                         ` General Zed
2019-09-14 18:29                                       ` Chris Murphy
2019-09-14 23:39                                         ` Zygo Blaxell
2019-09-13 11:09                                   ` Austin S. Hemmelgarn
2019-09-13 17:20                                     ` General Zed
2019-09-13 18:20                                       ` General Zed
2019-09-12 19:54                           ` Austin S. Hemmelgarn
2019-09-12 22:21                             ` General Zed
2019-09-13 11:53                               ` Austin S. Hemmelgarn
2019-09-13 16:54                                 ` General Zed
2019-09-13 18:29                                   ` Austin S. Hemmelgarn
2019-09-13 19:40                                     ` General Zed
2019-09-14 15:10                                       ` Jukka Larja
2019-09-12 22:47                             ` General Zed [this message]
2019-09-11 21:37                   ` Zygo Blaxell
2019-09-11 23:21                     ` webmaster
2019-09-12  0:10                       ` Remi Gauvin
2019-09-12  3:05                         ` webmaster
2019-09-12  3:30                           ` Remi Gauvin
2019-09-12  3:33                             ` Remi Gauvin
2019-09-12  5:19                       ` Zygo Blaxell
2019-09-12 21:23                         ` General Zed
2019-09-14  4:12                           ` Zygo Blaxell
2019-09-16 11:42                             ` General Zed
2019-09-17  0:49                               ` Zygo Blaxell
2019-09-17  2:30                                 ` General Zed
2019-09-17  5:30                                   ` Zygo Blaxell
2019-09-17 10:07                                     ` General Zed
2019-09-17 23:40                                       ` Zygo Blaxell
2019-09-18  4:37                                         ` General Zed
2019-09-18 18:00                                           ` Zygo Blaxell
2019-09-10 23:58             ` webmaster
2019-09-09 23:24         ` Qu Wenruo
2019-09-09 23:25         ` webmaster
2019-09-09 16:38       ` webmaster
2019-09-09 23:44         ` Qu Wenruo
2019-09-10  0:00           ` Chris Murphy
2019-09-10  0:51             ` Qu Wenruo
2019-09-10  0:06           ` webmaster
2019-09-10  0:48             ` Qu Wenruo
2019-09-10  1:24               ` webmaster
2019-09-10  1:48                 ` Qu Wenruo
2019-09-10  3:32                   ` webmaster
2019-09-10 14:14                     ` Nikolay Borisov
2019-09-10 22:35                       ` webmaster
2019-09-11  6:40                         ` Nikolay Borisov
2019-09-10 22:48                     ` webmaster
2019-09-10 23:14                   ` webmaster
2019-09-11  0:26               ` webmaster
2019-09-11  0:36                 ` webmaster
2019-09-11  1:00                 ` webmaster
2019-09-10 11:12     ` Austin S. Hemmelgarn
2019-09-09  3:12 webmaster

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912184719.Horde.Hl_ez-nVt2rCMxFVw4Zy7XQ@server53.web-hosting.com \
    --to=general-zed@zedlx.com \
    --cc=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git