From: General Zed <general-zed@zedlx.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Chris Murphy <lists@colorremedies.com>,
"Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Feature requests: online backup - defrag - change RAID level
Date: Fri, 13 Sep 2019 01:05:52 -0400 [thread overview]
Message-ID: <20190913010552.Horde.cUL303XsYbqREB5g0iiCDKd@server53.web-hosting.com> (raw)
In-Reply-To: <20190913031242.GF22121@hungrycats.org>
Quoting Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:
> On Thu, Sep 12, 2019 at 08:26:04PM -0400, General Zed wrote:
>>
>> Quoting Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:
>>
>> > Don't forget you have to write new checksum and free space tree pages.
>> > In the worst case, you'll need about 1GB of new metadata pages for each
>> > 128MB you defrag (though you get to delete 99.5% of them immediately
>> > after).
>>
>> Yes, here we are debating some worst-case scenaraio which is actually
>> imposible in practice due to various reasons.
>
> No, it's quite possible. A log file written slowly on an active
> filesystem above a few TB will do that accidentally. Every now and then
> I hit that case. It can take several hours to do a logrotate on spinning
> arrays because of all the metadata fetches and updates associated with
> worst-case file delete. Long enough to watch the delete happen, and
> even follow along in the source code.
>
> I guess if I did a proactive defrag every few hours, it might take less
> time to do the logrotate, but that would mean spreading out all the
> seeky IO load during the day instead of getting it all done at night.
> Logrotate does the same job as defrag in this case (replacing a file in
> thousands of fragments spread across the disk with a few large fragments
> close together), except logrotate gets better compression.
>
> To be more accurate, the example I gave above is the worst case you
> can expect from normal user workloads. If I throw in some reflinks
> and snapshots, I can make it arbitrarily worse, until the entire disk
> is consumed by the metadata update of a single extent defrag.
>
I can't believe I am considering this case.
So, we have a 1TB log file "ultralog" split into 256 million 4 KB
extents randomly over the entire disk. We have 512 GB free RAM and 2%
free disk space. The file needs to be defragmented.
In order to do that, defrag needs to be able to copy-move multiple
extents in one batch, and update the metadata.
The metadata has a total of at least 256 million entries, each of some
size, but each one should hold at least a pointer to the extent (8
bytes) and a checksum (8 bytes): In reality, it could be that there is
a lot of other data there per entry.
The metadata is organized as a b-tree. Therefore, nearby nodes should
contain data of consecutive file extents.
The trick, in this case, is to select one part of "ultralog" which is
localized in the metadata, and defragment it. Repeating this step will
ultimately defragment the entire file.
So, the defrag selects some part of metadata which is entirely a
descendant of some b-tree node not far from the bottom of b-tree. It
selects it such that the required update to the metadata is less than,
let's say, 64 MB, and simultaneously the affected "ultralog" file
fragments total less han 512 MB (therefore, less than 128 thousand
metadata leaf entries, each pointing to a 4 KB fragment). Then it
finds all the file extents pointed to by that part of metadata. They
are consecutive (as file fragments), because we have selected such
part of metadata. Now the defrag can safely copy-move those fragments
to a new area and update the metadata.
In order to quickly select that small part of metadata, the defrag
needs a metatdata cache that can hold somewhat more than 128 thousand
localized metadata leaf entries. That fits into 128 MB RAM definitely.
Of course, there are many other small issues there, but this outlines
the general procedure.
Problem solved?
next prev parent reply other threads:[~2019-09-13 5:05 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-09 2:55 Feature requests: online backup - defrag - change RAID level zedlryqc
2019-09-09 3:51 ` Qu Wenruo
2019-09-09 11:25 ` zedlryqc
2019-09-09 12:18 ` Qu Wenruo
2019-09-09 12:28 ` Qu Wenruo
2019-09-09 17:11 ` webmaster
2019-09-10 17:39 ` Andrei Borzenkov
2019-09-10 22:41 ` webmaster
2019-09-09 15:29 ` Graham Cobb
2019-09-09 17:24 ` Remi Gauvin
2019-09-09 19:26 ` webmaster
2019-09-10 19:22 ` Austin S. Hemmelgarn
2019-09-10 23:32 ` webmaster
2019-09-11 12:02 ` Austin S. Hemmelgarn
2019-09-11 16:26 ` Zygo Blaxell
2019-09-11 17:20 ` webmaster
2019-09-11 18:19 ` Austin S. Hemmelgarn
2019-09-11 20:01 ` webmaster
2019-09-11 21:42 ` Zygo Blaxell
2019-09-13 1:33 ` General Zed
2019-09-11 21:37 ` webmaster
2019-09-12 11:31 ` Austin S. Hemmelgarn
2019-09-12 19:18 ` webmaster
2019-09-12 19:44 ` Chris Murphy
2019-09-12 21:34 ` General Zed
2019-09-12 22:28 ` Chris Murphy
2019-09-12 22:57 ` General Zed
2019-09-12 23:54 ` Zygo Blaxell
2019-09-13 0:26 ` General Zed
2019-09-13 3:12 ` Zygo Blaxell
2019-09-13 5:05 ` General Zed [this message]
2019-09-14 0:56 ` Zygo Blaxell
2019-09-14 1:50 ` General Zed
2019-09-14 4:42 ` Zygo Blaxell
2019-09-14 4:53 ` Zygo Blaxell
2019-09-15 17:54 ` General Zed
2019-09-16 22:51 ` Zygo Blaxell
2019-09-17 1:03 ` General Zed
2019-09-17 1:34 ` General Zed
2019-09-17 1:44 ` Chris Murphy
2019-09-17 4:55 ` Zygo Blaxell
2019-09-17 4:19 ` Zygo Blaxell
2019-09-17 3:10 ` General Zed
2019-09-17 4:05 ` General Zed
2019-09-14 1:56 ` General Zed
2019-09-13 5:22 ` General Zed
2019-09-13 6:16 ` General Zed
2019-09-13 6:58 ` General Zed
2019-09-13 9:25 ` General Zed
2019-09-13 17:02 ` General Zed
2019-09-14 0:59 ` Zygo Blaxell
2019-09-14 1:28 ` General Zed
2019-09-14 4:28 ` Zygo Blaxell
2019-09-15 18:05 ` General Zed
2019-09-16 23:05 ` Zygo Blaxell
2019-09-13 7:51 ` General Zed
2019-09-13 11:04 ` Austin S. Hemmelgarn
2019-09-13 20:43 ` Zygo Blaxell
2019-09-14 0:20 ` General Zed
2019-09-14 18:29 ` Chris Murphy
2019-09-14 23:39 ` Zygo Blaxell
2019-09-13 11:09 ` Austin S. Hemmelgarn
2019-09-13 17:20 ` General Zed
2019-09-13 18:20 ` General Zed
2019-09-12 19:54 ` Austin S. Hemmelgarn
2019-09-12 22:21 ` General Zed
2019-09-13 11:53 ` Austin S. Hemmelgarn
2019-09-13 16:54 ` General Zed
2019-09-13 18:29 ` Austin S. Hemmelgarn
2019-09-13 19:40 ` General Zed
2019-09-14 15:10 ` Jukka Larja
2019-09-12 22:47 ` General Zed
2019-09-11 21:37 ` Zygo Blaxell
2019-09-11 23:21 ` webmaster
2019-09-12 0:10 ` Remi Gauvin
2019-09-12 3:05 ` webmaster
2019-09-12 3:30 ` Remi Gauvin
2019-09-12 3:33 ` Remi Gauvin
2019-09-12 5:19 ` Zygo Blaxell
2019-09-12 21:23 ` General Zed
2019-09-14 4:12 ` Zygo Blaxell
2019-09-16 11:42 ` General Zed
2019-09-17 0:49 ` Zygo Blaxell
2019-09-17 2:30 ` General Zed
2019-09-17 5:30 ` Zygo Blaxell
2019-09-17 10:07 ` General Zed
2019-09-17 23:40 ` Zygo Blaxell
2019-09-18 4:37 ` General Zed
2019-09-18 18:00 ` Zygo Blaxell
2019-09-10 23:58 ` webmaster
2019-09-09 23:24 ` Qu Wenruo
2019-09-09 23:25 ` webmaster
2019-09-09 16:38 ` webmaster
2019-09-09 23:44 ` Qu Wenruo
2019-09-10 0:00 ` Chris Murphy
2019-09-10 0:51 ` Qu Wenruo
2019-09-10 0:06 ` webmaster
2019-09-10 0:48 ` Qu Wenruo
2019-09-10 1:24 ` webmaster
2019-09-10 1:48 ` Qu Wenruo
2019-09-10 3:32 ` webmaster
2019-09-10 14:14 ` Nikolay Borisov
2019-09-10 22:35 ` webmaster
2019-09-11 6:40 ` Nikolay Borisov
2019-09-10 22:48 ` webmaster
2019-09-10 23:14 ` webmaster
2019-09-11 0:26 ` webmaster
2019-09-11 0:36 ` webmaster
2019-09-11 1:00 ` webmaster
2019-09-10 11:12 ` Austin S. Hemmelgarn
2019-09-09 3:12 webmaster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190913010552.Horde.cUL303XsYbqREB5g0iiCDKd@server53.web-hosting.com \
--to=general-zed@zedlx.com \
--cc=ahferroin7@gmail.com \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).