From: email@example.com To: "Austin S. Hemmelgarn" <firstname.lastname@example.org> Cc: email@example.com Subject: Re: Feature requests: online backup - defrag - change RAID level Date: Wed, 11 Sep 2019 16:01:01 -0400 Message-ID: <20190911160101.Horde.mYR8sgLb1dgpIs3fD4D5Cfy@server53.web-hosting.com> (raw) In-Reply-To: <firstname.lastname@example.org> Quoting "Austin S. Hemmelgarn" <email@example.com>: > On 2019-09-11 13:20, firstname.lastname@example.org wrote: >> >> Quoting "Austin S. Hemmelgarn" <email@example.com>: >> >>> On 2019-09-10 19:32, firstname.lastname@example.org wrote: >>>> >>>> Quoting "Austin S. Hemmelgarn" <email@example.com>: >>>> >> >>>> >>>> === I CHALLENGE you and anyone else on this mailing list: === >>>> >>>> - Show me an exaple where splitting an extent requires >>>> unsharing, and this split is needed to defrag. >>>> >>>> Make it clear, write it yourself, I don't want any machine-made outputs. >>>> >>> Start with the above comment about all writes unsharing the region >>> being written to. >>> >>> Now, extrapolating from there: >>> >>> Assume you have two files, A and B, each consisting of 64 >>> filesystem blocks in single shared extent. Now assume somebody >>> writes a few bytes to the middle of file B, right around the >>> boundary between blocks 31 and 32, and that you get similar writes >>> to file A straddling blocks 14-15 and 47-48. >>> >>> After all of that, file A will be 5 extents: >>> >>> * A reflink to blocks 0-13 of the original extent. >>> * A single isolated extent consisting of the new blocks 14-15 >>> * A reflink to blocks 16-46 of the original extent. >>> * A single isolated extent consisting of the new blocks 47-48 >>> * A reflink to blocks 49-63 of the original extent. >>> >>> And file B will be 3 extents: >>> >>> * A reflink to blocks 0-30 of the original extent. >>> * A single isolated extent consisting of the new blocks 31-32. >>> * A reflink to blocks 32-63 of the original extent. >>> >>> Note that there are a total of four contiguous sequences of blocks >>> that are common between both files: >>> >>> * 0-13 >>> * 16-30 >>> * 32-46 >>> * 49-63 >>> >>> There is no way to completely defragment either file without >>> splitting the original extent (which is still there, just not >>> fully referenced by either file) unless you rewrite the whole file >>> to a new single extent (which would, of course, completely unshare >>> the whole file). In fact, if you want to ensure that those shared >>> regions stay reflinked, there's no way to defragment either file >>> without _increasing_ the number of extents in that file (either >>> file would need 7 extents to properly share only those 4 regions), >>> and even then only one of the files could be fully defragmented. >>> >>> Such a situation generally won't happen if you're just dealing >>> with read-only snapshots, but is not unusual when dealing with >>> regular files that are reflinked (which is not an uncommon >>> situation on some systems, as a lot of people have `cp` aliased to >>> reflink things whenever possible). >> >> Well, thank you very much for writing this example. Your example is >> certainly not minimal, as it seems to me that one write to the file >> A and one write to file B would be sufficient to prove your point, >> so there we have one extra write in the example, but that's OK. >> >> Your example proves that I was wrong. I admit: it is impossible to >> perfectly defrag one subvolume (in the way I imagined it should be >> done). >> Why? Because, as in your example, there can be files within a >> SINGLE subvolume which share their extents with each other. I >> didn't consider such a case. >> >> On the other hand, I judge this issue to be mostly irrelevant. Why? >> Because most of the file sharing will be between subvolumes, not >> within a subvolume. > Not necessarily. Even ignoring the case of data deduplication (which > needs to be considered if you care at all about enterprise usage, > and is part of the whole point of using a CoW filesystem), there are > existing applications that actively use reflinks, either directly or > indirectly (via things like the `copy_file_range` system call), and > the number of such applications is growing. The same argument goes here: If data-deduplication was performed, then the user has specifically requested it. Therefore, since it was user's will, the defrag has to honor it, and so the defrag must not unshare deduplicated extents because the user wants them shared. This might prevent a perfect defrag, but that is exactly what the user has requested, either directly or indirectly, by some policy he has choosen. If an application actively creates reflinked-copies, then we can assume it does so according to user's will, therefore it is also a command by user and defrag should honor it by not unsharing and by being imperfect. Now, you might point out that, in case of data-deduplication, we now have a case where most sharing might be within-subvolume, invalidating my assertion that most sharing will be between-subvolumes. But this is an invalid (more precisely, irelevant) argument. Why? Because the defrag operation has to focus on doing what it can do, while honoring user's will. All within-subvolume sharing is user-requested, therefore it cannot be part of the argument to unshare. You can't both perfectly defrag and honor deduplication. Therefore, the defrag has to do the best possible thing while still honoring user's will. <<<!!! So, the fact that the deduplication was performed is actually the reason FOR not unsharing, not against it, as you made it look in that paragraph. !!!>>> If the system unshares automatically after deduplication, then the user will need to run deduplication again. Ridiculous! >> When a user creates a reflink to a file in the same subvolume, he >> is willingly denying himself the assurance of a perfect defrag. >> Because, as your example proves, if there are a few writes to BOTH >> files, it gets impossible to defrag perfectly. So, if the user >> creates such reflinks, it's his own whish and his own fault. > The same argument can be made about snapshots. It's an invalid > argument in both cases though because it's not always the user who's > creating the reflinks or snapshots. Um, I don't agree. 1) Actually, it is always the user who is creating reflinks, and snapshots, too. Ultimately, it's always the user who does absolutely everything, because a computer is supposed to be under his full control. But, in the case of reflink-copies, this is even more true because reflinks are not an essential feature for normal OS operation, at least as far as today's OSes go. Every OS has to copy files around. Every OS requires the copy operation. No current OS requires the reflinked-copy operation in order to function. 2) A user can make any number of snapshots and subvolumes, but he can at any time select one subvolume as a focus of the defrag operation, and that subvolume can be perfectly defragmented without any unsharing (except that the internal-reflinked files won't be perfectly defragmented). Therefore, the snapshoting operation can never jeopardize a perfect defrag. The user can make many snapshots without any fears (I'd say a total of 100 snapshots at any point in time is a good and reasonable limit). >> Such situations will occur only in some specific circumstances: >> a) when the user is reflinking manually >> b) when a file is copied from one subvolume into a different file >> in a different subvolume. >> >> The situation a) is unusual in normal use of the filesystem. Even >> when it occurs, it is the explicit command given by the user, so he >> should be willing to accept all the consequences, even the bad ones >> like imperfect defrag. >> >> The situation b) is possible, but as far as I know copies are >> currently not done that way in btrfs. There should probably be the >> option to reflink-copy files fron another subvolume, that would be >> good. >> >> But anyway, it doesn't matter. Because most of the sharing will be >> between subvolumes, not within subvolume. So, if there is some >> in-subvolume sharing, the defrag wont be 100% perfect, that a minor >> point. Unimportant. > You're focusing too much on your own use case here. It's so easy to say that. But you really don't know. You might be wrong. I might be the objective one, and you might be giving me some groupthink-induced, badly thought out conclusions from years ago, which was never rechecked because that's so hard to do. And then everybody just repeats it and it becomes the truth. As Goebels said, if you repeat anything enough times, it becomes the truth. > Not everybody uses snapshots, and there are many people who are > using reflinks very actively within subvolumes, either for > deduplication or because it saves time and space when dealing with > multiple copies of mostly identical tress of files. Yes, I guess there are many such users. Doesn't matter. What you are proposing is that the defrag should break all their reflinks and deduplicated data they painstakingly created. Come on! Or, maybe the defrag should unshare to gain performance? Yes, but only WHEN USER REQUESTS IT. So the defrag can unshare, but only by request. Since this means that user is reversing his previous command to not unshare, this has to be explicitly requested by the user, not part of the default defrag operation. > As mentioned in the previous email, we actually did have a (mostly) > working reflink-aware defrag a few years back. It got removed > because it had serious performance issues. Note that we're not > talking a few seconds of extra time to defrag a full tree here, > we're talking double-digit _minutes_ of extra time to defrag a > moderate sized (low triple digit GB) subvolume with dozens of > snapshots, _if you were lucky_ (if you weren't, you would be looking > at potentially multiple _hours_ of runtime for the defrag). The > performance scaled inversely proportionate to the number of reflinks > involved and the total amount of data in the subvolume being > defragmented, and was pretty bad even in the case of only a couple > of snapshots. > > Ultimately, there are a couple of issues at play here: I'll reply to this in another post. This one is getting a bit too long.
next prev parent reply index Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-09-09 2:55 zedlryqc 2019-09-09 3:51 ` Qu Wenruo 2019-09-09 11:25 ` zedlryqc 2019-09-09 12:18 ` Qu Wenruo 2019-09-09 12:28 ` Qu Wenruo 2019-09-09 17:11 ` webmaster 2019-09-10 17:39 ` Andrei Borzenkov 2019-09-10 22:41 ` webmaster 2019-09-09 15:29 ` Graham Cobb 2019-09-09 17:24 ` Remi Gauvin 2019-09-09 19:26 ` webmaster 2019-09-10 19:22 ` Austin S. Hemmelgarn 2019-09-10 23:32 ` webmaster 2019-09-11 12:02 ` Austin S. Hemmelgarn 2019-09-11 16:26 ` Zygo Blaxell 2019-09-11 17:20 ` webmaster 2019-09-11 18:19 ` Austin S. Hemmelgarn 2019-09-11 20:01 ` webmaster [this message] 2019-09-11 21:42 ` Zygo Blaxell 2019-09-13 1:33 ` General Zed 2019-09-11 21:37 ` webmaster 2019-09-12 11:31 ` Austin S. Hemmelgarn 2019-09-12 19:18 ` webmaster 2019-09-12 19:44 ` Chris Murphy 2019-09-12 21:34 ` General Zed 2019-09-12 22:28 ` Chris Murphy 2019-09-12 22:57 ` General Zed 2019-09-12 23:54 ` Zygo Blaxell 2019-09-13 0:26 ` General Zed 2019-09-13 3:12 ` Zygo Blaxell 2019-09-13 5:05 ` General Zed 2019-09-14 0:56 ` Zygo Blaxell 2019-09-14 1:50 ` General Zed 2019-09-14 4:42 ` Zygo Blaxell 2019-09-14 4:53 ` Zygo Blaxell 2019-09-15 17:54 ` General Zed 2019-09-16 22:51 ` Zygo Blaxell 2019-09-17 1:03 ` General Zed 2019-09-17 1:34 ` General Zed 2019-09-17 1:44 ` Chris Murphy 2019-09-17 4:55 ` Zygo Blaxell 2019-09-17 4:19 ` Zygo Blaxell 2019-09-17 3:10 ` General Zed 2019-09-17 4:05 ` General Zed 2019-09-14 1:56 ` General Zed 2019-09-13 5:22 ` General Zed 2019-09-13 6:16 ` General Zed 2019-09-13 6:58 ` General Zed 2019-09-13 9:25 ` General Zed 2019-09-13 17:02 ` General Zed 2019-09-14 0:59 ` Zygo Blaxell 2019-09-14 1:28 ` General Zed 2019-09-14 4:28 ` Zygo Blaxell 2019-09-15 18:05 ` General Zed 2019-09-16 23:05 ` Zygo Blaxell 2019-09-13 7:51 ` General Zed 2019-09-13 11:04 ` Austin S. Hemmelgarn 2019-09-13 20:43 ` Zygo Blaxell 2019-09-14 0:20 ` General Zed 2019-09-14 18:29 ` Chris Murphy 2019-09-14 23:39 ` Zygo Blaxell 2019-09-13 11:09 ` Austin S. Hemmelgarn 2019-09-13 17:20 ` General Zed 2019-09-13 18:20 ` General Zed 2019-09-12 19:54 ` Austin S. Hemmelgarn 2019-09-12 22:21 ` General Zed 2019-09-13 11:53 ` Austin S. Hemmelgarn 2019-09-13 16:54 ` General Zed 2019-09-13 18:29 ` Austin S. Hemmelgarn 2019-09-13 19:40 ` General Zed 2019-09-14 15:10 ` Jukka Larja 2019-09-12 22:47 ` General Zed 2019-09-11 21:37 ` Zygo Blaxell 2019-09-11 23:21 ` webmaster 2019-09-12 0:10 ` Remi Gauvin 2019-09-12 3:05 ` webmaster 2019-09-12 3:30 ` Remi Gauvin 2019-09-12 3:33 ` Remi Gauvin 2019-09-12 5:19 ` Zygo Blaxell 2019-09-12 21:23 ` General Zed 2019-09-14 4:12 ` Zygo Blaxell 2019-09-16 11:42 ` General Zed 2019-09-17 0:49 ` Zygo Blaxell 2019-09-17 2:30 ` General Zed 2019-09-17 5:30 ` Zygo Blaxell 2019-09-17 10:07 ` General Zed 2019-09-17 23:40 ` Zygo Blaxell 2019-09-18 4:37 ` General Zed 2019-09-18 18:00 ` Zygo Blaxell 2019-09-10 23:58 ` webmaster 2019-09-09 23:24 ` Qu Wenruo 2019-09-09 23:25 ` webmaster 2019-09-09 16:38 ` webmaster 2019-09-09 23:44 ` Qu Wenruo 2019-09-10 0:00 ` Chris Murphy 2019-09-10 0:51 ` Qu Wenruo 2019-09-10 0:06 ` webmaster 2019-09-10 0:48 ` Qu Wenruo 2019-09-10 1:24 ` webmaster 2019-09-10 1:48 ` Qu Wenruo 2019-09-10 3:32 ` webmaster 2019-09-10 14:14 ` Nikolay Borisov 2019-09-10 22:35 ` webmaster 2019-09-11 6:40 ` Nikolay Borisov 2019-09-10 22:48 ` webmaster 2019-09-10 23:14 ` webmaster 2019-09-11 0:26 ` webmaster 2019-09-11 0:36 ` webmaster 2019-09-11 1:00 ` webmaster 2019-09-10 11:12 ` Austin S. Hemmelgarn 2019-09-09 3:12 webmaster
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190911160101.Horde.mYR8sgLb1dgpIs3fD4D5Cfy@server53.web-hosting.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-BTRFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \ email@example.com public-inbox-index linux-btrfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git