From: webmaster@zedlx.com To: linux-btrfs@vger.kernel.org Subject: Re: Feature requests: online backup - defrag - change RAID level Date: Mon, 09 Sep 2019 12:38:18 -0400 Message-ID: <20190909123818.Horde.dbl-yi_cNi8aKDaW_QYXVij@server53.web-hosting.com> (raw) In-Reply-To: <fb80b97a-9bcd-5d13-0026-63e11e1a06b5@gmx.com> Quoting Qu Wenruo <quwenruo.btrfs@gmx.com>: >>>> 2) Sensible defrag >>>> The defrag is currently a joke. >> >> Maybe there are such cases, but I would say that a vast majority of >> users (99,99%) in a vast majority of cases (99,99%) don't want the >> defrag operation to reduce free disk space. >> >>> What's wrong with current file based defrag? >>> If you want to defrag a subvolume, just iterate through all files. >> >> I repeat: The defrag should not decrease free space. That's the 'normal' >> expectation. > > Since you're talking about btrfs, it's going to do CoW for metadata not > matter whatever, as long as you're going to change anything, btrfs will > cause extra space usage. > (Although the final result may not cause extra used disk space as freed > space is as large as newly allocated space, but to maintain CoW, newly > allocated space can't overlap with old data) It is OK for defrag to temporarily decrease free space while defrag operation is in progress. That's normal. > Further more, talking about snapshots with space wasted by extent > booking, it's definitely possible user want to break the shared extents: > > Subvol 257, inode 257 has the following file extents: > (257 EXTENT_DATA 0) > disk bytenr X len 16M > offset 0 num_bytes 4k << Only 4k is referred in the whole 16M extent. > > Subvol 258, inode 257 has the following file extents: > (257 EXTENT_DATA 0) > disk bytenr X len 16M > offset 0 num_bytes 4K << Shared with that one in subv 257 > (257 EXTENT_DATA 4K) > disk bytenr Y len 16M > offset 0 num_bytes 4K << Similar case, only 4K of 16M is used. > > In that case, user definitely want to defrag file in subvol 258, as if > that extent at bytenr Y can be freed, we can free up 16M, and allocate a > new 8K extent for subvol 258, ino 257. > (And will also want to defrag the extent in subvol 257 ino 257 too) You are confusing the actual defrag with a separate concern, let's call it 'reserved space optimization'. It is about partially used extents. The actual name 'reserved space optimization' doesn't matter, I just made it up. 'reserved space optimization' is usually performed as a part of the defrag operation, but it doesn't have to be, as the actual defrag is something separate. Yes, 'reserved space optimization' can break up extents. 'reserved space optimization' can either decrease or increase the free space. If the algorithm determines that more space should be reserved, than free space will decrease. If the algorithm determines that less space should be reserved, than free space will increase. The 'reserved space optimization' can be accomplished such that the free space does not decrease, if such behavior is needed. Also, the defrag operation can join some extents. In my original example, the extents e33 and e34 can be fused into one. > That's why knowledge in btrfs tech details can make a difference. > Sometimes you may find some ideas are brilliant and why btrfs is not > implementing it, but if you understand btrfs to some extent, you will > know the answer by yourself. Yes, it is true, but what you are posting so far are all 'red herring'-type arguments. It's just some irrelevant concerns, and you just got me explaining thinks like I would to a little baby. I don't know whether I stumbled on some rookie member of btrfs project, or you are just lazy and you don't want to think or consider my proposals. When I post an explanation, please try to UNDERSTAND HOW IT CAN WORK, fill in the missing gaps, because there are tons of them, because I can't explain everything via three e-mail posts. Don't just come up with some half-baked, forced, illogical reason why things are better as they are. >>>> - I think it would be wrong to use a general deduplication algorithm for >>>> this. Instead, the information about the shared extents should be >>>> analyzed given the starting state of the filesystem, and than the >>>> algorithm should produce an optimal solution based on the currently >>>> shared extents. >>> >>> Please be more specific, like giving an example for it. >> >> Let's say that there is a file FFF with extents e11, e12, e13, e22, e23, >> e33, e34 >> - in subvolA the file FFF consists of e11, e12, e13 >> - in subvolB the file FFF consists of e11, e22, e23 >> - in subvolC the file FFF consists of e11, e22, e33, e34 >> >> After defrag, where 'selected subvolume' is subvolC, the extents are >> ordered on disk as follows: >> >> e11,e22,e33,e34 - e23 - e12,e13 > > Inode FFF in different subvolumes are different inodes. They have no > knowledge of other inodes in other subvolumes. You can easily notice that, if necessary, the defrag algorithm can work without this knowledge, that is, without knowledge of other versions of FFF. This time I'm leaving it to you to figure out how. Another red herring. > If FFF in subvol C is e11, e22, e33, e34, then that's it. > I didn't see the point still. Now I need to explain like I would to a baby. If the extents e11, e22, e33, e34 are stored in neighbouring sectors, then the disk data reads are faster because they become sequential, as opposed to spread out. So, while the file FFF in subvolC still has 4 extents like it had before defrag, reading of those 4 extents is much faster than before because the read can be sequential. So, the defrag can actually be performed without fusing any extents. It would still have a noticeable performance benefit. As I have already said, the defrag operation can join(fuse) some extents. In my original example, the extents e33 and e34 can be fused into one. > And what's the on-disk bytenr of all these extents? Which has larger > bytenr and length? For the sake of simplicity, let's say that all the extents in the example have equal length (so, you can choose ANY size), and are fully used. > Please provide a better description like xfs_io -c "fiemap -v" output > before and after. No. My example is simple and clear. Nit-picking, like this you are doing, is not helpful. Concentrate, think, try to figure it out. >>> That's a shortcut for chunk profile change. >>> My first idea of this is, it could cause more problem than benefit. >>> (It only benefits profile downgrade, thus only makes sense for >>> RAID1->SINGLE, DUP->SINGLE, and RAID10->RAID0, nothing else) >> >> Those listed cases are exactly the ones I judge to be most important. >> Three important cases. > > I'd argue it's downgrade, not that important. As most users want to > replace the missing/bad device and maintain the raid profile. > >> What I am complaining about is that at one point in time, after issuing >> the command: >> btrfs balance start -dconvert=single -mconvert=single >> and before issuing the 'btrfs delete', the system could be in a too >> fragile state, with extents unnecesarily spread out over two drives, >> which is both a completely unnecessary operation, and it also seems to >> me that it could be dangerous in some situations involving potentially >> malfunctioning drives. > > In that case, you just need to replace that malfunctioning device other > than fall back to SINGLE. You are assuming that user has the time and money to replace the malfunctioning drive. In A LOT of cases, this is not true. What if the drive is failing, but the user has some important work to do finish. He has a presentation to perform. He doesn't want the presentation to be interrupted by a failing disk drive. What if the user doesn't have any spare SATA cables on hand? What if user doesn't have any space space in the case? What if it is a laptop computer? While a user might want to maintain a RAID1 long-term, in the short term he might want to perform a downgrade.
next prev parent reply index Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-09-09 2:55 zedlryqc 2019-09-09 3:51 ` Qu Wenruo 2019-09-09 11:25 ` zedlryqc 2019-09-09 12:18 ` Qu Wenruo 2019-09-09 12:28 ` Qu Wenruo 2019-09-09 17:11 ` webmaster 2019-09-10 17:39 ` Andrei Borzenkov 2019-09-10 22:41 ` webmaster 2019-09-09 15:29 ` Graham Cobb 2019-09-09 17:24 ` Remi Gauvin 2019-09-09 19:26 ` webmaster 2019-09-10 19:22 ` Austin S. Hemmelgarn 2019-09-10 23:32 ` webmaster 2019-09-11 12:02 ` Austin S. Hemmelgarn 2019-09-11 16:26 ` Zygo Blaxell 2019-09-11 17:20 ` webmaster 2019-09-11 18:19 ` Austin S. Hemmelgarn 2019-09-11 20:01 ` webmaster 2019-09-11 21:42 ` Zygo Blaxell 2019-09-13 1:33 ` General Zed 2019-09-11 21:37 ` webmaster 2019-09-12 11:31 ` Austin S. Hemmelgarn 2019-09-12 19:18 ` webmaster 2019-09-12 19:44 ` Chris Murphy 2019-09-12 21:34 ` General Zed 2019-09-12 22:28 ` Chris Murphy 2019-09-12 22:57 ` General Zed 2019-09-12 23:54 ` Zygo Blaxell 2019-09-13 0:26 ` General Zed 2019-09-13 3:12 ` Zygo Blaxell 2019-09-13 5:05 ` General Zed 2019-09-14 0:56 ` Zygo Blaxell 2019-09-14 1:50 ` General Zed 2019-09-14 4:42 ` Zygo Blaxell 2019-09-14 4:53 ` Zygo Blaxell 2019-09-15 17:54 ` General Zed 2019-09-16 22:51 ` Zygo Blaxell 2019-09-17 1:03 ` General Zed 2019-09-17 1:34 ` General Zed 2019-09-17 1:44 ` Chris Murphy 2019-09-17 4:55 ` Zygo Blaxell 2019-09-17 4:19 ` Zygo Blaxell 2019-09-17 3:10 ` General Zed 2019-09-17 4:05 ` General Zed 2019-09-14 1:56 ` General Zed 2019-09-13 5:22 ` General Zed 2019-09-13 6:16 ` General Zed 2019-09-13 6:58 ` General Zed 2019-09-13 9:25 ` General Zed 2019-09-13 17:02 ` General Zed 2019-09-14 0:59 ` Zygo Blaxell 2019-09-14 1:28 ` General Zed 2019-09-14 4:28 ` Zygo Blaxell 2019-09-15 18:05 ` General Zed 2019-09-16 23:05 ` Zygo Blaxell 2019-09-13 7:51 ` General Zed 2019-09-13 11:04 ` Austin S. Hemmelgarn 2019-09-13 20:43 ` Zygo Blaxell 2019-09-14 0:20 ` General Zed 2019-09-14 18:29 ` Chris Murphy 2019-09-14 23:39 ` Zygo Blaxell 2019-09-13 11:09 ` Austin S. Hemmelgarn 2019-09-13 17:20 ` General Zed 2019-09-13 18:20 ` General Zed 2019-09-12 19:54 ` Austin S. Hemmelgarn 2019-09-12 22:21 ` General Zed 2019-09-13 11:53 ` Austin S. Hemmelgarn 2019-09-13 16:54 ` General Zed 2019-09-13 18:29 ` Austin S. Hemmelgarn 2019-09-13 19:40 ` General Zed 2019-09-14 15:10 ` Jukka Larja 2019-09-12 22:47 ` General Zed 2019-09-11 21:37 ` Zygo Blaxell 2019-09-11 23:21 ` webmaster 2019-09-12 0:10 ` Remi Gauvin 2019-09-12 3:05 ` webmaster 2019-09-12 3:30 ` Remi Gauvin 2019-09-12 3:33 ` Remi Gauvin 2019-09-12 5:19 ` Zygo Blaxell 2019-09-12 21:23 ` General Zed 2019-09-14 4:12 ` Zygo Blaxell 2019-09-16 11:42 ` General Zed 2019-09-17 0:49 ` Zygo Blaxell 2019-09-17 2:30 ` General Zed 2019-09-17 5:30 ` Zygo Blaxell 2019-09-17 10:07 ` General Zed 2019-09-17 23:40 ` Zygo Blaxell 2019-09-18 4:37 ` General Zed 2019-09-18 18:00 ` Zygo Blaxell 2019-09-10 23:58 ` webmaster 2019-09-09 23:24 ` Qu Wenruo 2019-09-09 23:25 ` webmaster 2019-09-09 16:38 ` webmaster [this message] 2019-09-09 23:44 ` Qu Wenruo 2019-09-10 0:00 ` Chris Murphy 2019-09-10 0:51 ` Qu Wenruo 2019-09-10 0:06 ` webmaster 2019-09-10 0:48 ` Qu Wenruo 2019-09-10 1:24 ` webmaster 2019-09-10 1:48 ` Qu Wenruo 2019-09-10 3:32 ` webmaster 2019-09-10 14:14 ` Nikolay Borisov 2019-09-10 22:35 ` webmaster 2019-09-11 6:40 ` Nikolay Borisov 2019-09-10 22:48 ` webmaster 2019-09-10 23:14 ` webmaster 2019-09-11 0:26 ` webmaster 2019-09-11 0:36 ` webmaster 2019-09-11 1:00 ` webmaster 2019-09-10 11:12 ` Austin S. Hemmelgarn 2019-09-09 3:12 webmaster
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190909123818.Horde.dbl-yi_cNi8aKDaW_QYXVij@server53.web-hosting.com \ --to=webmaster@zedlx.com \ --cc=linux-btrfs@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-BTRFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \ linux-btrfs@vger.kernel.org public-inbox-index linux-btrfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git