From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:60923 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751416AbaFCEqt (ORCPT ); Tue, 3 Jun 2014 00:46:49 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Wrgca-0004Xr-1G for linux-btrfs@vger.kernel.org; Tue, 03 Jun 2014 06:46:48 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 03 Jun 2014 06:46:48 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 03 Jun 2014 06:46:48 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: All free space eaten during defragmenting (3.14) Date: Tue, 3 Jun 2014 04:46:35 +0000 (UTC) Message-ID: References: <1703083.hLnNuPsKpY@linux-suse.hu> <538B8F76.9090500@petezilla.co.uk> <538CE4A0.9020105@petezilla.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Peter Chant posted on Mon, 02 Jun 2014 21:54:56 +0100 as excerpted: >> What I /meant/ was "only defragging what you pointed the defrag at", >> not the other snapshots of the same subvolume. "Mounted" shouldn't >> have anything to do with it, except that I didn't consider the >> possibility of having the other snapshots mounted at the same time, so >> said "mounted" when I meant the one you pointed defrag at as I wasn't >> thinking about having the others mounted too. > > Interesting. I have set autodefrag in fstab. I _may_ have previously > tried to defrag the top-level subvolume - faint memory, that is > pointless, as if a file exists in more than one subvolume and it is > changed in one or more it cannot be optimally defraged in all subvols at > once if I understand it correctly - as bits of it are common and bits > differ? Or maybe separate whole copies of the file are created? So if > using snapshots only defrag the one you are actively using, if I > understand correctly. Hmm... that brings up an interesting question. I know snapshots stop at subvolume boundaries, but I haven't the foggiest how the -r/recursive option to defrag behaves. Does defrag stop at subvolume boundaries (and thus snapshot boundaries, as they're simply special-case subvolumes that point at the same data as another subvolume as of the time they were taken) too? If not, what about entirely separate filesystem boundaries where a second btrfs filesystem happen to be mounted inside the recursively defragged tree? I simply don't know, tho I strongly suspect it doesn't cross full filesystem boundaries, at least. Of course if you were using something like find and executing defrag on each found entry, then yes it would recurse, as find would recurse across filesystems and keep going (unless you told it not to using find's -xdev option). Meanwhile, you mention the autodefrag mount option. Assuming you have it on all the time, there should be that much to defrag, *EXCEPT* if the -c/ compress option is used as well. If you aren't also using the compress mount option by default, then you are effectively telling defrag to compress everything as it goes, so it will defrag-and-compressed all files. Which wouldn't be a problem with snapshot-aware-defrag as it'd compress for all snapshots at the same time too. But with snapshot-aware- defrag currently disabled, that would effectively force ALL files to be rewritten in ordered to compress them, thereby breaking the COW link with the other snapshots and duplicating ALL data. Which would SERIOUSLY increased data usage, doubling it, except that the compression would reduce the size of the new version, so perhaps only a 50% increase in data usage, with the caveat that the effectiveness of the compression and thus the 50% number would vary greatly depending on the compressibility of the data in question. Thus, if the OP were NOT using compression previously, it was the -clzo that /really/ blew things up the data usage, as without snapshot-aware- defrag enabled he was effectively duplicating everything that defrag saw in ordered to compress it! (If he was using the compress=lzo option before and had always used it, then adding the -clzo to defrag shouldn't have mattered at all, since the compress mount option would have done the same thing during the defrag as the defrag compress option.) I guess that wasn't quite the intended effect of adding the -clzo flag! All because of the lack of snapshot-aware-defrag. >> 2) With snapshot-aware-defrag (ideal but currently disabled due to >> scaling issues with the current code), defrag would take account of all >> the snapshots containing the same data, and would change them /all/ to >> point to the new data location, when defragging a snapshotted file. >> >> > This is an issue I'm not really up on, and is one of the things I was > reading with interest on the list. > >> 3) Unfortunately, with the snapshot-awareness disabled, it will only >> defrag the particular instance of the data (normally the online working >> instance) you actually pointed defrag at, ignoring the other snapshots >> still pointing at the old instance, thereby duplicating the data, with >> all the other instances of the data still pinned by their snapshot to >> the old location, while only the single instance you pointed defrag at >> actually gets defragged, thereby breaking the COW link with the other >> instances and duplicating the defragged data. > > So with what I am doing, creating snapshots for 'backup' purposes only, > this should not be a big issue as this will only affect the 'working > copy'. (No, btrfs snapshots are not my backup solution.) If the data that you're trying to defrag is snapshotted, the defrag will currently break the COW link and double usage. However, as long as you have the space to spare and are deleting the snapshots in a reasonable time (as it sounds like you are since it seems you're doing snapshots only to enable a stable backup), once you delete all the snapshots from before the defrag, you should get the space back, so it's not a permanent issue. >> That said, there's a couple reasons one might go to the inconvenience >> of doing the mount/umount dance, so the snapshots are only available >> when they're actually being worked with. The first is that unmounted >> data is less likely to be accidentally damaged (altho when it's >> subvolumes/ snapshots on the same master filesystem, the separation and >> protection from damage isn't as great as if they were entirely seperate >> filesystems, but of course you can't snapshot to entirely separate >> filesystems). >> >> > The protection from damage could also or perhaps better being enforced > using read only snapshots? Yes. But you can put me in the multiple independent btrfs filesystems, each on their own partitions, camp. My problem in principle with one big filesystem with subvolumes and snapshots, is that should something happen to damage that filesystem such that it cannot be fully recovered, all those snapshot and subvolume "data eggs" are in the same filesystem "basket", and if it drops, all those eggs are lost at the same time! So I still vastly prefer traditional partitioning methods, with several independent filesystems each on their own partition, and in fact, backup partitions/filesystems as well, with the primary backups on partitions on the same pair of (mostly btrfs raid1) physical devices. That way, if one btrfs filesystem or even all that were currently mounted go unrecoverably bad at the same time, the damage is limited, and I still have the first- backups on the same device-pair I can boot to. (FWIW, I have additional backups on other devices, just in case it's the operating device pair that go bad at the same time, tho I don't necessarily keep them to the same level of currency, as I don't consider the risk of both operating devices going bad at the same time all that high and accept that level of risk should it actually occur.) So I'm used to unmounted meaning the whole filesystem is not in use and therefore reasonable safe from damage, while if it's only subvolumes/ snapshots on the same master filesystem, the level of safety in keeping them unmounted (or read-only mounted if mounted at all) isn't really comparable to the entirely separate filesystem case. But certainly, there's still /some/ benefit to it. But that's why I added the parenthetical caveat, because in the middle of writing that paragraph, I realized that the safety element wasn't as big a deal as I had originally thought when I started the paragraph, because I'm used to dealing with the separate filesystems case and that didn't apply here. >> The second and arguably more important reason has to do with security, >> specifically root escalation vulnerabilities. Consider system updates >> that include a security update for such a root escalation >> vulnerability. Normally, you'd take a snapshot before doing the update, >> so as to have a chance to rollback to the pre-update snapshot in case >> something in the update goes wrong. That's a good policy, but what >> happens to that security update? Now the pre-update snapshot still >> contains the vulnerable version, even while the working copy is patched >> and is no longer vulnerable. Now, if you keep those snapshots mounted >> and some bad guy gets user access to your system, they can access the >> still vulnerable copy in the pre-update snapshot to upgrade their user >> access to root. =:^( >> > This is an interesting point. The changes are not too radical, all I > need to do is add code to my snapshot scripts to mount and unmount my > toplevel btrfs tree when performing a snapshot. Not sure if this causes > any sigificant time penulty as in slowing of the system with any heavy > IO. Since snapshots are run by cron then the time taken to complete is > not critical, rather whether the act of mounting and unmounting causes > any slowing due to heavy IO. Lest there be any confusion I should note that idea isn't original to me. But as I'm reasonably security focused, once I read it on the list, it definitely ranked rather high on my "snapshots considerations" list, and you can bet I'll never have the master subvolume routinely mounted here as a result! Meanwhile, unless there's something strange going on, mounts shouldn't affect ongoing I/O much at all. Umounts are slightly different, in that on btrfs there can be some housekeeping that must be done before the filesystem is fully unmounted that could in theory disrupt ongoing I/O temporarily, but that's limited to writable mounts where some serious write-activity occurred, such that if you're just mounting to do a snapshot and umounting again, I don't believe that should be a problem, since in the normal case there will be only a bit of metadata to update from the process of doing the snapshot. FWIW, while I actually don't do much snapshotting here, I have something similar setup for my "packages" filesystem, which is unmounted unless I'm doing system updates or package queries, and for my rootfs, which is mounted read-only, again unless I'm updating it. My package-tree-update scripts check to see if the packages filesystem is mounted and if not mount it, and remount my rootfs read-write, before syncing the packages- tree from remote. When I'm done, I have another script that umounts the packages tree, and remounts the rootfs ro once again. And you're right, in comparison to the rest of the scripts, the mounting bit is actually quite trivial. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman