From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from torres.zugschlus.de ([85.214.131.164]:39700 "EHLO torres.zugschlus.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750892AbcCEO2j (ORCPT ); Sat, 5 Mar 2016 09:28:39 -0500 Received: from mh by torres.zugschlus.de with local (Exim 4.84) (envelope-from ) id 1acDC9-0002Rz-3v for linux-btrfs@vger.kernel.org; Sat, 05 Mar 2016 15:28:37 +0100 Date: Sat, 5 Mar 2016 15:28:36 +0100 From: Marc Haber To: linux-btrfs@vger.kernel.org Subject: Re: Again, no space left on device while rebalancing and recipe doesnt work Message-ID: <20160305142836.GD1902@torres.zugschlus.de> References: <20160227211450.GS26042@torres.zugschlus.de> <56D3A56A.20809@cn.fujitsu.com> <20160229153352.GE2334@torres.zugschlus.de> <56D4E621.3010604@cn.fujitsu.com> <20160301065448.GJ2334@torres.zugschlus.de> <56D54393.8060307@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi, I have not seen this message coming back to the mailing list. Was it again too long? I have pastebinned the log at http://paste.debian.net/412118/ On Tue, Mar 01, 2016 at 08:51:32PM +0000, Duncan wrote: > There has been something bothering me about this thread that I wasn't > quite pinning down, but here it is. > > If you look at the btrfs fi df/usage numbers, data chunk total vs. used > are very close to one another (113 GiB total, 112.77 GiB used, single > profile, assuming GiB data chunks, that's only a fraction of a single > data chunk unused), so balance would seem to be getting thru them just > fine. Where would you see those numbers? I have those, pre-balance: Mar 2 20:28:01 fan root: Data, single: total=77.00GiB, used=76.35GiB Mar 2 20:28:01 fan root: System, DUP: total=32.00MiB, used=48.00KiB Mar 2 20:28:01 fan root: Metadata, DUP: total=86.50GiB, used=2.11GiB Mar 2 20:28:01 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B > But there's a /huge/ spread between total vs. used metadata (32 GiB > total, under 4 GiB used, clearly _many_ empty or nearly empty chunks), > implying that has not been successfully balanced in quite some time, if > ever. This is possible, yes. > So I'd surmise the problem is in metadata, not in data. > > Which would explain why balancing data works fine, but a whole-filesystem > balance doesn't, because it's getting stuck on the metadata, not the data. > > Now the balance metadata filters include system as well, by default, and > the -mprofiles=dup and -sprofiles=dup balances finished, apparently > without error, which throws a wrench into my theory. Also finishes without changing things, post-balance: Mar 2 21:55:37 fan root: Data, single: total=77.00GiB, used=76.36GiB Mar 2 21:55:37 fan root: System, DUP: total=32.00MiB, used=80.00KiB Mar 2 21:55:37 fan root: Metadata, DUP: total=99.00GiB, used=2.11GiB Mar 2 21:55:37 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B Wait, Metadata used actually _grew_??? > But while we have the btrfs fi df from before the attempt with the > profiles filters, we don't have the same output from after. s We now have everything. New log attached. > > I'd like to remove unused snapshots and keep the number of them to 4 > > digits, as a workaround. > > I'll strongly second that recommendation. Btrfs is known to have > snapshot scaling issues at 10K snapshots and above. My strong > recommendation is to limit snapshots per filesystem to 3000 or less, with > a target of 2000 per filesystem or less if possible, and an ideal of 1000 > per filesystem or less if it's practical to keep it to that, which it > should be with thinning, if you're only snapshotting 1-2 subvolumes, but > may not be if you're snapshotting more. I'm snapshotting /home every 10 minutes, the filesystem that I have been posting logs from has about 400 snapshots, and snapshot cleanup works fine. The slow snapshot removal is a different filesystem on the same host which is on a rotating rust HDD, and is much bigger. > By 3000 snapshots per filesystem, you'll be beginning to notice slowdowns > in some btrfs maintenance commands if you're sensitive to it, tho it's > still at least practical to work with, and by 10K, it's generally > noticeable by all, at least once they thin down to 2K or so, as it's > suddenly faster again! Above 100K, some btrfs maintenance commands slow > to a crawl and doing that sort of maintenance really becomes impractical > enough that it's generally easier to backup what you need to and blow > away the filesystem to start again with a new one, than it is to try to > recover the existing filesystem to a workable state, given that > maintenance can at that point take days to weeks. Ouch. This shold not be the case, or btrfs subvolume snapshot should at least emit a warning. It is not good that it is so easy to get a filesystem into a state this bad. > So 5-digits of snapshots on a filesystem is definitely well outside of > the recommended range, to the point that in some cases, particularly > approaching 6-digits of snapshots, it'll be more practical to simply > ditch the filesystem and start over, than to try to work with it any > longer. Just don't do it; setup your thinning schedule so your peak is > 3000 snapshots per filesystem or under, and you won't have that problem > to worry about. =:^) That needs to be documented prominently. Ths ZFS fanbois will love that. > Oh, and btrfs quota management exacerbates the scaling issues > dramatically. If you're using btrfs quotas Am not, thankfully. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421