From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [195.159.176.226] ([195.159.176.226]:48424 "EHLO blaine.gmane.org" rhost-flags-FAIL-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752903AbeAJEk3 (ORCPT ); Tue, 9 Jan 2018 23:40:29 -0500 Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1eZ89Y-0006m0-BA for linux-btrfs@vger.kernel.org; Wed, 10 Jan 2018 05:38:16 +0100 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: Recommendations for balancing as part of regular maintenance? Date: Wed, 10 Jan 2018 04:38:09 +0000 (UTC) Message-ID: References: <5A539A3A.10107@gmail.com> <811ff9be-d155-dae0-8841-0c1b20c18843@cobb.uk.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Graham Cobb posted on Mon, 08 Jan 2018 18:17:13 +0000 as excerpted: > On 08/01/18 16:34, Austin S. Hemmelgarn wrote: >> Ideally, I think it should be as generic as reasonably possible, >> possibly something along the lines of: >> >> A: While not strictly necessary, running regular filtered balances (for >> example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 >> -mlimit=4`, >> see `man btrfs-balance` for more info on what the options mean) can >> help keep a volume healthy by mitigating the things that typically >> cause ENOSPC errors.  Full balances by contrast are long and expensive >> operations, and should be done only as a last resort. > > That recommendation is similar to what I do and it works well for my use > case. I would recommend it to anyone with my usage, but cannot say how > well it would work for other uses. In my case, I run balances like that > once a week: some weeks nothing happens, other weeks 5 or 10 blocks may > get moved. Why 50% usage, and why the rather low limits? OK, so it rarely makes sense to go over 50% usage when the intent of the balance is to return chunks to the unallocated pool, because at 50% the payback ratio is one free chunk for two processed and it gets worse after that and MUCH worse after ~67-75%, where the ratios are 1:3 and 1:4 respectively, but why so high especially for a suggested scheduled/ routine command? I'd suggest a rather lower usage value, say 20/25/34%, for favorable payback ratios of 5:1, 4:1, and 3:1. That should be reasonable for a generic recommendation for scheduled/routine balances. If that's not enough, people can do more manually or increase the values from the generic recommendation for their specific use-case. And I'd suggest either no limits or (for kernels that can handle it, 4.4+, which at this point is everything within our recommended support range of the last two LTSs, thus now 4.9 earliest, anyway) range-limits, say 2..20, so it won't bother if there's less than enough to clear at least one chunk within the usage target (but see the observed behavior change noted below), but will do more than the low 2-4 in the above suggested limits if there is. With the lower usage= values, processing should take less time per chunk, and if there's no more that fit the usage filter it won't use the higher range anyway, so the limit can and should be higher. Meanwhile, for any recommendation of balance, I'd suggest also mentioning the negative effect that enabled quotas have on balance times, probably with a link to a fuller discussion where I'd suggest disabling them due to the scaling issues if the use-case doesn't require them, and if that's not possible due to the use-case, to at least consider temporarily disabling quotas before doing a balance so as to speed it up, after which they can be enabled again. (I'm not sure if a manual quota rescan is required to update them at that point, or not. I don't use quotas here or I'd test.) And an additional observation... I'm on ssd here and run many rather small independent btrfs instead of fewer larger ones, so I'm used to keeping an eye on usage, tho I've never found the need to schedule balances, partly because on ssd with relatively small btrfs, balances are fast enough they're not a problem to do "while I wait". And I've definitely noticed an effect since the ssd option stopped using the 2 MiB spreading algorithm in 4.14. In particular, while chunk usage was generally stable before that and I only occasionally needed to run balance to clear out empty chunks, now, balance with the usage filter will apparently actively fill in empty space in existing chunks, so while previously a usage-filtered balance that only rewrote one chunk didn't actually free anything, simply allocating a new chunk to replace the one it freed, so at least two chunks needed rewritten to actually free space back to unallocated... Now, usage-filtered rewrites of only a single chunk routinely frees the allocated space, because it writes that small bit of data in the freed chunk into existing free space in other chunks. At least I /presume/ that new balance-usage behavior is due to the ssd changes. Maybe it's due to other patches. Either way, it's an interesting and useful change. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman