All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Tomasz Pala <gotar@polanet.pl>, linux-btrfs@vger.kernel.org
Subject: Re: exclusive subvolume space missing
Date: Mon, 11 Dec 2017 08:24:16 +0800	[thread overview]
Message-ID: <599e2f5d-8e78-9b59-879c-6ba375510508@gmx.com> (raw)
In-Reply-To: <f9d281bb-e8a9-77c3-ab29-6fda9e5228ab@gmx.com>


[-- Attachment #1.1: Type: text/plain, Size: 5654 bytes --]



On 2017年12月11日 07:44, Qu Wenruo wrote:
> 
> 
> On 2017年12月10日 19:27, Tomasz Pala wrote:
>> On Mon, Dec 04, 2017 at 08:34:28 +0800, Qu Wenruo wrote:
>>
>>>> 1. is there any switch resulting in 'defrag only exclusive data'?
>>>
>>> IIRC, no.
>>
>> I have found a directory - pam_abl databases, which occupy 10 MB (yes,
>> TEN MEGAbytes) and released ...8.7 GB (almost NINE GIGAbytes) after
>> defrag. After defragging files were not snapshotted again and I've lost
>> 3.6 GB again, so I got this fully reproducible.
>> There are 7 files, one of which is 99% of the space (10 MB). None of
>> them has nocow set, so they're riding all-btrfs.
>>
>> I could debug something before I'll clean this up, is there anything you
>> want to me to check/know about the files?
> 
> fiemap result along with btrfs dump-tree -t2 result.
> 
> Both output has nothing related to file name/dir name, but only some
> "meaningless" bytenr, so it should be completely OK to share them.
> 
>>
>> The fragmentation impact is HUGE here, 1000-ratio is almost a DoS
>> condition which could be triggered by malicious user during a few hours
>> or faster
> 
> You won't want to hear this:
> The biggest ratio in theory is, 128M / 4K = 32768.
> 
>> - I've lost 3.6 GB during the night with reasonably small
>> amount of writes, I guess it might be possible to trash entire
>> filesystem within 10 minutes if doing this on purpose.
> 
> That's a little complex.
> To get into such situation, snapshot must be used and one must know
> which file extent is shared and how it's shared.
> 
> But yes, it's possible.
> 
> While on the other hand, XFS, which also supports reflink, handles it
> quite well, so I'm wondering if it's possible for btrfs to follow its
> behavior.
> 
>>
>>>> 3. I guess there aren't, so how could I accomplish my target, i.e.
>>>>    reclaiming space that was lost due to fragmentation, without breaking
>>>>    spanshoted CoW where it would be not only pointless, but actually harmful?
>>>
>>> What about using old kernel, like v4.13?
>>
>> Unfortunately (I guess you had 3.13 on mind), I need the new ones and
>> will be pushing towards 4.14.
> 
> No, I really mean v4.13.

My fault, it is v3.13.

What a stupid error...

> 
> From btrfs(5):
> ---
>                Warning
>                Defragmenting with Linux kernel versions < 3.9 or ≥
> 3.14-rc2 as
>                well as with Linux stable kernel versions ≥ 3.10.31, ≥
> 3.12.12
>                or ≥ 3.13.4 will break up the ref-links of CoW data (for
>                example files copied with cp --reflink, snapshots or
>                de-duplicated data). This may cause considerable increase of
>                space usage depending on the broken up ref-links.
> ---
> 
>>
>>>> 4. How can I prevent this from happening again? All the files, that are
>>>>    written constantly (stats collector here, PostgreSQL database and
>>>>    logs on other machines), are marked with nocow (+C); maybe some new
>>>>    attribute to mark file as autodefrag? +t?
>>>
>>> Unfortunately, nocow only works if there is no other subvolume/inode
>>> referring to it.
>>
>> This shouldn't be my case anymore after defrag (==breaking links).
>> I guess no easy way to check refcounts of the blocks?
> 
> No easy way unfortunately.
> It's either time consuming (used by qgroup) or complex (manually tree
> search and do the backref walk by yourself)
> 
>>
>>> But in my understanding, btrfs is not suitable for such conflicting
>>> situation, where you want to have snapshots of frequent partial updates.
>>>
>>> IIRC, btrfs is better for use case where either update is less frequent,
>>> or update is replacing the whole file, not just part of it.
>>>
>>> So btrfs is good for root filesystem like /etc /usr (and /bin /lib which
>>> is pointing to /usr/bin and /usr/lib) , but not for /var or /run.
>>
>> That is something coherent with my conclusions after 2 years on btrfs,
>> however I didn't expect a single file to eat 1000 times more space than it
>> should...
>>
>>
>> I wonder how many other filesystems were trashed like this - I'm short
>> of ~10 GB on other system, many other users might be affected by that
>> (telling the Internet stories about btrfs running out of space).
> 
> Firstly, no other filesystem supports snapshot.
> So it's pretty hard to get a baseline.
> 
> But as I mentioned, XFS supports reflink, which means file extent can be
> shared between several inodes.
> 
> From the message I got from XFS guys, they free any unused space of a
> file extent, so it should handle it quite well.
> 
> But it's quite a hard work to achieve in btrfs, needs years development
> at least.
> 
>>
>> It is not a problem that I need to defrag a file, the problem is I don't know:
>> 1. whether I need to defrag,
>> 2. *what* should I defrag
>> nor have a tool that would defrag smart - only the exclusive data or, in
>> general, the block that are worth defragging if space released from
>> extents is greater than space lost on inter-snapshot duplication.
>>
>> I can't just defrag entire filesystem since it breaks links with snapshots.
>> This change was a real deal-breaker here...
> 
> IIRC it's better to add a option to make defrag snapshot-aware.
> (Don't break snapshot sharing but only to defrag exclusive data)
> 
> Thanks,
> Qu
> 
>>
>> Any way to fed the deduplication code with snapshots maybe? There are
>> directories and files in the same layout, this could be fast-tracked to
>> check and deduplicate.
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

  reply	other threads:[~2017-12-11  0:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01 16:15 exclusive subvolume space missing Tomasz Pala
2017-12-01 21:27 ` Duncan
2017-12-01 21:36 ` Hugo Mills
2017-12-02  0:53   ` Tomasz Pala
2017-12-02  1:05     ` Qu Wenruo
2017-12-02  1:43       ` Tomasz Pala
2017-12-02  2:17         ` Qu Wenruo
2017-12-02  2:56     ` Duncan
2017-12-02 16:28     ` Tomasz Pala
2017-12-02 17:18       ` Tomasz Pala
2017-12-03  1:45         ` Duncan
2017-12-03 10:47           ` Adam Borowski
2017-12-04  5:11             ` Chris Murphy
2017-12-10 10:49           ` Tomasz Pala
2017-12-04  4:58     ` Chris Murphy
2017-12-02  0:27 ` Qu Wenruo
2017-12-02  1:23   ` Tomasz Pala
2017-12-02  1:47     ` Qu Wenruo
2017-12-02  2:21       ` Tomasz Pala
2017-12-02  2:35         ` Qu Wenruo
2017-12-02  9:33           ` Tomasz Pala
2017-12-04  0:34             ` Qu Wenruo
2017-12-10 11:27               ` Tomasz Pala
2017-12-10 15:49                 ` Tomasz Pala
2017-12-10 23:44                 ` Qu Wenruo
2017-12-11  0:24                   ` Qu Wenruo [this message]
2017-12-11 11:40                   ` Tomasz Pala
2017-12-12  0:50                     ` Qu Wenruo
2017-12-15  8:22                       ` Tomasz Pala
2017-12-16  3:21                         ` Duncan
2017-12-05 18:47   ` How exclusive in parent qgroup is computed? (was: Re: exclusive subvolume space missing) Andrei Borzenkov
2017-12-05 23:57     ` How exclusive in parent qgroup is computed? Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=599e2f5d-8e78-9b59-879c-6ba375510508@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=gotar@polanet.pl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.