linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Atemu <atemu.main@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: BUG: btrfs send: Kernel's memory usage rises until OOM kernel panic after sending ~37GiB
Date: Sun, 27 Oct 2019 19:34:03 +0800	[thread overview]
Message-ID: <b4673e3b-b9b2-e8e5-2783-4b5eac7f656d@gmx.com> (raw)
In-Reply-To: <CAE4GHgmW2A-2SUUw8FzgafRhQ2BoViBx2DsLigwBrrbbp=oOsw@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2238 bytes --]



On 2019/10/27 下午6:33, Atemu wrote:
>> That's the problem.
>>
>> Deduped files caused heavy overload for backref walk.
>> And send has to do backref walk, and you see the problem...
> 
> Interesting!
> But should it really be able to make btrfs send use up >15GiB of RAM
> and cause a kernel panic because of that? The btrfs doesn't even have
> that much metadata on-disk in total.

This depends on how shared one file extent is.

If one file extent is shared 10,000 times for one subvolume, and you
have 1000 snapshots of that subvolume, it will really go crazy.

> 
>> I'm very interested how heavily deduped the file is.
> 
> So am I, how could I get my hands on that information?
> 
> Are that particular file's extents what causes btrfs send's memory
> usage to spiral out of control?

I can't say for 100% sure. We need more info on that.

Extent tree dump can provide per-subvolume level view of how shared one
extent is.
But as I mentioned, snapshot is another catalyst for such problem.

> 
>> If it's just all 0 pages, hole punching is more effective than dedupe,
>> and causes 0 backref overhead.
> 
> I did punch holes into the disk images I have stored on it by mounting
> and fstrim'ing

That's trim (or discard), not hole punching.
Normally hole punching is done by ioctl fpunch(). Not sure if dupremove
does that too.

> them and the duperemove command I used has a flag that
> ignores all 0 pages (those get compressed down to next to nothing
> anyways) but it's likely that I ran duperememove once or twice before
> I knew about that flag.
> 
> Is there a way to find such extents that could cause the backref walk
> to overload?

It's really hard to determine, you could try the following command to
determine:
# btrfs ins dump-tree -t extent --bfs /dev/nvme/btrfs |\
  grep "(.*_ITEM.*)" | awk '{print $4" "$5" "$6" size "$10}'

Then which key is the most shown one and its size.

If a key's objectid (the first value) shows up multiple times, it's a
kinda heavily shared extent.

Then search that objectid in the full extent tree dump, to find out how
it's shared.

You can see it's already complex...

Thanks,
Qu
> 
> Thanks,
> Atemu
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-10-27 11:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-26 17:46 BUG: btrfs send: Kernel's memory usage rises until OOM kernel panic after sending ~37GiB Atemu
2019-10-27  0:50 ` Qu Wenruo
2019-10-27 10:33   ` Atemu
2019-10-27 11:34     ` Qu Wenruo [this message]
2019-10-27 12:55       ` Atemu
2019-10-27 13:43         ` Qu Wenruo
2019-10-27 15:19           ` Atemu
2019-10-27 15:19       ` Atemu
2019-10-27 23:16         ` Qu Wenruo
2019-10-28 12:26           ` Atemu
2019-10-28 11:30         ` Filipe Manana
2019-10-28 12:36           ` Qu Wenruo
2019-10-28 12:43             ` Filipe Manana
2019-10-28 14:58               ` Martin Raiber
2019-10-28 12:44           ` Atemu
2019-10-28 13:01             ` Filipe Manana
2019-10-28 13:44               ` Atemu
2019-10-31 13:55                 ` Atemu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4673e3b-b9b2-e8e5-2783-4b5eac7f656d@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=atemu.main@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).