On 2019/10/27 上午1:46, Atemu wrote: > Hi linux-btrfs, > after btrfs sending ~37GiB of a snapshot of one of my subvolumes, > btrfs send stalls (`pv` (which I'm piping it through) does not report > any significant throughput anymore) and shortly after, the Kernel's > memory usage starts to rise until it runs OOM and panics. > > Here's the tail of dmesg I saved before such a Kernel panic: > > https://gist.githubusercontent.com/Atemu/3af591b9fa02efee10303ccaac3b4a85/raw/f27c0c911f4a9839a6e59ed494ff5066c7754e07/btrfs%2520send%2520OOM%2520log > > (I cancelled the first btrfs send in this example FYI, that's not part > of nor required for this bug.) > > And here's a picture of the screen after the Kernel panic: > > https://photos.app.goo.gl/cEj5TA9B5V8eRXsy9 > > (This was recorded a while back but I am able to repoduce the same bug > on archlinux-2019.10.01-x86_64.iso.) > > The snapshot holds ~3.8TiB of data that has been compressed (ZSTD:3) > and heavily deduplicated down to ~1.9TiB. That's the problem. Deduped files caused heavy overload for backref walk. And send has to do backref walk, and you see the problem... I'm very interested how heavily deduped the file is. If it's just all 0 pages, hole punching is more effective than dedupe, and causes 0 backref overhead. Thanks, Qu > For deduplication I used `bedup dedup` and `duperemove -x -r -h -A -b > 32K ---skip-zeroes --dedupe-options=same,fiemap,noblock` and IIRC it > was mostly done around the time 4.19 and 4.20 were recent. > > The Inode that btrfs reports as corrupt towards the end of the dmesg > is a 37GiB 7z archive (size correlates) and can be read without errors > on a live system where the bug hasn't been triggered yet. Since it > happens to be a 7z archive, I can even confirm its integrity with `7z > t`. > A scrub and `btrfs check --check-data-csum` don't detect any errors either. > > Please tell me what other information I could provide that might be > useful/necessary for squashing this bug, > Atemu > > PS: I could spin up a VM with device mapper snapshots of the drives, > destructive troubleshooting is possible if needed. >