corrupt leaf, bad key order on kernel 5.0

* corrupt leaf, bad key order on kernel 5.0
@ 2019-04-05 19:11 Nazar Mokrynskyi
  2019-04-05 19:32 ` Hugo Mills
  0 siblings, 1 reply; 4+ messages in thread
From: Nazar Mokrynskyi @ 2019-04-05 19:11 UTC (permalink / raw)
  To: linux-btrfs

NOTE: I do not need help with recovery, I have fully automated snapshots, backups and restoration mechanisms, the only purpose of this email is to help developers find the reason of yet another filesystem corruption and hopefully fix it.

Yet another corruption of my root BTRFS filesystem happened today.
Didn't bother to run scrub, balance or check, just created disk image for future investigation and restored everything from backup.

Here is what corruption looks like:
[  274.241339] BTRFS info (device dm-0): disk space caching is enabled
[  274.241344] BTRFS info (device dm-0): has skinny extents
[  274.283238] BTRFS info (device dm-0): enabling ssd optimizations
[  310.436672] BTRFS critical (device dm-0): corrupt leaf: root=268 block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current (1240717 76 41451520)
[  310.449304] BTRFS critical (device dm-0): corrupt leaf: root=268 block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current (1240717 76 41451520)
[  310.449309] BTRFS: error (device dm-0) in btrfs_dropa_snapshot:9250: errno=-5 IO failure
[  310.449311] BTRFS info (device dm-0): forced readonly
[  311.266789] BTRFS info (device dm-0): delayed_refs has NO entry
[  311.277088] BTRFS error (device dm-0): cleaner transaction attach returned -30

My system just freezed when I was not looking at it and this is the state it is in now.
File system survived from March 8th til April 05, one of the fastest corruptions in my experience.

Looks like this happened during sending incremental snapshot to the other BTRFS filesystem, since last snapshot on that one was not read-only as it should have been otherwise.

I'm on Ubuntu 19.04 with Linux kernel 5.0.5 and btrfs-progs v4.20.2.

My filesystem is on top of LUKS on NVMe SSD (SM961), I have 3 snapshots created every 15 minutes from 3 subvolumes with rotation of old snapshots (can be from tens to hundreds of snapshots at any time).

Mount options: compress=lzo,noatime,ssd

I have full disk image with corrupted filesystem and will create Qcow2 snapshots of it, so if you want me to run any experiments, including potentially destructive, including usage of custom patches to btrfs-progs to find out the reason of corruption, would be happy to help as much as I can.

P.S. I'm riding latest stable and rc kernels all the time and during last 6 months I've got about as many corruptions of different BTRFS filesystems as during 3 years before that, really worrying if you ask me.

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

^ permalink raw reply	[flat|nested] 4+ messages in thread