Btrfs filesystem trashed after OOM scenario

* Btrfs filesystem trashed after OOM scenario
@ 2019-09-24 22:03 Nick Bowler
  2019-09-24 22:34 ` Chris Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Bowler @ 2019-09-24 22:03 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 9209 bytes --]

Hi folks,

So I had an interesting scenario that I thought I'd share in case
anyone wants to investigate before I blow away this filesystem...

Timeline:
- Running Linux 5.2.14, I pushed this system to OOM; the oom killer
ran and killed some userspace tasks.  At this point many of the
remaining tasks were stuck in uninterruptible sleeps.  Not really
worried, I turned the machine off and on again to just get everything
back to normal.  But I guess now that everything had gone horribly
wrong already at this point...

- Upon reboot, the system boots OK but now btrfs is throwing zillions
of checksum errors.  After some time the filesystem is remounted
readonly and I lose the ability to interact with the system at all, so
it gets powered off.

- Now the filesystem is unmountable.

I've attached the logs (gzipped) that were captured before, which I
think covers from syslog starting on the original boot to the OOM (but
possibly not right afterwards since things were hanging), plus the
boot logs from the first reboot up to (shortly before) the filesystem
goes readonly.

Appended is what I get now when attempting to access the filesystem on
a rescue system.  Let me know if you need any more info.

Cheers,
  Nick

# mount -o ro /dev/mapper/fucked /mnt/fucked
[  340.787239] Btrfs loaded, crc32c=crc32c-intel
[  340.788390] BTRFS: device label alastor-root devid 1 transid
2616190 /dev/dm-0
[  347.054205] BTRFS info (device dm-0): disk space caching is enabled
[  347.054207] BTRFS info (device dm-0): has skinny extents
[  347.155561] BTRFS info (device dm-0): enabling ssd optimizations
[  347.334218] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.334414] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.453104] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.453318] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.456581] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.456843] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.461251] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.461638] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.462755] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.462957] BTRFS error (device dm-0): parent transid verify failed
on 554858348544 wanted 2616165 found 2616162
[  347.511704] BTRFS error (device dm-0): error loading props for ino
721 (root 1): -5
[  347.551471] BTRFS: error (device dm-0) in
__btrfs_prealloc_file_range:10310: errno=-5 IO failure
[  347.551514] WARNING: CPU: 3 PID: 1143 at
fs/btrfs/extent-tree.c:4277
btrfs_free_reserved_data_space_noquota+0xd0/0xe0 [btrfs]
[  347.551515] Modules linked in: btrfs libcrc32c xor raid6_pq
dm_crypt algif_skcipher af_alg dm_mod ext4 crc32c_generic mbcache jbd2
fscrypto ccm 8021q garp mrp stp llc joydev mousedev rmi_smbus rmi_core
arc4 iwlmvm mac80211 intel_rapl ofpart uvcvideo x86_pkg_temp_thermal
cmdlinepart intel_powerclamp btusb intel_spi_platform coretemp
intel_spi iwlwifi btrtl mei_wdt snd_hda_codec_realtek spi_nor
snd_hda_codec_generic snd_hda_codec_hdmi mtd btbcm snd_hda_intel
btintel kvm_intel iTCO_wdt snd_hda_codec videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 iTCO_vendor_support bluetooth
crct10dif_pclmul thinkpad_acpi videobuf2_common ghash_clmulni_intel
tpm_tis videodev tpm_tis_core intel_cstate cfg80211 pcspkr
snd_hda_core tpm snd_hwdep intel_uncore snd_pcm nvram psmouse
snd_timer input_leds ecdh_generic media
[  347.551557]  intel_rapl_perf mei_me crc16 snd ac battery rng_core
rfkill mei rtsx_pci_ms lpc_ich intel_pch_thermal soundcore memstick
evdev mac_hid wmi_bmof i2c_i801 pcc_cpufreq ip_tables x_tables overlay
squashfs loop isofs sd_mod uas usb_storage i915 kvmgt vfio_mdev mdev
vfio_iommu_type1 vfio ahci kvm libahci crc32_pclmul crc32c_intel
rtsx_pci_sdmmc irqbypass i2c_algo_bit serio_raw mmc_core atkbd libata
drm_kms_helper libps2 aesni_intel syscopyarea sysfillrect aes_x86_64
sysimgblt crypto_simd fb_sys_fops cryptd ehci_pci xhci_pci glue_helper
ehci_hcd scsi_mod rtsx_pci drm e1000e xhci_hcd intel_gtt agpgart wmi
i8042 serio
[  347.551595] CPU: 3 PID: 1143 Comm: mount Not tainted 4.19.34-1-lts #1
[  347.551596] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS
N10ET42W (1.21 ) 02/26/2016
[  347.551610] RIP:
0010:btrfs_free_reserved_data_space_noquota+0xd0/0xe0 [btrfs]
[  347.551612] Code: 6c 55 1b c1 48 8b 7b 08 48 83 c3 18 45 31 c9 4d
89 e8 4c 89 f1 4c 89 fa 4c 89 e6 e8 ca c3 af c0 48 8b 03 48 85 c0 75
dc eb 98 <0f> 0b 31 db eb 89 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 41
[  347.551613] RSP: 0018:ffffaafd41fef758 EFLAGS: 00010287
[  347.551614] RAX: 0000000000000000 RBX: fffffffffffc0000 RCX: 0000000000040000
[  347.551615] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9087abcf5600
[  347.551616] RBP: ffff9087abcf5600 R08: 0000000000000369 R09: 0000000000000004
[  347.551617] R10: ffff9087a02c40d8 R11: ffffffff82861eed R12: ffff90881aa2a000
[  347.551618] R13: 0000000000040000 R14: 0000000000040000 R15: ffff9087b0299ad0
[  347.551620] FS:  00007fa67625b780(0000) GS:ffff908825cc0000(0000)
knlGS:0000000000000000
[  347.551621] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  347.551622] CR2: 00007f1d33580458 CR3: 00000001aedcc006 CR4: 00000000003606e0
[  347.551623] Call Trace:
[  347.551638]  btrfs_free_reserved_data_space+0x4b/0x70 [btrfs]
[  347.551656]  __btrfs_prealloc_file_range+0x388/0x450 [btrfs]
[  347.551670]  cache_save_setup+0x1dd/0x3a0 [btrfs]
[  347.551685]  btrfs_setup_space_cache+0x97/0xc0 [btrfs]
[  347.551700]  commit_cowonly_roots+0xde/0x2b0 [btrfs]
[  347.551718]  ? btrfs_qgroup_account_extents+0xbb/0x1d0 [btrfs]
[  347.551734]  btrfs_commit_transaction+0x2ac/0x890 [btrfs]
[  347.551752]  btrfs_recover_log_trees+0x38a/0x420 [btrfs]
[  347.551771]  ? replay_one_dir_item+0x170/0x170 [btrfs]
[  347.551786]  open_ctree+0x1a21/0x1b60 [btrfs]
[  347.551798]  btrfs_mount_root+0x656/0x720 [btrfs]
[  347.551802]  ? bitmap_find_next_zero_area_off+0x3d/0x90
[  347.551804]  ? cpumask_next+0x16/0x20
[  347.551807]  ? pcpu_alloc+0x1cb/0x640
[  347.551810]  mount_fs+0x3b/0x167
[  347.551813]  vfs_kern_mount.part.11+0x54/0x110
[  347.551825]  btrfs_mount+0x16f/0x860 [btrfs]
[  347.551830]  ? path_lookupat.isra.13+0xa6/0x230
[  347.551832]  ? legitimize_path.isra.9+0x2d/0x60
[  347.551834]  ? bitmap_find_next_zero_area_off+0x3d/0x90
[  347.551836]  ? pcpu_alloc_area+0xe2/0x130
[  347.551838]  ? pcpu_next_unpop+0x37/0x50
[  347.551840]  ? cpumask_next+0x16/0x20
[  347.551842]  ? pcpu_alloc+0x1cb/0x640
[  347.551844]  ? mount_fs+0x3b/0x167
[  347.551845]  mount_fs+0x3b/0x167
[  347.551848]  vfs_kern_mount.part.11+0x54/0x110
[  347.551850]  do_mount+0x1fb/0xc10
[  347.551852]  ? _copy_from_user+0x37/0x60
[  347.551854]  ? memdup_user+0x4b/0x70
[  347.551855]  ksys_mount+0xba/0xd0
[  347.551857]  __x64_sys_mount+0x21/0x30
[  347.551860]  do_syscall_64+0x4e/0x100
[  347.551862]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  347.551864] RIP: 0033:0x7fa6763e568e
[  347.551866] Code: 48 8b 0d d5 17 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a2 17 0c 00 f7 d8 64 89
01 48
[  347.551867] RSP: 002b:00007ffc92d01298 EFLAGS: 00000246 ORIG_RAX:
00000000000000a5
[  347.551868] RAX: ffffffffffffffda RBX: 00005561f6fe4400 RCX: 00007fa6763e568e
[  347.551869] RDX: 00005561f6fec000 RSI: 00005561f6fe5300 RDI: 00005561f6fe4610
[  347.551870] RBP: 00007fa67650b1e4 R08: 0000000000000000 R09: 0000000000000000
[  347.551871] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  347.551872] R13: 0000000000000001 R14: 00005561f6fe4610 R15: 00005561f6fec000
[  347.551874] ---[ end trace 010db75a59ca54bb ]---
[  347.556498] BTRFS warning (device dm-0): Skipping commit of aborted
transaction.
[  347.556501] BTRFS: error (device dm-0) in cleanup_transaction:1846:
errno=-5 IO failure
[  347.557941] BTRFS error (device dm-0): pending csums is 262144
[  347.557946] BTRFS: error (device dm-0) in btrfs_replay_log:2277:
errno=-5 IO failure (Failed to recover log tree)
[  347.790510] BTRFS error (device dm-0): open_ctree failed

# btrfs check --readonly /dev/mapper/fucked
Opening filesystem to check...
Checking filesystem on /dev/mapper/fucked
UUID: 412a90ce-0a07-4072-9219-44bd98eb1be4
[1/7] checking root items
parent transid verify failed on 554858348544 wanted 2616165 found 2616162
parent transid verify failed on 554858348544 wanted 2616165 found 2616162
parent transid verify failed on 554858348544 wanted 2616165 found 2616162
parent transid verify failed on 554858348544 wanted 2616165 found 2616162
Ignoring transid failure
leaf parent key incorrect 554858348544
ERROR: failed to repair root items: Operation not permitted

[-- Attachment #2: alastor-log-merged.log.gz --]
[-- Type: application/x-gzip, Size: 25127 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread