All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel oops when mounting btrfs
@ 2019-03-21 18:56 Thorsten Hirsch
  2019-03-21 23:23 ` Fw: " Thorsten Hirsch
  0 siblings, 1 reply; 8+ messages in thread
From: Thorsten Hirsch @ 2019-03-21 18:56 UTC (permalink / raw)
  To: linux-btrfs

Hi.
 
Yesterday when powering off my PC systemd's umount service run in a timeout. Today linux couldn't boot anymore, instead I suddenly found myself in the kernel's rescue shell. Seems like btrfs is broken on my root partition.

# mount -t btrfs -o ro /dev/nvme0n1p3 /mnt
...produces a kernel oops. I attached the dmesg output.

# mount -t btrfs -o ro,recovery /dev/nvme0n1p3 /mnt
Killed

I'm afraid of calling btrfsck, because of the warnings in the btrfs wiki, that this tool might make the situation worse. Any suggestions how to proceed?

Thorsten Hirsch



[  223.023202] BTRFS warning (device nvme0n1p3): 'recovery' is deprecated, use 'usebackuproot' instead
[  223.023204] BTRFS info (device nvme0n1p3): trying to use backup root at mount time
[  223.023206] BTRFS info (device nvme0n1p3): disk space caching is enabled
[  223.023207] BTRFS info (device nvme0n1p3): has skinny extents
[  223.052188] BTRFS info (device nvme0n1p3): enabling ssd optimizations
[  223.054170] init_special_inode: bogus i_mode (0) for inode nvme0n1p3:882
[  223.054184] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  223.054187] PGD 0 P4D 0 
[  223.054190] Oops: 0010 [#1] PREEMPT SMP NOPTI
[  223.054193] CPU: 6 PID: 2841 Comm: mount Not tainted 4.19.28-1-MANJARO #1
[  223.054194] Hardware name: Gigabyte Technology Co., Ltd. AB350M-DS3H/AB350M-DS3H-CF, BIOS F24 12/25/2018
[  223.054196] RIP: 0010:          (null)
[  223.054199] Code: Bad RIP value.
[  223.054200] RSP: 0018:ffffb36dcaa570b8 EFLAGS: 00010246
[  223.054202] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  223.054203] RDX: 0000000000000006 RSI: fffff00c0dd2fb40 RDI: 0000000000000000
[  223.054204] RBP: ffffb36dcaa57140 R08: 0000000000000000 R09: fffff00c0dd2fb40
[  223.054205] R10: ffff9ea7ee5846a8 R11: 0000000000000000 R12: ffff9ea7ee584698
[  223.054207] R13: ffffb36dcaa57190 R14: fffff00c0dd2fb48 R15: fffff00c0dd2fb40
[  223.054208] FS:  00007fbe1d015780(0000) GS:ffff9ea88eb80000(0000) knlGS:0000000000000000
[  223.054210] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  223.054211] CR2: ffffffffffffffd6 CR3: 000000038c148000 CR4: 00000000003406e0
[  223.054212] Call Trace:
[  223.054217]  ? read_pages+0x116/0x180
[  223.054221]  ? __do_page_cache_readahead+0x1a5/0x1c0
[  223.054224]  ? ondemand_readahead+0x211/0x2e0
[  223.054227]  ? kmem_cache_alloc_trace+0x176/0x1d0
[  223.054245]  ? __load_free_space_cache+0x1c4/0x5d0 [btrfs]
[  223.054261]  ? load_free_space_cache+0xb6/0x190 [btrfs]
[  223.054271]  ? cache_block_group+0x73/0x3e0 [btrfs]
[  223.054281]  ? cache_block_group+0x1c1/0x3e0 [btrfs]
[  223.054284]  ? wait_woken+0x80/0x80
[  223.054295]  ? find_free_extent+0x87e/0x1110 [btrfs]
[  223.054298]  ? syscall_return_via_sysret+0x14/0x83
[  223.054301]  ? __switch_to_asm+0x34/0x70
[  223.054302]  ? __switch_to_asm+0x40/0x70
[  223.054314]  ? btrfs_reserve_extent+0x9b/0x180 [btrfs]
[  223.054325]  ? btrfs_alloc_tree_block+0x1ab/0x5b0 [btrfs]
[  223.054335]  ? alloc_tree_block_no_bg_flush+0x47/0x50 [btrfs]
[  223.054345]  ? __btrfs_cow_block+0x11b/0x500 [btrfs]
[  223.054355]  ? btrfs_cow_block+0xdc/0x1a0 [btrfs]
[  223.054365]  ? btrfs_search_slot+0x22e/0x9f0 [btrfs]
[  223.054375]  ? btrfs_search_slot+0x858/0x9f0 [btrfs]
[  223.054385]  ? btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[  223.054399]  ? overwrite_item+0xfb/0x5e0 [btrfs]
[  223.054413]  ? replay_one_buffer+0x6af/0x870 [btrfs]
[  223.054427]  ? walk_down_log_tree+0x76/0x3b0 [btrfs]
[  223.054441]  ? walk_log_tree+0xce/0x1d0 [btrfs]
[  223.054453]  ? btrfs_recover_log_trees+0x221/0x420 [btrfs]
[  223.054466]  ? replay_one_dir_item+0x170/0x170 [btrfs]
[  223.054478]  ? open_ctree+0x1a21/0x1b60 [btrfs]
[  223.054489]  ? btrfs_mount_root+0x656/0x720 [btrfs]
[  223.054491]  ? pcpu_block_update_hint_alloc+0x18a/0x1d0
[  223.054495]  ? cpumask_next+0x16/0x20
[  223.054496]  ? pcpu_alloc+0x1cb/0x670
[  223.054499]  ? mount_fs+0x3b/0x167
[  223.054502]  ? vfs_kern_mount.part.11+0x54/0x110
[  223.054512]  ? btrfs_mount+0x16f/0x860 [btrfs]
[  223.054514]  ? path_lookupat.isra.13+0xa6/0x230
[  223.054515]  ? legitimize_path.isra.9+0x2d/0x60
[  223.054518]  ? pcpu_alloc_area+0xe2/0x130
[  223.054519]  ? pcpu_next_unpop+0x37/0x50
[  223.054521]  ? cpumask_next+0x16/0x20
[  223.054523]  ? pcpu_alloc+0x1cb/0x670
[  223.054525]  ? mount_fs+0x3b/0x167
[  223.054526]  ? mount_fs+0x3b/0x167
[  223.054529]  ? vfs_kern_mount.part.11+0x54/0x110
[  223.054531]  ? do_mount+0x1fd/0xc80
[  223.054533]  ? _copy_from_user+0x37/0x60
[  223.054535]  ? kmem_cache_alloc_trace+0x176/0x1d0
[  223.054537]  ? copy_mount_options+0x28/0x210
[  223.054539]  ? ksys_mount+0xba/0xd0
[  223.054540]  ? __x64_sys_mount+0x21/0x30
[  223.054543]  ? do_syscall_64+0x65/0x180
[  223.054545]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  223.054547] Modules linked in: fuse sg st mousedev joydev input_leds amdkfd amd_iommu_v2 amdgpu snd_hda_codec_ca0110 snd_hda_codec_generic chash snd_hda_codec_hdmi gpu_sched i2c_algo_bit snd_hda_intel snd_hda_codec ttm snd_hda_core edac_mce_amd drm_kms_helper sp5100_tco snd_hwdep kvm_amd snd_pcm drm snd_timer i2c_piix4 kvm snd agpgart syscopyarea sysfillrect ccp soundcore sysimgblt irqbypass fb_sys_fops pcc_cpufreq rng_core evdev k10temp pcspkr crct10dif_pclmul ghash_clmulni_intel acpi_cpufreq pinctrl_amd gpio_amdpt mac_hid wmi_bmof uinput crypto_user ip_tables x_tables overlay squashfs isofs sr_mod cdrom sd_mod uas usb_storage btrfs libcrc32c crc32c_generic xor hid_generic usbhid hid serio_raw atkbd crc32_pclmul crc32c_intel libps2 raid6_pq ahci libahci libata aesni_intel aes_x86_64 crypto_simd
[  223.054580]  r8169 xhci_pci cryptd realtek glue_helper scsi_mod xhci_hcd libphy wmi i8042 serio dm_snapshot dm_bufio dm_mod loop
[  223.054588] CR2: 0000000000000000
[  223.054590] ---[ end trace e933e8be13130720 ]---
[  223.054592] RIP: 0010:          (null)
[  223.054594] Code: Bad RIP value.
[  223.054595] RSP: 0018:ffffb36dcaa570b8 EFLAGS: 00010246
[  223.054596] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  223.054598] RDX: 0000000000000006 RSI: fffff00c0dd2fb40 RDI: 0000000000000000
[  223.054599] RBP: ffffb36dcaa57140 R08: 0000000000000000 R09: fffff00c0dd2fb40
[  223.054600] R10: ffff9ea7ee5846a8 R11: 0000000000000000 R12: ffff9ea7ee584698
[  223.054601] R13: ffffb36dcaa57190 R14: fffff00c0dd2fb48 R15: fffff00c0dd2fb40
[  223.054603] FS:  00007fbe1d015780(0000) GS:ffff9ea88eb80000(0000) knlGS:0000000000000000
[  223.054604] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  223.054605] CR2: ffffffffffffffd6 CR3: 000000038c148000 CR4: 00000000003406e0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fw: kernel oops when mounting btrfs
  2019-03-21 18:56 kernel oops when mounting btrfs Thorsten Hirsch
@ 2019-03-21 23:23 ` Thorsten Hirsch
  2019-03-21 23:55   ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Thorsten Hirsch @ 2019-03-21 23:23 UTC (permalink / raw)
  To: linux-btrfs

Meanwhile I upgraded to kernel 5.0, but the problem remained the same. So I zeroed the log with btrfs rescue zero-log. The command was successful, however the problem still remains more or less the same, which means:

- I still get a kernel oops when mounting.
- The stack trace looks a bit different now, e.g. RIP 0010 is not null anymore, and there's another RIP code: 0033. Maybe "set_page_dirty" is interesting, too. But see for yourserlf below.

Any ideas?

Thorsten Hirsch

P.S.: btrfs check --readonly tells me that everything's fine. But it already did so before zeroing the log.


[   12.400224] BTRFS: device label Linux devid 1 transid 30104 /dev/nvme0n1p3
[   12.400523] BTRFS info (device nvme0n1p3): disk space caching is enabled
[   12.400524] BTRFS info (device nvme0n1p3): has skinny extents
[   12.427049] BTRFS info (device nvme0n1p3): enabling ssd optimizations
[   12.461362] init_special_inode: bogus i_mode (0) for inode nvme0n1p3:882
[   12.462844] init_special_inode: bogus i_mode (0) for inode nvme0n1p3:802
[   12.515489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[   12.515491] #PF error: [normal kernel read fault]
[   12.515492] PGD 0 P4D 0
[   12.515495] Oops: 0000 [#1] SMP NOPTI
[   12.515497] CPU: 8 PID: 487 Comm: exe Not tainted 5.0.0-7-generic #8-Ubuntu
[   12.515499] Hardware name: Gigabyte Technology Co., Ltd. AB350M-DS3H/AB350M-DS3H-CF, BIOS F24 12/25/2018
[   12.515504] RIP: 0010:__set_page_dirty_buffers+0x4c/0x100
[   12.515506] Code: 84 00 00 00 49 89 c5 4c 89 e7 e8 df 71 73 00 48 8b 03 f6 c4 20 74 28 48 8b 03 f6 c4 20 0f 84 ae 00 00 00 48 8b 43 28 48 89 c2 <48> 8b 0a 83 e1 02 75 04 f0 80 0a 02 48 8b 52 08 48 39 d0 75 eb 48
[   12.515507] RSP: 0018:ffffa9d2c28e7688 EFLAGS: 00010202
[   12.515509] RAX: 0000000000000001 RBX: ffffe34f45261300 RCX: 0000000000000000
[   12.515510] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8fc1b80b784c
[   12.515512] RBP: ffffa9d2c28e76a0 R08: 0000000000028200 R09: ffffffffc07af588
[   12.515513] R10: ffff8fc1cc594000 R11: ffff8fc1ba605ff0 R12: ffff8fc1b80b784c
[   12.515514] R13: ffff8fc1b80b77c8 R14: ffff8fc1bc255600 R15: 0000000000000040
[   12.515516] FS:  00007f2fafc505c0(0000) GS:ffff8fc1cec00000(0000) knlGS:0000000000000000
[   12.515517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.515518] CR2: 0000000000000001 CR3: 00000003fe150000 CR4: 00000000003406e0
[   12.515520] Call Trace:
[   12.515524]  set_page_dirty+0x5e/0xb0
[   12.515541]  btrfs_dirty_pages+0x11f/0x2a0 [btrfs]
[   12.515562]  ? io_ctl_set_crc+0x5e/0xf0 [btrfs]
[   12.515576]  __btrfs_write_out_cache+0x3aa/0x470 [btrfs]
[   12.515590]  btrfs_write_out_cache+0x9a/0xf0 [btrfs]
[   12.515600]  btrfs_write_dirty_block_groups+0x293/0x3a0 [btrfs]
[   12.515611]  ? btrfs_run_delayed_refs+0xa7/0x190 [btrfs]
[   12.515622]  commit_cowonly_roots+0x21a/0x2c0 [btrfs]
[   12.515634]  btrfs_commit_transaction+0x32e/0x8c0 [btrfs]
[   12.515647]  btrfs_recover_log_trees+0x392/0x430 [btrfs]
[   12.515660]  ? replay_dir_deletes+0x2a0/0x2a0 [btrfs]
[   12.515671]  open_ctree+0x1b2c/0x1bb0 [btrfs]
[   12.515681]  btrfs_mount_root+0x4ff/0x650 [btrfs]
[   12.515683]  ? cpumask_next+0x1b/0x20
[   12.515685]  ? pcpu_alloc+0x1bd/0x610
[   12.515687]  mount_fs+0x51/0x165
[   12.515689]  vfs_kern_mount.part.38+0x5d/0x110
[   12.515691]  vfs_kern_mount+0x13/0x20
[   12.515700]  btrfs_mount+0x16f/0x860 [btrfs]
[   12.515702]  ? pcpu_block_update_hint_alloc+0x1af/0x200
[   12.515703]  ? pcpu_alloc_area+0xed/0x140
[   12.515704]  ? pcpu_next_unpop+0x3c/0x50
[   12.515706]  ? cpumask_next+0x1b/0x20
[   12.515707]  ? pcpu_alloc+0x1bd/0x610
[   12.515708]  mount_fs+0x51/0x165
[   12.515709]  ? mount_fs+0x51/0x165
[   12.515711]  vfs_kern_mount.part.38+0x5d/0x110
[   12.515712]  do_mount+0x22f/0xd50
[   12.515714]  ? __check_object_size+0x166/0x192
[   12.515716]  ? memdup_user+0x4f/0x80
[   12.515717]  ksys_mount+0xb6/0xd0
[   12.515718]  __x64_sys_mount+0x25/0x30
[   12.515720]  do_syscall_64+0x5a/0x110
[   12.515722]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   12.515723] RIP: 0033:0x7f2fafb82aca
[   12.515725] Code: 48 8b 0d c9 53 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 53 0c 00 f7 d8 64 89 01 48
[   12.515726] RSP: 002b:00007ffde16c06c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[   12.515727] RAX: ffffffffffffffda RBX: 0000000000008401 RCX: 00007f2fafb82aca
[   12.515728] RDX: 00007ffde16c2dd5 RSI: 00007ffde16c2df8 RDI: 00007ffde16c2de9
[   12.515728] RBP: 00007f2fafc50540 R08: 0000000000000000 R09: 0000000000000000
[   12.515729] R10: 0000000000008401 R11: 0000000000000202 R12: 0000000000000000
[   12.515730] R13: 0000000000000000 R14: 00007ffde16c0938 R15: 0000000000000000
[   12.515731] Modules linked in: btrfs xor zstd_compress raid6_pq libcrc32c nls_iso8859_1 dm_mirror dm_region_hash dm_log uas usb_storage hid_generic usbhid hid amdgpu chash amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 i2c_piix4 ahci nvme realtek libahci nvme_core wmi gpio_amdpt gpio_generic
[   12.515742] CR2: 0000000000000001
[   12.515744] ---[ end trace 3e026a4bcaf67667 ]---
[   12.515745] RIP: 0010:__set_page_dirty_buffers+0x4c/0x100
[   12.515746] Code: 84 00 00 00 49 89 c5 4c 89 e7 e8 df 71 73 00 48 8b 03 f6 c4 20 74 28 48 8b 03 f6 c4 20 0f 84 ae 00 00 00 48 8b 43 28 48 89 c2 <48> 8b 0a 83 e1 02 75 04 f0 80 0a 02 48 8b 52 08 48 39 d0 75 eb 48
[   12.515747] RSP: 0018:ffffa9d2c28e7688 EFLAGS: 00010202
[   12.515748] RAX: 0000000000000001 RBX: ffffe34f45261300 RCX: 0000000000000000
[   12.515749] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8fc1b80b784c
[   12.515750] RBP: ffffa9d2c28e76a0 R08: 0000000000028200 R09: ffffffffc07af588
[   12.515750] R10: ffff8fc1cc594000 R11: ffff8fc1ba605ff0 R12: ffff8fc1b80b784c
[   12.515751] R13: ffff8fc1b80b77c8 R14: ffff8fc1bc255600 R15: 0000000000000040
[   12.515752] FS:  00007f2fafc505c0(0000) GS:ffff8fc1cec00000(0000) knlGS:0000000000000000
[   12.515753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.515754] CR2: 0000000000000001 CR3: 00000003fe150000 CR4: 00000000003406e0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: kernel oops when mounting btrfs
  2019-03-21 23:23 ` Fw: " Thorsten Hirsch
@ 2019-03-21 23:55   ` Qu Wenruo
  2019-03-23  8:29     ` Thorsten Hirsch
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2019-03-21 23:55 UTC (permalink / raw)
  To: Thorsten Hirsch, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 7232 bytes --]



On 2019/3/22 上午7:23, Thorsten Hirsch wrote:
> Meanwhile I upgraded to kernel 5.0, but the problem remained the same. So I zeroed the log with btrfs rescue zero-log. The command was successful, however the problem still remains more or less the same, which means:
> 
> - I still get a kernel oops when mounting.
> - The stack trace looks a bit different now, e.g. RIP 0010 is not null anymore, and there's another RIP code: 0033. Maybe "set_page_dirty" is interesting, too. But see for yourserlf below.
> 
> Any ideas?
> 
> Thorsten Hirsch
> 
> P.S.: btrfs check --readonly tells me that everything's fine. But it already did so before zeroing the log.
> 
> 
> [   12.400224] BTRFS: device label Linux devid 1 transid 30104 /dev/nvme0n1p3
> [   12.400523] BTRFS info (device nvme0n1p3): disk space caching is enabled
> [   12.400524] BTRFS info (device nvme0n1p3): has skinny extents
> [   12.427049] BTRFS info (device nvme0n1p3): enabling ssd optimizations
> [   12.461362] init_special_inode: bogus i_mode (0) for inode nvme0n1p3:882
> [   12.462844] init_special_inode: bogus i_mode (0) for inode nvme0n1p3:802
> [   12.515489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> [   12.515491] #PF error: [normal kernel read fault]
> [   12.515492] PGD 0 P4D 0
> [   12.515495] Oops: 0000 [#1] SMP NOPTI
> [   12.515497] CPU: 8 PID: 487 Comm: exe Not tainted 5.0.0-7-generic #8-Ubuntu
> [   12.515499] Hardware name: Gigabyte Technology Co., Ltd. AB350M-DS3H/AB350M-DS3H-CF, BIOS F24 12/25/2018
> [   12.515504] RIP: 0010:__set_page_dirty_buffers+0x4c/0x100
> [   12.515506] Code: 84 00 00 00 49 89 c5 4c 89 e7 e8 df 71 73 00 48 8b 03 f6 c4 20 74 28 48 8b 03 f6 c4 20 0f 84 ae 00 00 00 48 8b 43 28 48 89 c2 <48> 8b 0a 83 e1 02 75 04 f0 80 0a 02 48 8b 52 08 48 39 d0 75 eb 48
> [   12.515507] RSP: 0018:ffffa9d2c28e7688 EFLAGS: 00010202
> [   12.515509] RAX: 0000000000000001 RBX: ffffe34f45261300 RCX: 0000000000000000
> [   12.515510] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8fc1b80b784c
> [   12.515512] RBP: ffffa9d2c28e76a0 R08: 0000000000028200 R09: ffffffffc07af588
> [   12.515513] R10: ffff8fc1cc594000 R11: ffff8fc1ba605ff0 R12: ffff8fc1b80b784c
> [   12.515514] R13: ffff8fc1b80b77c8 R14: ffff8fc1bc255600 R15: 0000000000000040
> [   12.515516] FS:  00007f2fafc505c0(0000) GS:ffff8fc1cec00000(0000) knlGS:0000000000000000
> [   12.515517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.515518] CR2: 0000000000000001 CR3: 00000003fe150000 CR4: 00000000003406e0
> [   12.515520] Call Trace:
> [   12.515524]  set_page_dirty+0x5e/0xb0
> [   12.515541]  btrfs_dirty_pages+0x11f/0x2a0 [btrfs]

It's free space cache write out causing the problem.

Btrfs check would not report cache generation mismatch as a problem.

Considering the previous output about bogous inode mode, it maybe some
bug in the free space cache inode mode.

Would you please dump the root tree (contains no file name except
subvolume name, feel free to mask them) by the following command:
 # btrfs ins dump-tree -t root /dev/nvme0n1p3

Then you can try to completely wipe out the free space cache by the
following command:
 # btrfs check --clear-space-cache v1

If the above command successes, try mount again, and it should work.

Thanks,
Qu


> [   12.515562]  ? io_ctl_set_crc+0x5e/0xf0 [btrfs]
> [   12.515576]  __btrfs_write_out_cache+0x3aa/0x470 [btrfs]
> [   12.515590]  btrfs_write_out_cache+0x9a/0xf0 [btrfs]
> [   12.515600]  btrfs_write_dirty_block_groups+0x293/0x3a0 [btrfs]
> [   12.515611]  ? btrfs_run_delayed_refs+0xa7/0x190 [btrfs]
> [   12.515622]  commit_cowonly_roots+0x21a/0x2c0 [btrfs]
> [   12.515634]  btrfs_commit_transaction+0x32e/0x8c0 [btrfs]
> [   12.515647]  btrfs_recover_log_trees+0x392/0x430 [btrfs]
> [   12.515660]  ? replay_dir_deletes+0x2a0/0x2a0 [btrfs]
> [   12.515671]  open_ctree+0x1b2c/0x1bb0 [btrfs]
> [   12.515681]  btrfs_mount_root+0x4ff/0x650 [btrfs]
> [   12.515683]  ? cpumask_next+0x1b/0x20
> [   12.515685]  ? pcpu_alloc+0x1bd/0x610
> [   12.515687]  mount_fs+0x51/0x165
> [   12.515689]  vfs_kern_mount.part.38+0x5d/0x110
> [   12.515691]  vfs_kern_mount+0x13/0x20
> [   12.515700]  btrfs_mount+0x16f/0x860 [btrfs]
> [   12.515702]  ? pcpu_block_update_hint_alloc+0x1af/0x200
> [   12.515703]  ? pcpu_alloc_area+0xed/0x140
> [   12.515704]  ? pcpu_next_unpop+0x3c/0x50
> [   12.515706]  ? cpumask_next+0x1b/0x20
> [   12.515707]  ? pcpu_alloc+0x1bd/0x610
> [   12.515708]  mount_fs+0x51/0x165
> [   12.515709]  ? mount_fs+0x51/0x165
> [   12.515711]  vfs_kern_mount.part.38+0x5d/0x110
> [   12.515712]  do_mount+0x22f/0xd50
> [   12.515714]  ? __check_object_size+0x166/0x192
> [   12.515716]  ? memdup_user+0x4f/0x80
> [   12.515717]  ksys_mount+0xb6/0xd0
> [   12.515718]  __x64_sys_mount+0x25/0x30
> [   12.515720]  do_syscall_64+0x5a/0x110
> [   12.515722]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   12.515723] RIP: 0033:0x7f2fafb82aca
> [   12.515725] Code: 48 8b 0d c9 53 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 53 0c 00 f7 d8 64 89 01 48
> [   12.515726] RSP: 002b:00007ffde16c06c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> [   12.515727] RAX: ffffffffffffffda RBX: 0000000000008401 RCX: 00007f2fafb82aca
> [   12.515728] RDX: 00007ffde16c2dd5 RSI: 00007ffde16c2df8 RDI: 00007ffde16c2de9
> [   12.515728] RBP: 00007f2fafc50540 R08: 0000000000000000 R09: 0000000000000000
> [   12.515729] R10: 0000000000008401 R11: 0000000000000202 R12: 0000000000000000
> [   12.515730] R13: 0000000000000000 R14: 00007ffde16c0938 R15: 0000000000000000
> [   12.515731] Modules linked in: btrfs xor zstd_compress raid6_pq libcrc32c nls_iso8859_1 dm_mirror dm_region_hash dm_log uas usb_storage hid_generic usbhid hid amdgpu chash amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 i2c_piix4 ahci nvme realtek libahci nvme_core wmi gpio_amdpt gpio_generic
> [   12.515742] CR2: 0000000000000001
> [   12.515744] ---[ end trace 3e026a4bcaf67667 ]---
> [   12.515745] RIP: 0010:__set_page_dirty_buffers+0x4c/0x100
> [   12.515746] Code: 84 00 00 00 49 89 c5 4c 89 e7 e8 df 71 73 00 48 8b 03 f6 c4 20 74 28 48 8b 03 f6 c4 20 0f 84 ae 00 00 00 48 8b 43 28 48 89 c2 <48> 8b 0a 83 e1 02 75 04 f0 80 0a 02 48 8b 52 08 48 39 d0 75 eb 48
> [   12.515747] RSP: 0018:ffffa9d2c28e7688 EFLAGS: 00010202
> [   12.515748] RAX: 0000000000000001 RBX: ffffe34f45261300 RCX: 0000000000000000
> [   12.515749] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8fc1b80b784c
> [   12.515750] RBP: ffffa9d2c28e76a0 R08: 0000000000028200 R09: ffffffffc07af588
> [   12.515750] R10: ffff8fc1cc594000 R11: ffff8fc1ba605ff0 R12: ffff8fc1b80b784c
> [   12.515751] R13: ffff8fc1b80b77c8 R14: ffff8fc1bc255600 R15: 0000000000000040
> [   12.515752] FS:  00007f2fafc505c0(0000) GS:ffff8fc1cec00000(0000) knlGS:0000000000000000
> [   12.515753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.515754] CR2: 0000000000000001 CR3: 00000003fe150000 CR4: 00000000003406e0
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: kernel oops when mounting btrfs
  2019-03-21 23:55   ` Qu Wenruo
@ 2019-03-23  8:29     ` Thorsten Hirsch
       [not found]       ` <CAH+WbHyQTxbz5inf31AgmAG1BvVWyTAExn3Nf_qvvEQWzZ0tHA@mail.gmail.com>
  2019-03-24 16:52       ` Chris Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Thorsten Hirsch @ 2019-03-23  8:29 UTC (permalink / raw)
  To: linux-btrfs

Hi Qu,

thank you, but unfortunately that didn't work out so well. The tree
dump was no problem [1], but clearing the space cache resulted in a
core dump. Now btrfs check --readonly reports some errors. I attached
the output of these commands.

Thorsten

[1] https://gist.github.com/thorstenhirsch/65d4308ce54729c902cb09c0d4ad2baf

# btrfs check --clear-space-cache v1 /dev/nvme0n1p3
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p3
UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
Failed to find [544448348160, 168, 16384]
btrfs unable to find ref byte nr 544448364544 parent 0 root 2  owner 0 offset 0
transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
btrfs(+0x3be68)[0x556936269e68]
btrfs(btrfs_commit_transaction+0x12a)[0x55693626a2ec]
btrfs(btrfs_clear_free_space_cache+0x32a)[0x55693625fecf]
btrfs(+0x4be5b)[0x556936279e5b]
btrfs(cmd_check+0x5c2)[0x556936284d86]
btrfs(main+0x1f6)[0x556936241ef6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fb9a7911b6b]
btrfs(_start+0x2a)[0x556936241f3a]
Aborted (core dumped)


# btrfs check --readonly /dev/nvme0n1p3
Opening filesystem to check...
parent transid verify failed on 419860414464 wanted 30188 found 30105
parent transid verify failed on 419860414464 wanted 30188 found 30105
Ignoring transid failure
Checking filesystem on /dev/nvme0n1p3
UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
[1/7] checking root items
[2/7] checking extents
ref mismatch on [544448348160 16384] extent item 1, found 0
backref 544448348160 root 2 not referenced back 0x563ce432f010
incorrect global backref count on 544448348160 found 1 wanted 0
backpointer mismatch on [544448348160 16384]
owner ref check failed [544448348160 16384]
ref mismatch on [544448364544 16384] extent item 0, found 1
tree backref 544448364544 parent 2 root 2 not found in extent tree
backpointer mismatch on [544448364544 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
cache and super generation don't match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
ERROR: transid errors in file system
found 275464060928 bytes used, error(s) found
total csum bytes: 266882612
total tree bytes: 1513570304
total fs tree bytes: 1135788032
total extent tree bytes: 73220096
btree space waste bytes: 236694654
file data blocks allocated: 1962517999616
 referenced 221128466432

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: Fw: kernel oops when mounting btrfs
       [not found]         ` <4dc0ed1a-3f90-56a2-b526-c26cb78d7932@gmx.com>
@ 2019-03-24 10:49           ` Thorsten Hirsch
  2019-03-24 11:24             ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Thorsten Hirsch @ 2019-03-24 10:49 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hi Qu,

thank you once again for your advice. I could indeed recover all my
data, even the snapshots docker had created. Everything's working as
if nothing had ever happened. Here's what I've did in the end:

btrfs recover <src> <dest> worked flawless, but only recovered some data.
mount -o ro,notreelog,nologreplay was the only way to mount the broken
partition and it showed me a lot more data than btrfs recover could
recover. However when trying to access these additional files I had
input/output errors.

btrfs recover -sxmS <src> <dest> was the magic command that recovered
all my data (which I could "cp -a" back to my device after creating a
new btrfs file system). After reading the help of btrfs-recover it's
obvious that the arguments are required, but in the btrfs wiki it says
"If you're really lucky, this might be enough"[1] describing the
command w/o arguments. I think this is misleading. The arguments are
always necessary if you want to recover all your data. Well, at least
I think the wiki page makes mores sense if the arguments were
included.

If there's anything I can provide to help you improve btrfs or its
recovery tools please don't hesitate to ask. Although I don't have an
image of the broken partition, at least I still have the core dump of
"btrfs check --clear-space-cache v1".

[1] https://btrfs.wiki.kernel.org/index.php/Restore

Thorsten Hirsch

P.S.: btrfs check --repair was of no use. It crashed almost
immediately. I tried it only after recovering all my data, to see if
it would've helped as well.

Am Sa., 23. März 2019 um 14:57 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>
>
>
> On 2019/3/23 下午6:48, Thorsten Hirsch wrote:
> > Hi Qu,
> >
> > sorry for this direct reply. I've been trying to answer to the mailing
> > list since yesterday, but my mails seem to get dropped. So please see
> > my answer to your mail enclosed.
> >
> > Thorsten
> >
> >
> > ---------- Forwarded message ---------
> > From: Thorsten Hirsch <t.hirsch@web.de>
> > Date: Sa., 23. März 2019 um 09:29 Uhr
> > Subject: Re: Fw: kernel oops when mounting btrfs
> > To: <linux-btrfs@vger.kernel.org>
> >
> >
> > Hi Qu,
> >
> > thank you, but unfortunately that didn't work out so well. The tree
> > dump was no problem [1], but clearing the space cache resulted in a
> > core dump. Now btrfs check --readonly reports some errors. I attached
> > the output of these commands.
> >
> > Thorsten
> >
> > [1] https://gist.github.com/thorstenhirsch/65d4308ce54729c902cb09c0d4ad2baf
>
> This explains why a lot of things doesn't go correct.
>
> The inode item of your free space cache tree is wrong.
> According to my experimental with latest kernel, it looks like some
> older kernel is the culprit.
>
> Your free space cache inode lacks the correct mode.
> Normally the mode should be 0100600. But your fs only has 0, and kernel
> panics for that reason.
>
> >
> > # btrfs check --clear-space-cache v1 /dev/nvme0n1p3
> > Opening filesystem to check...
> > Checking filesystem on /dev/nvme0n1p3
> > UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
> > Failed to find [544448348160, 168, 16384]
>
> Then this means something bad happened in extent tree.
>
> > btrfs unable to find ref byte nr 544448364544 parent 0 root 2  owner 0 offset 0
> > transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
> > btrfs(+0x3be68)[0x556936269e68]
> > btrfs(btrfs_commit_transaction+0x12a)[0x55693626a2ec]
> > btrfs(btrfs_clear_free_space_cache+0x32a)[0x55693625fecf]
> > btrfs(+0x4be5b)[0x556936279e5b]
> > btrfs(cmd_check+0x5c2)[0x556936284d86]
> > btrfs(main+0x1f6)[0x556936241ef6]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fb9a7911b6b]
> > btrfs(_start+0x2a)[0x556936241f3a]
> > Aborted (core dumped)
> >
> >
> > # btrfs check --readonly /dev/nvme0n1p3
> > Opening filesystem to check...
> > parent transid verify failed on 419860414464 wanted 30188 found 30105
> > parent transid verify failed on 419860414464 wanted 30188 found 30105
>
> So extent tree get corrupted in that repair attempt, which looks pretty
> strange, as aborted transaction shouldn't cause any impact on the
> existing fs.
>
> I'm afraid you can only try btrfs check --repair.
>
> If no good result, then I'm afraid you have to go to salvage the data,
> which I believe over 99% of your data should be safe.
>
> To salvage the data, either use btrfs-restore, or you my experimental
> 'skip_bg' kernel patches:
> https://github.com/adam900710/linux/tree/rescue_options
>
> The 'skip_bg' kernel patches introduce a new mount option,
> 'ro,rescue=skip_bg', which can skip the whole (corrupted) extent tree,
> and since you have all trees consistent but extent tree, you have all
> the readonly btrfs features, like subvolume list, csum check.
>
> Thanks,
> Qu
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: Fw: kernel oops when mounting btrfs
  2019-03-24 10:49           ` Fwd: " Thorsten Hirsch
@ 2019-03-24 11:24             ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2019-03-24 11:24 UTC (permalink / raw)
  To: Thorsten Hirsch, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5425 bytes --]



On 2019/3/24 下午6:49, Thorsten Hirsch wrote:
> Hi Qu,
> 
> thank you once again for your advice. I could indeed recover all my
> data, even the snapshots docker had created. Everything's working as
> if nothing had ever happened. Here's what I've did in the end:
> 
> btrfs recover <src> <dest> worked flawless, but only recovered some data.
> mount -o ro,notreelog,nologreplay was the only way to mount the broken
> partition and it showed me a lot more data than btrfs recover could
> recover. However when trying to access these additional files I had
> input/output errors.

This means some csum tree got corrupted.

I have seen several reports about csum and extent tree corruption, so
it's quite possible.

> 
> btrfs recover -sxmS <src> <dest> was the magic command that recovered
> all my data (which I could "cp -a" back to my device after creating a
> new btrfs file system). After reading the help of btrfs-recover it's
> obvious that the arguments are required, but in the btrfs wiki it says
> "If you're really lucky, this might be enough"[1] describing the
> command w/o arguments. I think this is misleading. The arguments are
> always necessary if you want to recover all your data. Well, at least
> I think the wiki page makes mores sense if the arguments were
> included.

After looking into the man page, I strongly believe that file
owner/mode/symlink related things should be the default value.

At least we should enhance either the manpage or btrfs-restore.

Thanks,
Qu

> 
> If there's anything I can provide to help you improve btrfs or its
> recovery tools please don't hesitate to ask. Although I don't have an
> image of the broken partition, at least I still have the core dump of
> "btrfs check --clear-space-cache v1".
> 
> [1] https://btrfs.wiki.kernel.org/index.php/Restore
> 
> Thorsten Hirsch
> 
> P.S.: btrfs check --repair was of no use. It crashed almost
> immediately. I tried it only after recovering all my data, to see if
> it would've helped as well.
> 
> Am Sa., 23. März 2019 um 14:57 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>
>>
>>
>> On 2019/3/23 下午6:48, Thorsten Hirsch wrote:
>>> Hi Qu,
>>>
>>> sorry for this direct reply. I've been trying to answer to the mailing
>>> list since yesterday, but my mails seem to get dropped. So please see
>>> my answer to your mail enclosed.
>>>
>>> Thorsten
>>>
>>>
>>> ---------- Forwarded message ---------
>>> From: Thorsten Hirsch <t.hirsch@web.de>
>>> Date: Sa., 23. März 2019 um 09:29 Uhr
>>> Subject: Re: Fw: kernel oops when mounting btrfs
>>> To: <linux-btrfs@vger.kernel.org>
>>>
>>>
>>> Hi Qu,
>>>
>>> thank you, but unfortunately that didn't work out so well. The tree
>>> dump was no problem [1], but clearing the space cache resulted in a
>>> core dump. Now btrfs check --readonly reports some errors. I attached
>>> the output of these commands.
>>>
>>> Thorsten
>>>
>>> [1] https://gist.github.com/thorstenhirsch/65d4308ce54729c902cb09c0d4ad2baf
>>
>> This explains why a lot of things doesn't go correct.
>>
>> The inode item of your free space cache tree is wrong.
>> According to my experimental with latest kernel, it looks like some
>> older kernel is the culprit.
>>
>> Your free space cache inode lacks the correct mode.
>> Normally the mode should be 0100600. But your fs only has 0, and kernel
>> panics for that reason.
>>
>>>
>>> # btrfs check --clear-space-cache v1 /dev/nvme0n1p3
>>> Opening filesystem to check...
>>> Checking filesystem on /dev/nvme0n1p3
>>> UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
>>> Failed to find [544448348160, 168, 16384]
>>
>> Then this means something bad happened in extent tree.
>>
>>> btrfs unable to find ref byte nr 544448364544 parent 0 root 2  owner 0 offset 0
>>> transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
>>> btrfs(+0x3be68)[0x556936269e68]
>>> btrfs(btrfs_commit_transaction+0x12a)[0x55693626a2ec]
>>> btrfs(btrfs_clear_free_space_cache+0x32a)[0x55693625fecf]
>>> btrfs(+0x4be5b)[0x556936279e5b]
>>> btrfs(cmd_check+0x5c2)[0x556936284d86]
>>> btrfs(main+0x1f6)[0x556936241ef6]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fb9a7911b6b]
>>> btrfs(_start+0x2a)[0x556936241f3a]
>>> Aborted (core dumped)
>>>
>>>
>>> # btrfs check --readonly /dev/nvme0n1p3
>>> Opening filesystem to check...
>>> parent transid verify failed on 419860414464 wanted 30188 found 30105
>>> parent transid verify failed on 419860414464 wanted 30188 found 30105
>>
>> So extent tree get corrupted in that repair attempt, which looks pretty
>> strange, as aborted transaction shouldn't cause any impact on the
>> existing fs.
>>
>> I'm afraid you can only try btrfs check --repair.
>>
>> If no good result, then I'm afraid you have to go to salvage the data,
>> which I believe over 99% of your data should be safe.
>>
>> To salvage the data, either use btrfs-restore, or you my experimental
>> 'skip_bg' kernel patches:
>> https://github.com/adam900710/linux/tree/rescue_options
>>
>> The 'skip_bg' kernel patches introduce a new mount option,
>> 'ro,rescue=skip_bg', which can skip the whole (corrupted) extent tree,
>> and since you have all trees consistent but extent tree, you have all
>> the readonly btrfs features, like subvolume list, csum check.
>>
>> Thanks,
>> Qu
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: kernel oops when mounting btrfs
  2019-03-23  8:29     ` Thorsten Hirsch
       [not found]       ` <CAH+WbHyQTxbz5inf31AgmAG1BvVWyTAExn3Nf_qvvEQWzZ0tHA@mail.gmail.com>
@ 2019-03-24 16:52       ` Chris Murphy
  2019-03-25  4:23         ` Qu Wenruo
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2019-03-24 16:52 UTC (permalink / raw)
  To: Thorsten Hirsch, Qu Wenruo; +Cc: Btrfs BTRFS

On Sat, Mar 23, 2019 at 2:30 AM Thorsten Hirsch <t.hirsch@web.de> wrote:
>
> Hi Qu,
>
> thank you, but unfortunately that didn't work out so well. The tree
> dump was no problem [1], but clearing the space cache resulted in a
> core dump. Now btrfs check --readonly reports some errors. I attached
> the output of these commands.
>
> Thorsten
>
> [1] https://gist.github.com/thorstenhirsch/65d4308ce54729c902cb09c0d4ad2baf
>
> # btrfs check --clear-space-cache v1 /dev/nvme0n1p3
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p3
> UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
> Failed to find [544448348160, 168, 16384]
> btrfs unable to find ref byte nr 544448364544 parent 0 root 2  owner 0 offset 0
> transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
> btrfs(+0x3be68)[0x556936269e68]
> btrfs(btrfs_commit_transaction+0x12a)[0x55693626a2ec]
> btrfs(btrfs_clear_free_space_cache+0x32a)[0x55693625fecf]
> btrfs(+0x4be5b)[0x556936279e5b]
> btrfs(cmd_check+0x5c2)[0x556936284d86]
> btrfs(main+0x1f6)[0x556936241ef6]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fb9a7911b6b]
> btrfs(_start+0x2a)[0x556936241f3a]
> Aborted (core dumped)
>
>
> # btrfs check --readonly /dev/nvme0n1p3
> Opening filesystem to check...
> parent transid verify failed on 419860414464 wanted 30188 found 30105
> parent transid verify failed on 419860414464 wanted 30188 found 30105
> Ignoring transid failure
> Checking filesystem on /dev/nvme0n1p3
> UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
> [1/7] checking root items
> [2/7] checking extents
> ref mismatch on [544448348160 16384] extent item 1, found 0
> backref 544448348160 root 2 not referenced back 0x563ce432f010
> incorrect global backref count on 544448348160 found 1 wanted 0
> backpointer mismatch on [544448348160 16384]
> owner ref check failed [544448348160 16384]
> ref mismatch on [544448364544 16384] extent item 0, found 1
> tree backref 544448364544 parent 2 root 2 not found in extent tree
> backpointer mismatch on [544448364544 16384]
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache
> cache and super generation don't match, space cache will be invalidated
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> ERROR: transid errors in file system
> found 275464060928 bytes used, error(s) found
> total csum bytes: 266882612
> total tree bytes: 1513570304
> total fs tree bytes: 1135788032
> total extent tree bytes: 73220096
> btree space waste bytes: 236694654
> file data blocks allocated: 1962517999616
>  referenced 221128466432

This looks like the same problem I reported earlier this month, and
also filed a bug for at
https://bugzilla.kernel.org/show_bug.cgi?id=202717

In my case I did a scrub and check before clearing space cache v1. No
problems reported. And then clearing space cache v1 crashed. And then
check reports corruption.

Bug 1, for sure btrfs check clear cache crashing is a bug
Bug 2, btrfs check appears to do non-COW overwrite of the extent tree,
which might be fine as long as it doesn't crash but it seems risky
considering how fragile the extent tree is anyway

Clear cache right now is not fail safe near as I can tell. It can make
an error free file system corrupt.

If there's some problem already, before clearing space cache, that
means more bugs:

Bug 3, btrfs check doesn't find the problem, reports no errors
Bug 4, btrfs kernel code doesn't find the problem during scrub,
reports no errors


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: kernel oops when mounting btrfs
  2019-03-24 16:52       ` Chris Murphy
@ 2019-03-25  4:23         ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2019-03-25  4:23 UTC (permalink / raw)
  To: Chris Murphy, Thorsten Hirsch; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1348 bytes --]



[snip]
> 
> This looks like the same problem I reported earlier this month, and
> also filed a bug for at
> https://bugzilla.kernel.org/show_bug.cgi?id=202717
> 
> In my case I did a scrub and check before clearing space cache v1. No
> problems reported. And then clearing space cache v1 crashed. And then
> check reports corruption.
> 
> Bug 1, for sure btrfs check clear cache crashing is a bug
> Bug 2, btrfs check appears to do non-COW overwrite of the extent tree,
> which might be fine as long as it doesn't crash but it seems risky
> considering how fragile the extent tree is anyway

In fact I think this is a bigger problem.

This breaks the basic metadata CoW, so I'm going to look into this bug.

We have enough tool to investigate this problem, but I'm afraid it may
need certain images to trigger.

If anyone is trying to try clear space cache, please try to take a
binary dump and provide it (if small enough).

Thanks,
Qu

> 
> Clear cache right now is not fail safe near as I can tell. It can make
> an error free file system corrupt.
> 
> If there's some problem already, before clearing space cache, that
> means more bugs:
> 
> Bug 3, btrfs check doesn't find the problem, reports no errors
> Bug 4, btrfs kernel code doesn't find the problem during scrub,
> reports no errors
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-03-25  4:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 18:56 kernel oops when mounting btrfs Thorsten Hirsch
2019-03-21 23:23 ` Fw: " Thorsten Hirsch
2019-03-21 23:55   ` Qu Wenruo
2019-03-23  8:29     ` Thorsten Hirsch
     [not found]       ` <CAH+WbHyQTxbz5inf31AgmAG1BvVWyTAExn3Nf_qvvEQWzZ0tHA@mail.gmail.com>
     [not found]         ` <4dc0ed1a-3f90-56a2-b526-c26cb78d7932@gmx.com>
2019-03-24 10:49           ` Fwd: " Thorsten Hirsch
2019-03-24 11:24             ` Qu Wenruo
2019-03-24 16:52       ` Chris Murphy
2019-03-25  4:23         ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.