All of lore.kernel.org
 help / color / mirror / Atom feed
* "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
@ 2018-05-12  5:08 james harvey
  2018-05-12  7:51 ` Martin Steigerwald
  2018-05-14  2:08 ` Qu Wenruo
  0 siblings, 2 replies; 14+ messages in thread
From: james harvey @ 2018-05-12  5:08 UTC (permalink / raw)
  To: linux-btrfs

100% reproducible, booting from disk, or even Arch installation ISO.
Kernel 4.16.7.  btrfs-progs v4.16.

Reading one of two journalctl files causes a kernel oops.  Initially
ran into it from "journalctl --list-boots", but cat'ing the file does
it too.  I believe this shows there's compressed data that is invalid,
but its btrfs checksum is invalid.  I've cat'ed every file on the
disk, and luckily have the problems narrowed down to only these 2
files in /var/log/journal.

This volume has always been mounted with lzo compression.

scrub has never found anything, and have ran it since the oops.

Found a user a few years ago who also ran into this, without
resolution, at:
https://www.spinics.net/lists/linux-btrfs/msg52218.html

1. Cat'ing a (non-essential) file shouldn't be able to bring down the system.

2. If this is infact invalid compressed data, there should be a way to
check for that.  Btrfs check and scrub pass.

Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
other files.



[  381.869940] BUG: unable to handle kernel paging request at 0000000000390e50
[  381.870881] BTRFS: decompress failed
[  381.891775] IP: rebalance_domains+0x8a/0x2c0
[  381.891776] PGD 0 P4D 0
[  381.891780] Oops: 0000 [#1] PREEMPT SMP PTI
[  381.891782] Modules linked in:
[  381.891784] BTRFS: decompress failed
[  381.891784]  8021q mrp wl(PO) btrfs dm_thin_pool ast
[  381.891788] BTRFS: decompress failed
[  381.891789]  dm_persistent_data dm_bio_prison dm_bufio libcrc32c
i2c_algo_bit crc32c_generic intel_rapl ttm sb_edac zstd_compress
drm_kms_helper xor x86_pkg_temp_thermal intel_powerclamp drm raid6_pq
raid1 agpgart coretemp md_mod cfg80211 syscopyarea sysfillrect
kvm_intel dm_mod sysimgblt kvm fb_sys_fops joydev irqbypass rfkill
iTCO_wdt iTCO_vendor_support crct10dif_pclmul ghash_clmulni_intel
ipmi_ssif rtc_cmos ipmi_si intel_cstate mei_me ipmi_devintf
intel_uncore ipmi_msghandler shpchp pcspkr mousedev input_leds
led_class psmouse intel_rapl_perf lpc_ich mei i2c_i801 evdev mac_hid
ip_tables x_tables overlay squashfs zstd_decompress xxhash loop isofs
sr_mod cdrom sd_mod
[  381.891835] BTRFS: decompress failed
[  381.891835]  hid_generic usbhid hid uas usb_storage
[  381.891838] BTRFS: decompress failed
[  381.891838]  serio_raw atkbd libps2 crc32_pclmul
[  381.891840] BTRFS: decompress failed
[  381.891841]  crc32c_intel isci ahci aesni_intel
[  381.891843] BTRFS: decompress failed
[  381.891843]  aes_x86_64 libsas libahci crypto_simd
[  381.891845] BTRFS: decompress failed
[  381.891845]  ehci_pci ehci_hcd cryptd glue_helper
[  381.891847] BTRFS: decompress failed
[  381.891847]  libata scsi_transport_sas e1000e mlx4_core usbcore ptp
pps_core scsi_mod usb_common devlink wmi i8042 serio
[  381.891855] CPU: 11 PID: 0 Comm: swapper/11 Tainted: P           O
   4.16.7-1-ARCH #1
[  381.891856] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./EP2C602, BIOS P1.80 12/09/2013
[  381.891858] RIP: 0010:rebalance_domains+0x8a/0x2c0
[  381.891859] RSP: 0018:ffff8e6c5f2c3f08 EFLAGS: 00010206
[  381.891860] RAX: 0000000000000000 RBX: 0000000000390de8 RCX: 0000000000000005
[  381.891861] RDX: 0000000100005ff2 RSI: 000000000000024d RDI: 0000001333333340
[  381.891862] RBP: 0000000100005ff4 R08: 0000000000000000 R09: 0000000000000001
[  381.891863] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  381.891863] R13: 0000000000000000 R14: 0000000000000001 R15: 00bd7801f8e8a9c8
[  381.891865] FS:  0000000000000000(0000) GS:ffff8e6c5f2c0000(0000)
knlGS:0000000000000000
[  381.891865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  381.891866] CR2: 0000000000390e50 CR3: 0000000e6100a004 CR4: 00000000000606e0
[  381.891867] Call Trace:
[  381.891870]  <IRQ>
[  381.891875]  __do_softirq+0xf1/0x2e0
[  381.891880]  irq_exit+0xc9/0xe0
[  381.903429] BTRFS: decompress failed
[  381.916574]  smp_apic_timer_interrupt+0x73/0x160
[  381.916576]  apic_timer_interrupt+0xf/0x20
[  381.916578]  </IRQ>
[  381.916581] RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
[  381.916582] RSP: 0018:ffff939f863fbea8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff12
[  381.916583] RAX: ffff8e6c5f2c0000 RBX: 00000058e9388d6f RCX: 000000000000001f
[  381.916584] RDX: 00000058e9388d6f RSI: ffffffff96e70d54 RDI: ffffffff96e70fb2
[  381.916585] RBP: ffff8e6c5f2ebe00 R08: 000002044b2e9556 R09: 000000000000337b
[  381.916585] R10: 000000000000471b R11: ffff8e6c5f2e07c4 R12: 0000000000000003
[  381.916586] R13: ffffffff970ae338 R14: 00000058e9215560 R15: 0000000000000000
[  381.916591]  ? cpuidle_enter_state+0x94/0x2e0
[  381.916593]  do_idle+0x193/0x1b0
[  381.916595]  cpu_startup_entry+0x6f/0x80
[  381.916599]  start_secondary+0x1a5/0x200
[  381.916602]  secondary_startup_64+0xa5/0xb0
[  381.916603] Code: 46 00 00 48 03 04 d5 40 f4
[  381.924842] BTRFS: decompress failed
[  381.937936] ee 96 48 8b 98 c0 09 00 00 48 85 db 0f 84 32 02 00 00
45 31 ff 45 31 f6 45 31 e4 48 8b 15 e6 aa f4 00 <48> 8b 43 68 48 39 53
70 79 2e 48 89 c2 41 be 01 00 00 00 48 c1
[  381.937956] RIP: rebalance_domains+0x8a/0x2c0 RSP: ffff8e6c5f2c3f08
[  381.937957] CR2: 0000000000390e50
[  381.937976] ---[ end trace 5c92a4d96e28b9b3 ]---
[  381.937977] BUG: unable to handle kernel paging request at 0000000000395370
[  381.937982] IP: load_balance+0x71e/0x9b0
[  381.937982] PGD 0 P4D 0
[  381.937985] Oops: 0000 [#2] PREEMPT SMP PTI
[  381.937986] Modules linked in: 8021q mrp wl(PO) btrfs dm_thin_pool
ast dm_persistent_data dm_bio_prison dm_bufio libcrc32c i2c_algo_bit
crc32c_generic intel_rapl ttm sb_edac zstd_compress drm_kms_helper xor
x86_pkg_temp_thermal intel_powerclamp drm raid6_pq raid1 agpgart
coretemp md_mod cfg80211 syscopyarea sysfillrect kvm_intel dm_mod
sysimgblt kvm fb_sys_fops joydev irqbypass rfkill iTCO_wdt
iTCO_vendor_support crct10dif_pclmul ghash_clmulni_intel ipmi_ssif
rtc_cmos ipmi_si intel_cstate mei_me ipmi_devintf intel_uncore
ipmi_msghandler shpchp pcspkr mousedev input_leds led_class psmouse
intel_rapl_perf lpc_ich mei i2c_i801 evdev mac_hid ip_tables x_tables
overlay squashfs zstd_decompress xxhash loop isofs sr_mod cdrom sd_mod
hid_generic usbhid hid uas usb_storage serio_raw atkbd libps2
crc32_pclmul
[  381.938018]  crc32c_intel isci ahci aesni_intel aes_x86_64 libsas
libahci crypto_simd ehci_pci ehci_hcd cryptd glue_helper libata
scsi_transport_sas e1000e mlx4_core usbcore ptp pps_core scsi_mod
usb_common devlink wmi i8042 serio
[  381.938029] CPU: 10 PID: 0 Comm: swapper/10 Tainted: P      D    O
   4.16.7-1-ARCH #1
[  381.938030] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./EP2C602, BIOS P1.80 12/09/2013
[  381.938032] RIP: 0010:load_balance+0x71e/0x9b0
[  381.938033] RSP: 0018:ffff8e6c5f283e18 EFLAGS: 00010286
[  381.938034] RAX: 0000000000395360 RBX: ffff8e6c5b0e6a20 RCX: 0000000000000000
[  381.938035] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000012
[  381.938035] RBP: 000000000000000a R08: 0000000000000005 R09: 0000000000000001
[  381.938036] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  381.938037] R13: 0000000000000000 R14: ffff8e6c5b10e600 R15: 000000000000000a
[  381.938038] FS:  0000000000000000(0000) GS:ffff8e6c5f280000(0000)
knlGS:0000000000000000
[  381.938039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  381.938040] CR2: 0000000000395370 CR3: 0000000e6100a004 CR4: 00000000000606e0
[  381.938041] Call Trace:
[  381.938042]  <IRQ>
[  381.938045]  ? core_get_scaling+0x10/0x10
[  381.938046]  ? intel_pstate_update_pstate+0x2c/0x30
[  381.938049]  rebalance_domains+0x194/0x2c0
[  381.938052]  __do_softirq+0xf1/0x2e0
[  381.938054]  irq_exit+0xc9/0xe0
[  381.938056]  smp_apic_timer_interrupt+0x73/0x160
[  381.938058]  apic_timer_interrupt+0xf/0x20
[  381.938059]  </IRQ>
[  381.938061] RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
[  381.938061] RSP: 0018:ffff939f863f3ea8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff12
[  381.938063] RAX: ffff8e6c5f280000 RBX: 00000058ea69b446 RCX: 000000000000001f
[  381.938064] RDX: 00000058ea69b446 RSI: ffffffff96e70d54 RDI: ffffffff96e70fb2
[  381.938065] RBP: ffff8e6c5f2abe00 R08: 000002044ea385d2 R09: 0000000000000ef6
[  381.938065] R10: 0000000000001d66 R11: ffff8e6c5f2a07c4 R12: 0000000000000005
[  381.938066] R13: ffffffff970ae3f8 R14: 00000058e9f06f07 R15: 0000000000000000
[  381.938069]  ? cpuidle_enter_state+0x94/0x2e0
[  381.938071]  do_idle+0x193/0x1b0
[  381.938073]  cpu_startup_entry+0x6f/0x80
[  381.938075]  start_secondary+0x1a5/0x200
[  381.938077]  secondary_startup_64+0xa5/0xb0
[  381.938078] Code: 8c 00 00 00 8b 44 24 5c 48 0f a3 02 0f 82 08 fa
ff ff 48 8b 44 24 30 c7 00 00 00 00 00 48 8b 44 24 28 48 85 c0 74 16
48 8b 40 10 <48> 8b 40 10 8b 50 20 85 d2 74 07 c7 40 20 00 00 00 00 66
66 66
[  381.938104] RIP: load_balance+0x71e/0x9b0 RSP: ffff8e6c5f283e18
[  381.938105] CR2: 0000000000395370
[  381.938112] ---[ end trace 5c92a4d96e28b9b4 ]---
[  381.976894] BUG: unable to handle kernel paging request at 0000000000390030
[  381.976915] IP: load_balance+0x71e/0x9b0
[  381.976916] PGD 0 P4D 0
[  381.976918] Oops: 0000 [#3] PREEMPT SMP PTI
[  381.976919] Modules linked in: 8021q mrp wl(PO) btrfs dm_thin_pool
ast dm_persistent_data dm_bio_prison dm_bufio libcrc32c i2c_algo_bit
crc32c_generic intel_rapl ttm sb_edac zstd_compress drm_kms_helper xor
x86_pkg_temp_thermal intel_powerclamp drm raid6_pq raid1 agpgart
coretemp md_mod cfg80211 syscopyarea sysfillrect kvm_intel dm_mod
sysimgblt kvm fb_sys_fops joydev irqbypass rfkill iTCO_wdt
iTCO_vendor_support crct10dif_pclmul ghash_clmulni_intel ipmi_ssif
rtc_cmos ipmi_si intel_cstate mei_me ipmi_devintf intel_uncore
ipmi_msghandler shpchp pcspkr mousedev input_leds led_class psmouse
intel_rapl_perf lpc_ich mei i2c_i801 evdev mac_hid ip_tables x_tables
overlay squashfs zstd_decompress xxhash loop isofs sr_mod cdrom sd_mod
hid_generic usbhid hid uas usb_storage serio_raw atkbd libps2
crc32_pclmul
[  381.976950]  crc32c_intel isci ahci aesni_intel aes_x86_64 libsas
libahci crypto_simd ehci_pci ehci_hcd cryptd glue_helper libata
scsi_transport_sas e1000e mlx4_core usbcore ptp pps_core scsi_mod
usb_common devlink wmi i8042 serio
[  381.976961] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P      D    O
   4.16.7-1-ARCH #1
[  381.976961] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./EP2C602, BIOS P1.80 12/09/2013
[  381.976963] RIP: 0010:load_balance+0x71e/0x9b0
[  381.976964] RSP: 0018:ffff8e6c5f583e18 EFLAGS: 00010286
[  381.976965] RAX: 0000000000390020 RBX: ffff8e6c5b0e68a0 RCX: 0000000000000000
[  381.976966] RDX: ffff8e6c5f3a1880 RSI: ffff8e6c5f596480 RDI: 000000000000000e
[  381.976967] RBP: 000000000000000e R08: 000000000000000e R09: 00000000ff00ff00
[  381.976968] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000001
[  381.976969] R13: 0000000000000000 R14: ffff8e6c5b10dc00 R15: 000000000000001e
[  381.976970] FS:  0000000000000000(0000) GS:ffff8e6c5f580000(0000)
knlGS:0000000000000000
[  381.976971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  381.976972] CR2: 0000000000390030 CR3: 0000000e6100a001 CR4: 00000000000606e0
[  381.976972] Call Trace:
[  381.976974]  <IRQ>
[  381.976976]  ? core_get_scaling+0x10/0x10
[  381.976977]  ? intel_pstate_update_pstate+0x2c/0x30
[  381.976980]  rebalance_domains+0x194/0x2c0
[  381.976982]  __do_softirq+0xf1/0x2e0
[  381.976985]  irq_exit+0xc9/0xe0
[  381.976987]  smp_apic_timer_interrupt+0x73/0x160
[  381.976988]  apic_timer_interrupt+0xf/0x20
[  381.976989]  </IRQ>
[  381.976991] RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
[  381.976992] RSP: 0018:ffff939f86493ea8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff12
[  381.976993] RAX: ffff8e6c5f580000 RBX: 00000058ef8f8464 RCX: 000000000000001f
[  381.976994] RDX: 00000058ef8f8464 RSI: ffffffff96e70d54 RDI: ffffffff96e70fb2
[  381.976995] RBP: ffff8e6c5f5abe00 R08: 000002045d912d7c R09: 0000000000000048
[  381.976996] R10: 0000000000000bc1 R11: ffff8e6c5f5a07c4 R12: 0000000000000005
[  381.976996] R13: ffffffff970ae3f8 R14: 00000058eb0bfe53 R15: 0000000000000000
[  381.976999]  ? cpuidle_enter_state+0x94/0x2e0
[  381.977001]  do_idle+0x193/0x1b0
[  381.977003]  cpu_startup_entry+0x6f/0x80
[  381.977004]  start_secondary+0x1a5/0x200
[  381.977006]  secondary_startup_64+0xa5/0xb0
[  381.977008] Code: 8c 00 00 00 8b 44 24 5c 48 0f a3 02 0f 82 08 fa
ff ff 48 8b 44 24 30 c7 00 00 00 00 00 48 8b 44 24 28 48 85 c0 74 16
48 8b 40 10 <48> 8b 40 10 8b 50 20 85 d2 74 07 c7 40 20 00 00 00 00 66
66 66
[  381.977034] RIP: load_balance+0x71e/0x9b0 RSP: ffff8e6c5f583e18
[  381.977035] CR2: 0000000000390030
[  381.977050] ---[ end trace 5c92a4d96e28b9b5 ]---
[  382.051694] Kernel panic - not syncing: Fatal exception in interrupt
[  383.119561] Shutting down cpus with NMI
[  385.335677] Kernel Offset: 0x15000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  385.490613] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-12  5:08 "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass james harvey
@ 2018-05-12  7:51 ` Martin Steigerwald
  2018-05-13  0:10   ` james harvey
  2018-05-14  2:08 ` Qu Wenruo
  1 sibling, 1 reply; 14+ messages in thread
From: Martin Steigerwald @ 2018-05-12  7:51 UTC (permalink / raw)
  To: james harvey; +Cc: linux-btrfs

Hey James.

james harvey - 12.05.18, 07:08:
> 100% reproducible, booting from disk, or even Arch installation ISO.
> Kernel 4.16.7.  btrfs-progs v4.16.
> 
> Reading one of two journalctl files causes a kernel oops.  Initially
> ran into it from "journalctl --list-boots", but cat'ing the file does
> it too.  I believe this shows there's compressed data that is invalid,
> but its btrfs checksum is invalid.  I've cat'ed every file on the
> disk, and luckily have the problems narrowed down to only these 2
> files in /var/log/journal.
> 
> This volume has always been mounted with lzo compression.
> 
> scrub has never found anything, and have ran it since the oops.
> 
> Found a user a few years ago who also ran into this, without
> resolution, at:
> https://www.spinics.net/lists/linux-btrfs/msg52218.html
> 
> 1. Cat'ing a (non-essential) file shouldn't be able to bring down the
> system.
> 
> 2. If this is infact invalid compressed data, there should be a way to
> check for that.  Btrfs check and scrub pass.

I think systemd-journald sets those files to nocow on BTRFS in order to 
reduce fragmentation: That means no checksums, no snapshots, no nothing. 
I just removed /var/log/journal and thus disabled journalling to disk. 
Its sufficient for me to have the recent state in /run/journal.

Can you confirm nocow being set via lsattr on those files?

Still they should be decompressible just fine.

> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
> other files.
> 
> 
> 
> [  381.869940] BUG: unable to handle kernel paging request at
> 0000000000390e50 [  381.870881] BTRFS: decompress failed
[…]
-- 
Martin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-12  7:51 ` Martin Steigerwald
@ 2018-05-13  0:10   ` james harvey
  2018-05-13  2:09     ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: james harvey @ 2018-05-13  0:10 UTC (permalink / raw)
  To: linux-btrfs

On Sat, May 12, 2018 at 3:51 AM, Martin Steigerwald <martin@lichtvoll.de> wrote:
> Hey James.
>
> james harvey - 12.05.18, 07:08:
>> 100% reproducible, booting from disk, or even Arch installation ISO.
>> Kernel 4.16.7.  btrfs-progs v4.16.
>>
>> Reading one of two journalctl files causes a kernel oops.  Initially
>> ran into it from "journalctl --list-boots", but cat'ing the file does
>> it too.  I believe this shows there's compressed data that is invalid,
>> but its btrfs checksum is invalid.  I've cat'ed every file on the
>> disk, and luckily have the problems narrowed down to only these 2
>> files in /var/log/journal.
>>
>> This volume has always been mounted with lzo compression.
>>
>> scrub has never found anything, and have ran it since the oops.
>>
>> Found a user a few years ago who also ran into this, without
>> resolution, at:
>> https://www.spinics.net/lists/linux-btrfs/msg52218.html
>>
>> 1. Cat'ing a (non-essential) file shouldn't be able to bring down the
>> system.
>>
>> 2. If this is infact invalid compressed data, there should be a way to
>> check for that.  Btrfs check and scrub pass.
>
> I think systemd-journald sets those files to nocow on BTRFS in order to
> reduce fragmentation: That means no checksums, no snapshots, no nothing.
> I just removed /var/log/journal and thus disabled journalling to disk.
> Its sufficient for me to have the recent state in /run/journal.
>
> Can you confirm nocow being set via lsattr on those files?
>
> Still they should be decompressible just fine.
>
>> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
>> other files.
>>
>>
>>
>> [  381.869940] BUG: unable to handle kernel paging request at
>> 0000000000390e50 [  381.870881] BTRFS: decompress failed
> […]
> --
> Martin
>
>

You're right, everything in /var/log/journal has the NoCOW attribute.

This is on a 3 device btrfs RAID1.  If I mount ro,degraded with disks
1&2 or 1&3, and read the file, I get a crash.  With disks 2&3, it
reads fine

Does this mean that although I've never had a corrupted disk bit
before on COW/checksummed data, one somehow happened on the small
fraction of my storage which is NoCOW?  Seems unlikely, but I don't
know what other explanation there would be.

So, I think this means the corrupted disk bit must be on disk 1.

I'm running with LVM, this a small'ish volume, and I would be happy to
leave a copy of the set of 3 volumes as-is, if anyone wanted to have
me run anything to help diagnose this and/or try a patch.

Does btrfs have a way to do something like scrub, by comparing the
mirrored copies of NoCOW data, and alerting you to a mismatch?  I
realize with the NoCOW, it wouldn't have a checksum to know which is
accurate.  It would at least be good for there to be a way to alert to
the corruption.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-13  0:10   ` james harvey
@ 2018-05-13  2:09     ` Chris Murphy
  2018-05-13  5:28       ` james harvey
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Chris Murphy @ 2018-05-13  2:09 UTC (permalink / raw)
  To: james harvey; +Cc: Btrfs BTRFS

On Sat, May 12, 2018 at 6:10 PM, james harvey <jamespharvey20@gmail.com> wrote:
> On Sat, May 12, 2018 at 3:51 AM, Martin Steigerwald <martin@lichtvoll.de> wrote:
>> Hey James.
>>
>> james harvey - 12.05.18, 07:08:
>>> 100% reproducible, booting from disk, or even Arch installation ISO.
>>> Kernel 4.16.7.  btrfs-progs v4.16.
>>>
>>> Reading one of two journalctl files causes a kernel oops.  Initially
>>> ran into it from "journalctl --list-boots", but cat'ing the file does
>>> it too.  I believe this shows there's compressed data that is invalid,
>>> but its btrfs checksum is invalid.  I've cat'ed every file on the
>>> disk, and luckily have the problems narrowed down to only these 2
>>> files in /var/log/journal.
>>>
>>> This volume has always been mounted with lzo compression.
>>>
>>> scrub has never found anything, and have ran it since the oops.
>>>
>>> Found a user a few years ago who also ran into this, without
>>> resolution, at:
>>> https://www.spinics.net/lists/linux-btrfs/msg52218.html
>>>
>>> 1. Cat'ing a (non-essential) file shouldn't be able to bring down the
>>> system.
>>>
>>> 2. If this is infact invalid compressed data, there should be a way to
>>> check for that.  Btrfs check and scrub pass.
>>
>> I think systemd-journald sets those files to nocow on BTRFS in order to
>> reduce fragmentation: That means no checksums, no snapshots, no nothing.
>> I just removed /var/log/journal and thus disabled journalling to disk.
>> Its sufficient for me to have the recent state in /run/journal.
>>
>> Can you confirm nocow being set via lsattr on those files?
>>
>> Still they should be decompressible just fine.
>>
>>> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
>>> other files.
>>>
>>>
>>>
>>> [  381.869940] BUG: unable to handle kernel paging request at
>>> 0000000000390e50 [  381.870881] BTRFS: decompress failed
>> […]
>> --
>> Martin
>>
>>
>
> You're right, everything in /var/log/journal has the NoCOW attribute.
>
> This is on a 3 device btrfs RAID1.  If I mount ro,degraded with disks
> 1&2 or 1&3, and read the file, I get a crash.  With disks 2&3, it
> reads fine

Unmounted with all three available, you can use btrfs-map-logical to
extract copy 1 and copy 2 to compare; but it might crash also if one
copy is corrupt. But it's another way to test.


>
> Does this mean that although I've never had a corrupted disk bit
> before on COW/checksummed data, one somehow happened on the small
> fraction of my storage which is NoCOW?  Seems unlikely, but I don't
> know what other explanation there would be.

Usually nocow also means no compression. But in the archives is a
thread where I found that compression can be forced on nocow if the
file is fragment and either the volume is mounted with compression or
the file has inherited chattr +c (I don't remember which or possibly
both). And systemd does submit rotated logs for defragmentation.

But the compression doesn't happen twice. So if it's corruption, it's
corruption in transit. I think you'd come across this more often.


>
> So, I think this means the corrupted disk bit must be on disk 1.
>
> I'm running with LVM, this a small'ish volume, and I would be happy to
> leave a copy of the set of 3 volumes as-is, if anyone wanted to have
> me run anything to help diagnose this and/or try a patch.
>
> Does btrfs have a way to do something like scrub, by comparing the
> mirrored copies of NoCOW data, and alerting you to a mismatch?  I
> realize with the NoCOW, it wouldn't have a checksum to know which is
> accurate.  It would at least be good for there to be a way to alert to
> the corruption.

No csums means the files are ignored.

You've definitely found a bug. A corrupt file shouldn't crash the
kernel. You could do regression testing and see if it happens with
older kernels. I'd probably stick to longterm, easier to find already
built. If these are zstd compressed, then I think you can only go back
to 4.14.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-13  2:09     ` Chris Murphy
@ 2018-05-13  5:28       ` james harvey
  2018-05-13 11:01       ` james harvey
  2018-05-13 21:27       ` Chris Murphy
  2 siblings, 0 replies; 14+ messages in thread
From: james harvey @ 2018-05-13  5:28 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

(Conversation order changed to put program output at bottom)

On Sat, May 12, 2018 at 10:09 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Sat, May 12, 2018 at 6:10 PM, james harvey <jamespharvey20@gmail.com> wrote:
>> Does this mean that although I've never had a corrupted disk bit
>> before on COW/checksummed data, one somehow happened on the small
>> fraction of my storage which is NoCOW?  Seems unlikely, but I don't
>> know what other explanation there would be.
>
> Usually nocow also means no compression. But in the archives is a
> thread where I found that compression can be forced on nocow if the
> file is fragment and either the volume is mounted with compression or
> the file has inherited chattr +c (I don't remember which or possibly
> both). And systemd does submit rotated logs for defragmentation.
>
> But the compression doesn't happen twice. So if it's corruption, it's
> corruption in transit. I think you'd come across this more often.

Ahh, OK.  As filefrag shows above, the file is fragmented.  And,
because on disk it seems to me like the first 128k is being compressed
to within a 4k block, I'm thinking compression is being forced here on
nocow as you mentioned.

I'll also mention I'm sometimes seeing the "BTRFS: decompress failed"
crash, but sometimes seeing a "general protection fault", but it's
still only on reading this one file.  Kernel message here:
https://pastebin.com/SckjTasE

>> So, I think this means the corrupted disk bit must be on disk 1.
>>
>> I'm running with LVM, this a small'ish volume, and I would be happy to
>> leave a copy of the set of 3 volumes as-is, if anyone wanted to have
>> me run anything to help diagnose this and/or try a patch.
>>
>> Does btrfs have a way to do something like scrub, by comparing the
>> mirrored copies of NoCOW data, and alerting you to a mismatch?  I
>> realize with the NoCOW, it wouldn't have a checksum to know which is
>> accurate.  It would at least be good for there to be a way to alert to
>> the corruption.
>
> No csums means the files are ignored.

IMO, it would be a really important feature to add, possibly to scrub,
to compare non-checksummed data across mirrors for differences.
Without a checksum, it couldn't fix anything, but could alert the user
there's a problem.  So, user could determine which is corrupt, restore
that file from backup, etc.

> You've definitely found a bug. A corrupt file shouldn't crash the
> kernel. You could do regression testing and see if it happens with
> older kernels. I'd probably stick to longterm, easier to find already
> built. If these are zstd compressed, then I think you can only go back
> to 4.14.

I booted my April 1, 2016 Arch ISO.  It also crashes on this file.
Linux 4.4.5.  I could download older ISOs and try further back if
requested, but I'm thinking this likely means it's not a regression
but always been there.

>> You're right, everything in /var/log/journal has the NoCOW attribute.
>>
>> This is on a 3 device btrfs RAID1.  If I mount ro,degraded with disks
>> 1&2 or 1&3, and read the file, I get a crash.  With disks 2&3, it
>> reads fine
>
> Unmounted with all three available, you can use btrfs-map-logical to
> extract copy 1 and copy 2 to compare; but it might crash also if one
> copy is corrupt. But it's another way to test.

Glad to do that.  That will confirm the copies are different, and rule
out that one disk went bad on a sector that causes a system crash.  I
did the first fragment of one of the files as a test, and after I get
some guidance, I can do the rest using vim & Excel to make it not as
painful as it looks like it would be.

I'm really confused.

I think I'm seeing a bug in btrfs-map-logical.  It's giving me 4 lines
when I'd expect 2.

And, I think it's showing me the mirrored copies are on disk 2&3.
But, it's only when disk 1 is mounted that it crashes.  Degraded with
disks 2&3 work fine.  Maybe it's disk 2 or 3's data is corrupted, and
which mirror it reads from happens to be different when it's degraded
with 2&3 vs any other way?

First, I can mount degraded with disks 2&3, and xxd the file to see
what its contents should look like.  It starts with ASCII "LPKSHHRH",
which google'ing this shows is how all journal files must start, as
their signature; then it's just binary from there.

I ran "filefrag -v [FILENAME]".

I ran "btrfs-map-logical -l [FILEFRAG'S STARTING PHYSICAL OFFSET
NUMBER * 4096 FOR BLOCKSIZE]".  I'd expect 2 lines of output, but I'm
getting 4 lines!  The first 2 are identical except for mirror number
and device name, and the last 2 are identical except for the same.

I ran "dd if=[BTRFS-MAP-LOGICAL DEVICE NAME] of=/root/[FILENAME] bs=1
count=[FILEFRAG'S LENGTH * 4096 FOR BLOCKSIZE]
skip=[BTRFS-MAP-LOGICAL'S PHYSICAL NUMBER}".  I'd expect these 4 files
(corresponding to each output line of btrfs-map-logical) to be the
first 128K of the file.  (filefrag showing length 32, in size of
blocks of 4096 bytes.)  Well, 2 of these files being that, and maybe 2
of these files being random data since btrfs-map-logical seems like
it's giving 2 invalid output lines.

If I xxd the files and compare them, I see the data from disk 2 and
disk 3 from the first 2 btrfs-map-logical lines (so both at physical
1531469824) do have the LPKSHHRH signature, but starting at offset 9
(so 10th byte).  The 9 bytes before it are "3a0c 0000 6b02 0000 0a".
Googling these 9 bytes (with or without spaces) doesn't pull up
anything.  I don't know if that's a btrfs-lzo compression header
(don't know if there would be one per file, or if it would be per
extent.)

Anyways, after the journal LPKSHHRH signature, there are some areas
that don't match, but some do.  Maybe if lzo compression is happening,
we'd expect to see some bytes match and others not.  Also,
interestingly, at 0x1000 (so 4k in) starts a binary header "ad3c 0000
6807 0000 0f" just before "// Copyright 2013... lest is based on..."
which is another file.  So, the first 128k of the journal file got
compressed to be within 4k (lzop default compression does reduce the
first 128k to 2260 bytes.)

The data from the last 2 btrfs-map-logical lines has the lest
copyright message, so I'm thinking I'd just use the first 2 lines from
each btrfs-map-logical run, and ignore the last 2?  Maybe it's getting
tripped up because of the lzo compression?

And, is there a way to see how large each fragment will be with the
lzo compression, so I know where to read to?




# filefrag -v user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
Filesystem type is: 9123683e
File size of user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
is 8388608 (2048 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      31:     640902..    640933:     32:
encoded,shared
   1:       32..      63:     641026..    641057:     32:     640934:
encoded,shared
   2:       64..      95:     643303..    643334:     32:     641058:
encoded,shared
   3:       96..     127:     643305..    643336:     32:     643335:
encoded,shared
   4:      128..     159:     643418..    643449:     32:     643337:
encoded,shared
   5:      160..     191:     643600..    643631:     32:     643450:
encoded,shared
<email snip>
  58:     1841..    2047:     662141..    662347:    207:    1446616:
last,unwritten,shared,eof
user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal:
59 extents found

# blockdev --getbsz /dev/lvm/newMain1
4096 {matches filefrag saying "blocks of 4096 bytes"}

# echo $[4096*640902]
2625134592

# btrfs-map-logical -l 2625134592 /dev/lvm/newMain1
mirror 1 logical 2625134592 physical 1531469824 device /dev/mapper/lvm-newMain2
mirror 2 logical 2625134592 physical 1531469824 device /dev/mapper/lvm-newMain3
mirror 1 logical 2625138688 physical 1531473920 device /dev/mapper/lvm-newMain2
mirror 2 logical 2625138688 physical 1531473920 device /dev/mapper/lvm-newMain3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-13  2:09     ` Chris Murphy
  2018-05-13  5:28       ` james harvey
@ 2018-05-13 11:01       ` james harvey
  2018-05-13 11:45         ` james harvey
  2018-05-13 21:27       ` Chris Murphy
  2 siblings, 1 reply; 14+ messages in thread
From: james harvey @ 2018-05-13 11:01 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

*** Disregard previous post.  I read btrfs-map-logical.c, and the
reply below is more sensical than my last. I now understand since I
wasn't specifying a byte size to btrfs-map-logical, it was defaulting
to the nodesize which is 16k.  Filefrag shows the first fragment is
128k, but below I discuss how that's compressed down to less than 4k,
so reading 16k goes into another file and jumps to another logical
area, forcing the extra lines to show the physical locations. ***

(Conversation order changed to put program output at bottom)

On Sat, May 12, 2018 at 10:09 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Sat, May 12, 2018 at 6:10 PM, james harvey <jamespharvey20@gmail.com> wrote:
>> Does this mean that although I've never had a corrupted disk bit
>> before on COW/checksummed data, one somehow happened on the small
>> fraction of my storage which is NoCOW?  Seems unlikely, but I don't
>> know what other explanation there would be.
>
> Usually nocow also means no compression. But in the archives is a
> thread where I found that compression can be forced on nocow if the
> file is fragment and either the volume is mounted with compression or
> the file has inherited chattr +c (I don't remember which or possibly
> both). And systemd does submit rotated logs for defragmentation.
>
> But the compression doesn't happen twice. So if it's corruption, it's
> corruption in transit. I think you'd come across this more often.

Ahh, OK.  As filefrag shows below, the file is fragmented.  And,
because on disk it seems to me like 128k fragments are being
compressed to less than 4k blocks (lzop is able to compress the
filie's first 128k down to 2k'ish, so this is realistic) I'm thinking
compression is being forced here on nocow as you mentioned it could
be.

I'll also mention I'm sometimes seeing the "BTRFS: decompress failed"
crash, but sometimes seeing a "general protection fault", but it's
still only on reading this one file.  GPF style kernel message here:
https://pastebin.com/SckjTasE

>> So, I think this means the corrupted disk bit must be on disk 1.
>>
>> I'm running with LVM, this a small'ish volume, and I would be happy to
>> leave a copy of the set of 3 volumes as-is, if anyone wanted to have
>> me run anything to help diagnose this and/or try a patch.
>>
>> Does btrfs have a way to do something like scrub, by comparing the
>> mirrored copies of NoCOW data, and alerting you to a mismatch?  I
>> realize with the NoCOW, it wouldn't have a checksum to know which is
>> accurate.  It would at least be good for there to be a way to alert to
>> the corruption.
>
> No csums means the files are ignored.

IMO, it would be a really important feature to add, possibly to scrub,
to compare non-checksummed data across mirrors for differences.
Without a checksum, it couldn't fix anything, but could alert the user
there's a problem.  So, user could determine which is corrupt, restore
that file from backup, just know something is wrong, etc.

> You've definitely found a bug. A corrupt file shouldn't crash the
> kernel. You could do regression testing and see if it happens with
> older kernels. I'd probably stick to longterm, easier to find already
> built. If these are zstd compressed, then I think you can only go back
> to 4.14.

I booted my April 1, 2016, Arch ISO.  It also crashes on this file.
Linux 4.4.5.  I could download older ISOs and try further back if
requested, but I'm thinking this likely means it's not a regression
but always been there.

>> You're right, everything in /var/log/journal has the NoCOW attribute.
>>
>> This is on a 3 device btrfs RAID1.  If I mount ro,degraded with disks
>> 1&2 or 1&3, and read the file, I get a crash.  With disks 2&3, it
>> reads fine
>
> Unmounted with all three available, you can use btrfs-map-logical to
> extract copy 1 and copy 2 to compare; but it might crash also if one
> copy is corrupt. But it's another way to test.

Glad to do that.

I started with "filefrag -v [FILENAME]".  It shows 59 fragments.
Except for the last one, maximum length 32, in units of 4096 byte
blocks.

For each fragment, I ran twice (once for each -c copy):
"btrfs-map-logical -l [FILEFRAG'S STARTING PHYSICAL OFFSET NUMBER *
4096 FOR BLOCKSIZE] -b 4096 -o frag[FRAGMENT NUMBER].[COPY NUMBER] -c
[COPY NUMBER] [FILENAME]".

Fragments [0-27], [29-39], and [56-58] (with 58 being a full 207 4k
blocks) match.

Fragments 28, and [40-55] are completely different.

Why reading 4096 for each fragment?  Well, I tried the first fragment,
and found it has an extra 9 byte header the actual file doesn't have.
("3a0c 0000 6b02 0000 0a".)  I'm assuming that's a btrfs-lzo header.
Then, there's ASCII "LPKSHHRH" which happens to be journald's
beginning of file (starting at byte 0) signature.  After the signature
is different binary data than the actual file for about 2k, then
zeros.  If I run lzop on the first 128k of the file, it winds up
2k'ish.  A larger read from btrfs-map-logical, starting at 0x100 (4k)
is a different file, with its own 9 byte header then "//Copyright
2013... lest is based on..." which is definitely another file.  All of
this put together is telling me these fragments are lzo compressed.

(I realize that although I can see the first 128k fragment compresses
to about 2k, other 128k fragments might compress to more than 4k, so
there might be more differences between the mirrors than I've
discovered.)

btrfs-map-logical isn't crashing, because it appears to be giving data
in its compressed form, so isn't tripping up on invalid compressed
data.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-13 11:01       ` james harvey
@ 2018-05-13 11:45         ` james harvey
  0 siblings, 0 replies; 14+ messages in thread
From: james harvey @ 2018-05-13 11:45 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Reported at: https://bugzilla.kernel.org/show_bug.cgi?id=199707

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-13  2:09     ` Chris Murphy
  2018-05-13  5:28       ` james harvey
  2018-05-13 11:01       ` james harvey
@ 2018-05-13 21:27       ` Chris Murphy
  2 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2018-05-13 21:27 UTC (permalink / raw)
  To: Chris Murphy; +Cc: james harvey, Btrfs BTRFS

On Sat, May 12, 2018 at 8:09 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Sat, May 12, 2018 at 6:10 PM, james harvey <jamespharvey20@gmail.com> wrote:
>> On Sat, May 12, 2018 at 3:51 AM, Martin Steigerwald <martin@lichtvoll.de> wrote:
>>> Hey James.
>>>
>>> james harvey - 12.05.18, 07:08:
>>>> 100% reproducible, booting from disk, or even Arch installation ISO.
>>>> Kernel 4.16.7.  btrfs-progs v4.16.
>>>>
>>>> Reading one of two journalctl files causes a kernel oops.  Initially
>>>> ran into it from "journalctl --list-boots", but cat'ing the file does
>>>> it too.  I believe this shows there's compressed data that is invalid,
>>>> but its btrfs checksum is invalid.  I've cat'ed every file on the
>>>> disk, and luckily have the problems narrowed down to only these 2
>>>> files in /var/log/journal.
>>>>
>>>> This volume has always been mounted with lzo compression.
>>>>
>>>> scrub has never found anything, and have ran it since the oops.
>>>>
>>>> Found a user a few years ago who also ran into this, without
>>>> resolution, at:
>>>> https://www.spinics.net/lists/linux-btrfs/msg52218.html
>>>>
>>>> 1. Cat'ing a (non-essential) file shouldn't be able to bring down the
>>>> system.
>>>>
>>>> 2. If this is infact invalid compressed data, there should be a way to
>>>> check for that.  Btrfs check and scrub pass.
>>>
>>> I think systemd-journald sets those files to nocow on BTRFS in order to
>>> reduce fragmentation: That means no checksums, no snapshots, no nothing.
>>> I just removed /var/log/journal and thus disabled journalling to disk.
>>> Its sufficient for me to have the recent state in /run/journal.
>>>
>>> Can you confirm nocow being set via lsattr on those files?
>>>
>>> Still they should be decompressible just fine.
>>>
>>>> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
>>>> other files.
>>>>
>>>>
>>>>
>>>> [  381.869940] BUG: unable to handle kernel paging request at
>>>> 0000000000390e50 [  381.870881] BTRFS: decompress failed
>>> […]
>>> --
>>> Martin
>>>
>>>
>>
>> You're right, everything in /var/log/journal has the NoCOW attribute.
>>
>> This is on a 3 device btrfs RAID1.  If I mount ro,degraded with disks
>> 1&2 or 1&3, and read the file, I get a crash.  With disks 2&3, it
>> reads fine
>
> Unmounted with all three available, you can use btrfs-map-logical to
> extract copy 1 and copy 2 to compare; but it might crash also if one
> copy is corrupt. But it's another way to test.
>
>
>>
>> Does this mean that although I've never had a corrupted disk bit
>> before on COW/checksummed data, one somehow happened on the small
>> fraction of my storage which is NoCOW?  Seems unlikely, but I don't
>> know what other explanation there would be.
>
> Usually nocow also means no compression. But in the archives is a
> thread where I found that compression can be forced on nocow if the
> file is fragment and either the volume is mounted with compression or
> the file has inherited chattr +c (I don't remember which or possibly
> both).

"file is fragment" should be "file is (submitted for) defragmentation"



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-12  5:08 "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass james harvey
  2018-05-12  7:51 ` Martin Steigerwald
@ 2018-05-14  2:08 ` Qu Wenruo
  2018-05-14  4:41   ` james harvey
  1 sibling, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2018-05-14  2:08 UTC (permalink / raw)
  To: james harvey, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 14890 bytes --]



On 2018年05月12日 13:08, james harvey wrote:
> 100% reproducible, booting from disk, or even Arch installation ISO.
> Kernel 4.16.7.  btrfs-progs v4.16.
> 
> Reading one of two journalctl files causes a kernel oops.  Initially
> ran into it from "journalctl --list-boots", but cat'ing the file does
> it too.  I believe this shows there's compressed data that is invalid,
> but its btrfs checksum is invalid.  I've cat'ed every file on the
> disk, and luckily have the problems narrowed down to only these 2
> files in /var/log/journal.
> 
> This volume has always been mounted with lzo compression.
> 
> scrub has never found anything, and have ran it since the oops.
> 
> Found a user a few years ago who also ran into this, without
> resolution, at:
> https://www.spinics.net/lists/linux-btrfs/msg52218.html
> 
> 1. Cat'ing a (non-essential) file shouldn't be able to bring down the system.
> 
> 2. If this is infact invalid compressed data, there should be a way to
> check for that.  Btrfs check and scrub pass.

Btrfs check ensure no obvious metadata corruption and all its cross
reference should be OK.
(Unless we find new bugs in btrfs check)

And scrub ensure all tree blocks and data extent with csum matches their
csum.

> 
> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
> other files.
> 
> 
> 
> [  381.869940] BUG: unable to handle kernel paging request at 0000000000390e50
> [  381.870881] BTRFS: decompress failed
> [  381.891775] IP: rebalance_domains+0x8a/0x2c0

The interesting part here is, btrfs is not showing up the call trace,
not even lzo code.
(Despite of the "decompress failed" message).
Maybe some corrupted data is screwing up some random kernel memory?

Would you please get the inode number of that corrupted files, and throw
it through btrfs-debug-tree?

# btrfs-debug-tree -t <subvol_id> <device> | grep -A 50 \(<INO>

This is the preferred method as it would provide all the details we
need. But since it could contain sensitive info like filename, please
double check before posting it.

Or fiemap of that file could also help:

# xfs_io -c "fiemap -v" <corrupted_file>

This is completely safe, but I'm not 100% sure about if the info is enough.

Thanks,
Qu

> [  381.891776] PGD 0 P4D 0
> [  381.891780] Oops: 0000 [#1] PREEMPT SMP PTI
> [  381.891782] Modules linked in:
> [  381.891784] BTRFS: decompress failed
> [  381.891784]  8021q mrp wl(PO) btrfs dm_thin_pool ast
> [  381.891788] BTRFS: decompress failed
> [  381.891789]  dm_persistent_data dm_bio_prison dm_bufio libcrc32c
> i2c_algo_bit crc32c_generic intel_rapl ttm sb_edac zstd_compress
> drm_kms_helper xor x86_pkg_temp_thermal intel_powerclamp drm raid6_pq
> raid1 agpgart coretemp md_mod cfg80211 syscopyarea sysfillrect
> kvm_intel dm_mod sysimgblt kvm fb_sys_fops joydev irqbypass rfkill
> iTCO_wdt iTCO_vendor_support crct10dif_pclmul ghash_clmulni_intel
> ipmi_ssif rtc_cmos ipmi_si intel_cstate mei_me ipmi_devintf
> intel_uncore ipmi_msghandler shpchp pcspkr mousedev input_leds
> led_class psmouse intel_rapl_perf lpc_ich mei i2c_i801 evdev mac_hid
> ip_tables x_tables overlay squashfs zstd_decompress xxhash loop isofs
> sr_mod cdrom sd_mod
> [  381.891835] BTRFS: decompress failed
> [  381.891835]  hid_generic usbhid hid uas usb_storage
> [  381.891838] BTRFS: decompress failed
> [  381.891838]  serio_raw atkbd libps2 crc32_pclmul
> [  381.891840] BTRFS: decompress failed
> [  381.891841]  crc32c_intel isci ahci aesni_intel
> [  381.891843] BTRFS: decompress failed
> [  381.891843]  aes_x86_64 libsas libahci crypto_simd
> [  381.891845] BTRFS: decompress failed
> [  381.891845]  ehci_pci ehci_hcd cryptd glue_helper
> [  381.891847] BTRFS: decompress failed
> [  381.891847]  libata scsi_transport_sas e1000e mlx4_core usbcore ptp
> pps_core scsi_mod usb_common devlink wmi i8042 serio
> [  381.891855] CPU: 11 PID: 0 Comm: swapper/11 Tainted: P           O
>    4.16.7-1-ARCH #1
> [  381.891856] Hardware name: To Be Filled By O.E.M. To Be Filled By
> O.E.M./EP2C602, BIOS P1.80 12/09/2013
> [  381.891858] RIP: 0010:rebalance_domains+0x8a/0x2c0
> [  381.891859] RSP: 0018:ffff8e6c5f2c3f08 EFLAGS: 00010206
> [  381.891860] RAX: 0000000000000000 RBX: 0000000000390de8 RCX: 0000000000000005
> [  381.891861] RDX: 0000000100005ff2 RSI: 000000000000024d RDI: 0000001333333340
> [  381.891862] RBP: 0000000100005ff4 R08: 0000000000000000 R09: 0000000000000001
> [  381.891863] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [  381.891863] R13: 0000000000000000 R14: 0000000000000001 R15: 00bd7801f8e8a9c8
> [  381.891865] FS:  0000000000000000(0000) GS:ffff8e6c5f2c0000(0000)
> knlGS:0000000000000000
> [  381.891865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  381.891866] CR2: 0000000000390e50 CR3: 0000000e6100a004 CR4: 00000000000606e0
> [  381.891867] Call Trace:
> [  381.891870]  <IRQ>
> [  381.891875]  __do_softirq+0xf1/0x2e0
> [  381.891880]  irq_exit+0xc9/0xe0
> [  381.903429] BTRFS: decompress failed
> [  381.916574]  smp_apic_timer_interrupt+0x73/0x160
> [  381.916576]  apic_timer_interrupt+0xf/0x20
> [  381.916578]  </IRQ>
> [  381.916581] RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
> [  381.916582] RSP: 0018:ffff939f863fbea8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff12
> [  381.916583] RAX: ffff8e6c5f2c0000 RBX: 00000058e9388d6f RCX: 000000000000001f
> [  381.916584] RDX: 00000058e9388d6f RSI: ffffffff96e70d54 RDI: ffffffff96e70fb2
> [  381.916585] RBP: ffff8e6c5f2ebe00 R08: 000002044b2e9556 R09: 000000000000337b
> [  381.916585] R10: 000000000000471b R11: ffff8e6c5f2e07c4 R12: 0000000000000003
> [  381.916586] R13: ffffffff970ae338 R14: 00000058e9215560 R15: 0000000000000000
> [  381.916591]  ? cpuidle_enter_state+0x94/0x2e0
> [  381.916593]  do_idle+0x193/0x1b0
> [  381.916595]  cpu_startup_entry+0x6f/0x80
> [  381.916599]  start_secondary+0x1a5/0x200
> [  381.916602]  secondary_startup_64+0xa5/0xb0
> [  381.916603] Code: 46 00 00 48 03 04 d5 40 f4
> [  381.924842] BTRFS: decompress failed
> [  381.937936] ee 96 48 8b 98 c0 09 00 00 48 85 db 0f 84 32 02 00 00
> 45 31 ff 45 31 f6 45 31 e4 48 8b 15 e6 aa f4 00 <48> 8b 43 68 48 39 53
> 70 79 2e 48 89 c2 41 be 01 00 00 00 48 c1
> [  381.937956] RIP: rebalance_domains+0x8a/0x2c0 RSP: ffff8e6c5f2c3f08
> [  381.937957] CR2: 0000000000390e50
> [  381.937976] ---[ end trace 5c92a4d96e28b9b3 ]---
> [  381.937977] BUG: unable to handle kernel paging request at 0000000000395370
> [  381.937982] IP: load_balance+0x71e/0x9b0
> [  381.937982] PGD 0 P4D 0
> [  381.937985] Oops: 0000 [#2] PREEMPT SMP PTI
> [  381.937986] Modules linked in: 8021q mrp wl(PO) btrfs dm_thin_pool
> ast dm_persistent_data dm_bio_prison dm_bufio libcrc32c i2c_algo_bit
> crc32c_generic intel_rapl ttm sb_edac zstd_compress drm_kms_helper xor
> x86_pkg_temp_thermal intel_powerclamp drm raid6_pq raid1 agpgart
> coretemp md_mod cfg80211 syscopyarea sysfillrect kvm_intel dm_mod
> sysimgblt kvm fb_sys_fops joydev irqbypass rfkill iTCO_wdt
> iTCO_vendor_support crct10dif_pclmul ghash_clmulni_intel ipmi_ssif
> rtc_cmos ipmi_si intel_cstate mei_me ipmi_devintf intel_uncore
> ipmi_msghandler shpchp pcspkr mousedev input_leds led_class psmouse
> intel_rapl_perf lpc_ich mei i2c_i801 evdev mac_hid ip_tables x_tables
> overlay squashfs zstd_decompress xxhash loop isofs sr_mod cdrom sd_mod
> hid_generic usbhid hid uas usb_storage serio_raw atkbd libps2
> crc32_pclmul
> [  381.938018]  crc32c_intel isci ahci aesni_intel aes_x86_64 libsas
> libahci crypto_simd ehci_pci ehci_hcd cryptd glue_helper libata
> scsi_transport_sas e1000e mlx4_core usbcore ptp pps_core scsi_mod
> usb_common devlink wmi i8042 serio
> [  381.938029] CPU: 10 PID: 0 Comm: swapper/10 Tainted: P      D    O
>    4.16.7-1-ARCH #1
> [  381.938030] Hardware name: To Be Filled By O.E.M. To Be Filled By
> O.E.M./EP2C602, BIOS P1.80 12/09/2013
> [  381.938032] RIP: 0010:load_balance+0x71e/0x9b0
> [  381.938033] RSP: 0018:ffff8e6c5f283e18 EFLAGS: 00010286
> [  381.938034] RAX: 0000000000395360 RBX: ffff8e6c5b0e6a20 RCX: 0000000000000000
> [  381.938035] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000012
> [  381.938035] RBP: 000000000000000a R08: 0000000000000005 R09: 0000000000000001
> [  381.938036] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [  381.938037] R13: 0000000000000000 R14: ffff8e6c5b10e600 R15: 000000000000000a
> [  381.938038] FS:  0000000000000000(0000) GS:ffff8e6c5f280000(0000)
> knlGS:0000000000000000
> [  381.938039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  381.938040] CR2: 0000000000395370 CR3: 0000000e6100a004 CR4: 00000000000606e0
> [  381.938041] Call Trace:
> [  381.938042]  <IRQ>
> [  381.938045]  ? core_get_scaling+0x10/0x10
> [  381.938046]  ? intel_pstate_update_pstate+0x2c/0x30
> [  381.938049]  rebalance_domains+0x194/0x2c0
> [  381.938052]  __do_softirq+0xf1/0x2e0
> [  381.938054]  irq_exit+0xc9/0xe0
> [  381.938056]  smp_apic_timer_interrupt+0x73/0x160
> [  381.938058]  apic_timer_interrupt+0xf/0x20
> [  381.938059]  </IRQ>
> [  381.938061] RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
> [  381.938061] RSP: 0018:ffff939f863f3ea8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff12
> [  381.938063] RAX: ffff8e6c5f280000 RBX: 00000058ea69b446 RCX: 000000000000001f
> [  381.938064] RDX: 00000058ea69b446 RSI: ffffffff96e70d54 RDI: ffffffff96e70fb2
> [  381.938065] RBP: ffff8e6c5f2abe00 R08: 000002044ea385d2 R09: 0000000000000ef6
> [  381.938065] R10: 0000000000001d66 R11: ffff8e6c5f2a07c4 R12: 0000000000000005
> [  381.938066] R13: ffffffff970ae3f8 R14: 00000058e9f06f07 R15: 0000000000000000
> [  381.938069]  ? cpuidle_enter_state+0x94/0x2e0
> [  381.938071]  do_idle+0x193/0x1b0
> [  381.938073]  cpu_startup_entry+0x6f/0x80
> [  381.938075]  start_secondary+0x1a5/0x200
> [  381.938077]  secondary_startup_64+0xa5/0xb0
> [  381.938078] Code: 8c 00 00 00 8b 44 24 5c 48 0f a3 02 0f 82 08 fa
> ff ff 48 8b 44 24 30 c7 00 00 00 00 00 48 8b 44 24 28 48 85 c0 74 16
> 48 8b 40 10 <48> 8b 40 10 8b 50 20 85 d2 74 07 c7 40 20 00 00 00 00 66
> 66 66
> [  381.938104] RIP: load_balance+0x71e/0x9b0 RSP: ffff8e6c5f283e18
> [  381.938105] CR2: 0000000000395370
> [  381.938112] ---[ end trace 5c92a4d96e28b9b4 ]---
> [  381.976894] BUG: unable to handle kernel paging request at 0000000000390030
> [  381.976915] IP: load_balance+0x71e/0x9b0
> [  381.976916] PGD 0 P4D 0
> [  381.976918] Oops: 0000 [#3] PREEMPT SMP PTI
> [  381.976919] Modules linked in: 8021q mrp wl(PO) btrfs dm_thin_pool
> ast dm_persistent_data dm_bio_prison dm_bufio libcrc32c i2c_algo_bit
> crc32c_generic intel_rapl ttm sb_edac zstd_compress drm_kms_helper xor
> x86_pkg_temp_thermal intel_powerclamp drm raid6_pq raid1 agpgart
> coretemp md_mod cfg80211 syscopyarea sysfillrect kvm_intel dm_mod
> sysimgblt kvm fb_sys_fops joydev irqbypass rfkill iTCO_wdt
> iTCO_vendor_support crct10dif_pclmul ghash_clmulni_intel ipmi_ssif
> rtc_cmos ipmi_si intel_cstate mei_me ipmi_devintf intel_uncore
> ipmi_msghandler shpchp pcspkr mousedev input_leds led_class psmouse
> intel_rapl_perf lpc_ich mei i2c_i801 evdev mac_hid ip_tables x_tables
> overlay squashfs zstd_decompress xxhash loop isofs sr_mod cdrom sd_mod
> hid_generic usbhid hid uas usb_storage serio_raw atkbd libps2
> crc32_pclmul
> [  381.976950]  crc32c_intel isci ahci aesni_intel aes_x86_64 libsas
> libahci crypto_simd ehci_pci ehci_hcd cryptd glue_helper libata
> scsi_transport_sas e1000e mlx4_core usbcore ptp pps_core scsi_mod
> usb_common devlink wmi i8042 serio
> [  381.976961] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P      D    O
>    4.16.7-1-ARCH #1
> [  381.976961] Hardware name: To Be Filled By O.E.M. To Be Filled By
> O.E.M./EP2C602, BIOS P1.80 12/09/2013
> [  381.976963] RIP: 0010:load_balance+0x71e/0x9b0
> [  381.976964] RSP: 0018:ffff8e6c5f583e18 EFLAGS: 00010286
> [  381.976965] RAX: 0000000000390020 RBX: ffff8e6c5b0e68a0 RCX: 0000000000000000
> [  381.976966] RDX: ffff8e6c5f3a1880 RSI: ffff8e6c5f596480 RDI: 000000000000000e
> [  381.976967] RBP: 000000000000000e R08: 000000000000000e R09: 00000000ff00ff00
> [  381.976968] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000001
> [  381.976969] R13: 0000000000000000 R14: ffff8e6c5b10dc00 R15: 000000000000001e
> [  381.976970] FS:  0000000000000000(0000) GS:ffff8e6c5f580000(0000)
> knlGS:0000000000000000
> [  381.976971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  381.976972] CR2: 0000000000390030 CR3: 0000000e6100a001 CR4: 00000000000606e0
> [  381.976972] Call Trace:
> [  381.976974]  <IRQ>
> [  381.976976]  ? core_get_scaling+0x10/0x10
> [  381.976977]  ? intel_pstate_update_pstate+0x2c/0x30
> [  381.976980]  rebalance_domains+0x194/0x2c0
> [  381.976982]  __do_softirq+0xf1/0x2e0
> [  381.976985]  irq_exit+0xc9/0xe0
> [  381.976987]  smp_apic_timer_interrupt+0x73/0x160
> [  381.976988]  apic_timer_interrupt+0xf/0x20
> [  381.976989]  </IRQ>
> [  381.976991] RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
> [  381.976992] RSP: 0018:ffff939f86493ea8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff12
> [  381.976993] RAX: ffff8e6c5f580000 RBX: 00000058ef8f8464 RCX: 000000000000001f
> [  381.976994] RDX: 00000058ef8f8464 RSI: ffffffff96e70d54 RDI: ffffffff96e70fb2
> [  381.976995] RBP: ffff8e6c5f5abe00 R08: 000002045d912d7c R09: 0000000000000048
> [  381.976996] R10: 0000000000000bc1 R11: ffff8e6c5f5a07c4 R12: 0000000000000005
> [  381.976996] R13: ffffffff970ae3f8 R14: 00000058eb0bfe53 R15: 0000000000000000
> [  381.976999]  ? cpuidle_enter_state+0x94/0x2e0
> [  381.977001]  do_idle+0x193/0x1b0
> [  381.977003]  cpu_startup_entry+0x6f/0x80
> [  381.977004]  start_secondary+0x1a5/0x200
> [  381.977006]  secondary_startup_64+0xa5/0xb0
> [  381.977008] Code: 8c 00 00 00 8b 44 24 5c 48 0f a3 02 0f 82 08 fa
> ff ff 48 8b 44 24 30 c7 00 00 00 00 00 48 8b 44 24 28 48 85 c0 74 16
> 48 8b 40 10 <48> 8b 40 10 8b 50 20 85 d2 74 07 c7 40 20 00 00 00 00 66
> 66 66
> [  381.977034] RIP: load_balance+0x71e/0x9b0 RSP: ffff8e6c5f583e18
> [  381.977035] CR2: 0000000000390030
> [  381.977050] ---[ end trace 5c92a4d96e28b9b5 ]---
> [  382.051694] Kernel panic - not syncing: Fatal exception in interrupt
> [  383.119561] Shutting down cpus with NMI
> [  385.335677] Kernel Offset: 0x15000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  385.490613] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-14  2:08 ` Qu Wenruo
@ 2018-05-14  4:41   ` james harvey
  2018-05-14  5:30     ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: james harvey @ 2018-05-14  4:41 UTC (permalink / raw)
  To: Qu Wenruo, Btrfs BTRFS

On Sun, May 13, 2018 at 10:08 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2018年05月12日 13:08, james harvey wrote:
>> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
>> other files.
>>
>>
>>
>> [  381.869940] BUG: unable to handle kernel paging request at 0000000000390e50
>> [  381.870881] BTRFS: decompress failed
>> [  381.891775] IP: rebalance_domains+0x8a/0x2c0
>
> The interesting part here is, btrfs is not showing up the call trace,
> not even lzo code.
> (Despite of the "decompress failed" message).
> Maybe some corrupted data is screwing up some random kernel memory?

I've been surprised by this too.  I've seen a few "styles" of crashes from this.

The fuller version of the one I posted in original post:
https://bugzilla.kernel.org/attachment.cgi?id=275949

One that starts with a "general protection fault":
https://bugzilla.kernel.org/attachment.cgi?id=275951

And my most recent version, starts with "BTRFS: decompress failed"
then "BUG: unable to handle kernel NULL pointer dereference at
0000000000000001":
https://bugzilla.kernel.org/attachment.cgi?id=275961

This latest one does have a call trace including btrfs.  The top of
the call trace is "end_compressed_bio_read+0x34e/0x3d0 [btrfs]", and
although it includes the word compressed, I'm not sure that's actually
having to do with lzo compression.  The call stack doesn't scream that
to me.

It seems like when the invalid decompression happens, that code itself
doesn't give any kernel errors, but the rest of the kernel starts
spazzing.

I've replicated this probably about 15 times now.  Only happens on
these files that have inconsistent mirrored data.



> Would you please get the inode number of that corrupted files, and throw
> it through btrfs-debug-tree?
>
> # btrfs-debug-tree -t <subvol_id> <device> | grep -A 50 \(<INO>
>
> This is the preferred method as it would provide all the details we
> need. But since it could contain sensitive info like filename, please
> double check before posting it.

# ls -i system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
291489 system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal

# ls -i user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
72267 user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal

# btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(291489 >
debug.tree.291489
Available at: http://termbin.com/kegj

# btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(72267 >
debug.tree.72267
Available at: http://termbin.com/xhdc



> Or fiemap of that file could also help:
>
> # xfs_io -c "fiemap -v" <corrupted_file>
>
> This is completely safe, but I'm not 100% sure about if the info is enough.

# xfs_io -c "fiemap -v"
system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
Available at: http://termbin.com/nsej

# xfs_io -c "fiemap -v"
system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
Available at: http://termbin.com/4fiz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-14  4:41   ` james harvey
@ 2018-05-14  5:30     ` Qu Wenruo
  2018-05-14  6:36       ` Qu Wenruo
  2018-05-14 10:29       ` james harvey
  0 siblings, 2 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-05-14  5:30 UTC (permalink / raw)
  To: james harvey, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4837 bytes --]



On 2018年05月14日 12:41, james harvey wrote:
> On Sun, May 13, 2018 at 10:08 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2018年05月12日 13:08, james harvey wrote:
>>> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
>>> other files.
>>>
>>>
>>>
>>> [  381.869940] BUG: unable to handle kernel paging request at 0000000000390e50
>>> [  381.870881] BTRFS: decompress failed
>>> [  381.891775] IP: rebalance_domains+0x8a/0x2c0
>>
>> The interesting part here is, btrfs is not showing up the call trace,
>> not even lzo code.
>> (Despite of the "decompress failed" message).
>> Maybe some corrupted data is screwing up some random kernel memory?
> 
> I've been surprised by this too.  I've seen a few "styles" of crashes from this.
> 
> The fuller version of the one I posted in original post:
> https://bugzilla.kernel.org/attachment.cgi?id=275949
> 
> One that starts with a "general protection fault":
> https://bugzilla.kernel.org/attachment.cgi?id=275951
> 
> And my most recent version, starts with "BTRFS: decompress failed"
> then "BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000001":
> https://bugzilla.kernel.org/attachment.cgi?id=275961
> 
> This latest one does have a call trace including btrfs.  The top of
> the call trace is "end_compressed_bio_read+0x34e/0x3d0 [btrfs]", and
> although it includes the word compressed, I'm not sure that's actually
> having to do with lzo compression.  The call stack doesn't scream that
> to me.
> 
> It seems like when the invalid decompression happens, that code itself
> doesn't give any kernel errors, but the rest of the kernel starts
> spazzing.

Yep, even the last case it still looks like that it's kernel memory get
corrupted.

> 
> I've replicated this probably about 15 times now.  Only happens on
> these files that have inconsistent mirrored data.

From the thread, since you have already located the corrupted mirror,
would you please provide the corrupted dump along with correct one?

It would help a lot for us to under stand what's going on.

> 
> 
> 
>> Would you please get the inode number of that corrupted files, and throw
>> it through btrfs-debug-tree?
>>
>> # btrfs-debug-tree -t <subvol_id> <device> | grep -A 50 \(<INO>
>>
>> This is the preferred method as it would provide all the details we
>> need. But since it could contain sensitive info like filename, please
>> double check before posting it.
> 
> # ls -i system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
> 291489 system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
> 
> # ls -i user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
> 72267 user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
> 
> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(291489 >
> debug.tree.291489
> Available at: http://termbin.com/kegj
> 
> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(72267 >
> debug.tree.72267
> Available at: http://termbin.com/xhdc

The dump indicates the same conclusion you reached.
The inode has NODATACOW NODATASUM flag, which means it should not has
csum nor has data compressed.
While in fact we have tons of compressed extents.

But the following fiemap result also shows that these extents get
shared. This could happen when there is a snapshot.

So there is something wrong that btrfs allows compressed data to be
generated for such file.
(Could not reproduce the same behavior with 4.16 kernel, could such
problem happens in older kernels? Or just get fixed recently?)

Then some corruption screwed up the compressed data, and when we
decompress, the kernel is screwed up.


To pindown the lzo decompress corruption, kasan would be a nice try.
However this means you need to enable it at compile time, and recompile
a kernel.
Not to mention kasan has a great impact on performance.

But it should provide more info before memory get corrupted.

Thanks,
Qu

> 
> 
> 
>> Or fiemap of that file could also help:
>>
>> # xfs_io -c "fiemap -v" <corrupted_file>
>>
>> This is completely safe, but I'm not 100% sure about if the info is enough.
> 
> # xfs_io -c "fiemap -v"
> system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
> Available at: http://termbin.com/nsej
> 
> # xfs_io -c "fiemap -v"
> system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
> Available at: http://termbin.com/4fiz


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-14  5:30     ` Qu Wenruo
@ 2018-05-14  6:36       ` Qu Wenruo
  2018-05-14 10:29       ` james harvey
  1 sibling, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-05-14  6:36 UTC (permalink / raw)
  To: james harvey, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 5252 bytes --]



On 2018年05月14日 13:30, Qu Wenruo wrote:
> 
> 
> On 2018年05月14日 12:41, james harvey wrote:
>> On Sun, May 13, 2018 at 10:08 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> On 2018年05月12日 13:08, james harvey wrote:
>>>> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
>>>> other files.
>>>>
>>>>
>>>>
>>>> [  381.869940] BUG: unable to handle kernel paging request at 0000000000390e50
>>>> [  381.870881] BTRFS: decompress failed
>>>> [  381.891775] IP: rebalance_domains+0x8a/0x2c0
>>>
>>> The interesting part here is, btrfs is not showing up the call trace,
>>> not even lzo code.
>>> (Despite of the "decompress failed" message).
>>> Maybe some corrupted data is screwing up some random kernel memory?
>>
>> I've been surprised by this too.  I've seen a few "styles" of crashes from this.
>>
>> The fuller version of the one I posted in original post:
>> https://bugzilla.kernel.org/attachment.cgi?id=275949
>>
>> One that starts with a "general protection fault":
>> https://bugzilla.kernel.org/attachment.cgi?id=275951
>>
>> And my most recent version, starts with "BTRFS: decompress failed"
>> then "BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000001":
>> https://bugzilla.kernel.org/attachment.cgi?id=275961
>>
>> This latest one does have a call trace including btrfs.  The top of
>> the call trace is "end_compressed_bio_read+0x34e/0x3d0 [btrfs]", and
>> although it includes the word compressed, I'm not sure that's actually
>> having to do with lzo compression.  The call stack doesn't scream that
>> to me.
>>
>> It seems like when the invalid decompression happens, that code itself
>> doesn't give any kernel errors, but the rest of the kernel starts
>> spazzing.
> 
> Yep, even the last case it still looks like that it's kernel memory get
> corrupted.
> 
>>
>> I've replicated this probably about 15 times now.  Only happens on
>> these files that have inconsistent mirrored data.
> 
> From the thread, since you have already located the corrupted mirror,
> would you please provide the corrupted dump along with correct one?
> 
> It would help a lot for us to under stand what's going on.
> 
>>
>>
>>
>>> Would you please get the inode number of that corrupted files, and throw
>>> it through btrfs-debug-tree?
>>>
>>> # btrfs-debug-tree -t <subvol_id> <device> | grep -A 50 \(<INO>
>>>
>>> This is the preferred method as it would provide all the details we
>>> need. But since it could contain sensitive info like filename, please
>>> double check before posting it.
>>
>> # ls -i system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
>> 291489 system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
>>
>> # ls -i user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
>> 72267 user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal
>>
>> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(291489 >
>> debug.tree.291489
>> Available at: http://termbin.com/kegj
>>
>> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(72267 >
>> debug.tree.72267
>> Available at: http://termbin.com/xhdc
> 
> The dump indicates the same conclusion you reached.
> The inode has NODATACOW NODATASUM flag, which means it should not has
> csum nor has data compressed.
> While in fact we have tons of compressed extents.
> 
> But the following fiemap result also shows that these extents get
> shared. This could happen when there is a snapshot.
> 
> So there is something wrong that btrfs allows compressed data to be
> generated for such file.
> (Could not reproduce the same behavior with 4.16 kernel, could such
> problem happens in older kernels? Or just get fixed recently?)

OK, I could reproduce it now.

Just mount with -o nodatasum, then create a file.
Remount with compress-force=lzo, then write something.

So at least btrfs should disallow such thing.

Thanks,
Qu

> 
> Then some corruption screwed up the compressed data, and when we
> decompress, the kernel is screwed up.
> 
> 
> To pindown the lzo decompress corruption, kasan would be a nice try.
> However this means you need to enable it at compile time, and recompile
> a kernel.
> Not to mention kasan has a great impact on performance.
> 
> But it should provide more info before memory get corrupted.
> 
> Thanks,
> Qu
> 
>>
>>
>>
>>> Or fiemap of that file could also help:
>>>
>>> # xfs_io -c "fiemap -v" <corrupted_file>
>>>
>>> This is completely safe, but I'm not 100% sure about if the info is enough.
>>
>> # xfs_io -c "fiemap -v"
>> system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
>> Available at: http://termbin.com/nsej
>>
>> # xfs_io -c "fiemap -v"
>> system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal
>> Available at: http://termbin.com/4fiz
> 
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-14  5:30     ` Qu Wenruo
  2018-05-14  6:36       ` Qu Wenruo
@ 2018-05-14 10:29       ` james harvey
  2018-05-14 11:05         ` Qu Wenruo
  1 sibling, 1 reply; 14+ messages in thread
From: james harvey @ 2018-05-14 10:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

On Mon, May 14, 2018 at 2:36 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> OK, I could reproduce it now.
>
> Just mount with -o nodatasum, then create a file.
> Remount with compress-force=lzo, then write something.
>
> So at least btrfs should disallow such thing.
>
> Thanks,
> Qu

Would the corrupted dump and correct one of the file, and kernel with
kasan output still help?  Or, with what you reproduced, do you have
what you need?


On Mon, May 14, 2018 at 1:30 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> So there is something wrong that btrfs allows compressed data to be
> generated for such file.
> (Could not reproduce the same behavior with 4.16 kernel, could such
> problem happens in older kernels? Or just get fixed recently?)
>
> Then some corruption screwed up the compressed data, and when we
> decompress, the kernel is screwed up.

In this thread, Chris Murphy noted systemd sets the "C" attribute, and
discussed what sounds to me like what happened here: "Usually nocow
also means no compression. But in the archives is a thread where I
found that compression can be forced on nocow if the file is submitted
for defragmentation and either the volume is mounted with compression
or the file has inherited chattr +c (I don't remember which or
possibly both). And systemd does submit rotated logs for
defragmentation."



(If you don't need the dumps and kernel kasan output, you can skip the
rest of this reply)



> Yep, even the last case it still looks like that it's kernel memory get
> corrupted.
>
> From the thread, since you have already located the corrupted mirror,
> would you please provide the corrupted dump along with correct one?
>
> It would help a lot for us to under stand what's going on.

Absolutely.  I'm not sure how to best get you that, though.

"filefrag -v" on one of the files can be seen here:
https://bugzilla.kernel.org/attachment.cgi?id=275953

It lists 58 fragments.

filefrag lists the ending offsets and length based on the uncompressed
sizes.  filefrag doesn't account for the compression.

So, in this thread, I compared the first 4k of fragments 0-57 on each
disk and found all the corruption was on disk 1.  (And the entire
207*4096 bytes on fragment 58.)  Grabbing more than 4k of each
fragment brings in data from other files.  So, I might have compared
all of the data (fragments 0-57 are 128k uncompressed, and at least
fragment 0 uncompressed does lzop down to about 2k, so perhaps all the
other 128k fragments compress to within 4k, but maybe not) but this
might not have grabbed all the data.

I could give you (56) 128k, (1) 68k, and (1) 828k fragments, but
they'd include extra data not involved, so you'd have to find a way to
use them, and without the metadata saying how many bytes of each
fragment to use, it might not be easy to put together.  (Maybe
chopping off all the trailing 0's in each fragment would do the
trick.)  Maybe the first 9 byte header on each fragment encodes the
length actually used?

If this is useful to you, I'd be happy to provide it, along with the
correct one.

If there's a better way than this, I'd be happy to do that instead.  I
of course can't just copy the file, so have to do something like dd or
"btrfs-map-logical -o".  "btrfs-map-logical -o" can't automatically
grab the proper length, because it needs a size argument, and if not
given, defaults to the 16k nodesize.

> The dump indicates the same conclusion you reached.
> The inode has NODATACOW NODATASUM flag, which means it should not has
> csum nor has data compressed.
> While in fact we have tons of compressed extents.
>
> But the following fiemap result also shows that these extents get
> shared. This could happen when there is a snapshot.

I do run snapper.

> To pindown the lzo decompress corruption, kasan would be a nice try.
> However this means you need to enable it at compile time, and recompile
> a kernel.
> Not to mention kasan has a great impact on performance.
>
> But it should provide more info before memory get corrupted.

Sure, it's compiling.  I'll probably be available to run it and send
results in 14 hours, if needed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
  2018-05-14 10:29       ` james harvey
@ 2018-05-14 11:05         ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-05-14 11:05 UTC (permalink / raw)
  To: james harvey; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 6608 bytes --]



On 2018年05月14日 18:29, james harvey wrote:
> On Mon, May 14, 2018 at 2:36 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> OK, I could reproduce it now.
>>
>> Just mount with -o nodatasum, then create a file.
>> Remount with compress-force=lzo, then write something.
>>
>> So at least btrfs should disallow such thing.
>>
>> Thanks,
>> Qu
> 
> Would the corrupted dump and correct one of the file, and kernel with
> kasan output still help?  Or, with what you reproduced, do you have
> what you need?

The dumps are good enough, kasan will be a little over killed.

For my reproduced case, the data is all good, thus unable to reproduce
the wildly corrupted kernel memory.

I could try corrupt them, but I'm not sure if the same symptom can be
reproduced, thus binary good/bad dump still helps.
The heavy lifting kasan can be done in my environment.

> 
> 
> On Mon, May 14, 2018 at 1:30 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> So there is something wrong that btrfs allows compressed data to be
>> generated for such file.
>> (Could not reproduce the same behavior with 4.16 kernel, could such
>> problem happens in older kernels? Or just get fixed recently?)
>>
>> Then some corruption screwed up the compressed data, and when we
>> decompress, the kernel is screwed up.
> 
> In this thread, Chris Murphy noted systemd sets the "C" attribute, and
> discussed what sounds to me like what happened here: "Usually nocow
> also means no compression. But in the archives is a thread where I
> found that compression can be forced on nocow if the file is submitted
> for defragmentation

Oh, defrag is making things more complex here.

But at least the kernel patch should also address that case.

> and either the volume is mounted with compression
> or the file has inherited chattr +c (I don't remember which or
> possibly both). And systemd does submit rotated logs for
> defragmentation."
> 
> 
> 
> (If you don't need the dumps and kernel kasan output, you can skip the
> rest of this reply)
> 
> 
> 
>> Yep, even the last case it still looks like that it's kernel memory get
>> corrupted.
>>
>> From the thread, since you have already located the corrupted mirror,
>> would you please provide the corrupted dump along with correct one?
>>
>> It would help a lot for us to under stand what's going on.
> 
> Absolutely.  I'm not sure how to best get you that, though.

It's great you'd like to help.
Considering you have experienced btrfs-map-block usage, it should be a
piece of cake.
And considering you're an Arch user, it won't really take you much time.
(Yeah, Arch users rock!)

> 
> "filefrag -v" on one of the files can be seen here:
> https://bugzilla.kernel.org/attachment.cgi?id=275953
> 
> It lists 58 fragments.

That filefrag output is less useful here, the main problem is it only
provides the uncompressed extent size.
That's why I asked for debug-tree dump.

And now those debug tree dumps would shine.

I'll take the 72267 dump as an example.

------
	item 182 key (72267 EXTENT_DATA *0* ) itemoff 6375 itemsize 53
		generation 41083 type 1 (regular)
		extent data disk byte *2625134592* nr *4096*
		extent data offset 0 nr 131072 ram 131072
		extent compression 2 (lzo)
------
Important numbers are surrounded by '*'.

The "0" means the offset inside the file.
"2625134592" means the logical address that compressed extent lies.
"4096" means the compressed size of that extent.

So to dump it, passing all needed numbers to btrfs-map-logical, ins this
example you just need:

# btrfs-map-logical -l 2625134592 -b 4096 <device>

And would get the result like:
mirror 1 logical 1104150528 physical 9437184 device
/dev/mapper/test-scratch2
mirror 2 logical 1104150528 physical 1095761920 device
/dev/mapper/test-scratch1

Then grab them using dd.

> 
> filefrag lists the ending offsets and length based on the uncompressed
> sizes.  filefrag doesn't account for the compression.

Now with debug-tree dump, you have the length you need. :)

> 
> So, in this thread, I compared the first 4k of fragments 0-57 on each
> disk and found all the corruption was on disk 1.  (And the entire
> 207*4096 bytes on fragment 58.)  Grabbing more than 4k of each
> fragment brings in data from other files.

And indeed, they are all 4K sized from the dump.

>  So, I might have compared
> all of the data (fragments 0-57 are 128k uncompressed, and at least
> fragment 0 uncompressed does lzop down to about 2k, so perhaps all the
> other 128k fragments compress to within 4k, but maybe not) but this
> might not have grabbed all the data.

Any bad/good pair is enough.

> 
> I could give you (56) 128k, (1) 68k, and (1) 828k fragments, but
> they'd include extra data not involved, so you'd have to find a way to
> use them, and without the metadata saying how many bytes of each
> fragment to use, it might not be easy to put together.  (Maybe
> chopping off all the trailing 0's in each fragment would do the
> trick.)  Maybe the first 9 byte header on each fragment encodes the
> length actually used?

No need to worry now. :)

> 
> If this is useful to you, I'd be happy to provide it, along with the
> correct one.

Any pair which differs would be sufficient.

> 
> If there's a better way than this, I'd be happy to do that instead.  I
> of course can't just copy the file, so have to do something like dd or
> "btrfs-map-logical -o".  "btrfs-map-logical -o" can't automatically
> grab the proper length, because it needs a size argument, and if not
> given, defaults to the 16k nodesize.
> 
>> The dump indicates the same conclusion you reached.
>> The inode has NODATACOW NODATASUM flag, which means it should not has
>> csum nor has data compressed.
>> While in fact we have tons of compressed extents.
>>
>> But the following fiemap result also shows that these extents get
>> shared. This could happen when there is a snapshot.
> 
> I do run snapper.
> 
>> To pindown the lzo decompress corruption, kasan would be a nice try.
>> However this means you need to enable it at compile time, and recompile
>> a kernel.
>> Not to mention kasan has a great impact on performance.
>>
>> But it should provide more info before memory get corrupted.
> 
> Sure, it's compiling.  I'll probably be available to run it and send
> results in 14 hours, if needed.

No need to bother.

But it's still a pretty nice and fun learning progress of how to hack
the complex linux kernel. :)

Thanks,
Qu

> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-14 11:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-12  5:08 "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass james harvey
2018-05-12  7:51 ` Martin Steigerwald
2018-05-13  0:10   ` james harvey
2018-05-13  2:09     ` Chris Murphy
2018-05-13  5:28       ` james harvey
2018-05-13 11:01       ` james harvey
2018-05-13 11:45         ` james harvey
2018-05-13 21:27       ` Chris Murphy
2018-05-14  2:08 ` Qu Wenruo
2018-05-14  4:41   ` james harvey
2018-05-14  5:30     ` Qu Wenruo
2018-05-14  6:36       ` Qu Wenruo
2018-05-14 10:29       ` james harvey
2018-05-14 11:05         ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.