linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel panic after upgrading to Linux 5.5
@ 2020-03-16  3:13 Tomasz Chmielewski
  2020-03-16  3:33 ` Tomasz Chmielewski
  2020-03-16  5:06 ` Qu Wenruo
  0 siblings, 2 replies; 7+ messages in thread
From: Tomasz Chmielewski @ 2020-03-16  3:13 UTC (permalink / raw)
  To: Btrfs BTRFS

After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the 
system panics shortly after mounting and starting to use a btrfs 
filesystem. Here is a dmesg - please advise how to deal with it.
It has since crashed several times, because of panic=10 parameter 
(system boots, runs for a while, crashes, boots again, and so on).

Mount options:

noatime,ssd,space_cache=v2,user_subvol_rm_allowed



[   65.777428] BTRFS info (device sda2): enabling ssd optimizations
[   65.777435] BTRFS info (device sda2): using free space tree
[   65.777436] BTRFS info (device sda2): has skinny extents
[   98.225099] BTRFS error (device sda2): parent transid verify failed 
on 19718118866944 wanted 664218442 found 674530371
[   98.225594] BTRFS error (device sda2): parent transid verify failed 
on 19718118866944 wanted 664218442 found 674530371
[   98.225757] BTRFS warning (device sda2): error accounting new delayed 
refs extent (err code: -5), quota inconsistent
[  129.044785] ------------[ cut here ]------------
[  129.044840] WARNING: CPU: 4 PID: 4476 at fs/btrfs/qgroup.c:2523 
btrfs_qgroup_account_extents+0x211/0x250 [btrfs]
[  129.044841] Modules linked in: unix_diag binfmt_misc nf_tables 
nfnetlink ebt_ip ebtable_filter ebtables joydev input_leds hid_generic 
amd64_edac_mod edac_mce_amd kvm_amd kvm ip6table_filter ipmi_ssif 
crct10dif_pclmul ip6table_nat ip6_tables crc32_pclmul 
ghash_clmulni_intel iptable_filter xt_CHECKSUM iptable_mangle 
xt_MASQUERADE xt_comment xt_nat aesni_intel xt_tcpudp iptable_nat 
crypto_simd nf_nat drm_vram_helper cryptd drm_ttm_helper glue_helper 
nf_conntrack ttm nf_defrag_ipv6 nf_defrag_ipv4 drm_kms_helper cec 
bpfilter rc_core drm usbhid hid fb_sys_fops syscopyarea sysfillrect 
sysimgblt k10temp ccp ipmi_si ipmi_devintf ipmi_msghandler mac_hid 
sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq raid1 raid0 multipath linear bnx2x igb 
i2c_algo_bit mdio dca libcrc32c ahci libahci i2c_piix4
[  129.044896] CPU: 4 PID: 4476 Comm: btrfs-transacti Kdump: loaded Not 
tainted 5.6.0-050600rc5-generic #202003082130
[  129.044897] Hardware name: GIGABYTE MZ31-AR0-00/MZ31-AR0-00, BIOS 
F03e 09/13/2017
[  129.044941] RIP: 0010:btrfs_qgroup_account_extents+0x211/0x250 
[btrfs]
[  129.044945] Code: 85 db 74 21 48 8b 03 48 8b 7b 08 48 83 c3 18 4c 89 
f1 4c 89 fa 4c 89 ee e8 7c 5f 6f e5 48 8b 03 48 85 c0 75 e2 e9 67 ff ff 
ff <0f> 0b 49 8b 57 18 45 31 c9 4d 8d 47 38 31 c9 4c 89 ee e8 d8 95 ff
[  129.044947] RSP: 0018:ffffadb1cef7fde8 EFLAGS: 00010246
[  129.044949] RAX: ffff9c4a4b36c300 RBX: ffff9c1a5768a550 RCX: 
0000000000000017
[  129.044951] RDX: 0000000000000001 RSI: 000011edd6f5c000 RDI: 
0000000000000000
[  129.044952] RBP: ffffadb1cef7fe38 R08: ffff9c4a4b36c300 R09: 
ffff9c4a4cb4bc00
[  129.044953] R10: 0000000000004000 R11: 0000000000000010 R12: 
0000000000000000
[  129.044954] R13: ffff9c4a59240000 R14: 0000000000000001 R15: 
ffff9c4a4b36c300
[  129.044957] FS:  0000000000000000(0000) GS:ffff9c1a5eb00000(0000) 
knlGS:0000000000000000
[  129.044958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  129.044959] CR2: 00007f67fab03808 CR3: 0000002abae0a000 CR4: 
00000000003406e0
[  129.044961] Call Trace:
[  129.045004]  btrfs_commit_transaction+0x4dc/0x9e0 [btrfs]
[  129.045011]  ? wait_woken+0x80/0x80
[  129.045045]  transaction_kthread+0x146/0x190 [btrfs]
[  129.045051]  kthread+0x104/0x140
[  129.045083]  ? btrfs_cleanup_transaction+0x5c0/0x5c0 [btrfs]
[  129.045086]  ? kthread_park+0x90/0x90
[  129.045091]  ret_from_fork+0x22/0x40
[  129.045095] ---[ end trace e192cb9f9978caa3 ]---
[  129.094866] BTRFS error (device sda2): parent transid verify failed 
on 19718118866944 wanted 664218442 found 674530371
[  129.095331] BTRFS error (device sda2): parent transid verify failed 
on 19718118866944 wanted 664218442 found 674530371
[  129.095476] ------------[ cut here ]------------
[  129.095478] kernel BUG at mm/slub.c:304!
[  129.095551] invalid opcode: 0000 [#1] SMP NOPTI
[  129.095618] CPU: 28 PID: 4476 Comm: btrfs-transacti Kdump: loaded 
Tainted: G        W         5.6.0-050600rc5-generic #202003082130
[  129.095704] Hardware name: GIGABYTE MZ31-AR0-00/MZ31-AR0-00, BIOS 
F03e 09/13/2017
[  129.095780] RIP: 0010:__slab_free+0x183/0x330
[  129.095842] Code: 00 48 89 c7 fa 66 0f 1f 44 00 00 f0 49 0f ba 2c 24 
00 72 65 4d 3b 6c 24 20 74 11 49 0f ba 34 24 00 57 9d 0f 1f 44 00 00 eb 
9f <0f> 0b 49 3b 5c 24 28 75 e8 48 8b 44 24 28 49 89 4c 24 28 49 89 44
[  129.095953] RSP: 0018:ffffadb1cef7fcf0 EFLAGS: 00010246
[  129.096018] RAX: ffff9c1a47934c00 RBX: 0000000080800076 RCX: 
ffff9c1a47934c00
[  129.096088] RDX: ffff9c1a47934c00 RSI: ffffe1d03f1e4d00 RDI: 
ffff9c1a5e407800
[  129.096157] RBP: ffffadb1cef7fd90 R08: 0000000000000001 R09: 
ffffffffc0305740
[  129.096254] R10: ffff9c1a47934c00 R11: 0000000000000001 R12: 
ffffe1d03f1e4d00
[  129.096366] R13: ffff9c1a47934c00 R14: ffff9c1a5e407800 R15: 
ffff9c4a4b36c300
[  129.096469] FS:  0000000000000000(0000) GS:ffff9c1a5ec80000(0000) 
knlGS:0000000000000000
[  129.096596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  129.096688] CR2: 0000555a48f94988 CR3: 0000002abae0a000 CR4: 
00000000003406e0
[  129.096785] Call Trace:
[  129.096868]  ? kfree+0x22b/0x240
[  129.096971]  ? ulist_free+0x20/0x30 [btrfs]
[  129.097074]  ? btrfs_find_all_roots_safe+0xdd/0x130 [btrfs]
[  129.097182]  ? ulist_free+0x20/0x30 [btrfs]
[  129.097270]  kfree+0x22b/0x240
[  129.097368]  ulist_free+0x20/0x30 [btrfs]
[  129.097469]  btrfs_qgroup_account_extents+0x91/0x250 [btrfs]
[  129.097577]  btrfs_commit_transaction+0x4dc/0x9e0 [btrfs]
[  129.097670]  ? wait_woken+0x80/0x80
[  129.097769]  transaction_kthread+0x146/0x190 [btrfs]
[  129.097860]  kthread+0x104/0x140
[  129.097955]  ? btrfs_cleanup_transaction+0x5c0/0x5c0 [btrfs]
[  129.098047]  ? kthread_park+0x90/0x90
[  129.098133]  ret_from_fork+0x22/0x40
[  129.098218] Modules linked in: unix_diag binfmt_misc nf_tables 
nfnetlink ebt_ip ebtable_filter ebtables joydev input_leds hid_generic 
amd64_edac_mod edac_mce_amd kvm_amd kvm ip6table_filter ipmi_ssif 
crct10dif_pclmul ip6table_nat ip6_tables crc32_pclmul 
ghash_clmulni_intel iptable_filter xt_CHECKSUM iptable_mangle 
xt_MASQUERADE xt_comment xt_nat aesni_intel xt_tcpudp iptable_nat 
crypto_simd nf_nat drm_vram_helper cryptd drm_ttm_helper glue_helper 
nf_conntrack ttm nf_defrag_ipv6 nf_defrag_ipv4 drm_kms_helper cec 
bpfilter rc_core drm usbhid hid fb_sys_fops syscopyarea sysfillrect 
sysimgblt k10temp ccp ipmi_si ipmi_devintf ipmi_msghandler mac_hid 
sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq raid1 raid0 multipath linear bnx2x igb 
i2c_algo_bit mdio dca libcrc32c ahci libahci i2c_piix4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel panic after upgrading to Linux 5.5
  2020-03-16  3:13 kernel panic after upgrading to Linux 5.5 Tomasz Chmielewski
@ 2020-03-16  3:33 ` Tomasz Chmielewski
  2020-03-16  5:06 ` Qu Wenruo
  1 sibling, 0 replies; 7+ messages in thread
From: Tomasz Chmielewski @ 2020-03-16  3:33 UTC (permalink / raw)
  To: Btrfs BTRFS

On 2020-03-16 12:13, Tomasz Chmielewski wrote:
> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the
> system panics shortly after mounting and starting to use a btrfs
> filesystem. Here is a dmesg - please advise how to deal with it.
> It has since crashed several times, because of panic=10 parameter
> (system boots, runs for a while, crashes, boots again, and so on).

Additionally, I also see that btrfs quota was enabled:

> [  129.044896] CPU: 4 PID: 4476 Comm: btrfs-transacti Kdump: loaded
> Not tainted 5.6.0-050600rc5-generic #202003082130
> [  129.044897] Hardware name: GIGABYTE MZ31-AR0-00/MZ31-AR0-00, BIOS
> F03e 09/13/2017
> [  129.044941] RIP: 0010:btrfs_qgroup_account_extents+0x211/0x250 
> [btrfs]

How is that possible? I always make sure to disable btrfs quotas after 
creating a filesystem, and it was also the case here:

# history|grep quota
  4894  btrfs quota disable /data/lxd   # <------ long time ago, history 
at 4894, now history at >11207
11207  history|grep quota


The server does not seem to crash with quotas disabled (at least it's up 
for 30 mins now).


Now I've checked a couple of other servers, and on some of them, quota 
is also enabled (as verified with "btrfs quota rescan /data/lxd", which 
in not exiting with an error if the quotas are on - is there a better 
check to see if the quota is on or off?). That's not very encouraging 
that quota somehow enables itself.



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel panic after upgrading to Linux 5.5
  2020-03-16  3:13 kernel panic after upgrading to Linux 5.5 Tomasz Chmielewski
  2020-03-16  3:33 ` Tomasz Chmielewski
@ 2020-03-16  5:06 ` Qu Wenruo
  2020-03-16  5:19   ` Tomasz Chmielewski
  1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2020-03-16  5:06 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 7540 bytes --]



On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the
> system panics shortly after mounting and starting to use a btrfs
> filesystem. Here is a dmesg - please advise how to deal with it.
> It has since crashed several times, because of panic=10 parameter
> (system boots, runs for a while, crashes, boots again, and so on).
> 
> Mount options:
> 
> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
> 
> 
> 
> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
> [   65.777435] BTRFS info (device sda2): using free space tree
> [   65.777436] BTRFS info (device sda2): has skinny extents
> [   98.225099] BTRFS error (device sda2): parent transid verify failed
> on 19718118866944 wanted 664218442 found 674530371
> [   98.225594] BTRFS error (device sda2): parent transid verify failed
> on 19718118866944 wanted 664218442 found 674530371

This is the root cause, not quota.

The metadata is already corrupted, and quota is the first to complain
about it.

Thanks,
Qu

> [   98.225757] BTRFS warning (device sda2): error accounting new delayed
> refs extent (err code: -5), quota inconsistent
> [  129.044785] ------------[ cut here ]------------
> [  129.044840] WARNING: CPU: 4 PID: 4476 at fs/btrfs/qgroup.c:2523
> btrfs_qgroup_account_extents+0x211/0x250 [btrfs]
> [  129.044841] Modules linked in: unix_diag binfmt_misc nf_tables
> nfnetlink ebt_ip ebtable_filter ebtables joydev input_leds hid_generic
> amd64_edac_mod edac_mce_amd kvm_amd kvm ip6table_filter ipmi_ssif
> crct10dif_pclmul ip6table_nat ip6_tables crc32_pclmul
> ghash_clmulni_intel iptable_filter xt_CHECKSUM iptable_mangle
> xt_MASQUERADE xt_comment xt_nat aesni_intel xt_tcpudp iptable_nat
> crypto_simd nf_nat drm_vram_helper cryptd drm_ttm_helper glue_helper
> nf_conntrack ttm nf_defrag_ipv6 nf_defrag_ipv4 drm_kms_helper cec
> bpfilter rc_core drm usbhid hid fb_sys_fops syscopyarea sysfillrect
> sysimgblt k10temp ccp ipmi_si ipmi_devintf ipmi_msghandler mac_hid
> sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic
> zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor raid6_pq raid1 raid0 multipath linear bnx2x igb
> i2c_algo_bit mdio dca libcrc32c ahci libahci i2c_piix4
> [  129.044896] CPU: 4 PID: 4476 Comm: btrfs-transacti Kdump: loaded Not
> tainted 5.6.0-050600rc5-generic #202003082130
> [  129.044897] Hardware name: GIGABYTE MZ31-AR0-00/MZ31-AR0-00, BIOS
> F03e 09/13/2017
> [  129.044941] RIP: 0010:btrfs_qgroup_account_extents+0x211/0x250 [btrfs]
> [  129.044945] Code: 85 db 74 21 48 8b 03 48 8b 7b 08 48 83 c3 18 4c 89
> f1 4c 89 fa 4c 89 ee e8 7c 5f 6f e5 48 8b 03 48 85 c0 75 e2 e9 67 ff ff
> ff <0f> 0b 49 8b 57 18 45 31 c9 4d 8d 47 38 31 c9 4c 89 ee e8 d8 95 ff
> [  129.044947] RSP: 0018:ffffadb1cef7fde8 EFLAGS: 00010246
> [  129.044949] RAX: ffff9c4a4b36c300 RBX: ffff9c1a5768a550 RCX:
> 0000000000000017
> [  129.044951] RDX: 0000000000000001 RSI: 000011edd6f5c000 RDI:
> 0000000000000000
> [  129.044952] RBP: ffffadb1cef7fe38 R08: ffff9c4a4b36c300 R09:
> ffff9c4a4cb4bc00
> [  129.044953] R10: 0000000000004000 R11: 0000000000000010 R12:
> 0000000000000000
> [  129.044954] R13: ffff9c4a59240000 R14: 0000000000000001 R15:
> ffff9c4a4b36c300
> [  129.044957] FS:  0000000000000000(0000) GS:ffff9c1a5eb00000(0000)
> knlGS:0000000000000000
> [  129.044958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  129.044959] CR2: 00007f67fab03808 CR3: 0000002abae0a000 CR4:
> 00000000003406e0
> [  129.044961] Call Trace:
> [  129.045004]  btrfs_commit_transaction+0x4dc/0x9e0 [btrfs]
> [  129.045011]  ? wait_woken+0x80/0x80
> [  129.045045]  transaction_kthread+0x146/0x190 [btrfs]
> [  129.045051]  kthread+0x104/0x140
> [  129.045083]  ? btrfs_cleanup_transaction+0x5c0/0x5c0 [btrfs]
> [  129.045086]  ? kthread_park+0x90/0x90
> [  129.045091]  ret_from_fork+0x22/0x40
> [  129.045095] ---[ end trace e192cb9f9978caa3 ]---
> [  129.094866] BTRFS error (device sda2): parent transid verify failed
> on 19718118866944 wanted 664218442 found 674530371
> [  129.095331] BTRFS error (device sda2): parent transid verify failed
> on 19718118866944 wanted 664218442 found 674530371
> [  129.095476] ------------[ cut here ]------------
> [  129.095478] kernel BUG at mm/slub.c:304!
> [  129.095551] invalid opcode: 0000 [#1] SMP NOPTI
> [  129.095618] CPU: 28 PID: 4476 Comm: btrfs-transacti Kdump: loaded
> Tainted: G        W         5.6.0-050600rc5-generic #202003082130
> [  129.095704] Hardware name: GIGABYTE MZ31-AR0-00/MZ31-AR0-00, BIOS
> F03e 09/13/2017
> [  129.095780] RIP: 0010:__slab_free+0x183/0x330
> [  129.095842] Code: 00 48 89 c7 fa 66 0f 1f 44 00 00 f0 49 0f ba 2c 24
> 00 72 65 4d 3b 6c 24 20 74 11 49 0f ba 34 24 00 57 9d 0f 1f 44 00 00 eb
> 9f <0f> 0b 49 3b 5c 24 28 75 e8 48 8b 44 24 28 49 89 4c 24 28 49 89 44
> [  129.095953] RSP: 0018:ffffadb1cef7fcf0 EFLAGS: 00010246
> [  129.096018] RAX: ffff9c1a47934c00 RBX: 0000000080800076 RCX:
> ffff9c1a47934c00
> [  129.096088] RDX: ffff9c1a47934c00 RSI: ffffe1d03f1e4d00 RDI:
> ffff9c1a5e407800
> [  129.096157] RBP: ffffadb1cef7fd90 R08: 0000000000000001 R09:
> ffffffffc0305740
> [  129.096254] R10: ffff9c1a47934c00 R11: 0000000000000001 R12:
> ffffe1d03f1e4d00
> [  129.096366] R13: ffff9c1a47934c00 R14: ffff9c1a5e407800 R15:
> ffff9c4a4b36c300
> [  129.096469] FS:  0000000000000000(0000) GS:ffff9c1a5ec80000(0000)
> knlGS:0000000000000000
> [  129.096596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  129.096688] CR2: 0000555a48f94988 CR3: 0000002abae0a000 CR4:
> 00000000003406e0
> [  129.096785] Call Trace:
> [  129.096868]  ? kfree+0x22b/0x240
> [  129.096971]  ? ulist_free+0x20/0x30 [btrfs]
> [  129.097074]  ? btrfs_find_all_roots_safe+0xdd/0x130 [btrfs]
> [  129.097182]  ? ulist_free+0x20/0x30 [btrfs]
> [  129.097270]  kfree+0x22b/0x240
> [  129.097368]  ulist_free+0x20/0x30 [btrfs]
> [  129.097469]  btrfs_qgroup_account_extents+0x91/0x250 [btrfs]
> [  129.097577]  btrfs_commit_transaction+0x4dc/0x9e0 [btrfs]
> [  129.097670]  ? wait_woken+0x80/0x80
> [  129.097769]  transaction_kthread+0x146/0x190 [btrfs]
> [  129.097860]  kthread+0x104/0x140
> [  129.097955]  ? btrfs_cleanup_transaction+0x5c0/0x5c0 [btrfs]
> [  129.098047]  ? kthread_park+0x90/0x90
> [  129.098133]  ret_from_fork+0x22/0x40
> [  129.098218] Modules linked in: unix_diag binfmt_misc nf_tables
> nfnetlink ebt_ip ebtable_filter ebtables joydev input_leds hid_generic
> amd64_edac_mod edac_mce_amd kvm_amd kvm ip6table_filter ipmi_ssif
> crct10dif_pclmul ip6table_nat ip6_tables crc32_pclmul
> ghash_clmulni_intel iptable_filter xt_CHECKSUM iptable_mangle
> xt_MASQUERADE xt_comment xt_nat aesni_intel xt_tcpudp iptable_nat
> crypto_simd nf_nat drm_vram_helper cryptd drm_ttm_helper glue_helper
> nf_conntrack ttm nf_defrag_ipv6 nf_defrag_ipv4 drm_kms_helper cec
> bpfilter rc_core drm usbhid hid fb_sys_fops syscopyarea sysfillrect
> sysimgblt k10temp ccp ipmi_si ipmi_devintf ipmi_msghandler mac_hid
> sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic
> zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor raid6_pq raid1 raid0 multipath linear bnx2x igb
> i2c_algo_bit mdio dca libcrc32c ahci libahci i2c_piix4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel panic after upgrading to Linux 5.5
  2020-03-16  5:06 ` Qu Wenruo
@ 2020-03-16  5:19   ` Tomasz Chmielewski
  2020-03-16 10:26     ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Tomasz Chmielewski @ 2020-03-16  5:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

On 2020-03-16 14:06, Qu Wenruo wrote:
> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the
>> system panics shortly after mounting and starting to use a btrfs
>> filesystem. Here is a dmesg - please advise how to deal with it.
>> It has since crashed several times, because of panic=10 parameter
>> (system boots, runs for a while, crashes, boots again, and so on).
>> 
>> Mount options:
>> 
>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>> 
>> 
>> 
>> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
>> [   65.777435] BTRFS info (device sda2): using free space tree
>> [   65.777436] BTRFS info (device sda2): has skinny extents
>> [   98.225099] BTRFS error (device sda2): parent transid verify failed
>> on 19718118866944 wanted 664218442 found 674530371
>> [   98.225594] BTRFS error (device sda2): parent transid verify failed
>> on 19718118866944 wanted 664218442 found 674530371
> 
> This is the root cause, not quota.
> 
> The metadata is already corrupted, and quota is the first to complain
> about it.

Still, should it crash the server, putting it into a cycle of 
crash-boot-crash-boot, possibly breaking the filesystem even more?

Also, how do I fix that corruption?

This server had a drive added, a full balance (to RAID-10 for data and 
metadata) and scrub a few weeks ago, with no errors. Running scrub now 
to see if it shows up anything.

btrfs filesystem stats also shows no errors:

# btrfs device stats /data/lxd
[/dev/sda2].write_io_errs    0
[/dev/sda2].read_io_errs     0
[/dev/sda2].flush_io_errs    0
[/dev/sda2].corruption_errs  0
[/dev/sda2].generation_errs  0
[/dev/sdd2].write_io_errs    0
[/dev/sdd2].read_io_errs     0
[/dev/sdd2].flush_io_errs    0
[/dev/sdd2].corruption_errs  0
[/dev/sdd2].generation_errs  0
[/dev/sdc2].write_io_errs    0
[/dev/sdc2].read_io_errs     0
[/dev/sdc2].flush_io_errs    0
[/dev/sdc2].corruption_errs  0
[/dev/sdc2].generation_errs  0
[/dev/sdb2].write_io_errs    0
[/dev/sdb2].read_io_errs     0
[/dev/sdb2].flush_io_errs    0
[/dev/sdb2].corruption_errs  0
[/dev/sdb2].generation_errs  0


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel panic after upgrading to Linux 5.5
  2020-03-16  5:19   ` Tomasz Chmielewski
@ 2020-03-16 10:26     ` Qu Wenruo
  2020-03-16 12:14       ` Tomasz Chmielewski
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2020-03-16 10:26 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 3137 bytes --]



On 2020/3/16 下午1:19, Tomasz Chmielewski wrote:
> On 2020-03-16 14:06, Qu Wenruo wrote:
>> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the
>>> system panics shortly after mounting and starting to use a btrfs
>>> filesystem. Here is a dmesg - please advise how to deal with it.
>>> It has since crashed several times, because of panic=10 parameter
>>> (system boots, runs for a while, crashes, boots again, and so on).
>>>
>>> Mount options:
>>>
>>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>>>
>>>
>>>
>>> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
>>> [   65.777435] BTRFS info (device sda2): using free space tree
>>> [   65.777436] BTRFS info (device sda2): has skinny extents
>>> [   98.225099] BTRFS error (device sda2): parent transid verify failed
>>> on 19718118866944 wanted 664218442 found 674530371
>>> [   98.225594] BTRFS error (device sda2): parent transid verify failed
>>> on 19718118866944 wanted 664218442 found 674530371
>>
>> This is the root cause, not quota.
>>
>> The metadata is already corrupted, and quota is the first to complain
>> about it.
> 
> Still, should it crash the server, putting it into a cycle of
> crash-boot-crash-boot, possibly breaking the filesystem even more?

The transid mismatch in the first place is the cause, and I'm not sure
how it happened.

Did you have any history of the kernel used on that server?

Some potential corruption source includes the v5.2.0~v5.2.14, which
could cause some tree block not written to disk.

> 
> Also, how do I fix that corruption?
> 
> This server had a drive added, a full balance (to RAID-10 for data and
> metadata) and scrub a few weeks ago, with no errors. Running scrub now
> to see if it shows up anything.

Then at least at that time, it's not corrupted.

Is there any sudden powerloss happened in recent days?
Another potential cause is out of spec FLUSH/FUA behavior, which means
the hard disk controller is not reporting correct FLUSH/FUA finish.

That means if you use the same disk/controller, and manually to cause
powerloss, it would fail just after several cycle.

Thanks,
Qu

> 
> btrfs filesystem stats also shows no errors:
> 
> # btrfs device stats /data/lxd
> [/dev/sda2].write_io_errs    0
> [/dev/sda2].read_io_errs     0
> [/dev/sda2].flush_io_errs    0
> [/dev/sda2].corruption_errs  0
> [/dev/sda2].generation_errs  0
> [/dev/sdd2].write_io_errs    0
> [/dev/sdd2].read_io_errs     0
> [/dev/sdd2].flush_io_errs    0
> [/dev/sdd2].corruption_errs  0
> [/dev/sdd2].generation_errs  0
> [/dev/sdc2].write_io_errs    0
> [/dev/sdc2].read_io_errs     0
> [/dev/sdc2].flush_io_errs    0
> [/dev/sdc2].corruption_errs  0
> [/dev/sdc2].generation_errs  0
> [/dev/sdb2].write_io_errs    0
> [/dev/sdb2].read_io_errs     0
> [/dev/sdb2].flush_io_errs    0
> [/dev/sdb2].corruption_errs  0
> [/dev/sdb2].generation_errs  0
> 
> 
> Tomasz Chmielewski
> https://lxadm.com


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel panic after upgrading to Linux 5.5
  2020-03-16 10:26     ` Qu Wenruo
@ 2020-03-16 12:14       ` Tomasz Chmielewski
  2020-03-16 12:32         ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Tomasz Chmielewski @ 2020-03-16 12:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

On 2020-03-16 19:26, Qu Wenruo wrote:
> On 2020/3/16 下午1:19, Tomasz Chmielewski wrote:
>> On 2020-03-16 14:06, Qu Wenruo wrote:
>>> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>>>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), 
>>>> the
>>>> system panics shortly after mounting and starting to use a btrfs
>>>> filesystem. Here is a dmesg - please advise how to deal with it.
>>>> It has since crashed several times, because of panic=10 parameter
>>>> (system boots, runs for a while, crashes, boots again, and so on).
>>>> 
>>>> Mount options:
>>>> 
>>>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>>>> 
>>>> 
>>>> 
>>>> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
>>>> [   65.777435] BTRFS info (device sda2): using free space tree
>>>> [   65.777436] BTRFS info (device sda2): has skinny extents
>>>> [   98.225099] BTRFS error (device sda2): parent transid verify 
>>>> failed
>>>> on 19718118866944 wanted 664218442 found 674530371
>>>> [   98.225594] BTRFS error (device sda2): parent transid verify 
>>>> failed
>>>> on 19718118866944 wanted 664218442 found 674530371
>>> 
>>> This is the root cause, not quota.
>>> 
>>> The metadata is already corrupted, and quota is the first to complain
>>> about it.
>> 
>> Still, should it crash the server, putting it into a cycle of
>> crash-boot-crash-boot, possibly breaking the filesystem even more?
> 
> The transid mismatch in the first place is the cause, and I'm not sure
> how it happened.
> 
> Did you have any history of the kernel used on that server?
> 
> Some potential corruption source includes the v5.2.0~v5.2.14, which
> could cause some tree block not written to disk.

Yes, it used to run a lot of kernel, starting with 4.18 or perhaps even 
earlier.


>> Also, how do I fix that corruption?
>> 
>> This server had a drive added, a full balance (to RAID-10 for data and
>> metadata) and scrub a few weeks ago, with no errors. Running scrub now
>> to see if it shows up anything.
> 
> Then at least at that time, it's not corrupted.
> 
> Is there any sudden powerloss happened in recent days?
> Another potential cause is out of spec FLUSH/FUA behavior, which means
> the hard disk controller is not reporting correct FLUSH/FUA finish.
> 
> That means if you use the same disk/controller, and manually to cause
> powerloss, it would fail just after several cycle.

Powerloss - possibly there was.


Tomasz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel panic after upgrading to Linux 5.5
  2020-03-16 12:14       ` Tomasz Chmielewski
@ 2020-03-16 12:32         ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2020-03-16 12:32 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 3150 bytes --]



On 2020/3/16 下午8:14, Tomasz Chmielewski wrote:
> On 2020-03-16 19:26, Qu Wenruo wrote:
>> On 2020/3/16 下午1:19, Tomasz Chmielewski wrote:
>>> On 2020-03-16 14:06, Qu Wenruo wrote:
>>>> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>>>>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the
>>>>> system panics shortly after mounting and starting to use a btrfs
>>>>> filesystem. Here is a dmesg - please advise how to deal with it.
>>>>> It has since crashed several times, because of panic=10 parameter
>>>>> (system boots, runs for a while, crashes, boots again, and so on).
>>>>>
>>>>> Mount options:
>>>>>
>>>>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>>>>>
>>>>>
>>>>>
>>>>> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
>>>>> [   65.777435] BTRFS info (device sda2): using free space tree
>>>>> [   65.777436] BTRFS info (device sda2): has skinny extents
>>>>> [   98.225099] BTRFS error (device sda2): parent transid verify failed
>>>>> on 19718118866944 wanted 664218442 found 674530371
>>>>> [   98.225594] BTRFS error (device sda2): parent transid verify failed
>>>>> on 19718118866944 wanted 664218442 found 674530371
>>>>
>>>> This is the root cause, not quota.
>>>>
>>>> The metadata is already corrupted, and quota is the first to complain
>>>> about it.
>>>
>>> Still, should it crash the server, putting it into a cycle of
>>> crash-boot-crash-boot, possibly breaking the filesystem even more?
>>
>> The transid mismatch in the first place is the cause, and I'm not sure
>> how it happened.
>>
>> Did you have any history of the kernel used on that server?
>>
>> Some potential corruption source includes the v5.2.0~v5.2.14, which
>> could cause some tree block not written to disk.
> 
> Yes, it used to run a lot of kernel, starting with 4.18 or perhaps even
> earlier.
> 
> 
>>> Also, how do I fix that corruption?
>>>
>>> This server had a drive added, a full balance (to RAID-10 for data and
>>> metadata) and scrub a few weeks ago, with no errors. Running scrub now
>>> to see if it shows up anything.
>>
>> Then at least at that time, it's not corrupted.
>>
>> Is there any sudden powerloss happened in recent days?
>> Another potential cause is out of spec FLUSH/FUA behavior, which means
>> the hard disk controller is not reporting correct FLUSH/FUA finish.
>>
>> That means if you use the same disk/controller, and manually to cause
>> powerloss, it would fail just after several cycle.
> 
> Powerloss - possibly there was.

Don't get me wrong, all modern fs should survive unexpected power loss
in theory.

If it has ran v5.2.0~v5.2.14, and power loss happened, it would be
pretty possible that v5.2.0~v5.2.14 is the cause.

If v5.2.0~v5.2.14 is not involved, and there is no extra layer between
btrfs and the block device, then I may suspect the disk (and maybe do
powerloss tests to ensure it's the disk not btrfs).

Anyway, to be clear again, if everything works as expected, then
powerloss shouldn't cause anything wrong on btrfs.

Thanks,
Qu

> 
> 
> Tomasz


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-16 12:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16  3:13 kernel panic after upgrading to Linux 5.5 Tomasz Chmielewski
2020-03-16  3:33 ` Tomasz Chmielewski
2020-03-16  5:06 ` Qu Wenruo
2020-03-16  5:19   ` Tomasz Chmielewski
2020-03-16 10:26     ` Qu Wenruo
2020-03-16 12:14       ` Tomasz Chmielewski
2020-03-16 12:32         ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).