All of lore.kernel.org
 help / color / mirror / Atom feed
* "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem.
@ 2014-04-27 16:30 Jaap Pieroen
  2014-04-28  3:26 ` Duncan
  0 siblings, 1 reply; 5+ messages in thread
From: Jaap Pieroen @ 2014-04-27 16:30 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

Hello,

When I try to delete a device from my btrfs filesystem I always get the following kernel bug error:
	[  809.161020] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
	[  809.166761] invalid opcode: 0000 [#3] SMP
See attached log file for more details. I’m trying to delete the device /dev/sdb from my filesystem.

Steps I tried so far are:
1. mount with the clear_cache option
2. balance the filesystem (results in the same kernel error)
3. scrub the filesystem
4. btrfsck —repair

During scrubbing and btrfsck some error where found and fixed. But I think these where error caused by system lockups during copying data to the new btrfs filesystem. These lockups where caused by an extraordinary amount of hard links, since I was using rsnapshot to create hourly snapshots on my old filesystem that I am migrating towards btrfs. Removing these hard links solved the lockup problems.

Something I also noted was that after the btrfsck run, the command ‘btrfs fi show’ reported “devid    4 size 0.0GiB used 98.00GiB path /dev/sdb” (mind the 0.0GB).

I’m ready to run any diagnostics necessary, but the filesystem is 4.7T so it won’t be able to provide an image.

System details:
	$ uname -a
	Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14 16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
	$ btrfs --version
	Btrfs v3.12
	$ sudo btrfs fi show
	Label: btrfs_storage  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
		Total devices 6 FS bytes used 4.57TiB
		devid    1 size 1.82TiB used 1.32TiB path /dev/sde
		devid    2 size 1.82TiB used 1.32TiB path /dev/sdf
		devid    3 size 1.82TiB used 1.32TiB path /dev/sdg
		devid    4 size 931.51GiB used 88.00GiB path /dev/sdb
		devid    6 size 2.73TiB used 947.03GiB path /dev/sdh
		devid    7 size 2.73TiB used 947.03GiB path /dev/sdi
	
	Btrfs v3.12


[-- Attachment #2: dmesg.log --]
[-- Type: application/octet-stream, Size: 20730 bytes --]




[  422.944579] BTRFS info (device sde): relocating block group 9377568325632 flags 129
[  453.928463] BTRFS info (device sde): found 1355 extents
[  460.766319] BTRFS info (device sde): found 1334 extents
[  462.216853] BTRFS info (device sde): relocating block group 9385118072832 flags 129
[  476.936402] BTRFS info (device sde): found 255 extents
[  482.668960] BTRFS info (device sde): found 255 extents
[  485.991575] BTRFS info (device sde): relocating block group 9366830907392 flags 129
[  519.821215] BTRFS info (device sde): found 2609 extents
[  528.389294] BTRFS info (device sde): found 2609 extents
[  530.482818] BTRFS info (device sde): relocating block group 9361462198272 flags 129
[  570.354624] BTRFS info (device sde): found 4829 extents
[  577.648311] BTRFS info (device sde): found 4829 extents
[  579.649814] BTRFS info (device sde): relocating block group 9356093489152 flags 129
[  612.162145] BTRFS info (device sde): found 2314 extents
[  620.911102] BTRFS info (device sde): found 2314 extents
[  623.604673] BTRFS info (device sde): relocating block group 9350724780032 flags 129
[  661.613872] BTRFS info (device sde): found 2621 extents
[  666.546872] BTRFS info (device sde): found 2621 extents
[  667.538488] BTRFS info (device sde): relocating block group 9345356070912 flags 129
[  699.461698] BTRFS info (device sde): found 1972 extents
[  706.284568] BTRFS info (device sde): found 1972 extents
[  707.447620] BTRFS info (device sde): relocating block group 9339987361792 flags 129
[  735.910785] BTRFS info (device sde): found 321 extents
[  741.392166] BTRFS info (device sde): found 321 extents
[  742.397313] BTRFS info (device sde): relocating block group 9334618652672 flags 129
[  769.054315] BTRFS info (device sde): found 312 extents
[  774.363056] BTRFS info (device sde): found 312 extents
[  775.324702] BTRFS info (device sde): relocating block group 9329249943552 flags 129
[  802.202409] BTRFS info (device sde): found 186 extents[  807.019172] BTRFS info (device sde): found 186 extents
[  807.969960] BTRFS info (device sde): relocating block group 286718427136 flags 129
[  808.869348] BTRFS info (device sde): csum failed ino 267 off 107151360 csum 2668642085 expected csum 70570527
[  808.869458] BTRFS info (device sde): csum failed ino 267 off 107155456 csum 2021857215 expected csum 70570527
[  808.869499] BTRFS info (device sde): csum failed ino 267 off 107159552 csum 3630881373 expected csum 70570527
[  808.869533] BTRFS info (device sde): csum failed ino 267 off 107163648 csum 1855918840 expected csum 70570527
[  808.869568] BTRFS info (device sde): csum failed ino 267 off 107167744 csum 623206042 expected csum 70570527
[  808.869601] BTRFS info (device sde): csum failed ino 267 off 107171840 csum 3397419927 expected csum 70570527
[  808.869632] BTRFS info (device sde): csum failed ino 267 off 107175936 csum 2086086655 expected csum 70570527
[  808.869664] BTRFS info (device sde): csum failed ino 267 off 107180032 csum 3838267325 expected csum 70570527
[  808.869759] BTRFS info (device sde): csum failed ino 267 off 107184128 csum 84619057 expected csum 70570527
[  808.869791] BTRFS info (device sde): csum failed ino 267 off 107188224 csum 2606067653 expected csum 70570527
[  808.869937] ------------[ cut here ]------------
[  808.870030] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
[  808.870137] invalid opcode: 0000 [#1] SMP
[  808.870214] Modules linked in: cuse deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc dm_crypt fscache ppdev ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 kvm ipt_REJECT xt_comment xt_LOG microcode xt_recent xt_multiport psmouse edac_core serio_raw xt_limit edac_mce_amd k10temp xt_tcpudp xt_addrtype snd_hda_codec_via ipt_MASQUERADE snd_hda_codec_hdmi snd_hda_codec_generic sp5100_tco i2c_piix4 iptable_nat nf_nat_ipv4 ftdi_sio snd_hda_intel usbserial snd_hda_codec nf_conntrack_ipv4 nf_defrag_ipv4 snd_hwdep xt_conntrack rc_tbs_nec(OF) snd_pcm ip6table_filter saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF) ip6_tables tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF) nf_conntrack_netbios_ns ir_lirc_codec(OF) tbs6982se(POF) nf_conntrack_broadcast lirc_dev(OF) tbs6991fe(POF) nf_nat_ftp tbs6618fe(POF) nf_nat ir_mce_kbd_decoder(OF) joydev saa716x_core(OF) nf_conntrack_ftp ir_sony_decoder(OF) tbs6922fe(POF) nf_conntrack ir_jvc_decoder(OF) tbs6928fe(POF) iptable_filter ir_rc6_decoder(OF) ip_tables tbs6991se(POF) x_tables stv090x(OF) ir_rc5_decoder(OF) snd_timer dvb_core(OF) ir_nec_decoder(OF) snd soundcore rc_core(OF) asus_atk0110 shpchp parport_pc mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid usb_storage radeon pata_atiixp i2c_algo_bit r8169 ttm mii drm_kms_helper drm ahci sata_sil24 libahci wmi
[  808.873160] CPU: 1 PID: 1412 Comm: btrfs-endio-3 Tainted: PF          O 3.14.1-031401-generic #201404141220
[  808.873308] Hardware name: System manufacturer System Product Name/M4A78LT-M, BIOS 0802    08/24/2010
[  808.873449] task: ffff88030a360000 ti: ffff88030bb8c000 task.ti: ffff88030bb8c000
[  808.873563] RIP: 0010:[<ffffffffa0313c33>]  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  808.873768] RSP: 0018:ffff88030bb8dcd8  EFLAGS: 00010246
[  808.873851] RAX: 0000000000000000 RBX: ffff88023f7405b8 RCX: 0000000000000000
[  808.873960] RDX: ffff88008aca2d20 RSI: 00000000cccaccc8 RDI: ffff88023f740484
[  808.874069] RBP: ffff88030bb8dd28 R08: 0000000000000000 R09: 0000000000000000
[  808.874178] R10: 0000000000000200 R11: 0000000000000000 R12: ffffea00004bcc00
[  808.874286] R13: ffff88014e7a7880 R14: ffff88023f740400 R15: 0000000006633000
[  808.874397] FS:  00007fa14510d880(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000
[  808.874520] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  808.874609] CR2: 00007f16b828f000 CR3: 000000008573d000 CR4: 00000000000007e0
[  808.874717] Stack:
[  808.874752]  0000000000006633 ffff8800cf96c000 0000000000000000 ffff88014e7a7880
[  808.874885]  ffffffff81eb1180 ffffea00004bcc00 ffff88015b03cff0 ffff88023f7405b8
[  808.875017]  0000000006633000 0000000000000000 ffff88030bb8ddb8 ffffffffa0313f1b
[  808.875149] Call Trace:
[  808.875248]  [<ffffffffa0313f1b>] end_bio_extent_readpage+0x2db/0x3d0 [btrfs]
[  808.875367]  [<ffffffff8120a013>] bio_endio+0x53/0xa0
[  808.875450]  [<ffffffff8120a072>] bio_endio_nodec+0x12/0x20
[  808.875579]  [<ffffffffa02ece81>] end_workqueue_fn+0x41/0x50 [btrfs]
[  808.875726]  [<ffffffffa03247d0>] worker_loop+0xa0/0x330 [btrfs]
[  808.875867]  [<ffffffffa0324730>] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs]
[  808.875996]  [<ffffffff8108ffa9>] kthread+0xc9/0xe0
[  808.876076]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  808.876175]  [<ffffffff817721bc>] ret_from_fork+0x7c/0xb0
[  808.876262]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  808.876357] Code: 00 00 83 f8 01 0f 8e 49 ff ff ff 49 8b 4d 18 49 8b 55 10 4d 89 e0 45 8b 4d 2c 48 8b 7d b8 4c 89 fe e8 72 fc ff ff e9 29 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89
[  808.877002] RIP  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  808.877157]  RSP <ffff88030bb8dcd8>
[  808.902677] ------------[ cut here ]------------
[  808.902710] ---[ end trace 65b3947795acb944 ]---
[  808.902845] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
[  808.902953] invalid opcode: 0000 [#2] SMP
[  808.903039] Modules linked in: cuse deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc dm_crypt fscache ppdev ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 kvm ipt_REJECT xt_comment xt_LOG microcode xt_recent xt_multiport psmouse edac_core serio_raw xt_limit edac_mce_amd k10temp xt_tcpudp xt_addrtype snd_hda_codec_via ipt_MASQUERADE snd_hda_codec_hdmi snd_hda_codec_generic sp5100_tco i2c_piix4 iptable_nat nf_nat_ipv4 ftdi_sio snd_hda_intel usbserial snd_hda_codec nf_conntrack_ipv4 nf_defrag_ipv4 snd_hwdep xt_conntrack rc_tbs_nec(OF) snd_pcm ip6table_filter saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF) ip6_tables tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF) nf_conntrack_netbios_ns ir_lirc_codec(OF) tbs6982se(POF) nf_conntrack_broadcast lirc_dev(OF) tbs6991fe(POF) nf_nat_ftp tbs6618fe(POF) nf_nat ir_mce_kbd_decoder(OF) joydev saa716x_core(OF) nf_conntrack_ftp ir_sony_decoder(OF) tbs6922fe(POF) nf_conntrack ir_jvc_decoder(OF) tbs6928fe(POF) iptable_filter ir_rc6_decoder(OF) ip_tables tbs6991se(POF) x_tables stv090x(OF) ir_rc5_decoder(OF) snd_timer dvb_core(OF) ir_nec_decoder(OF) snd soundcore rc_core(OF) asus_atk0110 shpchp parport_pc mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid usb_storage radeon pata_atiixp i2c_algo_bit r8169 ttm mii drm_kms_helper drm ahci sata_sil24 libahci wmi
[  808.914303] CPU: 0 PID: 4097 Comm: btrfs-endio-3 Tainted: PF     D    O 3.14.1-031401-generic #201404141220
[  808.922438] Hardware name: System manufacturer System Product Name/M4A78LT-M, BIOS 0802    08/24/2010
[  808.930654] task: ffff8800a61a63c0 ti: ffff88023ef70000 task.ti: ffff88023ef70000
[  808.938876] RIP: 0010:[<ffffffffa0313c33>]  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  808.947217] RSP: 0018:ffff88023ef71cd8  EFLAGS: 00010246
[  808.955461] RAX: 0000000000000000 RBX: ffff88023f7405b8 RCX: 0000000000000000
[  808.963631] RDX: ffff88008aca2e40 RSI: 00000000cceeccec RDI: ffff88023f740484
[  808.971681] RBP: ffff88023ef71d28 R08: 0000000000000000 R09: 0000000000000000
[  808.979597] R10: 0000000000000200 R11: 0000000000000000 R12: ffffea00004bcb80
[  808.987375] R13: ffff88014e7a7340 R14: ffff88023f740400 R15: 0000000006631000
[  808.995038] FS:  00007f2bb7377740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
[  809.002615] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  809.010059] CR2: 00007f16bc28d000 CR3: 00000002ec785000 CR4: 00000000000007f0
[  809.017412] Stack:
[  809.024589]  0000000000006631 ffff8800cf96c000 0000000000000000 ffff88014e7a7340
[  809.031775]  ffff88030a92d300 ffffea00004bcb80 ffff88015b03c870 ffff88023f7405b8
[  809.038841]  0000000006631000 0000000000000000 ffff88023ef71db8 ffffffffa0313f1b
[  809.045852] Call Trace:
[  809.052796]  [<ffffffffa0313f1b>] end_bio_extent_readpage+0x2db/0x3d0 [btrfs]
[  809.059740]  [<ffffffff8120a013>] bio_endio+0x53/0xa0
[  809.066601]  [<ffffffff8120a072>] bio_endio_nodec+0x12/0x20
[  809.073438]  [<ffffffffa02ece81>] end_workqueue_fn+0x41/0x50 [btrfs]
[  809.080279]  [<ffffffffa03247d0>] worker_loop+0xa0/0x330 [btrfs]
[  809.087098]  [<ffffffffa0324730>] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs]
[  809.093917]  [<ffffffff8108ffa9>] kthread+0xc9/0xe0
[  809.100721]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  809.107532]  [<ffffffff817721bc>] ret_from_fork+0x7c/0xb0
[  809.114294]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  809.121034] Code: 00 00 83 f8 01 0f 8e 49 ff ff ff 49 8b 4d 18 49 8b 55 10 4d 89 e0 45 8b 4d 2c 48 8b 7d b8 4c 89 fe e8 72 fc ff ff e9 29 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89
[  809.135474] RIP  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  809.142513]  RSP <ffff88023ef71cd8>
[  809.149468] ------------[ cut here ]------------
[  809.149603] ---[ end trace 65b3947795acb945 ]---
[  809.161020] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
[  809.166761] invalid opcode: 0000 [#3] SMP
[  809.172495] Modules linked in: cuse deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc dm_crypt fscache ppdev ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 kvm ipt_REJECT xt_comment xt_LOG microcode xt_recent xt_multiport psmouse edac_core serio_raw xt_limit edac_mce_amd k10temp xt_tcpudp xt_addrtype snd_hda_codec_via ipt_MASQUERADE snd_hda_codec_hdmi snd_hda_codec_generic sp5100_tco i2c_piix4 iptable_nat nf_nat_ipv4 ftdi_sio snd_hda_intel usbserial snd_hda_codec nf_conntrack_ipv4 nf_defrag_ipv4 snd_hwdep xt_conntrack rc_tbs_nec(OF) snd_pcm ip6table_filter saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF) ip6_tables tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF) nf_conntrack_netbios_ns ir_lirc_codec(OF) tbs6982se(POF) nf_conntrack_broadcast lirc_dev(OF) tbs6991fe(POF) nf_nat_ftp tbs6618fe(POF) nf_nat ir_mce_kbd_decoder(OF) joydev saa716x_core(OF) nf_conntrack_ftp ir_sony_decoder(OF) tbs6922fe(POF) nf_conntrack ir_jvc_decoder(OF) tbs6928fe(POF) iptable_filter ir_rc6_decoder(OF) ip_tables tbs6991se(POF) x_tables stv090x(OF) ir_rc5_decoder(OF) snd_timer dvb_core(OF) ir_nec_decoder(OF) snd soundcore rc_core(OF) asus_atk0110 shpchp parport_pc mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid usb_storage radeon pata_atiixp i2c_algo_bit r8169 ttm mii drm_kms_helper drm ahci sata_sil24 libahci wmi
[  809.244598] CPU: 1 PID: 4096 Comm: btrfs-endio-2 Tainted: PF     D    O 3.14.1-031401-generic #201404141220
[  809.251438] Hardware name: System manufacturer System Product Name/M4A78LT-M, BIOS 0802    08/24/2010
[  809.258357] task: ffff8800a61a4ad0 ti: ffff88021e98e000 task.ti: ffff88021e98e000
[  809.265289] RIP: 0010:[<ffffffffa0313c33>]  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  809.272315] RSP: 0018:ffff88021e98fcd8  EFLAGS: 00010246
[  809.279275] RAX: 0000000000000000 RBX: ffff88023f7405b8 RCX: 0000000000000000
[  809.286186] RDX: ffff88008aca2720 RSI: 00000000ccf2ccf0 RDI: ffff88023f740484
[  809.292972] RBP: ffff88021e98fd28 R08: 0000000000000000 R09: 0000000000000000
[  809.299660] R10: 0000000000000200 R11: 0000000000000000 R12: ffffea00004bcd00
[  809.306205] R13: ffff88014e7a7e40 R14: ffff88023f740400 R15: 0000000006637000
[  809.312650] FS:  00007f1faacdc700(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000
[  809.319040] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  809.325293] CR2: 00007f16b828f000 CR3: 000000030d306000 CR4: 00000000000007e0
[  809.331464] Stack:
[  809.337504]  0000000000006637 ffff8800cf96c000 0000000000000000 ffff88014e7a7e40
[  809.343525]  ffff88030e94c000 ffffea00004bcd00 ffff88015b03d630 ffff88023f7405b8
[  809.349446]  0000000006637000 0000000000000000 ffff88021e98fdb8 ffffffffa0313f1b
[  809.355319] Call Trace:
[  809.361107]  [<ffffffffa0313f1b>] end_bio_extent_readpage+0x2db/0x3d0 [btrfs]
[  809.366914]  [<ffffffff8120a013>] bio_endio+0x53/0xa0
[  809.372653]  [<ffffffff8120a072>] bio_endio_nodec+0x12/0x20
[  809.378364]  [<ffffffffa02ece81>] end_workqueue_fn+0x41/0x50 [btrfs]
[  809.384054]  [<ffffffffa03247d0>] worker_loop+0xa0/0x330 [btrfs]
[  809.389727]  [<ffffffffa0324730>] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs]
[  809.395431]  [<ffffffff8108ffa9>] kthread+0xc9/0xe0
[  809.401123]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  809.406866]  [<ffffffff817721bc>] ret_from_fork+0x7c/0xb0
[  809.412526]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  809.418158] Code: 00 00 83 f8 01 0f 8e 49 ff ff ff 49 8b 4d 18 49 8b 55 10 4d 89 e0 45 8b 4d 2c 48 8b 7d b8 4c 89 fe e8 72 fc ff ff e9 29 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89
[  809.430145] RIP  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  809.435962]  RSP <ffff88021e98fcd8>
[  809.441833] ---[ end trace 65b3947795acb946 ]---
[  809.447790] ------------[ cut here ]------------
[  809.453488] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
[  809.459195] invalid opcode: 0000 [#4] SMP
[  809.464891] Modules linked in: cuse deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc dm_crypt fscache ppdev ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 kvm ipt_REJECT xt_comment xt_LOG microcode xt_recent xt_multiport psmouse edac_core serio_raw xt_limit edac_mce_amd k10temp xt_tcpudp xt_addrtype snd_hda_codec_via ipt_MASQUERADE snd_hda_codec_hdmi snd_hda_codec_generic sp5100_tco i2c_piix4 iptable_nat nf_nat_ipv4 ftdi_sio snd_hda_intel usbserial snd_hda_codec nf_conntrack_ipv4 nf_defrag_ipv4 snd_hwdep xt_conntrack rc_tbs_nec(OF) snd_pcm ip6table_filter saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF) ip6_tables tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF) nf_conntrack_netbios_ns ir_lirc_codec(OF) tbs6982se(POF) nf_conntrack_broadcast lirc_dev(OF) tbs6991fe(POF) nf_nat_ftp tbs6618fe(POF) nf_nat ir_mce_kbd_decoder(OF) joydev saa716x_core(OF) nf_conntrack_ftp ir_sony_decoder(OF) tbs6922fe(POF) nf_conntrack ir_jvc_decoder(OF) tbs6928fe(POF) iptable_filter ir_rc6_decoder(OF) ip_tables tbs6991se(POF) x_tables stv090x(OF) ir_rc5_decoder(OF) snd_timer dvb_core(OF) ir_nec_decoder(OF) snd soundcore rc_core(OF) asus_atk0110 shpchp parport_pc mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid usb_storage radeon pata_atiixp i2c_algo_bit r8169 ttm mii drm_kms_helper drm ahci sata_sil24 libahci wmi
[  809.536319] CPU: 1 PID: 4098 Comm: btrfs-endio-4 Tainted: PF     D    O 3.14.1-031401-generic #201404141220
[  809.543111] Hardware name: System manufacturer System Product Name/M4A78LT-M, BIOS 0802    08/24/2010
[  809.549942] task: ffff8800a61a31e0 ti: ffff8802a595c000 task.ti: ffff8802a595c000
[  809.556833] RIP: 0010:[<ffffffffa0313c33>]  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  809.563769] RSP: 0018:ffff8802a595dcd8  EFLAGS: 00010246
[  809.570680] RAX: 0000000000000000 RBX: ffff88023f7405b8 RCX: 0000000000000000
[  809.577508] RDX: ffff88008aca2120 RSI: 00000000ccf6ccf4 RDI: ffff88023f740484
[  809.584249] RBP: ffff8802a595dd28 R08: 0000000000000000 R09: 0000000000000000
[  809.590855] R10: 0000000000000200 R11: 0000000000000000 R12: ffffea00004bce00
[  809.597358] R13: ffff88014e7a7e80 R14: ffff88023f740400 R15: 000000000663b000
[  809.603738] FS:  00007f1faacdc700(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000
[  809.610046] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  809.616250] CR2: 00007f16b828f000 CR3: 000000030d306000 CR4: 00000000000007e0
[  809.622360] Stack:
[  809.628317]  000000000000663b ffff8800cf96c000 0000000000000000 ffff88014e7a7e80
[  809.634275]  ffffffff81eb1180 ffffea00004bce00 ffff88015b03d3b0 ffff88023f7405b8
[  809.640147]  000000000663b000 0000000000000000 ffff8802a595ddb8 ffffffffa0313f1b
[  809.645949] Call Trace:
[  809.651693]  [<ffffffffa0313f1b>] end_bio_extent_readpage+0x2db/0x3d0 [btrfs]
[  809.657451]  [<ffffffff8120a013>] bio_endio+0x53/0xa0
[  809.663138]  [<ffffffff8120a072>] bio_endio_nodec+0x12/0x20
[  809.668771]  [<ffffffffa02ece81>] end_workqueue_fn+0x41/0x50 [btrfs]
[  809.674402]  [<ffffffffa03247d0>] worker_loop+0xa0/0x330 [btrfs]
[  809.680019]  [<ffffffffa0324730>] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs]
[  809.685664]  [<ffffffff8108ffa9>] kthread+0xc9/0xe0
[  809.691304]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  809.696948]  [<ffffffff817721bc>] ret_from_fork+0x7c/0xb0
[  809.702548]  [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0
[  809.708131] Code: 00 00 83 f8 01 0f 8e 49 ff ff ff 49 8b 4d 18 49 8b 55 10 4d 89 e0 45 8b 4d 2c 48 8b 7d b8 4c 89 fe e8 72 fc ff ff e9 29 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89
[  809.720025] RIP  [<ffffffffa0313c33>] clean_io_failure+0x1a3/0x1b0 [btrfs]
[  809.725918]  RSP <ffff8802a595dcd8>
[  809.731767] ---[ end trace 65b3947795acb947 ]---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem.
  2014-04-27 16:30 "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem Jaap Pieroen
@ 2014-04-28  3:26 ` Duncan
  2014-04-28  8:07   ` Hugo Mills
  2014-04-28 20:30   ` Jaap Pieroen
  0 siblings, 2 replies; 5+ messages in thread
From: Duncan @ 2014-04-28  3:26 UTC (permalink / raw)
  To: linux-btrfs

Jaap Pieroen posted on Sun, 27 Apr 2014 18:30:19 +0200 as excerpted:

> Hello,
> 
> When I try to delete a device from my btrfs filesystem I always get the
> following kernel bug error:

> kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
> invalid opcode: 0000 [#3] SMP
> See attached log file for more details.

That's a reasonably common, generic error, simply indicating the kernel 
got an invalid/zero opcode instead of what it was supposed to get, but 
not really saying why, tho the log does give some more info.

In the log, it relocates various block groups, but then fails on one, due 
to invalid checksum (csum).  See below for the implications of that.

> I’m trying to delete the device /dev/sdb from my filesystem.
> 
> Steps I tried so far are:
> 1. mount with the clear_cache option
> 2. balance the filesystem (results in the same kernel error)
> 3. scrub the filesystem
> 4. btrfsck —repair

Never use btrfsck (or btrfs check) with the --repair option, unless 
you're about ready to give up on the filesystem and do a mkfs, in which 
case you aren't risking anything anyway, or unless a dev suggests you run 
it.

The reason being, btrfs check --repair knows how to fix some types of 
errors, but among the ones it doesn't know how to fix, it can sometimes 
make the problem worse.  At some point it should know most problems and 
at least not make them worse, but until then, it's not a good risk to 
take unless  you really know what you're doing or it's no risk as the 
next step is blowing away the filesystem anyway.

(btrfs check, without --repair, is fine to run, since it's read-only and 
thus won't make anything worse.  But by the same token, it won't fix 
anything either, it's simply informational.)

> During scrubbing and btrfsck some error where found and fixed. But I
> think these where error caused by system lockups during copying data to
> the new btrfs filesystem. These lockups where caused by an extraordinary
> amount of hard links, since I was using rsnapshot to create hourly
> snapshots on my old filesystem that I am migrating towards btrfs.
> Removing these hard links solved the lockup problems.
> 
> Something I also noted was that after the btrfsck run, the command
> ‘btrfs fi show’ reported
> “devid    4 size 0.0GiB used 98.00GiB path /dev/sdb” (mind the 0.0GB).
> 
> I’m ready to run any diagnostics necessary, but the filesystem is 4.7T
> so it won’t be able to provide an image.
> 
> System details:
> $ uname -a Linux nasbak 3.14.1-031401-generic

Good, latest stable kernel. =:^)

> $ btrfs --version
> Btrfs v3.12

You're behind on btrfs-tools.  =:^(  The latest version is v3.14.1.

> $ sudo btrfs fi show
> Label: btrfs_storage  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
> 	Total devices 6 FS bytes used 4.57TiB
> 	devid    1 size 1.82TiB used 1.32TiB path /dev/sde
> 	devid    2 size 1.82TiB used 1.32TiB path /dev/sdf
> 	devid    3 size 1.82TiB used 1.32TiB path /dev/sdg
> 	devid    4 size 931.51GiB used 88.00GiB path /dev/sdb
> 	devid    6 size 2.73TiB used 947.03GiB path /dev/sdh
> 	devid    7 size 2.73TiB used 947.03GiB path /dev/sdi
> 	
> Btrfs v3.12

For further reference, whenever you post btrfs fi show, please post btrfs 
fi df as well, as the two provide complementary information, and the 
picture without both of them is incomplete.

If you'd supplied the btrfs fi df output, we could see what raid level 
you're running for data/metadata/system, as well as which type of chunks 
were still left on /dev/sdb.

For raid1 and raid10 modes (and dup mode on a single device), there's two 
copies of each chunk, thus a second copy to try if the checksum fails.   
Single and raid0 modes only keep a single copy, so there's not much to do 
there but find the corresponding file and delete it, to correct the 
problem.  In normal operation, if such a checksum error is found and 
there is a second copy that passes checksum, the invalid copy is 
rewritten to match.  What scrub does is go thru the entire filesystem 
looking for such errors and rewriting the invalid copy if possible, so 
you don't have to wait until you happen on the problem by accident.

You mentioned that you did try scrub and that it fixed some errors, which 
would be csum errors.  But did it leave any unfixed because there wasn't 
a second, valid copy of the invalid data with which to rewrite it?  If it 
found and fixed all the errors, then you shouldn't be seeing further csum 
errors like those in the log file, unless more are being created, which 
would indicate an ongoing problem (perhaps a device going bad).

Of course the kernel bug is presumably locking up your system, not 
allowing a clean shutdown, in which case you may well have more csum 
errors due to that.  So after rebooting, be sure to run a scrub before 
you try to balance or device delete, and hopefully eliminate the problem.

But... since you didn't post the df output, we don't know what the 
remaining content on the device is, data/metadata/system, nor do we know 
what mode it is, and it could well be that scrub can't remove it due to 
invalid csums if there's no second, valid copy, as will definitely be the 
case if it's single or raid0 mode (with data chunks being single by 
default, tho metadata and system chunks default to raid1 on a multi-
device filesystem and dup on a single-device filesystem).

If there's no valid second copy to rewrite the bad one with, you may 
simply have to figure out what file and/or snapshot(s) it belongs to and 
delete them, fixing the bad csums that way.

Of course that's assuming it's the bad csums causing the problem, not 
something else.

Meanwhile, while I don't claim to be a dev nor to /really/ read code, I 
did see some recent patches go by with comments that described bugs that 
looked to me like they might match the problem you're reporting here, 
specifically, failure to properly device delete under some conditions.  
So I'd suggest updating to a current btrfs-progs v3.14.1 and see if that 
helps.  If not, try a current v3.15-rcX testing kernel, or if you don't 
want to try that, wait a couple stable kernel releases and see if there's 
any btrfs patches applied.

With a bit of luck, between tracking down and eliminating the bad csums, 
and the newer code that I think fixes at least some of the failure to 
device delete issues, the problem will be addressed. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem.
  2014-04-28  3:26 ` Duncan
@ 2014-04-28  8:07   ` Hugo Mills
  2014-04-28 20:30   ` Jaap Pieroen
  1 sibling, 0 replies; 5+ messages in thread
From: Hugo Mills @ 2014-04-28  8:07 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 961 bytes --]

On Mon, Apr 28, 2014 at 03:26:45AM +0000, Duncan wrote:
> Jaap Pieroen posted on Sun, 27 Apr 2014 18:30:19 +0200 as excerpted:
> 
> > Hello,
> > 
> > When I try to delete a device from my btrfs filesystem I always get the
> > following kernel bug error:
> 
> > kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
> > invalid opcode: 0000 [#3] SMP
> > See attached log file for more details.
> 
> That's a reasonably common, generic error, simply indicating the kernel 
> got an invalid/zero opcode instead of what it was supposed to get, but 
> not really saying why, tho the log does give some more info.

   More than that -- the invalid opcode is simply the way that the
BUG() and BUG_ON() macros are implemented.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- People are too unreliable to be replaced by machines. ---      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem.
  2014-04-28  3:26 ` Duncan
  2014-04-28  8:07   ` Hugo Mills
@ 2014-04-28 20:30   ` Jaap Pieroen
  2014-04-29  6:28     ` Duncan
  1 sibling, 1 reply; 5+ messages in thread
From: Jaap Pieroen @ 2014-04-28 20:30 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan <at> cox.net> writes:

> 
> Jaap Pieroen posted on Sun, 27 Apr 2014 18:30:19 +0200 as excerpted:
> 
> > ... <snip>
> 
> Never use btrfsck (or btrfs check) with the --repair option, unless 
> you're about ready to give up on the filesystem and do a mkfs, in which 
> case you aren't risking anything anyway, or unless a dev suggests you run 
> it.
> 
> The reason being, btrfs check --repair knows how to fix some types of 
> errors, but among the ones it doesn't know how to fix, it can sometimes 
> make the problem worse.  At some point it should know most problems and 
> at least not make them worse, but until then, it's not a good risk to 
> take unless  you really know what you're doing or it's no risk as the 
> next step is blowing away the filesystem anyway.
> 
> (btrfs check, without --repair, is fine to run, since it's read-only and 
> thus won't make anything worse.  But by the same token, it won't fix 
> anything either, it's simply informational.)

I guess I have been lucky. Unfortunately I was locked out of SSH due to
unmounting my /home folder, so I didn't copy the error message. But if memory
serves me well it only found csum errors which I guess it never corrected.

>  ...
> 
> You're behind on btrfs-tools.  =:^(  The latest version is v3.14.1.

I guess I like to install my packages via apt :). Since the error was a kernel
message I figured it was tool independent. But you are right, I should have
tried the latest tools.

> > $ sudo btrfs fi show
> > Label: btrfs_storage  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
> > 	Total devices 6 FS bytes used 4.57TiB
> > 	devid    1 size 1.82TiB used 1.32TiB path /dev/sde
> > 	devid    2 size 1.82TiB used 1.32TiB path /dev/sdf
> > 	devid    3 size 1.82TiB used 1.32TiB path /dev/sdg
> > 	devid    4 size 931.51GiB used 88.00GiB path /dev/sdb
> > 	devid    6 size 2.73TiB used 947.03GiB path /dev/sdh
> > 	devid    7 size 2.73TiB used 947.03GiB path /dev/sdi
> > 	
> > Btrfs v3.12
> 
> For further reference, whenever you post btrfs fi show, please post btrfs 
> fi df as well, as the two provide complementary information, and the 
> picture without both of them is incomplete.
> 
> If you'd supplied the btrfs fi df output, we could see what raid level 
> you're running for data/metadata/system, as well as which type of chunks 
> were still left on /dev/sdb.

Yep, I dropped the ball here. I did look in the wiki for a list of output
required when asking for support, but I couldn't find any. I'll make sure I add
it to the wiki for the next person.

Here is the output:

  # btrfs fi df /home/
  Data, RAID5: total=4.57TiB, used=4.57TiB
  System, RAID1: total=32.00MiB, used=352.00KiB
  Metadata, RAID1: total=7.00GiB, used=5.58GiB

Which will tell you I've been adventurous and went for raid5. :)

> 
> ...
> 
> You mentioned that you did try scrub and that it fixed some errors, which 
> would be csum errors.  But did it leave any unfixed because there wasn't 
> a second, valid copy of the invalid data with which to rewrite it?  If it 
> found and fixed all the errors, then you shouldn't be seeing further csum 
> errors like those in the log file, unless more are being created, which 
> would indicate an ongoing problem (perhaps a device going bad).

Well, in hindsight I realize the scrub actually did not fix any errors. It
reported 6 csums errors, and none of those could be fixed. Which shouldn't be a
suprise since currently scrub doesn't fix raid5 errors.

My mistake was that I thought these crc errors should be fixed by a balance
( https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg30714.html), but I
guess that only applies when you are balancing over an missing drive? I was also
under the assumption that a csum error shouldn't prevent relocation of block
groups.

> 
> Of course the kernel bug is presumably locking up your system, not 
> allowing a clean shutdown, in which case you may well have more csum 
> errors due to that.  So after rebooting, be sure to run a scrub before 
> you try to balance or device delete, and hopefully eliminate the problem.

Even when I do a 'proper' REISUB reboot?

> 
> But... since you didn't post the df output, we don't know what the 
> remaining content on the device is, data/metadata/system, nor do we know 
> what mode it is, and it could well be that scrub can't remove it due to 
> invalid csums if there's no second, valid copy, as will definitely be the 
> case if it's single or raid0 mode (with data chunks being single by 
> default, tho metadata and system chunks default to raid1 on a multi-
> device filesystem and dup on a single-device filesystem).
> 
> If there's no valid second copy to rewrite the bad one with, you may 
> simply have to figure out what file and/or snapshot(s) it belongs to and 
> delete them, fixing the bad csums that way.
>

This is what I'll do as a workaround. It's unfortunate that a balance didn't
seem to take care of these checksum errors.

> 
> Of course that's assuming it's the bad csums causing the problem, not 
> something else.

I assume it is so. I deleted the files with the wrong csum and reran the device
delete command. It has managed to progress much further than before and, as i'm
writing, is near completion.

> Meanwhile, while I don't claim to be a dev nor to /really/ read code, I 
> did see some recent patches go by with comments that described bugs that 
> looked to me like they might match the problem you're reporting here, 
> specifically, failure to properly device delete under some conditions.  
> So I'd suggest updating to a current btrfs-progs v3.14.1 and see if that 
> helps.  If not, try a current v3.15-rcX testing kernel, or if you don't 
> want to try that, wait a couple stable kernel releases and see if there's 
> any btrfs patches applied.
> 
> With a bit of luck, between tracking down and eliminating the bad csums, 
> and the newer code that I think fixes at least some of the failure to 
> device delete issues, the problem will be addressed. =:^)
> 

The problems where addressed and I gained a lot of new insights. It makes me
glad that I made decided to give btrfs a try.

The take away is that:
- The 'kernel bug' message is not something alarming, but even a reasonably
  common occurrence. (though undocumented and might I say not user-friendly.
  Maybe even user scaring :o)   )
- Unresolved csum errors will disable btrfs from moving block groups around
  effectively inhibiting balancing / device deleting

How a raid5 user should be able to resolve these csum error(s), apart from
trashing the file(s) or yanking out the device containing the file and
rebalancing, is yet unclear to me. Maybe raid5 is not ready for users who want
to protect themselves agains corruption/bit rot?

Lastely: thanks a lot for this elaborate and insightful response! I realize
you have taken quite some time to give me this good in depth answer and I
appreciate it. A lot. It was very helpful.

-- Jaap


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem.
  2014-04-28 20:30   ` Jaap Pieroen
@ 2014-04-29  6:28     ` Duncan
  0 siblings, 0 replies; 5+ messages in thread
From: Duncan @ 2014-04-29  6:28 UTC (permalink / raw)
  To: linux-btrfs

Jaap Pieroen posted on Mon, 28 Apr 2014 20:30:55 +0000 as excerpted:

>> For further reference, whenever you post btrfs fi show, please post
>> btrfs fi df as well, as the two provide complementary information, and
>> the picture without both of them is incomplete.
>> 
>> If you'd supplied the btrfs fi df output, we could see what raid level
>> you're running for data/metadata/system, as well as which type of
>> chunks were still left on /dev/sdb.
> 
> Yep, I dropped the ball here. I did look in the wiki for a list of
> output required when asking for support, but I couldn't find any. I'll
> make sure I add it to the wiki for the next person.

Thanks.  That's likely to be quite practically useful to many people over 
time, including me.  =:^)

(FWIW, while I mentally appreciate that wikis are there to be user-
edited, in practice I seem to treat them like I do most of the web, read-
only, and never actually seem to get around to changing them at all.  
OTOH, I've always enjoyed the give and take of newsgroups and mailing 
lists and in practice spend quite a bit of time replying to posts, 
hopefully helping others there, which I know I do based on the thanks I 
get.  So I really /can/ thank you for putting that on the wiki, since 
while it's likely to make my job replying to posts easier, in practice 
it's something I'd be extremely unlikely to change on the wiki myself, 
even on wikis like the btrfs wiki which I highly value and have spent 
quite some time reading and much more replying on the related list, 
because for whatever reason I personally just seem to have an easier time 
replying on-list than I do editing wikis.  But I'm sure glad others get 
around to editing them. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-29  6:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-27 16:30 "kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!" when deleting device or balancing filesystem Jaap Pieroen
2014-04-28  3:26 ` Duncan
2014-04-28  8:07   ` Hugo Mills
2014-04-28 20:30   ` Jaap Pieroen
2014-04-29  6:28     ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.