* "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error @ 2019-10-19 19:29 Peter Hjalmarsson 2019-10-21 9:17 ` Johannes Thumshirn 0 siblings, 1 reply; 9+ messages in thread From: Peter Hjalmarsson @ 2019-10-19 19:29 UTC (permalink / raw) To: linux-btrfs Hi, While trying to reproduce another problem I have seen with BTRFS while running balance and raid6 I hit an issue resulting in: BUG: kernel NULL pointer dereference, address: 00000000000002ce I created a script trying to pinpoint the problem utilizing zram, and the run goes like this: 1. run the scripts which sets up a "SINGEL" filesystem on two larger devices, fills it with an adequate amount of data, and then tries to convert it to raid6 2. the filesystem will fail to convert due to not enough space to convert all data to raid6 (not enough space on at least 3 devices at the same time), hitting an enospc-error * Up until this point stuff still seems to work without crashing, and the system seems stable, just with two different profiles fot the data 3. issue the umount command which will be "Killed", and a backtrace will be written to dmesg This results in a filesystem that cannot be synced, unmounted, and in all just seems crashed. I have ran the script 6 times. 1 time it passed, 5 times it has crashed, and I have rebooted the computer, since that is the only way it seems to get rid of the filesysem, so the issue seems pretty reproducable. The system is a laptop running Fedora 31 Beta with the latest updates, and the latest kernel (5.3.6-300.fc31.x86_64) Do you have any input, or any other questions or stuff you want me to test? The script looks as follow: ----- $ cat run-btrfs-test modprobe -iv zram num_devices=8 udevadm trigger sync zramctl /dev/zram0 -s 8G && \ zramctl /dev/zram1 -s 8G && \ zramctl /dev/zram2 -s 4G && \ zramctl /dev/zram3 -s 4G && \ zramctl /dev/zram4 -s 4G && \ zramctl /dev/zram5 -s 2G && \ zramctl /dev/zram6 -s 2G && \ zramctl /dev/zram7 -s 4G && \ mkfs.btrfs /dev/zram0 && \ mkdir -p /mnt/btrfs-test && \ mount /dev/zram0 /mnt/btrfs-test && \ echo "FS Mounted" && \ btrfs dev add /dev/zram1 /mnt/btrfs-test && \ echo "Devices added" && \ for int in {1..500} ; do dd if=/dev/zero of=/mnt/btrfs-test/file${int} bs=32M count=1 && sync ; done btrfs dev add /dev/zram[2-7] /mnt/btrfs-test && \ btrfs fi sh /mnt/btrfs-test && \ btrfs fi df /mnt/btrfs-test && \ btrfs bal star -mconvert=raid6 /mnt/btrfs-test && \ btrfs bal star -dconvert=raid6 /mnt/btrfs-test btrfs fi sh /mnt/btrfs-test && \ btrfs fi df /mnt/btrfs-test ===== The output from the script, I will trim the output down to after the for-loop: ------ Label: none uuid: 87150918-e487-4f59-994b-ccb13ee05446 Total devices 8 FS bytes used 15.64GiB devid 1 size 8.00GiB used 8.00GiB path /dev/zram0 devid 2 size 8.00GiB used 8.00GiB path /dev/zram1 devid 3 size 4.00GiB used 0.00B path /dev/zram2 devid 4 size 4.00GiB used 0.00B path /dev/zram3 devid 5 size 4.00GiB used 0.00B path /dev/zram4 devid 6 size 2.00GiB used 0.00B path /dev/zram5 devid 7 size 2.00GiB used 0.00B path /dev/zram6 devid 8 size 4.00GiB used 0.00B path /dev/zram7 Data, single: total=15.74GiB, used=15.63GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=264.00MiB, used=17.06MiB GlobalReserve, single: total=16.70MiB, used=0.00B Done, had to relocate 3 out of 20 chunks ERROR: error during balancing '/mnt/btrfs-test': No space left on device There may be more info in syslog - try dmesg | tail Label: none uuid: 87150918-e487-4f59-994b-ccb13ee05446 Total devices 8 FS bytes used 15.65GiB devid 1 size 8.00GiB used 2.99GiB path /dev/zram0 devid 2 size 8.00GiB used 2.00GiB path /dev/zram1 devid 3 size 4.00GiB used 4.00GiB path /dev/zram2 devid 4 size 4.00GiB used 4.00GiB path /dev/zram3 devid 5 size 4.00GiB used 4.00GiB path /dev/zram4 devid 6 size 2.00GiB used 2.00GiB path /dev/zram5 devid 7 size 2.00GiB used 2.00GiB path /dev/zram6 devid 8 size 4.00GiB used 4.00GiB path /dev/zram7 Data, single: total=2.00GiB, used=1.97GiB Data, RAID6: total=14.41GiB, used=13.66GiB System, RAID6: total=80.00MiB, used=16.00KiB Metadata, RAID6: total=512.00MiB, used=17.52MiB GlobalReserve, single: total=17.16MiB, used=0.00B === Issueing the umount: ---- # umount /mnt/btrfs-test && modprobe -rv zram Killed === And last but not least: the output in dmesg: --- [ 205.960233] BTRFS info (device zram0): 2 enospc errors during balance [ 205.960235] BTRFS info (device zram0): balance: ended with status: -28 [ 235.774821] BUG: kernel NULL pointer dereference, address: 00000000000002ce [ 235.774826] #PF: supervisor read access in kernel mode [ 235.774828] #PF: error_code(0x0000) - not-present page [ 235.774830] PGD 0 P4D 0 [ 235.774834] Oops: 0000 [#1] SMP PTI [ 235.774838] CPU: 3 PID: 5421 Comm: umount Not tainted 5.3.6-300.fc31.x86_64 #1 [ 235.774840] Hardware name: LENOVO 80JV/Lenovo U41-70, BIOS BDCN71WW 08/03/2016 [ 235.774847] RIP: 0010:__free_pages+0x5/0x30 [ 235.774850] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 235.774852] RSP: 0018:ffffc3cf0ffb7db0 EFLAGS: 00010046 [ 235.774854] RAX: ffffa0ad0ffa0118 RBX: 0000000000000045 RCX: 0000000000000000 [ 235.774856] RDX: ffffa0aeceaee2f0 RSI: 0000000000000000 RDI: 000000000000029a [ 235.774858] RBP: ffffa0ad0ffa0000 R08: fffffb06c23fd108 R09: fffffb06c23fd108 [ 235.774860] R10: 0000000000068879 R11: fffffb06c02cf820 R12: 0000000000000045 [ 235.774861] R13: ffffa0ade5c00010 R14: ffffa0ae0d19e5ac R15: ffffa0ae0d19e578 [ 235.774864] FS: 00007f3b40f6cc80(0000) GS:ffffa0aeceac0000(0000) knlGS:0000000000000000 [ 235.774866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 235.774867] CR2: 00000000000002ce CR3: 000000004c1f0004 CR4: 00000000003606e0 [ 235.774869] Call Trace: [ 235.774920] __free_raid_bio+0x72/0xb0 [btrfs] [ 235.774961] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] [ 235.774992] close_ctree+0x1ea/0x2f0 [btrfs] [ 235.774998] generic_shutdown_super+0x6c/0x100 [ 235.775001] kill_anon_super+0x14/0x30 [ 235.775024] btrfs_kill_super+0x12/0xa0 [btrfs] [ 235.775029] deactivate_locked_super+0x36/0x70 [ 235.775033] cleanup_mnt+0x104/0x150 [ 235.775038] task_work_run+0x87/0xa0 [ 235.775043] exit_to_usermode_loop+0xda/0x100 [ 235.775047] do_syscall_64+0x183/0x1a0 [ 235.775053] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 235.775056] RIP: 0033:0x7f3b411b767b [ 235.775060] Code: 08 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 07 0c 00 f7 d8 64 89 01 48 [ 235.775062] RSP: 002b:00007fffb57488e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 235.775065] RAX: 0000000000000000 RBX: 00007f3b412e11e4 RCX: 00007f3b411b767b [ 235.775066] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055dc3815f730 [ 235.775068] RBP: 000055dc3815f520 R08: 0000000000000000 R09: 00007fffb5747660 [ 235.775070] R10: 000055dc38160750 R11: 0000000000000246 R12: 000055dc3815f730 [ 235.775072] R13: 0000000000000000 R14: 000055dc3815f618 R15: 0000000000000000 [ 235.775074] Modules linked in: zram uinput rfcomm ccm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack tun bridge stp llc ebtable_nat ebtable_broute ip6table_nat ip6_udp_tunnel udp_tunnel ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep cachefiles fscache sunrpc vfat fat iwlmvm intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 kvm_intel libarc4 iwlwifi kvm uvcvideo videobuf2_vmalloc mei_hdcp videobuf2_memops videobuf2_v4l2 videobuf2_common cfg80211 snd_hda_codec_realtek irqbypass snd_hda_codec_generic ledtrig_audio intel_cstate snd_hda_codec_hdmi intel_uncore btusb videodev snd_hda_intel btrtl btbcm btintel snd_hda_codec [ 235.775113] mc bluetooth snd_hda_core snd_hwdep intel_rapl_perf wdat_wdt snd_seq snd_seq_device snd_pcm joydev asus_wmi input_polldev i2c_i801 intel_wmi_thunderbolt intel_pch_thermal wmi_bmof ecdh_generic ideapad_laptop ecc snd_timer lpc_ich sparse_keymap mei_me snd mei soundcore rfkill acpi_pad lz4 lz4_compress binfmt_misc ip_tables btrfs libcrc32c xor zstd_decompress zstd_compress raid6_pq dm_crypt i915 nouveau mxm_wmi ttm crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper drm ghash_clmulni_intel serio_raw r8169 wmi video fuse [last unloaded: zram] [ 235.775143] CR2: 00000000000002ce [ 235.775146] ---[ end trace 1ed5f1c3085019fd ]--- [ 235.775151] RIP: 0010:__free_pages+0x5/0x30 [ 235.775154] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 235.775156] RSP: 0018:ffffc3cf0ffb7db0 EFLAGS: 00010046 [ 235.775158] RAX: ffffa0ad0ffa0118 RBX: 0000000000000045 RCX: 0000000000000000 [ 235.775160] RDX: ffffa0aeceaee2f0 RSI: 0000000000000000 RDI: 000000000000029a [ 235.775162] RBP: ffffa0ad0ffa0000 R08: fffffb06c23fd108 R09: fffffb06c23fd108 [ 235.775164] R10: 0000000000068879 R11: fffffb06c02cf820 R12: 0000000000000045 [ 235.775165] R13: ffffa0ade5c00010 R14: ffffa0ae0d19e5ac R15: ffffa0ae0d19e578 [ 235.775168] FS: 00007f3b40f6cc80(0000) GS:ffffa0aeceac0000(0000) knlGS:0000000000000000 [ 235.775170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 235.775172] CR2: 00000000000002ce CR3: 000000004c1f0004 CR4: 00000000003606e0 Best Regards, Peter Hjalmarsson ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-19 19:29 "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error Peter Hjalmarsson @ 2019-10-21 9:17 ` Johannes Thumshirn 2019-10-21 14:32 ` Johannes Thumshirn 0 siblings, 1 reply; 9+ messages in thread From: Johannes Thumshirn @ 2019-10-21 9:17 UTC (permalink / raw) To: Peter Hjalmarsson, linux-btrfs On 19/10/2019 21:29, Peter Hjalmarsson wrote: Hi Peter, Thanks for the report. > While trying to reproduce another problem I have seen with BTRFS while > running balance and raid6 I hit an issue resulting in: > BUG: kernel NULL pointer dereference, address: 00000000000002ce > > I created a script trying to pinpoint the problem utilizing zram, and > the run goes like this: > > 1. run the scripts which sets up a "SINGEL" filesystem on two larger > devices, fills it with an adequate amount of data, and then tries to > convert it to raid6 > 2. the filesystem will fail to convert due to not enough space to > convert all data to raid6 (not enough space on at least 3 devices at > the same time), hitting an enospc-error > * Up until this point stuff still seems to work without crashing, and > the system seems stable, just with two different profiles fot the data > 3. issue the umount command which will be "Killed", and a backtrace > will be written to dmesg > > This results in a filesystem that cannot be synced, unmounted, and in > all just seems crashed. > I have ran the script 6 times. 1 time it passed, 5 times it has > crashed, and I have rebooted the computer, since that is the only way > it seems to get rid of the filesysem, so the issue seems pretty > reproducable. > > The system is a laptop running Fedora 31 Beta with the latest updates, > and the latest kernel (5.3.6-300.fc31.x86_64) > > Do you have any input, or any other questions or stuff you want me to test? > > The script looks as follow: > ----- > $ cat run-btrfs-test > modprobe -iv zram num_devices=8 > udevadm trigger > sync > zramctl /dev/zram0 -s 8G && \ > zramctl /dev/zram1 -s 8G && \ > zramctl /dev/zram2 -s 4G && \ > zramctl /dev/zram3 -s 4G && \ > zramctl /dev/zram4 -s 4G && \ > zramctl /dev/zram5 -s 2G && \ > zramctl /dev/zram6 -s 2G && \ > zramctl /dev/zram7 -s 4G && \ > mkfs.btrfs /dev/zram0 && \ > mkdir -p /mnt/btrfs-test && \ > mount /dev/zram0 /mnt/btrfs-test && \ > echo "FS Mounted" && \ > btrfs dev add /dev/zram1 /mnt/btrfs-test && \ > echo "Devices added" && \ > for int in {1..500} ; do dd if=/dev/zero of=/mnt/btrfs-test/file${int} > bs=32M count=1 && sync ; done > btrfs dev add /dev/zram[2-7] /mnt/btrfs-test && \ > btrfs fi sh /mnt/btrfs-test && \ > btrfs fi df /mnt/btrfs-test && \ > btrfs bal star -mconvert=raid6 /mnt/btrfs-test && \ > btrfs bal star -dconvert=raid6 /mnt/btrfs-test > btrfs fi sh /mnt/btrfs-test && \ > btrfs fi df /mnt/btrfs-test > ===== > > The output from the script, I will trim the output down to after the for-loop: > ------ > Label: none uuid: 87150918-e487-4f59-994b-ccb13ee05446 > Total devices 8 FS bytes used 15.64GiB > devid 1 size 8.00GiB used 8.00GiB path /dev/zram0 > devid 2 size 8.00GiB used 8.00GiB path /dev/zram1 > devid 3 size 4.00GiB used 0.00B path /dev/zram2 > devid 4 size 4.00GiB used 0.00B path /dev/zram3 > devid 5 size 4.00GiB used 0.00B path /dev/zram4 > devid 6 size 2.00GiB used 0.00B path /dev/zram5 > devid 7 size 2.00GiB used 0.00B path /dev/zram6 > devid 8 size 4.00GiB used 0.00B path /dev/zram7 > > Data, single: total=15.74GiB, used=15.63GiB > System, single: total=4.00MiB, used=16.00KiB > Metadata, single: total=264.00MiB, used=17.06MiB > GlobalReserve, single: total=16.70MiB, used=0.00B > Done, had to relocate 3 out of 20 chunks > ERROR: error during balancing '/mnt/btrfs-test': No space left on device > There may be more info in syslog - try dmesg | tail > Label: none uuid: 87150918-e487-4f59-994b-ccb13ee05446 > Total devices 8 FS bytes used 15.65GiB > devid 1 size 8.00GiB used 2.99GiB path /dev/zram0 > devid 2 size 8.00GiB used 2.00GiB path /dev/zram1 > devid 3 size 4.00GiB used 4.00GiB path /dev/zram2 > devid 4 size 4.00GiB used 4.00GiB path /dev/zram3 > devid 5 size 4.00GiB used 4.00GiB path /dev/zram4 > devid 6 size 2.00GiB used 2.00GiB path /dev/zram5 > devid 7 size 2.00GiB used 2.00GiB path /dev/zram6 > devid 8 size 4.00GiB used 4.00GiB path /dev/zram7 > > Data, single: total=2.00GiB, used=1.97GiB > Data, RAID6: total=14.41GiB, used=13.66GiB > System, RAID6: total=80.00MiB, used=16.00KiB > Metadata, RAID6: total=512.00MiB, used=17.52MiB > GlobalReserve, single: total=17.16MiB, used=0.00B > === > > Issueing the umount: > ---- > # umount /mnt/btrfs-test && modprobe -rv zram OK, I couldn't reproduce it in my environment (5.4-rc3+ based btrfs-devel/misc-next form David) with this script. I'll dig deeper. > Killed > === > > And last but not least: the output in dmesg: > --- > [ 205.960233] BTRFS info (device zram0): 2 enospc errors during balance > [ 205.960235] BTRFS info (device zram0): balance: ended with status: -28 Here balance ended with -ENOSPC (28). > [ 235.774821] BUG: kernel NULL pointer dereference, address: 00000000000002ce That's a NULL pointer deference with an offset of 0x2ce (718). > [ 235.774826] #PF: supervisor read access in kernel mode > [ 235.774828] #PF: error_code(0x0000) - not-present page > [ 235.774830] PGD 0 P4D 0 > [ 235.774834] Oops: 0000 [#1] SMP PTI > [ 235.774838] CPU: 3 PID: 5421 Comm: umount Not tainted > 5.3.6-300.fc31.x86_64 #1 > [ 235.774840] Hardware name: LENOVO 80JV/Lenovo U41-70, BIOS BDCN71WW > 08/03/2016 > [ 235.774847] RIP: 0010:__free_pages+0x5/0x30 > [ 235.774850] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 > e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f > 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 > e9 82 > [ 235.774852] RSP: 0018:ffffc3cf0ffb7db0 EFLAGS: 00010046 > [ 235.774854] RAX: ffffa0ad0ffa0118 RBX: 0000000000000045 RCX: 0000000000000000 > [ 235.774856] RDX: ffffa0aeceaee2f0 RSI: 0000000000000000 RDI: 000000000000029a > [ 235.774858] RBP: ffffa0ad0ffa0000 R08: fffffb06c23fd108 R09: fffffb06c23fd108 > [ 235.774860] R10: 0000000000068879 R11: fffffb06c02cf820 R12: 0000000000000045 > [ 235.774861] R13: ffffa0ade5c00010 R14: ffffa0ae0d19e5ac R15: ffffa0ae0d19e578 > [ 235.774864] FS: 00007f3b40f6cc80(0000) GS:ffffa0aeceac0000(0000) > knlGS:0000000000000000 > [ 235.774866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 235.774867] CR2: 00000000000002ce CR3: 000000004c1f0004 CR4: 00000000003606e0 CR2 has it as well, so we're page faulting on an access to 0x2ce. Let's look at __free_pages(): (gdb) l *(__free_pages+0x5) 0xffffffff81179e45 is in __free_pages (mm/page_alloc.c:4818). 4813 __free_pages_ok(page, order); 4814 } 4815 4816 void __free_pages(struct page *page, unsigned int order) 4817 { 4818 if (put_page_testzero(page)) 4819 free_the_page(page, order); 4820 } 4821 EXPORT_SYMBOL(__free_pages); 4822 (gdb) We're getting called by __free_page() so we know order is 0. So something must have passed in a NULL page (somehow). Looking at __free_raid_bio() one step further up the call-chain I see this: for (i = 0; i < rbio->nr_pages; i++) { if (rbio->stripe_pages[i]) { __free_page(rbio->stripe_pages[i]); rbio->stripe_pages[i] = NULL; } } > [ 235.774869] Call Trace: > [ 235.774920] __free_raid_bio+0x72/0xb0 [btrfs] > [ 235.774961] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] > [ 235.774992] close_ctree+0x1ea/0x2f0 [btrfs] > [ 235.774998] generic_shutdown_super+0x6c/0x100 > [ 235.775001] kill_anon_super+0x14/0x30 > [ 235.775024] btrfs_kill_super+0x12/0xa0 [btrfs] > [ 235.775029] deactivate_locked_super+0x36/0x70 > [ 235.775033] cleanup_mnt+0x104/0x150 > [ 235.775038] task_work_run+0x87/0xa0 > [ 235.775043] exit_to_usermode_loop+0xda/0x100 > [ 235.775047] do_syscall_64+0x183/0x1a0 > [ 235.775053] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 235.775056] RIP: 0033:0x7f3b411b767b > [ 235.775060] Code: 08 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 07 0c 00 f7 d8 64 89 > 01 48 This decodes to: All code ======== 0: 08 0c 00 or %cl,(%rax,%rax,1) 3: f7 d8 neg %eax 5: 64 89 01 mov %eax,%fs:(%rcx) 8: 48 83 c8 ff or $0xffffffffffffffff,%rax c: c3 retq d: 66 90 xchg %ax,%ax f: f3 0f 1e fa endbr64 13: 31 f6 xor %esi,%esi 15: e9 05 00 00 00 jmpq 0x1f 1a: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 1f: f3 0f 1e fa endbr64 23: b8 a6 00 00 00 mov $0xa6,%eax 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 01 jae 0x33 32: c3 retq 33: 48 8b 0d dd 07 0c 00 mov 0xc07dd(%rip),%rcx # 0xc0817 3a: f7 d8 neg %eax 3c: 64 89 01 mov %eax,%fs:(%rcx) 3f: 48 rex.W Code starting with the faulting instruction =========================================== 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 6: 73 01 jae 0x9 8: c3 retq 9: 48 8b 0d dd 07 0c 00 mov 0xc07dd(%rip),%rcx # 0xc07ed 10: f7 d8 neg %eax 12: 64 89 01 mov %eax,%fs:(%rcx) 15: 48 rex.W This doesn't look like __free_pages (gdb) disassemble __free_pages Dump of assembler code for function __free_pages: 0xffffffff81179e40 <+0>: lock decl 0x34(%rdi) 0xffffffff81179e44 <+4>: jne 0xffffffff81179e51 <__free_pages+17> 0xffffffff81179e46 <+6>: test %esi,%esi 0xffffffff81179e48 <+8>: je 0xffffffff81179e4f <__free_pages+15> 0xffffffff81179e4a <+10>: jmpq 0xffffffff81178500 <__free_pages_ok> 0xffffffff81179e4f <+15>: jmp 0xffffffff81179de0 <free_unref_page> 0xffffffff81179e51 <+17>: repz retq End of assembler dump. (gdb) -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-21 9:17 ` Johannes Thumshirn @ 2019-10-21 14:32 ` Johannes Thumshirn 2019-10-24 20:23 ` Peter Hjalmarsson 2019-10-27 13:24 ` Su Yue 0 siblings, 2 replies; 9+ messages in thread From: Johannes Thumshirn @ 2019-10-21 14:32 UTC (permalink / raw) To: Peter Hjalmarsson, linux-btrfs On 21/10/2019 11:17, Johannes Thumshirn wrote: [...] >> ----- >> $ cat run-btrfs-test >> modprobe -iv zram num_devices=8 >> udevadm trigger >> sync >> zramctl /dev/zram0 -s 8G && \ >> zramctl /dev/zram1 -s 8G && \ >> zramctl /dev/zram2 -s 4G && \ >> zramctl /dev/zram3 -s 4G && \ >> zramctl /dev/zram4 -s 4G && \ >> zramctl /dev/zram5 -s 2G && \ >> zramctl /dev/zram6 -s 2G && \ >> zramctl /dev/zram7 -s 4G && \ >> mkfs.btrfs /dev/zram0 && \ >> mkdir -p /mnt/btrfs-test && \ >> mount /dev/zram0 /mnt/btrfs-test && \ >> echo "FS Mounted" && \ >> btrfs dev add /dev/zram1 /mnt/btrfs-test && \ >> echo "Devices added" && \ >> for int in {1..500} ; do dd if=/dev/zero of=/mnt/btrfs-test/file${int} >> bs=32M count=1 && sync ; done >> btrfs dev add /dev/zram[2-7] /mnt/btrfs-test && \ >> btrfs fi sh /mnt/btrfs-test && \ >> btrfs fi df /mnt/btrfs-test && \ >> btrfs bal star -mconvert=raid6 /mnt/btrfs-test && \ >> btrfs bal star -dconvert=raid6 /mnt/btrfs-test >> btrfs fi sh /mnt/btrfs-test && \ >> btrfs fi df /mnt/btrfs-test I'm sorry. I ran this script in a loop for 35 iterations on 5.3.6 and couldn't reproduce a single crash. Is there anything else needed? -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-21 14:32 ` Johannes Thumshirn @ 2019-10-24 20:23 ` Peter Hjalmarsson 2019-10-25 8:07 ` Johannes Thumshirn 2019-10-27 13:24 ` Su Yue 1 sibling, 1 reply; 9+ messages in thread From: Peter Hjalmarsson @ 2019-10-24 20:23 UTC (permalink / raw) To: Johannes Thumshirn; +Cc: linux-btrfs Hi, Sorry for late answer. Have been out of town for work. Interesting that you could not reproduce. What kind of system do you run? I have tried on a couple of system now, and they behave slightly different, so maybe it has something to do with which arch it runs what kind of crash it triggers if any? Also make sure that the "dd"-line in the script is not wrapped like it became in the mail. For me running trough the test in a couple of systems, I found the following: The systems that always crashes did that during the first or the second run of the script (so probably no need to run for longer than maybe three to four times to verify). Two systems passes 5 times (32 bit arm) One system crashes during umount with a slightly different traceback but essential the same as previously reported Pine64 Rock64 rk3328: [503397.433500] Call trace: [503397.436338] __free_pages+0x1c/0x80 [503397.440506] __free_raid_bio+0x84/0xf8 [btrfs] [503397.445713] __remove_rbio_from_cache+0x134/0x1b8 [btrfs] [503397.451957] btrfs_clear_rbio_cache.isra.0+0x5c/0x98 [btrfs] [503397.458473] btrfs_free_stripe_hash_table+0x24/0x40 [btrfs] [503397.464891] close_ctree+0x1b0/0x2c8 [btrfs] [503397.469851] btrfs_put_super+0x20/0x30 [btrfs] Two system crashes (one x86_64 Atom, one RPI3 arch64) during balance with a dmesg like: [10282.926420] Call Trace: [10282.926488] lock_stripe_add+0x292/0x370 [btrfs] [10282.926560] __raid56_parity_write+0x20/0x40 [btrfs] [10282.926633] run_plug+0x131/0x150 [btrfs] [10282.926671] blk_flush_plug_list+0xc2/0x110 [10282.926708] blk_finish_plug+0x21/0x2e [10282.926769] btrfs_write_and_wait_transaction.isra.0+0x57/0xa0 [btrfs] [10282.926851] btrfs_commit_transaction+0x72e/0x9a0 [btrfs] Three system crashes (three x86_64, the one "tainted" has nvidia binary blobs) during umount with a dmesg like: [ 658.646613] Call Trace: [ 658.646675] __free_raid_bio+0x72/0xb0 [btrfs] [ 658.646728] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] [ 658.646766] close_ctree+0x1ea/0x2f0 [btrfs] [ 658.646773] generic_shutdown_super+0x6c/0x100 [ 658.646778] kill_anon_super+0x14/0x30 [ 658.646808] btrfs_kill_super+0x12/0xa0 [btrfs] All full dmesg saved if you want to look at any of the other not posted below. dmesg for the three x86_64 crashes during umount as follows (since that was what I reported in this thread): [ 7322.868716] BUG: kernel NULL pointer dereference, address: 00000000000002ce [ 7322.868720] #PF: supervisor read access in kernel mode [ 7322.868721] #PF: error_code(0x0000) - not-present page [ 7322.868722] PGD 0 P4D 0 [ 7322.868725] Oops: 0000 [#1] SMP PTI [ 7322.868727] CPU: 1 PID: 18329 Comm: umount Tainted: P OE 5.3.6-200.fc30.x86_64 #1 [ 7322.868728] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 3805 05/16/2018 [ 7322.868733] RIP: 0010:__free_pages+0x5/0x30 [ 7322.868735] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 7322.868737] RSP: 0018:ffffb481d632fdb0 EFLAGS: 00010046 [ 7322.868738] RAX: ffff8932a0b32118 RBX: 0000000000000045 RCX: 0000000000000000 [ 7322.868740] RDX: ffff893366a6e2f0 RSI: 0000000000000000 RDI: 000000000000029a [ 7322.868741] RBP: ffff8932a0b32000 R08: ffffd62ac0e41b48 R09: ffffd62ac0e41b48 [ 7322.868742] R10: 000000000004f4b1 R11: ffffd62ac055d220 R12: 0000000000000045 [ 7322.868743] R13: ffff893135a20010 R14: ffff89326792e72c R15: ffff89326792e6f8 [ 7322.868744] FS: 00007fc8c95d8080(0000) GS:ffff893366a40000(0000) knlGS:0000000000000000 [ 7322.868746] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7322.868747] CR2: 00000000000002ce CR3: 0000000085bd6001 CR4: 00000000003606e0 [ 7322.868748] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7322.868749] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 7322.868750] Call Trace: [ 7322.868782] __free_raid_bio+0x72/0xb0 [btrfs] [ 7322.868809] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] [ 7322.868827] close_ctree+0x1ea/0x2f0 [btrfs] [ 7322.868831] generic_shutdown_super+0x6c/0x100 [ 7322.868834] kill_anon_super+0x14/0x30 [ 7322.868848] btrfs_kill_super+0x12/0xa0 [btrfs] [ 7322.868851] deactivate_locked_super+0x36/0x70 [ 7322.868853] cleanup_mnt+0x104/0x150 [ 7322.868856] task_work_run+0x87/0xa0 [ 7322.868860] exit_to_usermode_loop+0xda/0x100 [ 7322.868862] do_syscall_64+0x183/0x1a0 [ 7322.868866] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 7322.868868] RIP: 0033:0x7fc8c982358b [ 7322.868870] Code: 39 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 38 0c 00 f7 d8 64 89 01 48 [ 7322.868872] RSP: 002b:00007ffd1ba2b9f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 7322.868873] RAX: 0000000000000000 RBX: 00007fc8c994e1c4 RCX: 00007fc8c982358b [ 7322.868874] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055588393a6f0 [ 7322.868875] RBP: 000055588393a4e0 R08: 0000000000000000 R09: 00007ffd1ba2a7a0 [ 7322.868876] R10: 000055588393b750 R11: 0000000000000246 R12: 000055588393a6f0 [ 7322.868877] R13: 0000000000000000 R14: 000055588393a5d8 R15: 0000000000000000 [ 7322.868879] Modules linked in: zram rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace rfcomm fuse xt_CHECKSUM xt_MASQUERADE tun xt_addrtype br_netfilter bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables overlay cmac bnep nct6775 cachefiles hwmon_vid fscache sunrpc vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm ledtrig_audio irqbypass snd_hda_intel btusb snd_hda_codec btrtl btbcm btintel crct10dif_pclmul snd_hda_core [ 7322.868904] ucsi_ccg crc32_pclmul typec_ucsi bluetooth snd_hwdep mei_hdcp typec snd_seq eeepc_wmi iTCO_wdt ghash_clmulni_intel iTCO_vendor_support asus_wmi intel_cstate snd_seq_device intel_uncore intel_rapl_perf wmi_bmof sparse_keymap drm_kms_helper snd_pcm ecdh_generic i2c_i801 rfkill mei_me ecc snd_timer drm cp210x mei snd ipmi_devintf soundcore i2c_nvidia_gpu ipmi_msghandler acpi_pad binfmt_misc btrfs libcrc32c xor zstd_decompress zstd_compress raid6_pq mxm_wmi e1000e nvme crc32c_intel nvme_core wmi video [last unloaded: zram] [ 7322.868923] CR2: 00000000000002ce [ 7322.868924] ---[ end trace 9c6f3e1ed9db6ba7 ]--- [ 7322.868927] RIP: 0010:__free_pages+0x5/0x30 [ 7322.868929] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 7322.868930] RSP: 0018:ffffb481d632fdb0 EFLAGS: 00010046 [ 7322.868931] RAX: ffff8932a0b32118 RBX: 0000000000000045 RCX: 0000000000000000 [ 7322.868932] RDX: ffff893366a6e2f0 RSI: 0000000000000000 RDI: 000000000000029a [ 7322.868933] RBP: ffff8932a0b32000 R08: ffffd62ac0e41b48 R09: ffffd62ac0e41b48 [ 7322.868934] R10: 000000000004f4b1 R11: ffffd62ac055d220 R12: 0000000000000045 [ 7322.868935] R13: ffff893135a20010 R14: ffff89326792e72c R15: ffff89326792e6f8 [ 7322.868937] FS: 00007fc8c95d8080(0000) GS:ffff893366a40000(0000) knlGS:0000000000000000 [ 7322.868938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7322.868939] CR2: 00000000000002ce CR3: 0000000085bd6001 CR4: 00000000003606e0 [ 7322.868940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7322.868941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3270.915492] BUG: kernel NULL pointer dereference, address: 00000000000002d1 [ 3270.915499] #PF: supervisor read access in kernel mode [ 3270.915502] #PF: error_code(0x0000) - not-present page [ 3270.915504] PGD 0 P4D 0 [ 3270.915509] Oops: 0000 [#1] SMP PTI [ 3270.915514] CPU: 1 PID: 2578 Comm: umount Not tainted 5.3.6-200.fc30.x86_64 #1 [ 3270.915516] Hardware name: System manufacturer System Product Name/P7P55D-E LX, BIOS 1701 09/27/2012 [ 3270.915525] RIP: 0010:__free_pages+0x5/0x30 [ 3270.915529] Code: 90 48 89 c3 fa 66 66 90 66 66 90 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 66 66 90 66 90 5b 5d 41 5c c3 66 66 66 66 90 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 3270.915532] RSP: 0018:ffffb41a52367db0 EFLAGS: 00010046 [ 3270.915535] RAX: ffffa01a356ed918 RBX: 0000000000000045 RCX: 0000000000000000 [ 3270.915538] RDX: ffffa01ad586e350 RSI: 0000000000000000 RDI: 000000000000029d [ 3270.915541] RBP: ffffa01a356ed800 R08: fffff43e482f9688 R09: fffff43e482f9688 [ 3270.915543] R10: 000000000021592f R11: fffff43e482b8420 R12: 0000000000000045 [ 3270.915546] R13: ffffa01a63040010 R14: ffffa01a6ae125ec R15: ffffa01a6ae125b8 [ 3270.915549] FS: 00007fbfce760080(0000) GS:ffffa01ad5840000(0000) knlGS:0000000000000000 [ 3270.915552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3270.915554] CR2: 00000000000002d1 CR3: 000000020b64c000 CR4: 00000000000006e0 [ 3270.915557] Call Trace: [ 3270.915617] __free_raid_bio+0x72/0xb0 [btrfs] [ 3270.915668] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] [ 3270.915709] close_ctree+0x1ea/0x2f0 [btrfs] [ 3270.915715] generic_shutdown_super+0x6c/0x100 [ 3270.915720] kill_anon_super+0x14/0x30 [ 3270.915751] btrfs_kill_super+0x12/0xa0 [btrfs] [ 3270.915756] deactivate_locked_super+0x36/0x70 [ 3270.915760] cleanup_mnt+0x104/0x150 [ 3270.915765] task_work_run+0x87/0xa0 [ 3270.915771] exit_to_usermode_loop+0xda/0x100 [ 3270.915776] do_syscall_64+0x183/0x1a0 [ 3270.915782] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3270.915786] RIP: 0033:0x7fbfce9ab58b [ 3270.915790] Code: 39 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 38 0c 00 f7 d8 64 89 01 48 [ 3270.915792] RSP: 002b:00007ffcf5f6a918 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 3270.915796] RAX: 0000000000000000 RBX: 00007fbfcead61c4 RCX: 00007fbfce9ab58b [ 3270.915798] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b3dcd106f0 [ 3270.915801] RBP: 000055b3dcd104e0 R08: 0000000000000000 R09: 00007ffcf5f696c0 [ 3270.915803] R10: 000055b3dcd10710 R11: 0000000000000246 R12: 000055b3dcd106f0 [ 3270.915805] R13: 0000000000000000 R14: 000055b3dcd105d8 R15: 0000000000000000 [ 3270.915809] Modules linked in: zram rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc rfcomm bluetooth ecdh_generic rfkill ecc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables intel_powerclamp kvm_intel kvm irqbypass snd_hda_codec_via iTCO_wdt snd_hda_codec_generic iTCO_vendor_support intel_cstate intel_uncore snd_hda_codec_hdmi ledtrig_audio uinput snd_hda_intel snd_hda_codec cachefiles fscache hwmon_vid coretemp snd_hda_core i7core_edac snd_hwdep xpad i2c_i801 snd_seq joydev ff_memless lpc_ich snd_seq_device snd_pcm asus_atk0110 snd_timer snd soundcore acpi_cpufreq amdgpu amd_iommu_v2 gpu_sched btrfs radeon libcrc32c xor zstd_decompress zstd_compress [ 3270.915860] raid6_pq i2c_algo_bit drm_kms_helper crc32c_intel ttm serio_raw drm r8169 [ 3270.915870] CR2: 00000000000002d1 [ 3270.915874] ---[ end trace 03b4a864514336a0 ]--- [ 3270.915879] RIP: 0010:__free_pages+0x5/0x30 [ 3270.915882] Code: 90 48 89 c3 fa 66 66 90 66 66 90 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 66 66 90 66 90 5b 5d 41 5c c3 66 66 66 66 90 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 3270.915885] RSP: 0018:ffffb41a52367db0 EFLAGS: 00010046 [ 3270.915888] RAX: ffffa01a356ed918 RBX: 0000000000000045 RCX: 0000000000000000 [ 3270.915890] RDX: ffffa01ad586e350 RSI: 0000000000000000 RDI: 000000000000029d [ 3270.915893] RBP: ffffa01a356ed800 R08: fffff43e482f9688 R09: fffff43e482f9688 [ 3270.915895] R10: 000000000021592f R11: fffff43e482b8420 R12: 0000000000000045 [ 3270.915898] R13: ffffa01a63040010 R14: ffffa01a6ae125ec R15: ffffa01a6ae125b8 [ 3270.915901] FS: 00007fbfce760080(0000) GS:ffffa01ad5840000(0000) knlGS:0000000000000000 [ 3270.915904] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3270.915906] CR2: 00000000000002d1 CR3: 000000020b64c000 CR4: 00000000000006e0 [ 658.646553] BUG: kernel NULL pointer dereference, address: 00000000000002d1 [ 658.646560] #PF: supervisor read access in kernel mode [ 658.646562] #PF: error_code(0x0000) - not-present page [ 658.646564] PGD 0 P4D 0 [ 658.646569] Oops: 0000 [#1] SMP PTI [ 658.646574] CPU: 0 PID: 6418 Comm: umount Not tainted 5.3.6-300.fc31.x86_64 #1 [ 658.646576] Hardware name: LENOVO 80JV/Lenovo U41-70, BIOS BDCN71WW 08/03/2016 [ 658.646584] RIP: 0010:__free_pages+0x5/0x30 [ 658.646588] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 658.646591] RSP: 0018:ffffb23bc8d73db0 EFLAGS: 00010046 [ 658.646594] RAX: ffff9d285707c918 RBX: 0000000000000045 RCX: 0000000000000000 [ 658.646597] RDX: ffff9d290ea2e2f0 RSI: 0000000000000000 RDI: 000000000000029d [ 658.646599] RBP: ffff9d285707c800 R08: ffffdc18806c2d48 R09: ffffdc18806c2d48 [ 658.646601] R10: 000000000001b12d R11: ffffdc1888e5a020 R12: 0000000000000045 [ 658.646603] R13: ffff9d2825350010 R14: ffff9d2882cec66c R15: ffff9d2882cec638 [ 658.646606] FS: 00007f28dbf23c80(0000) GS:ffff9d290ea00000(0000) knlGS:0000000000000000 [ 658.646609] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 658.646611] CR2: 00000000000002d1 CR3: 0000000092fd0003 CR4: 00000000003606f0 [ 658.646613] Call Trace: [ 658.646675] __free_raid_bio+0x72/0xb0 [btrfs] [ 658.646728] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] [ 658.646766] close_ctree+0x1ea/0x2f0 [btrfs] [ 658.646773] generic_shutdown_super+0x6c/0x100 [ 658.646778] kill_anon_super+0x14/0x30 [ 658.646808] btrfs_kill_super+0x12/0xa0 [btrfs] [ 658.646814] deactivate_locked_super+0x36/0x70 [ 658.646819] cleanup_mnt+0x104/0x150 [ 658.646825] task_work_run+0x87/0xa0 [ 658.646831] exit_to_usermode_loop+0xda/0x100 [ 658.646836] do_syscall_64+0x183/0x1a0 [ 658.646843] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 658.646847] RIP: 0033:0x7f28dc16e67b [ 658.646851] Code: 08 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 07 0c 00 f7 d8 64 89 01 48 [ 658.646854] RSP: 002b:00007ffe46290828 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 658.646858] RAX: 0000000000000000 RBX: 00007f28dc2981e4 RCX: 00007f28dc16e67b [ 658.646860] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055d87bf48730 [ 658.646862] RBP: 000055d87bf48520 R08: 0000000000000000 R09: 00007ffe4628f5a0 [ 658.646864] R10: 000055d87bf49750 R11: 0000000000000246 R12: 000055d87bf48730 [ 658.646867] R13: 0000000000000000 R14: 000055d87bf48618 R15: 0000000000000000 [ 658.646870] Modules linked in: zram uinput rfcomm ccm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack tun bridge ebtable_nat stp ebtable_broute ip6table_nat llc ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle ip6_udp_tunnel udp_tunnel iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep cachefiles fscache sunrpc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek iwlmvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi mei_hdcp snd_hda_intel mac80211 kvm snd_hda_codec uvcvideo libarc4 snd_hda_core videobuf2_vmalloc iwlwifi btusb snd_hwdep videobuf2_memops irqbypass snd_seq videobuf2_v4l2 btrtl btbcm btintel snd_seq_device [ 658.646916] videobuf2_common intel_cstate intel_uncore snd_pcm videodev mc intel_rapl_perf bluetooth cfg80211 asus_wmi joydev input_polldev wmi_bmof i2c_i801 mei_me intel_wmi_thunderbolt wdat_wdt snd_timer intel_pch_thermal ecdh_generic ecc mei snd soundcore lpc_ich ideapad_laptop sparse_keymap rfkill acpi_pad lz4 lz4_compress binfmt_misc ip_tables btrfs libcrc32c xor zstd_decompress zstd_compress raid6_pq dm_crypt i915 nouveau crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi ttm i2c_algo_bit drm_kms_helper ghash_clmulni_intel drm serio_raw r8169 wmi video fuse [last unloaded: zram] [ 658.646954] CR2: 00000000000002d1 [ 658.646958] ---[ end trace 0e45be4afd3f4e04 ]--- [ 658.646964] RIP: 0010:__free_pages+0x5/0x30 [ 658.646967] Code: 00 48 89 c3 fa 66 0f 1f 44 00 00 48 89 ef 4c 89 e6 e8 2f ef ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d 41 5c c3 0f 1f 44 00 00 <8b> 47 34 85 c0 74 12 f0 ff 4f 34 75 06 85 f6 75 03 eb 88 c3 e9 82 [ 658.646970] RSP: 0018:ffffb23bc8d73db0 EFLAGS: 00010046 [ 658.646973] RAX: ffff9d285707c918 RBX: 0000000000000045 RCX: 0000000000000000 [ 658.646975] RDX: ffff9d290ea2e2f0 RSI: 0000000000000000 RDI: 000000000000029d [ 658.646978] RBP: ffff9d285707c800 R08: ffffdc18806c2d48 R09: ffffdc18806c2d48 [ 658.646980] R10: 000000000001b12d R11: ffffdc1888e5a020 R12: 0000000000000045 [ 658.646982] R13: ffff9d2825350010 R14: ffff9d2882cec66c R15: ffff9d2882cec638 [ 658.646986] FS: 00007f28dbf23c80(0000) GS:ffff9d290ea00000(0000) knlGS:0000000000000000 [ 658.646988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 658.646991] CR2: 00000000000002d1 CR3: 0000000092fd0003 CR4: 00000000003606f0 BR, Peter Hjalmarsson Den mån 21 okt. 2019 kl 16:33 skrev Johannes Thumshirn <jthumshirn@suse.de>: > > On 21/10/2019 11:17, Johannes Thumshirn wrote: > [...] > >> ----- > >> $ cat run-btrfs-test > >> modprobe -iv zram num_devices=8 > >> udevadm trigger > >> sync > >> zramctl /dev/zram0 -s 8G && \ > >> zramctl /dev/zram1 -s 8G && \ > >> zramctl /dev/zram2 -s 4G && \ > >> zramctl /dev/zram3 -s 4G && \ > >> zramctl /dev/zram4 -s 4G && \ > >> zramctl /dev/zram5 -s 2G && \ > >> zramctl /dev/zram6 -s 2G && \ > >> zramctl /dev/zram7 -s 4G && \ > >> mkfs.btrfs /dev/zram0 && \ > >> mkdir -p /mnt/btrfs-test && \ > >> mount /dev/zram0 /mnt/btrfs-test && \ > >> echo "FS Mounted" && \ > >> btrfs dev add /dev/zram1 /mnt/btrfs-test && \ > >> echo "Devices added" && \ > >> for int in {1..500} ; do dd if=/dev/zero of=/mnt/btrfs-test/file${int} > >> bs=32M count=1 && sync ; done > >> btrfs dev add /dev/zram[2-7] /mnt/btrfs-test && \ > >> btrfs fi sh /mnt/btrfs-test && \ > >> btrfs fi df /mnt/btrfs-test && \ > >> btrfs bal star -mconvert=raid6 /mnt/btrfs-test && \ > >> btrfs bal star -dconvert=raid6 /mnt/btrfs-test > >> btrfs fi sh /mnt/btrfs-test && \ > >> btrfs fi df /mnt/btrfs-test > > I'm sorry. I ran this script in a loop for 35 iterations on 5.3.6 and > couldn't reproduce a single crash. > > > Is there anything else needed? > -- > Johannes Thumshirn SUSE Labs Filesystems > jthumshirn@suse.de +49 911 74053 689 > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5 > 90409 Nürnberg > Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-24 20:23 ` Peter Hjalmarsson @ 2019-10-25 8:07 ` Johannes Thumshirn 2019-10-25 10:02 ` Johannes Thumshirn 0 siblings, 1 reply; 9+ messages in thread From: Johannes Thumshirn @ 2019-10-25 8:07 UTC (permalink / raw) To: Peter Hjalmarsson; +Cc: linux-btrfs On 24/10/2019 22:23, Peter Hjalmarsson wrote: > Hi, > > Sorry for late answer. Have been out of town for work. > Interesting that you could not reproduce. What kind of system do you run? I ran it up to 64 times but in a very minimal VM. Let me try to reprouce the issue on a bare metal system. > I have tried on a couple of system now, and they behave slightly > different, so maybe it has something to do with which arch it runs > what kind of crash it triggers if any? > Also make sure that the "dd"-line in the script is not wrapped like it > became in the mail. Yes I've fixed that up. > For me running trough the test in a couple of systems, I found the following: > > The systems that always crashes did that during the first or the > second run of the script (so probably no need to run for longer than > maybe three to four times to verify). > > Two systems passes 5 times (32 bit arm) > One system crashes during umount with a slightly different traceback > but essential the same as previously reported Pine64 Rock64 rk3328: > [503397.433500] Call trace: > [503397.436338] __free_pages+0x1c/0x80 > [503397.440506] __free_raid_bio+0x84/0xf8 [btrfs] > [503397.445713] __remove_rbio_from_cache+0x134/0x1b8 [btrfs] > [503397.451957] btrfs_clear_rbio_cache.isra.0+0x5c/0x98 [btrfs] > [503397.458473] btrfs_free_stripe_hash_table+0x24/0x40 [btrfs] > [503397.464891] close_ctree+0x1b0/0x2c8 [btrfs] > [503397.469851] btrfs_put_super+0x20/0x30 [btrfs] > > Two system crashes (one x86_64 Atom, one RPI3 arch64) during balance > with a dmesg like: > [10282.926420] Call Trace: > [10282.926488] lock_stripe_add+0x292/0x370 [btrfs] > [10282.926560] __raid56_parity_write+0x20/0x40 [btrfs] > [10282.926633] run_plug+0x131/0x150 [btrfs] > [10282.926671] blk_flush_plug_list+0xc2/0x110 > [10282.926708] blk_finish_plug+0x21/0x2e > [10282.926769] btrfs_write_and_wait_transaction.isra.0+0x57/0xa0 [btrfs] > [10282.926851] btrfs_commit_transaction+0x72e/0x9a0 [btrfs] > > Three system crashes (three x86_64, the one "tainted" has nvidia > binary blobs) during umount with a dmesg like: > [ 658.646613] Call Trace: > [ 658.646675] __free_raid_bio+0x72/0xb0 [btrfs] > [ 658.646728] btrfs_free_stripe_hash_table+0x3d/0x70 [btrfs] > [ 658.646766] close_ctree+0x1ea/0x2f0 [btrfs] > [ 658.646773] generic_shutdown_super+0x6c/0x100 > [ 658.646778] kill_anon_super+0x14/0x30 > [ 658.646808] btrfs_kill_super+0x12/0xa0 [btrfs] > > All full dmesg saved if you want to look at any of the other not posted below. this could be handy, but not sure yet. Byte, Johannes -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-25 8:07 ` Johannes Thumshirn @ 2019-10-25 10:02 ` Johannes Thumshirn 0 siblings, 0 replies; 9+ messages in thread From: Johannes Thumshirn @ 2019-10-25 10:02 UTC (permalink / raw) To: Peter Hjalmarsson; +Cc: linux-btrfs On 25/10/2019 10:07, Johannes Thumshirn wrote: > On 24/10/2019 22:23, Peter Hjalmarsson wrote: >> Hi, >> >> Sorry for late answer. Have been out of town for work. >> Interesting that you could not reproduce. What kind of system do you run? > > I ran it up to 64 times but in a very minimal VM. Let me try to reprouce > the issue on a bare metal system. > >> I have tried on a couple of system now, and they behave slightly >> different, so maybe it has something to do with which arch it runs >> what kind of crash it triggers if any? >> Also make sure that the "dd"-line in the script is not wrapped like it >> became in the mail. > > Yes I've fixed that up. > >> For me running trough the test in a couple of systems, I found the following: >> >> The systems that always crashes did that during the first or the >> second run of the script (so probably no need to run for longer than >> maybe three to four times to verify). Good news. I was able to reproduce this on bare metal and have a kdump core. -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-21 14:32 ` Johannes Thumshirn 2019-10-24 20:23 ` Peter Hjalmarsson @ 2019-10-27 13:24 ` Su Yue 2019-10-28 7:50 ` Johannes Thumshirn 1 sibling, 1 reply; 9+ messages in thread From: Su Yue @ 2019-10-27 13:24 UTC (permalink / raw) To: Johannes Thumshirn, Peter Hjalmarsson, linux-btrfs On 2019/10/21 10:32 PM, Johannes Thumshirn wrote: > On 21/10/2019 11:17, Johannes Thumshirn wrote: > [...] >>> ----- >>> $ cat run-btrfs-test >>> modprobe -iv zram num_devices=8 >>> udevadm trigger >>> sync >>> zramctl /dev/zram0 -s 8G && \ >>> zramctl /dev/zram1 -s 8G && \ >>> zramctl /dev/zram2 -s 4G && \ >>> zramctl /dev/zram3 -s 4G && \ >>> zramctl /dev/zram4 -s 4G && \ >>> zramctl /dev/zram5 -s 2G && \ >>> zramctl /dev/zram6 -s 2G && \ >>> zramctl /dev/zram7 -s 4G && \ >>> mkfs.btrfs /dev/zram0 && \ >>> mkdir -p /mnt/btrfs-test && \ >>> mount /dev/zram0 /mnt/btrfs-test && \ >>> echo "FS Mounted" && \ >>> btrfs dev add /dev/zram1 /mnt/btrfs-test && \ >>> echo "Devices added" && \ >>> for int in {1..500} ; do dd if=/dev/zero of=/mnt/btrfs-test/file${int} >>> bs=32M count=1 && sync ; done >>> btrfs dev add /dev/zram[2-7] /mnt/btrfs-test && \ >>> btrfs fi sh /mnt/btrfs-test && \ >>> btrfs fi df /mnt/btrfs-test && \ >>> btrfs bal star -mconvert=raid6 /mnt/btrfs-test && \ >>> btrfs bal star -dconvert=raid6 /mnt/btrfs-test >>> btrfs fi sh /mnt/btrfs-test && \ >>> btrfs fi df /mnt/btrfs-test > > I'm sorry. I ran this script in a loop for 35 iterations on 5.3.6 and > couldn't reproduce a single crash. > Interesting thing I met too. That's not reproducible on my VM but host (Archlinux v5.3.6 same kernel config). What's more interesting is that v5.3.7 seems to have fixed the bug. After some bisect. The commit is commit 417d26300214f7b593a99c6bc8badb66492ae322 Author: Qu Wenruo <wqu@suse.com> Date: Mon Sep 23 14:56:14 2019 +0800 btrfs: relocation: fix use-after-free on dead relocation roots commit 1fac4a54374f7ef385938f3c6cf7649c0fe4f6cd upstream. -- Su > > Is there anything else needed? > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-27 13:24 ` Su Yue @ 2019-10-28 7:50 ` Johannes Thumshirn 2019-10-28 10:22 ` Peter Hjalmarsson 0 siblings, 1 reply; 9+ messages in thread From: Johannes Thumshirn @ 2019-10-28 7:50 UTC (permalink / raw) To: Su Yue, Peter Hjalmarsson, linux-btrfs On 27/10/2019 14:24, Su Yue wrote: [...] > > Interesting thing I met too. That's not reproducible on my VM but > host (Archlinux v5.3.6 same kernel config). > > What's more interesting is that v5.3.7 seems to have fixed the bug. > After some bisect. The commit is > > commit 417d26300214f7b593a99c6bc8badb66492ae322 > Author: Qu Wenruo <wqu@suse.com> > Date: Mon Sep 23 14:56:14 2019 +0800 > > btrfs: relocation: fix use-after-free on dead relocation roots > > commit 1fac4a54374f7ef385938f3c6cf7649c0fe4f6cd upstream. > Good catch, cherry-picking this commit on top of v5.3.5 resolves the issue in my setup. -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error 2019-10-28 7:50 ` Johannes Thumshirn @ 2019-10-28 10:22 ` Peter Hjalmarsson 0 siblings, 0 replies; 9+ messages in thread From: Peter Hjalmarsson @ 2019-10-28 10:22 UTC (permalink / raw) To: Johannes Thumshirn; +Cc: Su Yue, linux-btrfs Thanks! 5.3.7 hit my system from the Fedora repos just a day or two after my last mail, and spent last evening trying to reproduce both the problem the scripts shows, but also the problem I tried to track down when making the script. Both seems to be gone after upgrading on the systems I have been able to run the test on so far. Best Regards, Peter hjalmarsson Den mån 28 okt. 2019 kl 08:50 skrev Johannes Thumshirn <jthumshirn@suse.de>: > > On 27/10/2019 14:24, Su Yue wrote: > [...] > > > > Interesting thing I met too. That's not reproducible on my VM but > > host (Archlinux v5.3.6 same kernel config). > > > > What's more interesting is that v5.3.7 seems to have fixed the bug. > > After some bisect. The commit is > > > > commit 417d26300214f7b593a99c6bc8badb66492ae322 > > Author: Qu Wenruo <wqu@suse.com> > > Date: Mon Sep 23 14:56:14 2019 +0800 > > > > btrfs: relocation: fix use-after-free on dead relocation roots > > > > commit 1fac4a54374f7ef385938f3c6cf7649c0fe4f6cd upstream. > > > > Good catch, cherry-picking this commit on top of v5.3.5 resolves the > issue in my setup. > > -- > Johannes Thumshirn SUSE Labs Filesystems > jthumshirn@suse.de +49 911 74053 689 > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5 > 90409 Nürnberg > Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-10-28 10:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-19 19:29 "BUG: kernel NULL pointer dereference," when unmounting filesystem hitted by enospc error Peter Hjalmarsson 2019-10-21 9:17 ` Johannes Thumshirn 2019-10-21 14:32 ` Johannes Thumshirn 2019-10-24 20:23 ` Peter Hjalmarsson 2019-10-25 8:07 ` Johannes Thumshirn 2019-10-25 10:02 ` Johannes Thumshirn 2019-10-27 13:24 ` Su Yue 2019-10-28 7:50 ` Johannes Thumshirn 2019-10-28 10:22 ` Peter Hjalmarsson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).