* [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI @ 2020-10-09 22:10 Mikhail Gavrilov 2021-03-04 8:42 ` Ming Lei 0 siblings, 1 reply; 8+ messages in thread From: Mikhail Gavrilov @ 2020-10-09 22:10 UTC (permalink / raw) To: linux-block, paolo.valente, axboe, Linux List Kernel Mailing Paolo, Jens I am sorry for the noise. But today I hit the kernel panic and git blame said that you have created the file in which happened panic (this I saw from trace) $ /usr/src/kernels/`uname -r`/scripts/faddr2line /lib/debug/lib/modules/`uname -r`/vmlinux __bfq_deactivate_entity+0x15a __bfq_deactivate_entity+0x15a/0x240: bfq_gt at block/bfq-wf2q.c:20 (inlined by) bfq_insert at block/bfq-wf2q.c:381 (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 $ head /sys/block/*/queue/scheduler ==> /sys/block/nvme0n1/queue/scheduler <== [none] mq-deadline kyber bfq ==> /sys/block/sda/queue/scheduler <== mq-deadline kyber [bfq] none ==> /sys/block/zram0/queue/scheduler <== none Trace: general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G W --------- --- 5.9.0-0.rc8.28.fc34.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002 RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000 R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150 R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88 FS: 0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0 Call Trace: bfq_deactivate_entity+0x4f/0xc0 bfq_del_bfqq_busy+0xbf/0x170 __bfq_bfqq_expire+0x95/0xc0 bfq_bfqq_expire+0x3c5/0x9a0 ? bfq_active_extract+0x8e/0x140 bfq_dispatch_request+0x438/0x1070 __blk_mq_do_dispatch_sched+0x1c7/0x290 ? dequeue_entity+0xa4/0x420 __blk_mq_sched_dispatch_requests+0x129/0x180 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x110 process_one_work+0x1b4/0x370 worker_thread+0x53/0x3e0 ? process_one_work+0x370/0x370 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Modules linked in: tun snd_seq_dummy snd_hrtimer uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat mt76x2u snd_hda_codec_realtek mt76x2_common mt76x02_usb snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi mt76_usb mt76x02_lib edac_mce_amd iwlmvm snd_hda_intel mt76 snd_intel_dspcfg kvm_amd mac80211 gspca_zc3xx snd_usb_audio snd_hda_codec gspca_main uvcvideo btusb snd_usbmidi_lib iwlwifi snd_hda_core videobuf2_vmalloc kvm videobuf2_memops btrtl snd_rawmidi videobuf2_v4l2 snd_hwdep btbcm snd_seq btintel videobuf2_common eeepc_wmi irqbypass snd_seq_device asus_wmi xpad bluetooth joydev sparse_keymap libarc4 rapl cfg80211 ff_memless snd_pcm videodev video pcspkr wmi_bmof sp5100_tco snd_timer mc k10temp i2c_piix4 snd ecdh_generic ecc soundcore rfkill acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm ccp igb ghash_clmulni_intel nvme nvme_core dca i2c_algo_bit wmi pinctrl_amd fuse ---[ end trace 09deb55d1b05f40c ]--- Full system log: https://pastebin.com/6cKHZzAi Full kernel log: https://pastebin.com/316HjHit Unfortunately, I did not know how reproduce this bug. I am not doing anything unusual on the computer when it happened. I could provide any useful info for further investigation. -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI 2020-10-09 22:10 [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI Mikhail Gavrilov @ 2021-03-04 8:42 ` Ming Lei [not found] ` <20210305090022.1863-1-hdanton@sina.com> 0 siblings, 1 reply; 8+ messages in thread From: Ming Lei @ 2021-03-04 8:42 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-block, Paolo Valente, Jens Axboe, Linux List Kernel Mailing On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> wrote: > > Paolo, Jens I am sorry for the noise. > But today I hit the kernel panic and git blame said that you have > created the file in which happened panic (this I saw from trace) > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line > /lib/debug/lib/modules/`uname -r`/vmlinux > __bfq_deactivate_entity+0x15a > __bfq_deactivate_entity+0x15a/0x240: > bfq_gt at block/bfq-wf2q.c:20 > (inlined by) bfq_insert at block/bfq-wf2q.c:381 > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > $ head /sys/block/*/queue/scheduler > ==> /sys/block/nvme0n1/queue/scheduler <== > [none] mq-deadline kyber bfq > > ==> /sys/block/sda/queue/scheduler <== > mq-deadline kyber [bfq] none > > ==> /sys/block/zram0/queue/scheduler <== > none > > Trace: > general protection fault, probably for non-canonical address > 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G W > --------- --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > Hardware name: System manufacturer System Product Name/ROG STRIX > X570-I GAMING, BIOS 2606 08/13/2020 > Workqueue: kblockd blk_mq_run_work_fn > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002 > RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a > RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb > RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000 > R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150 > R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88 > FS: 0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0 > Call Trace: > bfq_deactivate_entity+0x4f/0xc0 Hello, The same stack trace was observed in RH internal test too, and kernel is 5.11.0-0.rc6, but there isn't reproducer yet. -- Ming Lei ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20210305090022.1863-1-hdanton@sina.com>]
* Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI [not found] ` <20210305090022.1863-1-hdanton@sina.com> @ 2021-03-05 9:27 ` Ming Lei 2021-03-05 9:32 ` Paolo Valente 0 siblings, 1 reply; 8+ messages in thread From: Ming Lei @ 2021-03-05 9:27 UTC (permalink / raw) To: Hillf Danton Cc: Mikhail Gavrilov, linux-block, Paolo Valente, Jens Axboe, LKML Hello Hillf, Thanks for the debug patch. On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton <hdanton@sina.com> wrote: > > On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > > On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > > <mikhail.v.gavrilov@gmail.com> wrote: > > > > > > Paolo, Jens I am sorry for the noise. > > > But today I hit the kernel panic and git blame said that you have > > > created the file in which happened panic (this I saw from trace) > > > > > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line > > > /lib/debug/lib/modules/`uname -r`/vmlinux > > > __bfq_deactivate_entity+0x15a > > > __bfq_deactivate_entity+0x15a/0x240: > > > bfq_gt at block/bfq-wf2q.c:20 > > > (inlined by) bfq_insert at block/bfq-wf2q.c:381 > > > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > > > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > > > > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > > > > > $ head /sys/block/*/queue/scheduler > > > ==> /sys/block/nvme0n1/queue/scheduler <== > > > [none] mq-deadline kyber bfq > > > > > > ==> /sys/block/sda/queue/scheduler <== > > > mq-deadline kyber [bfq] none > > > > > > ==> /sys/block/zram0/queue/scheduler <== > > > none > > > > > > Trace: > > > general protection fault, probably for non-canonical address > > > 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI > > > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G W > > > --------- --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > > > Hardware name: System manufacturer System Product Name/ROG STRIX > > > X570-I GAMING, BIOS 2606 08/13/2020 > > > Workqueue: kblockd blk_mq_run_work_fn > > > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > > > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > > > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > > > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > > > RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002 > > > RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a > > > RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb > > > RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000 > > > R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150 > > > R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88 > > > FS: 0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0 > > > Call Trace: > > > bfq_deactivate_entity+0x4f/0xc0 > > > > Hello, > > > > The same stack trace was observed in RH internal test too, and kernel > > is 5.11.0-0.rc6, > > but there isn't reproducer yet. > > > > > > -- > > Ming Lei > > Add some debug info. > > --- x/block/bfq-wf2q.c > +++ y/block/bfq-wf2q.c > @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq > > entity->on_st_or_in_serv = false; > st->wsum -= entity->weight; > - if (bfqq && !is_in_service) > + if (bfqq && !is_in_service) { > + WARN_ON(entity->tree != NULL); > bfq_put_queue(bfqq); > + } > } > > /** > @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct > * bfqq gets freed here. > */ > int ref = in_serv_bfqq->ref; > + WARN_ON(in_serv_entity->tree != NULL); > bfq_put_queue(in_serv_bfqq); > if (ref == 1) > return true; This kernel oops isn't easy to be reproduced, and we have got another crash report[1] too, still on __bfq_deactivate_entity(), and not easy to trigger. Can your debug patch cover the report[1]? If not, feel free to add more debug messages, then I will try to reproduce the two. [1] another kernel oops log on __bfq_deactivate_entity [ 899.790606] systemd-sysv-generator[25205]: SysV service '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust. [ 901.937047] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 901.944005] #PF: supervisor read access in kernel mode [ 901.949143] #PF: error_code(0x0000) - not-present page [ 901.954285] PGD 0 P4D 0 [ 901.956824] Oops: 0000 [#1] SMP NOPTI [ 901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G I X --------- --- 5.11.0-1.el9.x86_64 #1 [ 901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS 2.5.4 01/13/2020 [ 901.978480] Workqueue: cgwb_release cgwb_release_workfn [ 901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240 [ 901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f 0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40 48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01 00 00 [ 902.007763] RSP: 0018:ffffb77107f0bd98 EFLAGS: 00010002 [ 902.012986] RAX: 0000002fffffffd0 RBX: ffff9853ca9c6098 RCX: 0000000000000046 [ 902.020119] RDX: 0000000000000001 RSI: 00000000474b1168 RDI: ffff9853ca9c6098 [ 902.027253] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff985470c2fed0 [ 902.034383] R10: 0000000000000001 R11: ffff9853c9287d98 R12: ffff9853ca8b8000 [ 902.041515] R13: 00000000000000ff R14: 0000000000000000 R15: ffff985b44308098 [ 902.048647] FS: 0000000000000000(0000) GS:ffff98631f980000(0000) knlGS:0000000000000000 [ 902.056732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 902.062479] CR2: 0000000000000000 CR3: 00000001c0ac2002 CR4: 00000000007706e0 [ 902.069611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 902.076744] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 902.083876] PKRU: 55555554 [ 902.086589] Call Trace: [ 902.089042] bfq_pd_offline+0x89/0xd0 [ 902.092708] blkg_destroy+0x52/0xf0 [ 902.096200] blkcg_destroy_blkgs+0x46/0xc0 [ 902.100300] cgwb_release_workfn+0xbe/0x150 [ 902.104485] process_one_work+0x1e6/0x380 [ 902.108497] worker_thread+0x53/0x3d0 [ 902.112161] ? process_one_work+0x380/0x380 [ 902.116346] kthread+0x11b/0x140 [ 902.119581] ? kthread_associate_blkcg+0xa0/0xa0 [ 902.124199] ret_from_fork+0x1f/0x30 [ 902.127780] Modules linked in: sunrpc scsi_debug iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink rfkill intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass mgag200 rapl i2c_algo_bit iTCO_wdt drm_kms_helper intel_cstate iTCO_vendor_support syscopyarea sysfillrect sysimgblt acpi_ipmi mei_me fb_sys_fops intel_uncore pcspkr dell_smbios dcdbas dell_wmi_descriptor wmi_bmof mei cec i2c_i801 ipmi_si acpi_power_meter lpc_ich i2c_smbus ipmi_devintf ipmi_msghandler drm fuse xfs libcrc32c sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci megaraid_sas tg3 ghash_clmulni_intel libata wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables] [ 902.208546] CR2: 0000000000000000 [ 902.211881] ---[ end trace 827b8521dc634ca4 ]--- -- Ming Lei ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI 2021-03-05 9:27 ` Ming Lei @ 2021-03-05 9:32 ` Paolo Valente 2021-03-05 10:01 ` Ming Lei 0 siblings, 1 reply; 8+ messages in thread From: Paolo Valente @ 2021-03-05 9:32 UTC (permalink / raw) To: Ming Lei; +Cc: Hillf Danton, Mikhail Gavrilov, linux-block, Jens Axboe, LKML I'm thinking of a way to debug this too. The symptom may hint at a use-after-free. Could you enable KASAN in your tests? (On the flip side, I know this might change timings, thereby making the fault disappear). Thanks, Paolo > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei <tom.leiming@gmail.com> ha scritto: > > Hello Hillf, > > Thanks for the debug patch. > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton <hdanton@sina.com> wrote: >> >> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov >>> <mikhail.v.gavrilov@gmail.com> wrote: >>>> >>>> Paolo, Jens I am sorry for the noise. >>>> But today I hit the kernel panic and git blame said that you have >>>> created the file in which happened panic (this I saw from trace) >>>> >>>> $ /usr/src/kernels/`uname -r`/scripts/faddr2line >>>> /lib/debug/lib/modules/`uname -r`/vmlinux >>>> __bfq_deactivate_entity+0x15a >>>> __bfq_deactivate_entity+0x15a/0x240: >>>> bfq_gt at block/bfq-wf2q.c:20 >>>> (inlined by) bfq_insert at block/bfq-wf2q.c:381 >>>> (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 >>>> (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 >>>> >>>> https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 >>>> >>>> $ head /sys/block/*/queue/scheduler >>>> ==> /sys/block/nvme0n1/queue/scheduler <== >>>> [none] mq-deadline kyber bfq >>>> >>>> ==> /sys/block/sda/queue/scheduler <== >>>> mq-deadline kyber [bfq] none >>>> >>>> ==> /sys/block/zram0/queue/scheduler <== >>>> none >>>> >>>> Trace: >>>> general protection fault, probably for non-canonical address >>>> 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI >>>> CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G W >>>> --------- --- 5.9.0-0.rc8.28.fc34.x86_64 #1 >>>> Hardware name: System manufacturer System Product Name/ROG STRIX >>>> X570-I GAMING, BIOS 2606 08/13/2020 >>>> Workqueue: kblockd blk_mq_run_work_fn >>>> RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 >>>> Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d >>>> 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b >>>> 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 >>>> RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002 >>>> RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a >>>> RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb >>>> RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000 >>>> R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150 >>>> R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88 >>>> FS: 0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0 >>>> Call Trace: >>>> bfq_deactivate_entity+0x4f/0xc0 >>> >>> Hello, >>> >>> The same stack trace was observed in RH internal test too, and kernel >>> is 5.11.0-0.rc6, >>> but there isn't reproducer yet. >>> >>> >>> -- >>> Ming Lei >> >> Add some debug info. >> >> --- x/block/bfq-wf2q.c >> +++ y/block/bfq-wf2q.c >> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq >> >> entity->on_st_or_in_serv = false; >> st->wsum -= entity->weight; >> - if (bfqq && !is_in_service) >> + if (bfqq && !is_in_service) { >> + WARN_ON(entity->tree != NULL); >> bfq_put_queue(bfqq); >> + } >> } >> >> /** >> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct >> * bfqq gets freed here. >> */ >> int ref = in_serv_bfqq->ref; >> + WARN_ON(in_serv_entity->tree != NULL); >> bfq_put_queue(in_serv_bfqq); >> if (ref == 1) >> return true; > > This kernel oops isn't easy to be reproduced, and we have got another crash > report[1] too, still on __bfq_deactivate_entity(), and not easy to > trigger. Can your > debug patch cover the report[1]? If not, feel free to add more debug messages, > then I will try to reproduce the two. > > [1] another kernel oops log on __bfq_deactivate_entity > > [ 899.790606] systemd-sysv-generator[25205]: SysV service > '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. > Automatically generating a unit file for compatibility. Please update > package to include a native systemd unit file, in order to make it > more safe and robust. > [ 901.937047] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 901.944005] #PF: supervisor read access in kernel mode > [ 901.949143] #PF: error_code(0x0000) - not-present page > [ 901.954285] PGD 0 P4D 0 > [ 901.956824] Oops: 0000 [#1] SMP NOPTI > [ 901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G > I X --------- --- 5.11.0-1.el9.x86_64 #1 > [ 901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS > 2.5.4 01/13/2020 > [ 901.978480] Workqueue: cgwb_release cgwb_release_workfn > [ 901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240 > [ 901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f > 0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40 > 48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01 > 00 00 > [ 902.007763] RSP: 0018:ffffb77107f0bd98 EFLAGS: 00010002 > [ 902.012986] RAX: 0000002fffffffd0 RBX: ffff9853ca9c6098 RCX: 0000000000000046 > [ 902.020119] RDX: 0000000000000001 RSI: 00000000474b1168 RDI: ffff9853ca9c6098 > [ 902.027253] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff985470c2fed0 > [ 902.034383] R10: 0000000000000001 R11: ffff9853c9287d98 R12: ffff9853ca8b8000 > [ 902.041515] R13: 00000000000000ff R14: 0000000000000000 R15: ffff985b44308098 > [ 902.048647] FS: 0000000000000000(0000) GS:ffff98631f980000(0000) > knlGS:0000000000000000 > [ 902.056732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 902.062479] CR2: 0000000000000000 CR3: 00000001c0ac2002 CR4: 00000000007706e0 > [ 902.069611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 902.076744] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 902.083876] PKRU: 55555554 > [ 902.086589] Call Trace: > [ 902.089042] bfq_pd_offline+0x89/0xd0 > [ 902.092708] blkg_destroy+0x52/0xf0 > [ 902.096200] blkcg_destroy_blkgs+0x46/0xc0 > [ 902.100300] cgwb_release_workfn+0xbe/0x150 > [ 902.104485] process_one_work+0x1e6/0x380 > [ 902.108497] worker_thread+0x53/0x3d0 > [ 902.112161] ? process_one_work+0x380/0x380 > [ 902.116346] kthread+0x11b/0x140 > [ 902.119581] ? kthread_associate_blkcg+0xa0/0xa0 > [ 902.124199] ret_from_fork+0x1f/0x30 > [ 902.127780] Modules linked in: sunrpc scsi_debug iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > rfkill intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit > libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > ipmi_ssif irqbypass mgag200 rapl i2c_algo_bit iTCO_wdt drm_kms_helper > intel_cstate iTCO_vendor_support syscopyarea sysfillrect sysimgblt > acpi_ipmi mei_me fb_sys_fops intel_uncore pcspkr dell_smbios dcdbas > dell_wmi_descriptor wmi_bmof mei cec i2c_i801 ipmi_si acpi_power_meter > lpc_ich i2c_smbus ipmi_devintf ipmi_msghandler drm fuse xfs libcrc32c > sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci > megaraid_sas tg3 ghash_clmulni_intel libata wmi dm_mirror > dm_region_hash dm_log dm_mod [last unloaded: ip_tables] > [ 902.208546] CR2: 0000000000000000 > [ 902.211881] ---[ end trace 827b8521dc634ca4 ]--- > > > -- > Ming Lei ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI 2021-03-05 9:32 ` Paolo Valente @ 2021-03-05 10:01 ` Ming Lei [not found] ` <20210307021524.13260-1-hdanton@sina.com> 0 siblings, 1 reply; 8+ messages in thread From: Ming Lei @ 2021-03-05 10:01 UTC (permalink / raw) To: Paolo Valente Cc: Ming Lei, Hillf Danton, Mikhail Gavrilov, linux-block, Jens Axboe, LKML On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote: > I'm thinking of a way to debug this too. The symptom may hint at a > use-after-free. Could you enable KASAN in your tests? (On the flip > side, I know this might change timings, thereby making the fault > disappear). I have asked our QE to reproduce the issue with debug kernel, which may take a while. And I can't trigger it in my box. BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to: (gdb) l *(__bfq_deactivate_entity+0x5b) 0xffffffff814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181). 1176 * bfq_group_set_parent has already been invoked for the group 1177 * represented by entity. Therefore, the field 1178 * entity->sched_data has been set, and we can safely use it. 1179 */ 1180 st = bfq_entity_service_tree(entity); 1181 is_in_service = entity == sd->in_service_entity; 1182 1183 bfq_calc_finish(entity, entity->service); 1184 1185 if (is_in_service) Seems entity->sched_data points to NULL. > > Thanks, > Paolo > > > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei <tom.leiming@gmail.com> ha scritto: > > > > Hello Hillf, > > > > Thanks for the debug patch. > > > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton <hdanton@sina.com> wrote: > >> > >> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > >>> <mikhail.v.gavrilov@gmail.com> wrote: > >>>> > >>>> Paolo, Jens I am sorry for the noise. > >>>> But today I hit the kernel panic and git blame said that you have > >>>> created the file in which happened panic (this I saw from trace) > >>>> > >>>> $ /usr/src/kernels/`uname -r`/scripts/faddr2line > >>>> /lib/debug/lib/modules/`uname -r`/vmlinux > >>>> __bfq_deactivate_entity+0x15a > >>>> __bfq_deactivate_entity+0x15a/0x240: > >>>> bfq_gt at block/bfq-wf2q.c:20 > >>>> (inlined by) bfq_insert at block/bfq-wf2q.c:381 > >>>> (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > >>>> (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > >>>> > >>>> https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > >>>> > >>>> $ head /sys/block/*/queue/scheduler > >>>> ==> /sys/block/nvme0n1/queue/scheduler <== > >>>> [none] mq-deadline kyber bfq > >>>> > >>>> ==> /sys/block/sda/queue/scheduler <== > >>>> mq-deadline kyber [bfq] none > >>>> > >>>> ==> /sys/block/zram0/queue/scheduler <== > >>>> none > >>>> > >>>> Trace: > >>>> general protection fault, probably for non-canonical address > >>>> 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI > >>>> CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G W > >>>> --------- --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > >>>> Hardware name: System manufacturer System Product Name/ROG STRIX > >>>> X570-I GAMING, BIOS 2606 08/13/2020 > >>>> Workqueue: kblockd blk_mq_run_work_fn > >>>> RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > >>>> Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > >>>> 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > >>>> 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > >>>> RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002 > >>>> RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a > >>>> RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb > >>>> RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000 > >>>> R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150 > >>>> R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88 > >>>> FS: 0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000 > >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>> CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0 > >>>> Call Trace: > >>>> bfq_deactivate_entity+0x4f/0xc0 > >>> > >>> Hello, > >>> > >>> The same stack trace was observed in RH internal test too, and kernel > >>> is 5.11.0-0.rc6, > >>> but there isn't reproducer yet. > >>> > >>> > >>> -- > >>> Ming Lei > >> > >> Add some debug info. > >> > >> --- x/block/bfq-wf2q.c > >> +++ y/block/bfq-wf2q.c > >> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq > >> > >> entity->on_st_or_in_serv = false; > >> st->wsum -= entity->weight; > >> - if (bfqq && !is_in_service) > >> + if (bfqq && !is_in_service) { > >> + WARN_ON(entity->tree != NULL); > >> bfq_put_queue(bfqq); > >> + } > >> } > >> > >> /** > >> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct > >> * bfqq gets freed here. > >> */ > >> int ref = in_serv_bfqq->ref; > >> + WARN_ON(in_serv_entity->tree != NULL); > >> bfq_put_queue(in_serv_bfqq); > >> if (ref == 1) > >> return true; > > > > This kernel oops isn't easy to be reproduced, and we have got another crash > > report[1] too, still on __bfq_deactivate_entity(), and not easy to > > trigger. Can your > > debug patch cover the report[1]? If not, feel free to add more debug messages, > > then I will try to reproduce the two. > > > > [1] another kernel oops log on __bfq_deactivate_entity > > > > [ 899.790606] systemd-sysv-generator[25205]: SysV service > > '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. > > Automatically generating a unit file for compatibility. Please update > > package to include a native systemd unit file, in order to make it > > more safe and robust. > > [ 901.937047] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > [ 901.944005] #PF: supervisor read access in kernel mode > > [ 901.949143] #PF: error_code(0x0000) - not-present page > > [ 901.954285] PGD 0 P4D 0 > > [ 901.956824] Oops: 0000 [#1] SMP NOPTI > > [ 901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G > > I X --------- --- 5.11.0-1.el9.x86_64 #1 > > [ 901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS > > 2.5.4 01/13/2020 > > [ 901.978480] Workqueue: cgwb_release cgwb_release_workfn > > [ 901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240 > > [ 901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f > > 0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40 > > 48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01 > > 00 00 > > [ 902.007763] RSP: 0018:ffffb77107f0bd98 EFLAGS: 00010002 > > [ 902.012986] RAX: 0000002fffffffd0 RBX: ffff9853ca9c6098 RCX: 0000000000000046 > > [ 902.020119] RDX: 0000000000000001 RSI: 00000000474b1168 RDI: ffff9853ca9c6098 > > [ 902.027253] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff985470c2fed0 > > [ 902.034383] R10: 0000000000000001 R11: ffff9853c9287d98 R12: ffff9853ca8b8000 > > [ 902.041515] R13: 00000000000000ff R14: 0000000000000000 R15: ffff985b44308098 > > [ 902.048647] FS: 0000000000000000(0000) GS:ffff98631f980000(0000) > > knlGS:0000000000000000 > > [ 902.056732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 902.062479] CR2: 0000000000000000 CR3: 00000001c0ac2002 CR4: 00000000007706e0 > > [ 902.069611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 902.076744] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 902.083876] PKRU: 55555554 > > [ 902.086589] Call Trace: > > [ 902.089042] bfq_pd_offline+0x89/0xd0 > > [ 902.092708] blkg_destroy+0x52/0xf0 > > [ 902.096200] blkcg_destroy_blkgs+0x46/0xc0 > > [ 902.100300] cgwb_release_workfn+0xbe/0x150 > > [ 902.104485] process_one_work+0x1e6/0x380 > > [ 902.108497] worker_thread+0x53/0x3d0 > > [ 902.112161] ? process_one_work+0x380/0x380 > > [ 902.116346] kthread+0x11b/0x140 > > [ 902.119581] ? kthread_associate_blkcg+0xa0/0xa0 > > [ 902.124199] ret_from_fork+0x1f/0x30 > > [ 902.127780] Modules linked in: sunrpc scsi_debug iscsi_tcp > > libiscsi_tcp libiscsi scsi_transport_iscsi nft_reject_inet > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > > rfkill intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit > > libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > > ipmi_ssif irqbypass mgag200 rapl i2c_algo_bit iTCO_wdt drm_kms_helper > > intel_cstate iTCO_vendor_support syscopyarea sysfillrect sysimgblt > > acpi_ipmi mei_me fb_sys_fops intel_uncore pcspkr dell_smbios dcdbas > > dell_wmi_descriptor wmi_bmof mei cec i2c_i801 ipmi_si acpi_power_meter > > lpc_ich i2c_smbus ipmi_devintf ipmi_msghandler drm fuse xfs libcrc32c > > sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci > > megaraid_sas tg3 ghash_clmulni_intel libata wmi dm_mirror > > dm_region_hash dm_log dm_mod [last unloaded: ip_tables] > > [ 902.208546] CR2: 0000000000000000 > > [ 902.211881] ---[ end trace 827b8521dc634ca4 ]--- > > > > > > -- > > Ming Lei > -- Ming ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20210307021524.13260-1-hdanton@sina.com>]
* Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity [not found] ` <20210307021524.13260-1-hdanton@sina.com> @ 2021-03-07 7:46 ` Dmitry Vyukov [not found] ` <20210307100900.13768-1-hdanton@sina.com> 2021-05-21 2:50 ` Ming Lei 1 sibling, 1 reply; 8+ messages in thread From: Dmitry Vyukov @ 2021-03-07 7:46 UTC (permalink / raw) To: Hillf Danton Cc: Ming Lei, Paolo Valente, Ming Lei, Mikhail Gavrilov, linux-block, Jens Axboe, LKML, kasan-dev On Sun, Mar 7, 2021 at 3:15 AM Hillf Danton <hdanton@sina.com> wrote: > > On Fri, 5 Mar 2021 18:01:04 +0800 Ming Lei wrote: > > On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote: > > > I'm thinking of a way to debug this too. The symptom may hint at a > > > use-after-free. Could you enable KASAN in your tests? (On the flip > > > side, I know this might change timings, thereby making the fault > > > disappear). > > > > I have asked our QE to reproduce the issue with debug kernel, which may take a > > while. And I can't trigger it in my box. > > > > BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to: > > > > (gdb) l *(__bfq_deactivate_entity+0x5b) > > 0xffffffff814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181). > > 1176 * bfq_group_set_parent has already been invoked for the group > > 1177 * represented by entity. Therefore, the field > > 1178 * entity->sched_data has been set, and we can safely use it. > > 1179 */ > > 1180 st = bfq_entity_service_tree(entity); > > 1181 is_in_service = entity == sd->in_service_entity; > > 1182 > > 1183 bfq_calc_finish(entity, entity->service); > > 1184 > > 1185 if (is_in_service) > > > > Seems entity->sched_data points to NULL. > > Hi Ming, > > Thanks for your report. > > Given the invalid pointer cannot explain line 1180, you are reporting > a different issue from what Mike reported, and we can do nothing now > for both without a reproducer. > > Dmitry can you shed some light on the tricks to config kasan to print > Call Trace as the reports with the leading [syzbot] on the subject line do? +kasan-dev Hi Hillf, KASAN prints stack traces always unconditionally. There is nothing you need to do at all. Do you have any reports w/o stack traces? "[syzbot]" is prepend by syzbot code. If you want some prefix, you would need to prepend it manually. > > > Thanks, > > > Paolo > > > > > > > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei <tom.leiming@gmail.com> ha scritto: > > > > > > > > Hello Hillf, > > > > > > > > Thanks for the debug patch. > > > > > > > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton <hdanton@sina.com> wrote: > > > >> > > > >> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > > > >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > > > >>> <mikhail.v.gavrilov@gmail.com> wrote: > > > >>>> > > > >>>> Paolo, Jens I am sorry for the noise. > > > >>>> But today I hit the kernel panic and git blame said that you have > > > >>>> created the file in which happened panic (this I saw from trace) > > > >>>> > > > >>>> $ /usr/src/kernels/`uname -r`/scripts/faddr2line > > > >>>> /lib/debug/lib/modules/`uname -r`/vmlinux > > > >>>> __bfq_deactivate_entity+0x15a > > > >>>> __bfq_deactivate_entity+0x15a/0x240: > > > >>>> bfq_gt at block/bfq-wf2q.c:20 > > > >>>> (inlined by) bfq_insert at block/bfq-wf2q.c:381 > > > >>>> (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > > > >>>> (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > > >>>> > > > >>>> https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > > >>>> > > > >>>> $ head /sys/block/*/queue/scheduler > > > >>>> ==> /sys/block/nvme0n1/queue/scheduler <== > > > >>>> [none] mq-deadline kyber bfq > > > >>>> > > > >>>> ==> /sys/block/sda/queue/scheduler <== > > > >>>> mq-deadline kyber [bfq] none > > > >>>> > > > >>>> ==> /sys/block/zram0/queue/scheduler <== > > > >>>> none > > > >>>> > > > >>>> Trace: > > > >>>> general protection fault, probably for non-canonical address > > > >>>> 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI > > > >>>> CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G W > > > >>>> --------- --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > > > >>>> Hardware name: System manufacturer System Product Name/ROG STRIX > > > >>>> X570-I GAMING, BIOS 2606 08/13/2020 > > > >>>> Workqueue: kblockd blk_mq_run_work_fn > > > >>>> RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > > > >>>> Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > > > >>>> 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > > > >>>> 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > > > >>>> RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002 > > > >>>> RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a > > > >>>> RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb > > > >>>> RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000 > > > >>>> R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150 > > > >>>> R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88 > > > >>>> FS: 0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000 > > > >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > >>>> CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0 > > > >>>> Call Trace: > > > >>>> bfq_deactivate_entity+0x4f/0xc0 > > > >>> > > > >>> Hello, > > > >>> > > > >>> The same stack trace was observed in RH internal test too, and kernel > > > >>> is 5.11.0-0.rc6, > > > >>> but there isn't reproducer yet. > > > >>> > > > >>> > > > >>> -- > > > >>> Ming Lei > > > >> > > > >> Add some debug info. > > > >> > > > >> --- x/block/bfq-wf2q.c > > > >> +++ y/block/bfq-wf2q.c > > > >> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq > > > >> > > > >> entity->on_st_or_in_serv = false; > > > >> st->wsum -= entity->weight; > > > >> - if (bfqq && !is_in_service) > > > >> + if (bfqq && !is_in_service) { > > > >> + WARN_ON(entity->tree != NULL); > > > >> bfq_put_queue(bfqq); > > > >> + } > > > >> } > > > >> > > > >> /** > > > >> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct > > > >> * bfqq gets freed here. > > > >> */ > > > >> int ref = in_serv_bfqq->ref; > > > >> + WARN_ON(in_serv_entity->tree != NULL); > > > >> bfq_put_queue(in_serv_bfqq); > > > >> if (ref == 1) > > > >> return true; > > > > > > > > This kernel oops isn't easy to be reproduced, and we have got another crash > > > > report[1] too, still on __bfq_deactivate_entity(), and not easy to > > > > trigger. Can your > > > > debug patch cover the report[1]? If not, feel free to add more debug messages, > > > > then I will try to reproduce the two. > > > > > > > > [1] another kernel oops log on __bfq_deactivate_entity > > > > > > > > [ 899.790606] systemd-sysv-generator[25205]: SysV service > > > > '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. > > > > Automatically generating a unit file for compatibility. Please update > > > > package to include a native systemd unit file, in order to make it > > > > more safe and robust. > > > > [ 901.937047] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > > > [ 901.944005] #PF: supervisor read access in kernel mode > > > > [ 901.949143] #PF: error_code(0x0000) - not-present page > > > > [ 901.954285] PGD 0 P4D 0 > > > > [ 901.956824] Oops: 0000 [#1] SMP NOPTI > > > > [ 901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G > > > > I X --------- --- 5.11.0-1.el9.x86_64 #1 > > > > [ 901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS > > > > 2.5.4 01/13/2020 > > > > [ 901.978480] Workqueue: cgwb_release cgwb_release_workfn > > > > [ 901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240 > > > > [ 901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f > > > > 0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40 > > > > 48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01 > > > > 00 00 > > > > [ 902.007763] RSP: 0018:ffffb77107f0bd98 EFLAGS: 00010002 > > > > [ 902.012986] RAX: 0000002fffffffd0 RBX: ffff9853ca9c6098 RCX: 0000000000000046 > > > > [ 902.020119] RDX: 0000000000000001 RSI: 00000000474b1168 RDI: ffff9853ca9c6098 > > > > [ 902.027253] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff985470c2fed0 > > > > [ 902.034383] R10: 0000000000000001 R11: ffff9853c9287d98 R12: ffff9853ca8b8000 > > > > [ 902.041515] R13: 00000000000000ff R14: 0000000000000000 R15: ffff985b44308098 > > > > [ 902.048647] FS: 0000000000000000(0000) GS:ffff98631f980000(0000) > > > > knlGS:0000000000000000 > > > > [ 902.056732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 902.062479] CR2: 0000000000000000 CR3: 00000001c0ac2002 CR4: 00000000007706e0 > > > > [ 902.069611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > [ 902.076744] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > [ 902.083876] PKRU: 55555554 > > > > [ 902.086589] Call Trace: > > > > [ 902.089042] bfq_pd_offline+0x89/0xd0 > > > > [ 902.092708] blkg_destroy+0x52/0xf0 > > > > [ 902.096200] blkcg_destroy_blkgs+0x46/0xc0 > > > > [ 902.100300] cgwb_release_workfn+0xbe/0x150 > > > > [ 902.104485] process_one_work+0x1e6/0x380 > > > > [ 902.108497] worker_thread+0x53/0x3d0 > > > > [ 902.112161] ? process_one_work+0x380/0x380 > > > > [ 902.116346] kthread+0x11b/0x140 > > > > [ 902.119581] ? kthread_associate_blkcg+0xa0/0xa0 > > > > [ 902.124199] ret_from_fork+0x1f/0x30 > > > > [ 902.127780] Modules linked in: sunrpc scsi_debug iscsi_tcp > > > > libiscsi_tcp libiscsi scsi_transport_iscsi nft_reject_inet > > > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > > > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > > > > rfkill intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit > > > > libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > > > > ipmi_ssif irqbypass mgag200 rapl i2c_algo_bit iTCO_wdt drm_kms_helper > > > > intel_cstate iTCO_vendor_support syscopyarea sysfillrect sysimgblt > > > > acpi_ipmi mei_me fb_sys_fops intel_uncore pcspkr dell_smbios dcdbas > > > > dell_wmi_descriptor wmi_bmof mei cec i2c_i801 ipmi_si acpi_power_meter > > > > lpc_ich i2c_smbus ipmi_devintf ipmi_msghandler drm fuse xfs libcrc32c > > > > sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci > > > > megaraid_sas tg3 ghash_clmulni_intel libata wmi dm_mirror > > > > dm_region_hash dm_log dm_mod [last unloaded: ip_tables] > > > > [ 902.208546] CR2: 0000000000000000 > > > > [ 902.211881] ---[ end trace 827b8521dc634ca4 ]--- > > > > > > > > > > > > -- > > > > Ming Lei > > > > > > > -- > > Ming > > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20210307100900.13768-1-hdanton@sina.com>]
* Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity [not found] ` <20210307100900.13768-1-hdanton@sina.com> @ 2021-03-07 10:17 ` Dmitry Vyukov 0 siblings, 0 replies; 8+ messages in thread From: Dmitry Vyukov @ 2021-03-07 10:17 UTC (permalink / raw) To: Hillf Danton Cc: Ming Lei, Paolo Valente, Ming Lei, Mikhail Gavrilov, Palash Oswal, linux-block, Jens Axboe, LKML, kasan-dev On Sun, Mar 7, 2021 at 11:09 AM Hillf Danton <hdanton@sina.com> wrote: > > On Sun, 7 Mar 2021 08:46:19 +0100 Dmitry Vyukov wrote: > > On Sun, Mar 7, 2021 at 3:15 AM Hillf Danton <hdanton@sina.com> wrote: > > > > > > Dmitry can you shed some light on the tricks to config kasan to print > > > Call Trace as the reports with the leading [syzbot] on the subject line do? > > > > +kasan-dev > > > > Hi Hillf, > > > > KASAN prints stack traces always unconditionally. There is nothing you > > need to do at all. > > Got it, thanks. > > > Do you have any reports w/o stack traces? > > No, but I saw different formats in Call Trace prints. > > Below from [1] is the instance without file name and line number printed, > while both info help spot the cause of the reported issue. KASAN always prints stack traces w/o file:line info, like any other kernel bug detection facility. Kernel itself never symbolizes reports. In case of syzkaller, syzkaller will symbolize reports and add file:line info. The main config it requires is CONFIG_DEBUG_INFO. You may see syzkaller kernel configuration guide here: https://github.com/google/syzkaller/blob/master/docs/linux/kernel_configs.md Or fragments that are actually used to generate syzbot configs in this dir (the guide above may be out-of-date): https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/base.yml https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/debug.yml https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/kasan.yml Or a complete syzbot config here: https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-apparmor-kasan.config > >>>>>>>>>>>>>>>>>>>>>>>>> > > I was running syzkaller and I found the following issue : > > Head Commit : b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ( v5.10.4 ) > Git Tree : stable > Console Output : > [ 242.769080] INFO: task repro:2639 blocked for more than 120 seconds. > [ 242.769096] Not tainted 5.10.4 #8 > [ 242.769103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 242.769112] task:repro state:D stack: 0 pid: 2639 > ppid: 2638 flags:0x00000004 > [ 242.769126] Call Trace: > [ 242.769148] __schedule+0x28d/0x7e0 > [ 242.769162] ? __percpu_counter_sum+0x75/0x90 > [ 242.769175] schedule+0x4f/0xc0 > [ 242.769187] __io_uring_task_cancel+0xad/0xf0 > [ 242.769198] ? wait_woken+0x80/0x80 > [ 242.769210] bprm_execve+0x67/0x8a0 > [ 242.769223] do_execveat_common+0x1d2/0x220 > [ 242.769235] __x64_sys_execveat+0x5d/0x70 > [ 242.769249] do_syscall_64+0x38/0x90 > [ 242.769260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [1] https://lore.kernel.org/lkml/CAGyP=7cFM6BJE7X2PN9YUptQgt5uQYwM4aVmOiVayQPJg1pqaA@mail.gmail.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity [not found] ` <20210307021524.13260-1-hdanton@sina.com> 2021-03-07 7:46 ` [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity Dmitry Vyukov @ 2021-05-21 2:50 ` Ming Lei 1 sibling, 0 replies; 8+ messages in thread From: Ming Lei @ 2021-05-21 2:50 UTC (permalink / raw) To: Hillf Danton Cc: Paolo Valente, Ming Lei, Dmitry Vyukov, Mikhail Gavrilov, linux-block, Jens Axboe, LKML On Sat, Mar 06, 2021 at 07:15:24PM -0700, Hillf Danton wrote: > On Fri, 5 Mar 2021 18:01:04 +0800 Ming Lei wrote: > > On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote: > > > I'm thinking of a way to debug this too. The symptom may hint at a > > > use-after-free. Could you enable KASAN in your tests? (On the flip > > > side, I know this might change timings, thereby making the fault > > > disappear). > > > > I have asked our QE to reproduce the issue with debug kernel, which may take a > > while. And I can't trigger it in my box. > > > > BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to: > > > > (gdb) l *(__bfq_deactivate_entity+0x5b) > > 0xffffffff814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181). > > 1176 * bfq_group_set_parent has already been invoked for the group > > 1177 * represented by entity. Therefore, the field > > 1178 * entity->sched_data has been set, and we can safely use it. > > 1179 */ > > 1180 st = bfq_entity_service_tree(entity); > > 1181 is_in_service = entity == sd->in_service_entity; > > 1182 > > 1183 bfq_calc_finish(entity, entity->service); > > 1184 > > 1185 if (is_in_service) > > > > Seems entity->sched_data points to NULL. > > Hi Ming, > > Thanks for your report. > > Given the invalid pointer cannot explain line 1180, you are reporting > a different issue from what Mike reported, and we can do nothing now > for both without a reproducer. BTW, we get this report 2 times on 5.12 kernel, following the kernel log, and this time there is hard LOCKUP. [ 337.526984] systemd-shutdown[1]: Not all DM devices detached, 1 left. [ 337.526988] systemd-shutdown[1]: Cannot finalize remaining DM devices, continuing. [ 337.531043] systemd-shutdown[1]: Successfully changed into root pivot. [ 337.531046] systemd-shutdown[1]: Returning to initrd... [ 337.533136] watchdog: watchdog0: watchdog did not stop! [ 337.569177] dracut Warning: Killing all remaining processes [ 337.706605] XFS (dm-0): Unmounting Filesystem [ 351.593888] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2 [ 351.593890] Modules linked in: dm_multipath rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 dcdbas iTCO_wdt irqbypass i2c_algo_bit iTCO_vendor_support rapl drm_kms_helper intel_cstate syscopyarea sysfillrect sysimgblt fb_sys_fops cec intel_uncore pcspkr ipmi_ssif mei_me mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse ip_tables xfs libcrc32c sd_mod qla2xxx ahci libahci nvme_fc crct10dif_pclmul crc32_pclmul crc32c_intel nvme_fabrics libata ghash_clmulni_intel tg3 nvme_core megaraid_sas t10_pi scsi_transport_fc wmi dm_mirror dm_region_hash dm_log dm_mod [ 351.593929] CPU: 2 PID: 95 Comm: kworker/2:1 Kdump: loaded Tainted: G X --------- --- 5.12.0-1.el9.x86_64 #1 [ 351.593930] Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.6.2 01/08/2016 [ 351.593931] Workqueue: cgwb_release cgwb_release_workfn [ 351.593932] RIP: 0010:rb_prev+0x18/0x50 [ 351.593933] Code: 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 8b 17 48 39 d7 74 35 48 8b 47 10 48 85 c0 74 1c 49 89 c0 48 8b 40 08 <48> 85 c0 75 f4 4c 89 c0 c3 48 3b 78 10 75 f6 48 8b 10 48 89 c7 48 [ 351.593934] RSP: 0018:ffffb7280048fd70 EFLAGS: 00000086 [ 351.593935] RAX: ffff98bc30f448a0 RBX: ffff98bc10d1e150 RCX: 0000000000000014 [ 351.593936] RDX: 0000000000000001 RSI: ffff98bc00b39098 RDI: ffff98bc00b39098 [ 351.593937] RBP: ffff98bc00b39098 R08: ffff98bc30f448a0 R09: 0000000000000000 [ 351.593938] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 351.593939] R13: 0000000000000001 R14: ffff98bc10d1e110 R15: 0000000000000000 [ 351.593940] FS: 0000000000000000(0000) GS:ffff98c37fa80000(0000) knlGS:0000000000000000 [ 351.593941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 351.593941] CR2: 00007fffeeceaea0 CR3: 0000000105b40003 CR4: 00000000001706e0 [ 351.593942] Call Trace: [ 351.593943] bfq_idle_extract+0x98/0xb0 [ 351.593943] __bfq_deactivate_entity+0x224/0x240 [ 351.593944] bfq_pd_offline+0xaa/0xd0 [ 351.593945] blkg_destroy+0x52/0xf0 [ 351.593945] blkcg_destroy_blkgs+0x46/0xc0 [ 351.593946] cgwb_release_workfn+0xbe/0x150 [ 351.593947] process_one_work+0x1e6/0x380 [ 351.593947] worker_thread+0x53/0x3d0 [ 351.593948] ? process_one_work+0x380/0x380 [ 351.593949] kthread+0x11b/0x140 [ 351.593949] ? kthread_associate_blkcg+0xa0/0xa0 [ 351.593950] ret_from_fork+0x22/0x30 [ 351.593950] Kernel panic - not syncing: Hard LOCKUP Thanks, Ming ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-05-21 2:50 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-09 22:10 [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI Mikhail Gavrilov 2021-03-04 8:42 ` Ming Lei [not found] ` <20210305090022.1863-1-hdanton@sina.com> 2021-03-05 9:27 ` Ming Lei 2021-03-05 9:32 ` Paolo Valente 2021-03-05 10:01 ` Ming Lei [not found] ` <20210307021524.13260-1-hdanton@sina.com> 2021-03-07 7:46 ` [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity Dmitry Vyukov [not found] ` <20210307100900.13768-1-hdanton@sina.com> 2021-03-07 10:17 ` Dmitry Vyukov 2021-05-21 2:50 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).