All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: Ming Lei <tom.leiming@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>,
	Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
	linux-block <linux-block@vger.kernel.org>,
	Jens Axboe <axboe@fb.com>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
Date: Fri, 5 Mar 2021 10:32:04 +0100	[thread overview]
Message-ID: <E28250BB-FBFF-4F02-B7A2-9530340E481E@linaro.org> (raw)
In-Reply-To: <CACVXFVPp_byzrYVwyo05u0v3zoPP42FKZhfWMb6GMBno1rCZRw@mail.gmail.com>

I'm thinking of a way to debug this too.  The symptom may hint at a
use-after-free.  Could you enable KASAN in your tests?  (On the flip
side, I know this might change timings, thereby making the fault
disappear).

Thanks,
Paolo

> Il giorno 5 mar 2021, alle ore 10:27, Ming Lei <tom.leiming@gmail.com> ha scritto:
> 
> Hello Hillf,
> 
> Thanks for the debug patch.
> 
> On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton <hdanton@sina.com> wrote:
>> 
>> On Thu, 4 Mar 2021 16:42:30 +0800  Ming Lei wrote:
>>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
>>> <mikhail.v.gavrilov@gmail.com> wrote:
>>>> 
>>>> Paolo, Jens I am sorry for the noise.
>>>> But today I hit the kernel panic and git blame said that you have
>>>> created the file in which happened panic (this I saw from trace)
>>>> 
>>>> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
>>>> /lib/debug/lib/modules/`uname -r`/vmlinux
>>>> __bfq_deactivate_entity+0x15a
>>>> __bfq_deactivate_entity+0x15a/0x240:
>>>> bfq_gt at block/bfq-wf2q.c:20
>>>> (inlined by) bfq_insert at block/bfq-wf2q.c:381
>>>> (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
>>>> (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203
>>>> 
>>>> https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203
>>>> 
>>>> $ head /sys/block/*/queue/scheduler
>>>> ==> /sys/block/nvme0n1/queue/scheduler <==
>>>> [none] mq-deadline kyber bfq
>>>> 
>>>> ==> /sys/block/sda/queue/scheduler <==
>>>> mq-deadline kyber [bfq] none
>>>> 
>>>> ==> /sys/block/zram0/queue/scheduler <==
>>>> none
>>>> 
>>>> Trace:
>>>> general protection fault, probably for non-canonical address
>>>> 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
>>>> CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: G        W
>>>> --------- ---  5.9.0-0.rc8.28.fc34.x86_64 #1
>>>> Hardware name: System manufacturer System Product Name/ROG STRIX
>>>> X570-I GAMING, BIOS 2606 08/13/2020
>>>> Workqueue: kblockd blk_mq_run_work_fn
>>>> RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
>>>> Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
>>>> 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b
>>>> 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
>>>> RSP: 0018:ffffadf6c0c6fc00 EFLAGS: 00010002
>>>> RAX: 46b1b0f0d8856e4a RBX: ffff8dc2773b5c88 RCX: 46b1b0f0d8856e4a
>>>> RDX: ffff8dc7d02ed0a0 RSI: ffff8dc7d02ed0a8 RDI: 0000584e64e96beb
>>>> RBP: ffff8dc2773b5c00 R08: ffff8dc9054cb938 R09: 0000000000000000
>>>> R10: 0000000000000018 R11: 0000000000000018 R12: ffff8dc904927150
>>>> R13: 0000000000000001 R14: ffff8dc904927158 R15: ffff8dc2773b5c88
>>>> FS:  0000000000000000(0000) GS:ffff8dc90e0c0000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000003e8ebe4000 CR3: 00000007c2546000 CR4: 0000000000350ee0
>>>> Call Trace:
>>>> bfq_deactivate_entity+0x4f/0xc0
>>> 
>>> Hello,
>>> 
>>> The same stack trace was observed in RH internal test too, and kernel
>>> is 5.11.0-0.rc6,
>>> but there isn't reproducer yet.
>>> 
>>> 
>>> --
>>> Ming Lei
>> 
>> Add some debug info.
>> 
>> --- x/block/bfq-wf2q.c
>> +++ y/block/bfq-wf2q.c
>> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq
>> 
>>        entity->on_st_or_in_serv = false;
>>        st->wsum -= entity->weight;
>> -       if (bfqq && !is_in_service)
>> +       if (bfqq && !is_in_service) {
>> +               WARN_ON(entity->tree != NULL);
>>                bfq_put_queue(bfqq);
>> +       }
>> }
>> 
>> /**
>> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct
>>                 * bfqq gets freed here.
>>                 */
>>                int ref = in_serv_bfqq->ref;
>> +               WARN_ON(in_serv_entity->tree != NULL);
>>                bfq_put_queue(in_serv_bfqq);
>>                if (ref == 1)
>>                        return true;
> 
> This kernel oops isn't easy to be reproduced, and  we have got another crash
> report[1] too, still on __bfq_deactivate_entity(), and not easy to
> trigger.  Can your
> debug patch cover the report[1]? If not, feel free to add more debug messages,
> then I will try to reproduce the two.
> 
> [1] another kernel oops log on __bfq_deactivate_entity
> 
> [  899.790606] systemd-sysv-generator[25205]: SysV service
> '/etc/rc.d/init.d/anamon' lacks a native systemd unit file.
> Automatically generating a unit file for compatibility. Please update
> package to include a native systemd unit file, in order to make it
> more safe and robust.
> [  901.937047] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  901.944005] #PF: supervisor read access in kernel mode
> [  901.949143] #PF: error_code(0x0000) - not-present page
> [  901.954285] PGD 0 P4D 0
> [  901.956824] Oops: 0000 [#1] SMP NOPTI
> [  901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G
>  I    X --------- ---  5.11.0-1.el9.x86_64 #1
> [  901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS
> 2.5.4 01/13/2020
> [  901.978480] Workqueue: cgwb_release cgwb_release_workfn
> [  901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240
> [  901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f
> 0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40
> 48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01
> 00 00
> [  902.007763] RSP: 0018:ffffb77107f0bd98 EFLAGS: 00010002
> [  902.012986] RAX: 0000002fffffffd0 RBX: ffff9853ca9c6098 RCX: 0000000000000046
> [  902.020119] RDX: 0000000000000001 RSI: 00000000474b1168 RDI: ffff9853ca9c6098
> [  902.027253] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff985470c2fed0
> [  902.034383] R10: 0000000000000001 R11: ffff9853c9287d98 R12: ffff9853ca8b8000
> [  902.041515] R13: 00000000000000ff R14: 0000000000000000 R15: ffff985b44308098
> [  902.048647] FS:  0000000000000000(0000) GS:ffff98631f980000(0000)
> knlGS:0000000000000000
> [  902.056732] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  902.062479] CR2: 0000000000000000 CR3: 00000001c0ac2002 CR4: 00000000007706e0
> [  902.069611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  902.076744] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  902.083876] PKRU: 55555554
> [  902.086589] Call Trace:
> [  902.089042]  bfq_pd_offline+0x89/0xd0
> [  902.092708]  blkg_destroy+0x52/0xf0
> [  902.096200]  blkcg_destroy_blkgs+0x46/0xc0
> [  902.100300]  cgwb_release_workfn+0xbe/0x150
> [  902.104485]  process_one_work+0x1e6/0x380
> [  902.108497]  worker_thread+0x53/0x3d0
> [  902.112161]  ? process_one_work+0x380/0x380
> [  902.116346]  kthread+0x11b/0x140
> [  902.119581]  ? kthread_associate_blkcg+0xa0/0xa0
> [  902.124199]  ret_from_fork+0x1f/0x30
> [  902.127780] Modules linked in: sunrpc scsi_debug iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> rfkill intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit
> libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> ipmi_ssif irqbypass mgag200 rapl i2c_algo_bit iTCO_wdt drm_kms_helper
> intel_cstate iTCO_vendor_support syscopyarea sysfillrect sysimgblt
> acpi_ipmi mei_me fb_sys_fops intel_uncore pcspkr dell_smbios dcdbas
> dell_wmi_descriptor wmi_bmof mei cec i2c_i801 ipmi_si acpi_power_meter
> lpc_ich i2c_smbus ipmi_devintf ipmi_msghandler drm fuse xfs libcrc32c
> sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci
> megaraid_sas tg3 ghash_clmulni_intel libata wmi dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
> [  902.208546] CR2: 0000000000000000
> [  902.211881] ---[ end trace 827b8521dc634ca4 ]---
> 
> 
> -- 
> Ming Lei


  reply	other threads:[~2021-03-05  9:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-09 22:10 [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI Mikhail Gavrilov
2021-03-04  8:42 ` Ming Lei
     [not found]   ` <20210305090022.1863-1-hdanton@sina.com>
2021-03-05  9:27     ` Ming Lei
2021-03-05  9:32       ` Paolo Valente [this message]
2021-03-05 10:01         ` Ming Lei
     [not found]           ` <20210307021524.13260-1-hdanton@sina.com>
2021-03-07  7:46             ` [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity Dmitry Vyukov
     [not found]               ` <20210307100900.13768-1-hdanton@sina.com>
2021-03-07 10:17                 ` Dmitry Vyukov
2021-05-21  2:50             ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E28250BB-FBFF-4F02-B7A2-9530340E481E@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=axboe@fb.com \
    --cc=hdanton@sina.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.