From: Dongli Zhang <dongli.zhang@oracle.com>
To: <linux-nvme@lists.infradead.org>
Cc: <linux-block@vger.kernel.org>
Subject: nvme double __blk_mq_complete_request() bugs
Date: Sun, 17 May 2020 21:30:10 -0700 (PDT) [thread overview]
Message-ID: <c77b0998-5112-4d6b-b51c-41d2b901009d@default> (raw)
Hi,
This is to report the below page fault issue for nvme-loop.
[ 235.223975] ==================================================================
[ 235.224878] BUG: KASAN: null-ptr-deref in blk_mq_free_request+0x363/0x510
[ 235.225674] Write of size 4 at addr 0000000000000198 by task swapper/3/0
[ 235.226456]
[ 235.226772] ==================================================================
[ 235.227633] BUG: kernel NULL pointer dereference, address: 0000000000000198
[ 235.228447] #PF: supervisor write access in kernel mode
[ 235.229062] #PF: error_code(0x0002) - not-present page
[ 235.229667] PGD 0 P4D 0
[ 235.229976] Oops: 0002 [#1] SMP KASAN PTI
[ 235.230451] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G B 5.7.0-rc5+ #1
[ 235.231347] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 235.232692] RIP: 0010:blk_mq_free_request+0x367/0x510
[ 235.233286] Code: 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 a2 dc ff ff 48 8d ba 98 01 00 00 be 04 00 00 00 48 89 14 24 e8 fd 69 9a ff 48 8b 14 24 <f0> ff 8a 98 01 00 00 e9 e2 fe ff ff 48 83 c4 08 48 89 ef be 03 00
[ 235.235503] RSP: 0018:ffff8881f7389be0 EFLAGS: 00010046
[ 235.236114] RAX: 0000000000000000 RBX: ffff8881f21b9680 RCX: ffffffff816e05b1
[ 235.236941] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
[ 235.237770] RBP: 0000000000000001 R08: fffffbfff0af5869 R09: fffffbfff0af5869
[ 235.238598] R10: ffffffff857ac347 R11: fffffbfff0af5868 R12: ffff8881f21b969c
[ 235.239424] R13: ffff8881f0f62738 R14: ffffe8ffffd884c0 R15: ffff8881f21b9698
[ 235.240255] FS: 0000000000000000(0000) GS:ffff8881f7380000(0000) knlGS:0000000000000000
[ 235.241192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.241863] CR2: 0000000000000198 CR3: 00000001eb886004 CR4: 0000000000360ee0
[ 235.242695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 235.243523] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 235.244349] Call Trace:
[ 235.244645] <IRQ>
[ 235.244890] blk_mq_complete_request+0x266/0x460
[ 235.245432] nvmet_req_complete+0xd/0xb0
[ 235.245898] iomap_dio_bio_end_io+0x336/0x480
[ 235.246409] blk_update_request+0x320/0x960
[ 235.246904] blk_mq_end_request+0x4e/0x4d0
[ 235.247391] blk_mq_complete_request+0x266/0x460
[ 235.247937] virtblk_done+0x164/0x300
[ 235.248372] ? loop_queue_work.cold.44+0x5e/0x5e
[ 235.248915] ? rcu_accelerate_cbs+0x5d/0x1a70
[ 235.249429] ? virtqueue_get_used_addr+0x140/0x140
[ 235.249990] vring_interrupt+0x16d/0x280
[ 235.250454] __handle_irq_event_percpu+0xdd/0x470
[ 235.251010] handle_irq_event_percpu+0x6e/0x130
[ 235.251544] ? rcu_accelerate_cbs_unlocked+0x110/0x110
[ 235.252145] ? __handle_irq_event_percpu+0x470/0x470
[ 235.252729] ? _raw_spin_lock+0x75/0xd0
[ 235.253181] ? _raw_write_lock+0xd0/0xd0
[ 235.253645] handle_irq_event+0xc2/0x158
[ 235.254107] handle_edge_irq+0x1e9/0x7a0
[ 235.254572] do_IRQ+0x94/0x1e0
[ 235.254936] common_interrupt+0xf/0xf
[ 235.255368] </IRQ>
[ 235.255629] RIP: 0010:native_safe_halt+0xe/0x10
[ 235.256162] Code: e9 f2 fe ff ff 48 89 df e8 2f dc f8 fd eb a4 cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d a4 7d 4a 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 94 7d 4a 00 f4 c3 cc cc 41 56 41 55
[ 235.258314] RSP: 0018:ffff8881f61ffdd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 235.259194] RAX: ffffffff83bc59f0 RBX: dffffc0000000000 RCX: ffffffff83bc6639
[ 235.260022] RDX: 1ffff1103ec398e8 RSI: 0000000000000008 RDI: ffff8881f61cc740
[ 235.260850] RBP: 0000000000000003 R08: ffffed103ec398e9 R09: ffffed103ec398e9
[ 235.261681] R10: ffff8881f61cc747 R11: ffffed103ec398e8 R12: ffffffff84e5c4c0
[ 235.262508] R13: 0000000000000003 R14: 1ffff1103ec3ffc4 R15: 0000000000000000
[ 235.263338] ? __cpuidle_text_start+0x8/0x8
[ 235.263832] ? default_idle_call+0x29/0x60
[ 235.264315] ? tsc_verify_tsc_adjust+0x68/0x1f0
[ 235.264849] default_idle+0x1a/0x2b0
[ 235.265273] do_idle+0x2fd/0x3b0
[ 235.265658] ? arch_cpu_idle_exit+0x40/0x40
[ 235.266151] ? schedule_idle+0x56/0x90
[ 235.266595] cpu_startup_entry+0x14/0x20
[ 235.267057] start_secondary+0x2a6/0x340
[ 235.267522] ? set_cpu_sibling_map+0x1fb0/0x1fb0
[ 235.268066] secondary_startup_64+0xb6/0xc0
[ 235.268561] Modules linked in:
[ 235.268929] CR2: 0000000000000198
[ 235.269330] ---[ end trace f8fa823705a3dbe7 ]---
[ 235.269875] RIP: 0010:blk_mq_free_request+0x367/0x510
[ 235.270466] Code: 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 a2 dc ff ff 48 8d ba 98 01 00 00 be 04 00 00 00 48 89 14 24 e8 fd 69 9a ff 48 8b 14 24 <f0> ff 8a 98 01 00 00 e9 e2 fe ff ff 48 83 c4 08 48 89 ef be 03 00
[ 235.272622] RSP: 0018:ffff8881f7389be0 EFLAGS: 00010046
[ 235.273230] RAX: 0000000000000000 RBX: ffff8881f21b9680 RCX: ffffffff816e05b1
[ 235.274063] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
[ 235.274890] RBP: 0000000000000001 R08: fffffbfff0af5869 R09: fffffbfff0af5869
[ 235.275717] R10: ffffffff857ac347 R11: fffffbfff0af5868 R12: ffff8881f21b969c
[ 235.276545] R13: ffff8881f0f62738 R14: ffffe8ffffd884c0 R15: ffff8881f21b9698
[ 235.277371] FS: 0000000000000000(0000) GS:ffff8881f7380000(0000) knlGS:0000000000000000
[ 235.278307] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.278977] CR2: 0000000000000198 CR3: 00000001eb886004 CR4: 0000000000360ee0
[ 235.279805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 235.280632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 235.281459] Kernel panic - not syncing: Fatal exception in interrupt
[ 235.282474] Kernel Offset: 0x400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 235.283695] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
This is because of double __blk_mq_complete_request() when resetting nvme-loop.
# echo 1 > /sys/block/nvme1n1/device/nvme1/reset_controller
nvme_loop_queue_response()
-> nvme_end_request()
-> blk_mq_complete_request()
nvme_loop_reset_ctrl_work()
-> nvme_stop_queues(&ctrl->ctrl)
-> blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
-> nvme_cancel_request(): state is not MQ_RQ_COMPLETE()
-> blk_mq_complete_request()
-> WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
-> __blk_mq_complete_request()
-> WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
After code analysis, I think this is for nvme-pci as well.
nvme_process_cq()
-> nvme_handle_cqe()
-> nvme_end_request()
-> blk_mq_complete_request()
nvme_reset_work()
-> nvme_dev_disable()
-> nvme_reap_pending_cqes()
-> nvme_process_cq()
-> nvme_handle_cqe()
-> nvme_end_request()
-> blk_mq_complete_request()
-> __blk_mq_complete_request()
-> __blk_mq_complete_request()
I did not check if nvme-tcp/nvme-fc/nvme-rdma would have the similar issue. Was
there already any fix or discussion for above issue? That is, when resetting
the device, a blk_mq_complete_request() is already in progress.
Thank you very much!
Dongli Zhang
next reply other threads:[~2020-05-18 4:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-18 4:30 Dongli Zhang [this message]
2020-05-18 7:51 ` nvme double __blk_mq_complete_request() bugs Sagi Grimberg
2020-05-24 14:33 ` Dongli Zhang
2020-05-25 16:45 ` Keith Busch
2020-05-27 1:04 ` Dongli Zhang
2020-05-27 3:36 ` Keith Busch
2020-05-27 4:14 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c77b0998-5112-4d6b-b51c-41d2b901009d@default \
--to=dongli.zhang@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).