From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: Yufen Yu <yuyufen@huawei.com>, Ming Lei <ming.lei@redhat.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org, hch@infradead.org,
keith.busch@intel.com, tj@kernel.org, zhangxiaoxu5@huawei.com
Subject: Re: [PATCH] block: fix null pointer dereference in blk_mq_rq_timed_out()
Date: Thu, 12 Sep 2019 10:59:33 +0200 [thread overview]
Message-ID: <9a77d54c-78b1-c24d-c8ba-0240c7ae7460@cloud.ionos.com> (raw)
In-Reply-To: <b3d7b459-5f31-d473-2508-20048119c1b2@huawei.com>
On 9/12/19 5:29 AM, Yufen Yu wrote:
>
>
> On 2019/9/12 10:46, Ming Lei wrote:
>> On Sat, Sep 07, 2019 at 06:24:50PM +0800, Yufen Yu wrote:
>>> There is a race condition between timeout check and completion for
>>> flush request as follow:
>>>
>>> timeout_work issue flush issue flush
>>> blk_insert_flush
>>> blk_insert_flush
>>> blk_mq_timeout_work
>>> blk_kick_flush
>>>
>>> blk_mq_queue_tag_busy_iter
>>> blk_mq_check_expired(flush_rq)
>>>
>>> __blk_mq_end_request
>>> flush_end_io
>>> blk_kick_flush
>>> blk_rq_init(flush_rq)
>>> memset(flush_rq, 0)
>> Not see there is memset(flush_rq, 0) in block/blk-flush.c
>
> Call path as follow:
>
> blk_kick_flush
> blk_rq_init
> memset(rq, 0, sizeof(*rq));
>
>>> blk_mq_timed_out
>>> BUG_ON flush_rq->q->mq_ops
>> flush_rq->q won't be changed by blk_rq_init(), and either READ or WRITE
>> on variable with machine WORD length are atomic, so how can the BUG_ON()
>> be triggered? Do you have the actual BUG log?
>>
>> Also now it is driver's responsibility for avoiding race between normal
>> completion and timeout.
>
> I have reproduced the bug by adding time delay in timeout handle and
> memset.
> BUG_ON log as follow:
>
> [ 108.825472] BUG: kernel NULL pointer dereference, address:
> 0000000000000040
> [ 108.826091] #PF: supervisor read access in kernel mode
> [ 108.826543] #PF: error_code(0x0000) - not-present page
> [ 108.827059] PGD 0 P4D 0
> [ 108.827313] Oops: 0000 [#1] SMP PTI
> [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted
> 5.3.0-rc8+ #431
> [ 108.828326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28
> 04/01/2014
> [ 108.829503] Workqueue: kblockd blk_mq_timeout_work
> [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
> [ 108.830439] Code: 01 e9 0a ff ff ff 48 83 05 34 45 dd 02 01 4c 39
> 63 40 0f 84 8a 00 00 00 0d 00 00 20 00 40 0f b6 f5 41 89 44 24 1c 49
> 8b 04 24 <48> 8b 40 40 48 8b 40 20 48 85 c0 0f 84 90 00 00 00 48 83 05
> 2f 44
> [ 108.832246] RSP: 0018:ffffbf7ac18b7db0 EFLAGS: 00010206
> [ 108.832756] RAX: 0000000000000000 RBX: ffffffffb56e0250 RCX:
> 0000000000000000
> [ 108.833444] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff9ab7fbb96538
> [ 108.834149] RBP: 0000000000000000 R08: 000000000000024b R09:
> 0000000000000030
> [ 108.834840] R10: 000000000000004e R11: ffffbf7ac18b7c40 R12:
> ffff9ab7f756e000
> [ 108.835531] R13: ffffbf7ac18b7e70 R14: 0000000000000017 R15:
> ffff9ab7f6ead0a0
> [ 108.836228] FS: 0000000000000000(0000) GS:ffff9ab7fbb80000(0000)
> knlGS:0000000000000000
> [ 108.837026] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 108.837588] CR2: 0000000000000040 CR3: 000000013544c000 CR4:
> 00000000000006e0
> [ 108.838191] Call Trace:
> [ 108.838406] bt_iter+0x74/0x80
> [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
> [ 108.839074] ? __switch_to_asm+0x34/0x70
> [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
> [ 108.840732] blk_mq_timeout_work+0x74/0x200
> [ 108.841151] process_one_work+0x297/0x680
> [ 108.841550] worker_thread+0x29c/0x6f0
> [ 108.841926] ? rescuer_thread+0x580/0x580
> [ 108.842344] kthread+0x16a/0x1a0
> [ 108.842666] ? kthread_flush_work+0x170/0x170
> [ 108.843100] ret_from_fork+0x35/0x40
> [ 108.843455] Modules linked in:
> [ 108.843758] CR2: 0000000000000040
> [ 108.844090] ---[ end trace e0ac552505fa1b95 ]---
>
> blk_mq_rq_timed_out() attempt to read 'req->q->mq_ops->timeout', but
> 'q == 0' currently,
> which triggers BUG_ON.
We have a similar calltrace which happened in older kernel (4.4.62), not
sure if it is the same one.
[32353526.224059] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G O 4.4.62-1-storage #4.4.62-1.3
[32353526.224343] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[...]
[32353526.224840] RIP: 0010:[<ffffffff812df5a1>] [<ffffffff812df5a1>] blk_mq_rq_timed_out+0x11/0x70
[32353526.285015] [<ffffffff812df63d>] blk_mq_check_expired+0x3d/0x60
[32353526.301997] [<ffffffff812e1f74>] bt_for_each+0xd4/0xe0
[32353526.310730] [<ffffffff812df600>] ? blk_mq_rq_timed_out+0x70/0x70
[32353526.319579] [<ffffffff812df600>] ? blk_mq_rq_timed_out+0x70/0x70
[32353526.328329] [<ffffffff812e2753>] blk_mq_queue_tag_busy_iter+0x43/0xc0
[32353526.336995] [<ffffffff812de7c0>] ? blk_mq_bio_to_request+0x40/0x40
[32353526.345566] [<ffffffff812de7f2>] blk_mq_rq_timer+0x32/0xd0
[32353526.354094] [<ffffffff810b2e45>] call_timer_fn+0x35/0x130
[32353526.362542] [<ffffffff812de7c0>] ? blk_mq_bio_to_request+0x40/0x40
[32353526.370894] [<ffffffff810b31b7>] run_timer_softirq+0x157/0x280
Thanks,
Guoqing
prev parent reply other threads:[~2019-09-12 8:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-07 10:24 [PATCH] block: fix null pointer dereference in blk_mq_rq_timed_out() Yufen Yu
2019-09-12 2:46 ` Ming Lei
2019-09-12 3:29 ` Yufen Yu
2019-09-12 4:16 ` Ming Lei
2019-09-12 8:49 ` Yufen Yu
2019-09-12 10:07 ` Ming Lei
2019-09-16 2:40 ` Yufen Yu
2019-09-16 9:27 ` Yufen Yu
2019-09-17 0:50 ` Ming Lei
2019-09-12 8:59 ` Guoqing Jiang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a77d54c-78b1-c24d-c8ba-0240c7ae7460@cloud.ionos.com \
--to=guoqing.jiang@cloud.ionos.com \
--cc=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=tj@kernel.org \
--cc=yuyufen@huawei.com \
--cc=zhangxiaoxu5@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).