linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: Yufen Yu <yuyufen@huawei.com>, Ming Lei <ming.lei@redhat.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org, hch@infradead.org,
	keith.busch@intel.com, tj@kernel.org, zhangxiaoxu5@huawei.com
Subject: Re: [PATCH] block: fix null pointer dereference in blk_mq_rq_timed_out()
Date: Thu, 12 Sep 2019 10:59:33 +0200	[thread overview]
Message-ID: <9a77d54c-78b1-c24d-c8ba-0240c7ae7460@cloud.ionos.com> (raw)
In-Reply-To: <b3d7b459-5f31-d473-2508-20048119c1b2@huawei.com>



On 9/12/19 5:29 AM, Yufen Yu wrote:
>
>
> On 2019/9/12 10:46, Ming Lei wrote:
>> On Sat, Sep 07, 2019 at 06:24:50PM +0800, Yufen Yu wrote:
>>> There is a race condition between timeout check and completion for
>>> flush request as follow:
>>>
>>> timeout_work    issue flush      issue flush
>>>                  blk_insert_flush
>>>                                   blk_insert_flush
>>> blk_mq_timeout_work
>>>                  blk_kick_flush
>>>
>>> blk_mq_queue_tag_busy_iter
>>> blk_mq_check_expired(flush_rq)
>>>
>>>                  __blk_mq_end_request
>>>                 flush_end_io
>>>                 blk_kick_flush
>>>                 blk_rq_init(flush_rq)
>>>                 memset(flush_rq, 0)
>> Not see there is memset(flush_rq, 0) in block/blk-flush.c
>
> Call path as follow:
>
> blk_kick_flush
>     blk_rq_init
>         memset(rq, 0, sizeof(*rq));
>
>>> blk_mq_timed_out
>>> BUG_ON flush_rq->q->mq_ops
>> flush_rq->q won't be changed by blk_rq_init(), and either READ or WRITE
>> on variable with machine WORD length are atomic, so how can the BUG_ON()
>> be triggered? Do you have the actual BUG log?
>>
>> Also now it is driver's responsibility for avoiding race between normal
>> completion and timeout.
>
> I have reproduced the bug by adding time delay in timeout handle and 
> memset.
> BUG_ON log as follow:
>
> [  108.825472] BUG: kernel NULL pointer dereference, address: 
> 0000000000000040
> [  108.826091] #PF: supervisor read access in kernel mode
> [  108.826543] #PF: error_code(0x0000) - not-present page
> [  108.827059] PGD 0 P4D 0
> [  108.827313] Oops: 0000 [#1] SMP PTI
> [  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 
> 5.3.0-rc8+ #431
> [  108.828326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 
> 04/01/2014
> [  108.829503] Workqueue: kblockd blk_mq_timeout_work
> [  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
> [  108.830439] Code: 01 e9 0a ff ff ff 48 83 05 34 45 dd 02 01 4c 39 
> 63 40 0f 84 8a 00 00 00 0d 00 00 20 00 40 0f b6 f5 41 89 44 24 1c 49 
> 8b 04 24 <48> 8b 40 40 48 8b 40 20 48 85 c0 0f 84 90 00 00 00 48 83 05 
> 2f 44
> [  108.832246] RSP: 0018:ffffbf7ac18b7db0 EFLAGS: 00010206
> [  108.832756] RAX: 0000000000000000 RBX: ffffffffb56e0250 RCX: 
> 0000000000000000
> [  108.833444] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> ffff9ab7fbb96538
> [  108.834149] RBP: 0000000000000000 R08: 000000000000024b R09: 
> 0000000000000030
> [  108.834840] R10: 000000000000004e R11: ffffbf7ac18b7c40 R12: 
> ffff9ab7f756e000
> [  108.835531] R13: ffffbf7ac18b7e70 R14: 0000000000000017 R15: 
> ffff9ab7f6ead0a0
> [  108.836228] FS:  0000000000000000(0000) GS:ffff9ab7fbb80000(0000) 
> knlGS:0000000000000000
> [  108.837026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  108.837588] CR2: 0000000000000040 CR3: 000000013544c000 CR4: 
> 00000000000006e0
> [  108.838191] Call Trace:
> [  108.838406]  bt_iter+0x74/0x80
> [  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
> [  108.839074]  ? __switch_to_asm+0x34/0x70
> [  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
> [  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
> [  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
> [  108.840732]  blk_mq_timeout_work+0x74/0x200
> [  108.841151]  process_one_work+0x297/0x680
> [  108.841550]  worker_thread+0x29c/0x6f0
> [  108.841926]  ? rescuer_thread+0x580/0x580
> [  108.842344]  kthread+0x16a/0x1a0
> [  108.842666]  ? kthread_flush_work+0x170/0x170
> [  108.843100]  ret_from_fork+0x35/0x40
> [  108.843455] Modules linked in:
> [  108.843758] CR2: 0000000000000040
> [  108.844090] ---[ end trace e0ac552505fa1b95 ]---
>
> blk_mq_rq_timed_out() attempt to read 'req->q->mq_ops->timeout', but 
> 'q == 0' currently,
> which triggers BUG_ON.

We have a similar calltrace which happened in older kernel (4.4.62), not 
sure if it is the same one.

[32353526.224059] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G           O    4.4.62-1-storage #4.4.62-1.3
[32353526.224343] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[...]
[32353526.224840] RIP: 0010:[<ffffffff812df5a1>] [<ffffffff812df5a1>] blk_mq_rq_timed_out+0x11/0x70
[32353526.285015]  [<ffffffff812df63d>] blk_mq_check_expired+0x3d/0x60
[32353526.301997]  [<ffffffff812e1f74>] bt_for_each+0xd4/0xe0
[32353526.310730]  [<ffffffff812df600>] ? blk_mq_rq_timed_out+0x70/0x70
[32353526.319579]  [<ffffffff812df600>] ? blk_mq_rq_timed_out+0x70/0x70
[32353526.328329]  [<ffffffff812e2753>] blk_mq_queue_tag_busy_iter+0x43/0xc0
[32353526.336995]  [<ffffffff812de7c0>] ? blk_mq_bio_to_request+0x40/0x40
[32353526.345566]  [<ffffffff812de7f2>] blk_mq_rq_timer+0x32/0xd0
[32353526.354094]  [<ffffffff810b2e45>] call_timer_fn+0x35/0x130
[32353526.362542]  [<ffffffff812de7c0>] ? blk_mq_bio_to_request+0x40/0x40
[32353526.370894]  [<ffffffff810b31b7>] run_timer_softirq+0x157/0x280

Thanks,
Guoqing


      parent reply	other threads:[~2019-09-12  8:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-07 10:24 [PATCH] block: fix null pointer dereference in blk_mq_rq_timed_out() Yufen Yu
2019-09-12  2:46 ` Ming Lei
2019-09-12  3:29   ` Yufen Yu
2019-09-12  4:16     ` Ming Lei
2019-09-12  8:49       ` Yufen Yu
2019-09-12 10:07         ` Ming Lei
2019-09-16  2:40           ` Yufen Yu
2019-09-16  9:27           ` Yufen Yu
2019-09-17  0:50             ` Ming Lei
2019-09-12  8:59     ` Guoqing Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a77d54c-78b1-c24d-c8ba-0240c7ae7460@cloud.ionos.com \
    --to=guoqing.jiang@cloud.ionos.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tj@kernel.org \
    --cc=yuyufen@huawei.com \
    --cc=zhangxiaoxu5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).