From: Jens Axboe <axboe@kernel.dk>
To: "jianchao.wang" <jianchao.w.wang@oracle.com>,
Bart Van Assche <bvanassche@acm.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: v4.20-rc6: Sporadic use-after-free in bt_iter()
Date: Thu, 20 Dec 2018 06:02:55 -0700 [thread overview]
Message-ID: <ccf50ee5-ff34-39a0-d7eb-3853fee55a4c@kernel.dk> (raw)
In-Reply-To: <372d2960-ff0c-1135-28f9-23eea8670463@oracle.com>
On 12/19/18 10:03 PM, jianchao.wang wrote:
>
>
> On 12/20/18 12:48 PM, Jens Axboe wrote:
>> On 12/19/18 9:32 PM, jianchao.wang wrote:
>>>
>>>
>>> On 12/20/18 12:19 PM, Jens Axboe wrote:
>>>> On 12/19/18 8:24 PM, jianchao.wang wrote:
>>>>>
>>>>>
>>>>> On 12/20/18 11:17 AM, Jens Axboe wrote:
>>>>>> On 12/19/18 5:16 PM, Bart Van Assche wrote:
>>>>>>> On Wed, 2018-12-19 at 16:27 -0700, Jens Axboe wrote:
>>>>>>>> On 12/19/18 4:24 PM, Bart Van Assche wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> If I run the srp blktests in a loop then I see the below call stack appearing
>>>>>>>>> sporadically. I have not yet had the time to analyze this but I'm reporting
>>>>>>>>> this here in case someone else would already have had a look at this.
>>>>>>>>>
>>>>>>>>> Bart.
>>>>>>>>>
>>>>>>>>> ==================================================================
>>>>>>>>> BUG: KASAN: use-after-free in bt_iter+0x86/0xf0
>>>>>>>>> Read of size 8 at addr ffff88803b335240 by task fio/21412
>>>>>>>>>
>>>>>>>>> CPU: 0 PID: 21412 Comm: fio Tainted: G W 4.20.0-rc6-dbg+ #3
>>>>>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>>>>>>> Call Trace:
>>>>>>>>> dump_stack+0x86/0xca
>>>>>>>>> print_address_description+0x71/0x239
>>>>>>>>> kasan_report.cold.5+0x242/0x301
>>>>>>>>> __asan_load8+0x54/0x90
>>>>>>>>> bt_iter+0x86/0xf0
>>>>>>>>> blk_mq_queue_tag_busy_iter+0x373/0x5e0
>>>>>>>>> blk_mq_in_flight+0x96/0xb0
>>>>>>>>> part_in_flight+0x40/0x140
>>>>>>>>> part_round_stats+0x18e/0x370
>>>>>>>>> blk_account_io_start+0x3d7/0x670
>>>>>>>>> blk_mq_bio_to_request+0x19c/0x3a0
>>>>>>>>> blk_mq_make_request+0x7a9/0xcb0
>>>>>>>>> generic_make_request+0x41d/0x960
>>>>>>>>> submit_bio+0x9b/0x250
>>>>>>>>> do_blockdev_direct_IO+0x435c/0x4c70
>>>>>>>>> __blockdev_direct_IO+0x79/0x88
>>>>>>>>> ext4_direct_IO+0x46c/0xc00
>>>>>>>>> generic_file_direct_write+0x119/0x210
>>>>>>>>> __generic_file_write_iter+0x11c/0x280
>>>>>>>>> ext4_file_write_iter+0x1b8/0x6f0
>>>>>>>>> aio_write+0x204/0x310
>>>>>>>>> io_submit_one+0x9d3/0xe80
>>>>>>>>> __x64_sys_io_submit+0x115/0x340
>>>>>>>>> do_syscall_64+0x71/0x210
>>>>>>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>>>>> RIP: 0033:0x7f02cf043219
>>>>>>>>
>>>>>>>> I've seen this one before as well, it's not a new thing. As far as I can
>>>>>>>> tell, it's a false positive. There should be no possibility for a
>>>>>>>> use-after-free iterating the static tags/requests.
>>>>>>>
>>>>>>> Are you sure this is a false positive?
>>>>>>
>>>>>> No I'm not, but the few times I have seen it, I haven't been able to
>>>>>> make much sense of it. It goes back quite a bit.
>>>>>>
>>>>>> I have not yet encountered any false
>>>>>>> positive KASAN complaints. According to the following gdb output this complaint
>>>>>>> refers to reading rq->q:
>>>>>>>
>>>>>>> (gdb) list *(bt_iter+0x86)
>>>>>>> 0xffffffff816b9346 is in bt_iter (block/blk-mq-tag.c:237).
>>>>>>> 232
>>>>>>> 233 /*
>>>>>>> 234 * We can hit rq == NULL here, because the tagging functions
>>>>>>> 235 * test and set the bit before assigning ->rqs[].
>>>>>>> 236 */
>>>>>>> 237 if (rq && rq->q == hctx->queue)
>>>>>>> 238 iter_data->fn(hctx, rq, iter_data->data, reserved);
>>>>>>> 239 return true;
>>>>>>> 240 }
>>>>>>> 241
>>>>>>>
>>>>>>> From the disassembly output:
>>>>>>>
>>>>>>> 232
>>>>>>> 233 /*
>>>>>>> 234 * We can hit rq == NULL here, because the tagging functions
>>>>>>> 235 * test and set the bit before assigning ->rqs[].
>>>>>>> 236 */
>>>>>>> 237 if (rq && rq->q == hctx->queue)
>>>>>>> 0xffffffff816b9339 <+121>: test %r12,%r12
>>>>>>> 0xffffffff816b933c <+124>: je 0xffffffff816b935f <bt_iter+159>
>>>>>>> 0xffffffff816b933e <+126>: mov %r12,%rdi
>>>>>>> 0xffffffff816b9341 <+129>: callq 0xffffffff813bd3e0 <__asan_load8>
>>>>>>> 0xffffffff816b9346 <+134>: lea 0x138(%r13),%rdi
>>>>>>> 0xffffffff816b934d <+141>: mov (%r12),%r14
>>>>>>> 0xffffffff816b9351 <+145>: callq 0xffffffff813bd3e0 <__asan_load8>
>>>>>>> 0xffffffff816b9356 <+150>: cmp 0x138(%r13),%r14
>>>>>>> 0xffffffff816b935d <+157>: je 0xffffffff816b936f <bt_iter+175>
>>>>>>>
>>>>>>> BTW, rq may but does not have to refer to tags->static_rqs[...]. It may also
>>>>>>> refer to hctx->fq.flush_rq.
>>>>>>
>>>>>> But even those are persistent for the lifetime of the queue... But since
>>>>>> kasan complains it belongs to a specific page, I'm guessing it's one
>>>>>> of the regular requests since those are out of a chopped up page. Which
>>>>>> means it makes even less sense.
>>>>>>
>>>>>> Is this happening while devices are being actively torn down? And
>>>>>> are you using shared tags? That's the only way I could see this
>>>>>> triggering.
>>>>>>
>>>>>
>>>>> Or could it be caused by the stale request in hctx->tags->rqs[] slot ?
>>>>> We don't clear it after free the requests.
>>>>>
>>>>> And there could be a scenario like,
>>>>> There used to be a io scheduler attached.
>>>>> After some workload, the io scheduler is detached.
>>>>> So there could be rqs allocated by the io scheduler left in hctx->tags->rqs.
>>>>>
>>>>> blk_mq_get_request blk_mq_queue_tag_busy_iter
>>>>> -> blk_mq_get_tag
>>>>> -> bt_for_each
>>>>> -> bt_iter
>>>>> -> rq = taags->rqs[]
>>>>> -> rq->q
>>>>> -> blk_mq_rq_ctx_init
>>>>> -> data->hctx->tags->rqs[rq->tag] = rq;
>>>>>
>>>>> If the scenario is possible, maybe we could fix it as following.
>>>>
>>>> Ah yes, good point, I bet that's what it is. But we just had this exact
>>>> discussion in another thread, and my point there was that we should
>>>> clear these when they go away, not inline. So how about clearing entries
>>>> when the sched tags go away?
>>>>
>>> I guess it should be OK. :)
>>
>> Something like this. Totally untested... And I wonder if there's a more
>> efficient way to do this, not that it matters THAT much. But still.
>>
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 2de972857496..341cb8b9cfb7 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2025,16 +2025,21 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>> {
>> struct page *page;
>>
>> - if (tags->rqs && set->ops->exit_request) {
>> - int i;
>> + if (tags->rqs) {
>> + int i, j;
>>
>> for (i = 0; i < tags->nr_tags; i++) {
>> struct request *rq = tags->static_rqs[i];
>>
>> if (!rq)
>> continue;
>> - set->ops->exit_request(set, rq, hctx_idx);
>> + if (set->ops->exit_request)
>> + set->ops->exit_request(set, rq, hctx_idx);
>> tags->static_rqs[i] = NULL;
>> +
>> + for (j = 0; j < tags->nr_tags; j++)
>> + if (tags->rqs[j] == rq)
>> + tags->rqs[j] = NULL;
>> }
>> }
>>
>>
>
> I'm afraid this cannot work.
>
> The 'tags' here could be the hctx->sched_tags, but what we need to
> clear is hctx->tags->rqs[].
You are right, of course, a bit too quick on the trigger. This one
should work better, and also avoids that silly quadratic loop. I don't
think we need the tag == -1 check, but probably best to be safe.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2de972857496..151891eb6fbd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2025,16 +2025,22 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
{
struct page *page;
- if (tags->rqs && set->ops->exit_request) {
- int i;
+ if (tags->rqs) {
+ int i, j;
for (i = 0; i < tags->nr_tags; i++) {
struct request *rq = tags->static_rqs[i];
if (!rq)
continue;
- set->ops->exit_request(set, rq, hctx_idx);
+ if (set->ops->exit_request)
+ set->ops->exit_request(set, rq, hctx_idx);
tags->static_rqs[i] = NULL;
+
+ if (rq->tag == -1)
+ continue;
+ if (set->tags[hctx_idx]->rqs[rq->tag] == rq)
+ set->tags[hctx_idx]->rqs[rq->tag] = NULL;
}
}
--
Jens Axboe
next prev parent reply other threads:[~2018-12-20 13:03 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-19 23:24 v4.20-rc6: Sporadic use-after-free in bt_iter() Bart Van Assche
2018-12-19 23:27 ` Jens Axboe
2018-12-20 0:16 ` Bart Van Assche
2018-12-20 3:17 ` Jens Axboe
2018-12-20 3:24 ` jianchao.wang
2018-12-20 4:19 ` Jens Axboe
2018-12-20 4:32 ` jianchao.wang
2018-12-20 4:48 ` Jens Axboe
2018-12-20 5:03 ` jianchao.wang
2018-12-20 13:02 ` Jens Axboe [this message]
2018-12-20 13:07 ` Jens Axboe
2018-12-20 18:01 ` Bart Van Assche
2018-12-20 18:21 ` Jens Axboe
2018-12-20 18:33 ` Jens Axboe
2018-12-20 20:56 ` Bart Van Assche
2018-12-20 21:00 ` Jens Axboe
2018-12-20 21:23 ` Bart Van Assche
2018-12-20 21:26 ` Jens Axboe
2018-12-20 21:31 ` Bart Van Assche
2018-12-20 21:34 ` Jens Axboe
2018-12-20 21:40 ` Bart Van Assche
2018-12-20 21:44 ` Jens Axboe
2018-12-20 21:48 ` Jens Axboe
2018-12-20 22:19 ` Bart Van Assche
2018-12-20 22:23 ` Jens Axboe
2018-12-20 22:33 ` Jens Axboe
2018-12-20 22:47 ` Jens Axboe
2018-12-20 22:50 ` Jens Axboe
2019-02-14 23:36 ` Bart Van Assche
2019-02-15 18:29 ` Evan Green
2019-02-19 16:48 ` Bart Van Assche
2019-02-21 20:54 ` Evan Green
2019-02-15 2:57 ` jianchao.wang
2018-12-20 4:06 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ccf50ee5-ff34-39a0-d7eb-3853fee55a4c@kernel.dk \
--to=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=jianchao.w.wang@oracle.com \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).