All of lore.kernel.org
 help / color / mirror / Atom feed
* Why don't we always grab bfqd->lock for bio_bfqq?
@ 2022-11-01  4:37 Khazhy Kumykov
  2022-11-01  5:05 ` Khazhy Kumykov
  0 siblings, 1 reply; 3+ messages in thread
From: Khazhy Kumykov @ 2022-11-01  4:37 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe; +Cc: linux-block, Linux Kernel Mailing List

I'm investigating a NULL deref crash in bfq_add_bfqq_busy(), wherein
bfqq->woken_list_node is hashed, but bfqq->waker_bfqq is NULL - which
seems inconsistent per my reading of the code.

Wherein I see bfq_allow_bio_merge() both accesses and modifies
accesses bfqd->bio_bfqq without bfqd->lock, which strikes me as odd.
The call there though to bfq_setup_cooperator and bfq_merge_bfqqs()
seem wrong to me. In particular, the call to bfq_merge_bfqqs() I am
suspecting can cause the inconsistency seen above, since it's the only
place I've found that modifies bfqq->waker_bfqq without bfqd->lock.

But I'm curious in general - what's special about bio_bfqq? Should we
grab bfqd->lock when touching it? e.g. bfq_request_merge() also
accesses bio_bfqq without grabbing the lock, where-in we traverse
bfqq->sort_list - that strikes me as odd as well, but I'm not fully
familiar with the locking conventions here. But it feels like,
especially since we can merge bfqqs, so bio_bfqq is shared - this
lockless access seems wrong.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Why don't we always grab bfqd->lock for bio_bfqq?
  2022-11-01  4:37 Why don't we always grab bfqd->lock for bio_bfqq? Khazhy Kumykov
@ 2022-11-01  5:05 ` Khazhy Kumykov
  2022-11-01 23:16   ` NULL deref crash in bfq_add_bfqq_busy Khazhy Kumykov
  0 siblings, 1 reply; 3+ messages in thread
From: Khazhy Kumykov @ 2022-11-01  5:05 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe; +Cc: linux-block, Linux Kernel Mailing List

On Mon, Oct 31, 2022 at 9:37 PM Khazhy Kumykov <khazhy@chromium.org> wrote:
>
> I'm investigating a NULL deref crash in bfq_add_bfqq_busy(), wherein
> bfqq->woken_list_node is hashed, but bfqq->waker_bfqq is NULL - which
> seems inconsistent per my reading of the code.
>
> Wherein I see bfq_allow_bio_merge() both accesses and modifies
> accesses bfqd->bio_bfqq without bfqd->lock, which strikes me as odd.
> The call there though to bfq_setup_cooperator and bfq_merge_bfqqs()
> seem wrong to me. In particular, the call to bfq_merge_bfqqs() I am
> suspecting can cause the inconsistency seen above, since it's the only
> place I've found that modifies bfqq->waker_bfqq without bfqd->lock.
>
> But I'm curious in general - what's special about bio_bfqq? Should we
> grab bfqd->lock when touching it? e.g. bfq_request_merge() also
> accesses bio_bfqq without grabbing the lock, where-in we traverse
> bfqq->sort_list - that strikes me as odd as well, but I'm not fully
> familiar with the locking conventions here. But it feels like,
> especially since we can merge bfqqs, so bio_bfqq is shared - this
> lockless access seems wrong.

Something on this front, since it does look like in *some* paths we do
call blk_mq_sched_allow_merge()/bfq_allow_bio_merge() with the lock
held, and some paths we do not - e.g. blk_mq_sched_try_merge gets
called directly by the schedulers (and bfq calls it under the lock).

However, blk_attempt_bio_merge also calls blk_mq_sched_allow_merge(),
and it's called by blk-mq directly on the submission path
(blk_bio_list_merge <- blk_mq_sched_bio_merge <-
blk_mq_attempt_bio_merge <- blk_mq_get_new_requests <-
blk_mq_submit_bio), and so we'll call bfq_allow_bio_merge without
bfqd->lock held in this path only.

I can see also for bfq_request_merge(), it gets called under
bfqd->lock, since the only path to ->request_merge() is through
blk_mq_sched_try_merge() - which is called by the schedulers. If I'm
understanding this correctly, and the functions are intended to be
called under the locks, perhaps it'd be appropriate to add some
lockdep_held annotations?

Khazhy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* NULL deref crash in bfq_add_bfqq_busy
  2022-11-01  5:05 ` Khazhy Kumykov
@ 2022-11-01 23:16   ` Khazhy Kumykov
  0 siblings, 0 replies; 3+ messages in thread
From: Khazhy Kumykov @ 2022-11-01 23:16 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe; +Cc: linux-block, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3780 bytes --]

Still poking around here, I think my previous emails were leading me
down a dead end, at least w.r.t. unsafe locking. The one code path I
identified below (blk_bio_list_merge() via __blk_mq_sched_bio_merge())
won't actually get hit since we'll instead call bfq_bio_merge().

The crashes I'm seeing have been in the read side (stacks below)

[160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142]  bfq_add_request+0x6bc/0x980
[160595.666602]  bfq_insert_request+0x8ec/0x1240
[160595.671762]  bfq_insert_requests+0x58/0x9c
[160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
[160595.682107]  blk_mq_submit_bio+0x270/0x62c
[160595.686759]  __submit_bio_noacct_mq+0xec/0x178
[160595.691926]  submit_bio+0x120/0x184
[160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
[160595.701026]  ext4_readpage+0x60/0xb0
[160595.705158]  filemap_read_page+0x54/0x114
[160595.711961]  filemap_fault+0x228/0x5f4
[160595.716272]  do_read_fault+0xe0/0x1f0
[160595.720487]  do_fault+0x40/0x1c8

or

[28497.344552]  bfq_add_bfqq_busy+0x110/0x1ec
[28497.348787]  bfq_add_request+0x6bc/0x980
[28497.352845]  bfq_insert_request+0x8ec/0x1240
[28497.357265]  bfq_insert_requests+0x58/0x9c
[28497.361503]  blk_mq_sched_insert_requests+0x8c/0x180
[28497.366647]  blk_mq_flush_plug_list+0x15c/0x1e0
[28497.371376]  blk_flush_plug_list+0xf0/0x124
[28497.375877]  blk_finish_plug+0x34/0x48
[28497.379846]  read_pages+0x21c/0x288
[28497.383510]  page_cache_ra_unbounded+0x1f0/0x24c
[28497.388346]  do_page_cache_ra+0x48/0x54
[28497.392388]  do_sync_mmap_readahead+0x190/0x1e0
[28497.397150]  filemap_fault+0x150/0x5f4
[28497.401111]  do_read_fault+0xe0/0x1f0
[28497.404948]  do_fault+0x40/0x1c8

In the crashes I'm looking at, it looks like the inconsistent
bfq_queue is oom_bfqq (where waker_bfqq is NULL, but woken_list_node
is hashed)

I'm looking at bfq_init_rq(), where we set bfqq->waker_bfqq
(potentially to NULL), and only hlist_add_head, no hlist_del_init. It
looks like the logic here is, bfq_get_bfqq_handle_split is supposed to
return a freshly allocated bfq_queue.

However, it can also return oom_bfqq, which may already be on a woken
list - then we proceed to set waker_bfqq without deleting woken_list
node, which can result in the inconsistency seen in the crash. wdyt?

bfqq = bfq_split_bfqq(bic, bfqq);
split = true;

if (!bfqq) {
    bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
                        true, is_sync,
                        NULL);  # This can also return oom_bfqq
    bfqq->waker_bfqq = old_bfqq->waker_bfqq;  # this can be set to null
    bfqq->tentative_waker_bfqq = NULL;


an aside, which looks at the async case, since I'd like to understand
this code better.
---

The call to bfq_split_bfqq() doesn't make sense to me - since for
bfq_get_bfqq_handle_split() et. al. we pass the is_sync parameter, but
bfq_split_bfqq() only ever clears bfqq for sync=true. My understanding
seems like this is intended to be: we decide to split, we call
bfq_split_bfqq() which sets bic->bfqq[is_sync] to NULL, then
bfq_get_bfqq_handle_split() sees that it's null and allocates a brand
new queue. But for async, bfq_split_bfqq will return NULL, but
bic->bfqq[0] was never cleared, so bfq_get_bfqq_handle_split() will
see that, and return that already existing bfqq. We'll then modify the
not-freshly-allocated bfqq.

Should bfq_split_bfqq() here have an is_sync param, and clear the
corresponding bic->bfqq?
---




On Mon, Oct 31, 2022 at 10:05 PM Khazhy Kumykov <khazhy@chromium.org> wrote:
>
> On Mon, Oct 31, 2022 at 9:37 PM Khazhy Kumykov <khazhy@chromium.org> wrote:
> >
> > I'm investigating a NULL deref crash in bfq_add_bfqq_busy(), wherein
> > bfqq->woken_list_node is hashed, but bfqq->waker_bfqq is NULL - which
> > seems inconsistent per my reading of the code.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3999 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-01 23:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01  4:37 Why don't we always grab bfqd->lock for bio_bfqq? Khazhy Kumykov
2022-11-01  5:05 ` Khazhy Kumykov
2022-11-01 23:16   ` NULL deref crash in bfq_add_bfqq_busy Khazhy Kumykov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.