All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai3@huawei.com>
To: <ming.lei@redhat.com>, <axboe@kernel.dk>
Cc: <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<yukuai3@huawei.com>, <yi.zhang@huawei.com>
Subject: [PATCH -next v2] blk-mq: fix panic during blk_mq_run_work_fn()
Date: Fri, 20 May 2022 11:25:42 +0800	[thread overview]
Message-ID: <20220520032542.3331610-1-yukuai3@huawei.com> (raw)

Our test report a following crash:

BUG: kernel NULL pointer dereference, address: 0000000000000018
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 6 PID: 265 Comm: kworker/6:1H Kdump: loaded Tainted: G           O      5.10.0-60.17.0.h43.eulerosv2r11.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220320_160524-szxrtosci10000 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:blk_mq_delay_run_hw_queues+0xb6/0xe0
RSP: 0018:ffffacc6803d3d88 EFLAGS: 00010246
RAX: 0000000000000006 RBX: ffff99e2c3d25008 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff99e2c911ae18
RBP: ffffacc6803d3dd8 R08: 0000000000000000 R09: ffff99e2c0901f6c
R10: 0000000000000018 R11: 0000000000000018 R12: ffff99e2c911ae18
R13: 0000000000000000 R14: 0000000000000003 R15: ffff99e2c911ae18
FS:  0000000000000000(0000) GS:ffff99e6bbf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 000000007460a006 CR4: 00000000003706e0
Call Trace:
 __blk_mq_do_dispatch_sched+0x2a7/0x2c0
 ? newidle_balance+0x23e/0x2f0
 __blk_mq_sched_dispatch_requests+0x13f/0x190
 blk_mq_sched_dispatch_requests+0x30/0x60
 __blk_mq_run_hw_queue+0x47/0xd0
 process_one_work+0x1b0/0x350
 worker_thread+0x49/0x300
 ? rescuer_thread+0x3a0/0x3a0
 kthread+0xfe/0x140
 ? kthread_park+0x90/0x90
 ret_from_fork+0x22/0x30

After digging from vmcore, I found that the queue is cleaned
up(blk_cleanup_queue() is done) and tag set is
freed(blk_mq_free_tag_set() is done).

There are two problems here:

1) blk_mq_delay_run_hw_queues() will only be called from
__blk_mq_do_dispatch_sched() if e->type->ops.has_work() return true.
This seems impossible because blk_cleanup_queue() is done, and there
should be no io. Commit ddc25c86b466 ("block, bfq: make bfq_has_work()
more accurate") fix the problem in bfq. And currently ohter schedulers
don't have such problem.

2) 'hctx->run_work' still exists after blk_cleanup_queue().
blk_mq_cancel_work_sync() is called from blk_cleanup_queue() to cancel
all the 'run_work'. However, there is no guarantee that new 'run_work'
won't be queued after that(and before blk_mq_exit_queue() is done).

The first problem is not the root cause, it will only increase the
probability of the second problem. this patch fix the second problem by
checking the 'QUEUE_FLAG_DEAD' before queuing 'hctx->run_work', and
using 'queue_lock' to synchronize queuing new work and cacelling the
old work.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-core.c |  3 +++
 block/blk-mq.c   | 10 ++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 80fa73c419a9..f3e36d8143ec 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -314,7 +314,10 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 
+	/* New 'hctx->run_work' can't be queued after setting the dead flag */
+	spin_lock_irq(&q->queue_lock);
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
+	spin_unlock_irq(&q->queue_lock);
 
 	blk_sync_queue(q);
 	if (queue_is_mq(q)) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ed1869a305c4..fb35d335d554 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2093,6 +2093,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
 					unsigned long msecs)
 {
+	unsigned long flags;
+
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
@@ -2107,8 +2109,12 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
 		put_cpu();
 	}
 
-	kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
-				    msecs_to_jiffies(msecs));
+	spin_lock_irqsave(&hctx->queue->queue_lock, flags);
+	if (!blk_queue_dead(hctx->queue))
+		kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+					    &hctx->run_work,
+					    msecs_to_jiffies(msecs));
+	spin_unlock_irqrestore(&hctx->queue->queue_lock, flags);
 }
 
 /**
-- 
2.31.1


             reply	other threads:[~2022-05-20  3:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-20  3:25 Yu Kuai [this message]
2022-05-20  3:44 ` [PATCH -next v2] blk-mq: fix panic during blk_mq_run_work_fn() Ming Lei
2022-05-20  6:23   ` yukuai (C)
2022-05-20  7:02     ` yukuai (C)
2022-05-20  8:34       ` Ming Lei
2022-05-20  8:49         ` yukuai (C)
2022-05-20  9:53           ` Ming Lei
2022-05-20 10:56             ` yukuai (C)
2022-05-20 11:39               ` Ming Lei
2022-05-20 12:01                 ` yukuai (C)
2022-05-20 13:56                   ` Ming Lei
2022-05-21  3:33                     ` yukuai (C)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220520032542.3331610-1-yukuai3@huawei.com \
    --to=yukuai3@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.