All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Damien Le Moal <damien.lemoal@wdc.com>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>,
	linux-block@vger.kernel.org
Subject: [PATCH AUTOSEL 5.10 18/22] null_blk: fix command timeout completion handling
Date: Mon,  5 Apr 2021 12:04:27 -0400	[thread overview]
Message-ID: <20210405160432.268374-18-sashal@kernel.org> (raw)
In-Reply-To: <20210405160432.268374-1-sashal@kernel.org>

From: Damien Le Moal <damien.lemoal@wdc.com>

[ Upstream commit de3510e52b0a398261271455562458003b8eea62 ]

Memory backed or zoned null block devices may generate actual request
timeout errors due to the submission path being blocked on memory
allocation or zone locking. Unlike fake timeouts or injected timeouts,
the request submission path will call blk_mq_complete_request() or
blk_mq_end_request() for these real timeout errors, causing a double
completion and use after free situation as the block layer timeout
handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in
blk_mq_check_expired(). This problem often triggers a NULL pointer
dereference such as:

BUG: kernel NULL pointer dereference, address: 0000000000000050
RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20
...
Call Trace:
  dd_finish_request+0x56/0x80
  blk_mq_free_request+0x37/0x130
  null_handle_cmd+0xbf/0x250 [null_blk]
  ? null_queue_rq+0x67/0xd0 [null_blk]
  blk_mq_dispatch_rq_list+0x122/0x850
  __blk_mq_do_dispatch_sched+0xbb/0x2c0
  __blk_mq_sched_dispatch_requests+0x13d/0x190
  blk_mq_sched_dispatch_requests+0x30/0x60
  __blk_mq_run_hw_queue+0x49/0x90
  process_one_work+0x26c/0x580
  worker_thread+0x55/0x3c0
  ? process_one_work+0x580/0x580
  kthread+0x134/0x150
  ? kthread_create_worker_on_cpu+0x70/0x70
  ret_from_fork+0x1f/0x30

This problem very often triggers when running the full btrfs xfstests
on a memory-backed zoned null block device in a VM with limited amount
of memory.

Avoid this by executing blk_mq_complete_request() in null_timeout_rq()
only for commands that are marked for a fake timeout completion using
the fake_timeout boolean in struct null_cmd. For timeout errors injected
through debugfs, the timeout handler will execute
blk_mq_complete_request()i as before. This is safe as the submission
path does not execute complete requests in this case.

In null_timeout_rq(), also make sure to set the command error field to
BLK_STS_TIMEOUT and to propagate this error through to the request
completion.

Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/block/null_blk.h      |  1 +
 drivers/block/null_blk_main.c | 26 +++++++++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index c24d9b5ad81a..7de703f28617 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -20,6 +20,7 @@ struct nullb_cmd {
 	blk_status_t error;
 	struct nullb_queue *nq;
 	struct hrtimer timer;
+	bool fake_timeout;
 };
 
 struct nullb_queue {
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 4685ea401d5b..bb3686c3869d 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1367,10 +1367,13 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd, sector_t sector,
 	}
 
 	if (dev->zoned)
-		cmd->error = null_process_zoned_cmd(cmd, op,
-						    sector, nr_sectors);
+		sts = null_process_zoned_cmd(cmd, op, sector, nr_sectors);
 	else
-		cmd->error = null_process_cmd(cmd, op, sector, nr_sectors);
+		sts = null_process_cmd(cmd, op, sector, nr_sectors);
+
+	/* Do not overwrite errors (e.g. timeout errors) */
+	if (cmd->error == BLK_STS_OK)
+		cmd->error = sts;
 
 out:
 	nullb_complete_cmd(cmd);
@@ -1449,8 +1452,20 @@ static bool should_requeue_request(struct request *rq)
 
 static enum blk_eh_timer_return null_timeout_rq(struct request *rq, bool res)
 {
+	struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
+
 	pr_info("rq %p timed out\n", rq);
-	blk_mq_complete_request(rq);
+
+	/*
+	 * If the device is marked as blocking (i.e. memory backed or zoned
+	 * device), the submission path may be blocked waiting for resources
+	 * and cause real timeouts. For these real timeouts, the submission
+	 * path will complete the request using blk_mq_complete_request().
+	 * Only fake timeouts need to execute blk_mq_complete_request() here.
+	 */
+	cmd->error = BLK_STS_TIMEOUT;
+	if (cmd->fake_timeout)
+		blk_mq_complete_request(rq);
 	return BLK_EH_DONE;
 }
 
@@ -1471,6 +1486,7 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 	cmd->rq = bd->rq;
 	cmd->error = BLK_STS_OK;
 	cmd->nq = nq;
+	cmd->fake_timeout = should_timeout_request(bd->rq);
 
 	blk_mq_start_request(bd->rq);
 
@@ -1487,7 +1503,7 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 			return BLK_STS_OK;
 		}
 	}
-	if (should_timeout_request(bd->rq))
+	if (cmd->fake_timeout)
 		return BLK_STS_OK;
 
 	return null_handle_cmd(cmd, sector, nr_sectors, req_op(bd->rq));
-- 
2.30.2


  parent reply	other threads:[~2021-04-05 16:05 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 16:04 [PATCH AUTOSEL 5.10 01/22] interconnect: core: fix error return code of icc_link_destroy() Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 02/22] gfs2: Flag a withdraw if init_threads() fails Sasha Levin
2021-04-05 16:04   ` [Cluster-devel] " Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 03/22] KVM: arm64: Hide system instruction access to Trace registers Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 04/22] KVM: arm64: Disable guest access to trace filter controls Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 05/22] drm/imx: imx-ldb: fix out of bounds array access warning Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 06/22] gfs2: report "already frozen/thawed" errors Sasha Levin
2021-04-05 16:04   ` [Cluster-devel] " Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 07/22] scsi: iscsi: Fix race condition between login and sync thread Sasha Levin
2021-04-06 17:24   ` Mike Christie
2021-04-06 19:22     ` Greg KH
2021-04-14 12:14     ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 08/22] ftrace: Check if pages were allocated before calling free_pages() Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 09/22] tools/kvm_stat: Add restart delay Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 10/22] drm/tegra: dc: Don't set PLL clock to 0Hz Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 11/22] gpu: host1x: Use different lock classes for each client Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 12/22] XArray: Fix splitting to non-zero orders Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 13/22] radix tree test suite: Fix compilation Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 14/22] block: only update parent bi_status when bio fail Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 15/22] radix tree test suite: Register the main thread with the RCU library Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 16/22] idr test suite: Take RCU read lock in idr_find_test_1 Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 17/22] idr test suite: Create anchor before launching throbber Sasha Levin
2021-04-05 16:04 ` Sasha Levin [this message]
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 19/22] io_uring: don't mark S_ISBLK async work as unbounded Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 20/22] riscv: evaluate put_user() arg before enabling user access Sasha Levin
2021-04-05 16:04   ` Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 21/22] riscv,entry: fix misaligned base for excp_vect_table Sasha Levin
2021-04-05 16:04   ` [PATCH AUTOSEL 5.10 21/22] riscv, entry: " Sasha Levin
2021-04-05 16:04 ` [PATCH AUTOSEL 5.10 22/22] block: don't ignore REQ_NOWAIT for direct IO Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210405160432.268374-18-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.