linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/6] handle unexpected message from server
@ 2021-09-15  8:15 Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

This patch set tries to fix that client might oops if nbd server send
unexpected message to client, for example, our syzkaller report a uaf
in nbd_read_stat():

Call trace:
 dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x144/0x1b4 lib/dump_stack.c:118
 print_address_description+0x68/0x2d0 mm/kasan/report.c:253
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x134/0x2f0 mm/kasan/report.c:409
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
 __read_once_size include/linux/compiler.h:193 [inline]
 blk_mq_rq_state block/blk-mq.h:106 [inline]
 blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
 nbd_read_stat drivers/block/nbd.c:670 [inline]
 recv_work+0x1bc/0x890 drivers/block/nbd.c:749
 process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2147
 worker_thread+0x80/0x9d0 kernel/workqueue.c:2302
 kthread+0x1d8/0x1e0 kernel/kthread.c:255
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174

1) At first, a normal io is submitted and completed with scheduler:

internel_tag = blk_mq_get_tag -> get tag from sched_tags
 blk_mq_rq_ctx_init
  sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
...
blk_mq_get_driver_tag
 __blk_mq_get_driver_tag -> get tag from tags
 tags->rq[tag] = sched_tag->static_rq[internel_tag]

So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
to the request: sched_tags->static_rq[internal_tag]. Even if the
io is finished.

2) nbd server send a reply with random tag directly:

recv_work
 nbd_read_stat
  blk_mq_tag_to_rq(tags, tag)
   rq = tags->rq[tag]

3) if the sched_tags->static_rq is freed:

blk_mq_sched_free_requests
 blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
  -> step 2) access rq before clearing rq mapping
  blk_mq_clear_rq_mapping(set, tags, hctx_idx);
  __free_pages() -> rq is freed here

4) Then, nbd continue to use the freed request in nbd_read_stat()

Changes in v6:
 - don't set cmd->status to error if request is completed before
 nbd_clear_req().
 - get 'q_usage_counter' to prevent accessing freed request through
 blk_mq_tag_to_rq(), instead of using blk_mq_find_and_get_req().
Changes in v5:
 - move patch 1 & 2 in v4 (patch 4 & 5 in v5) behind
 - add some comment in patch 5
Changes in v4:
 - change the name of the patchset, since uaf is not the only problem
 if server send unexpected reply message.
 - instead of adding new interface, use blk_mq_find_and_get_req().
 - add patch 5 to this series
Changes in v3:
 - v2 can't fix the problem thoroughly, add patch 3-4 to this series.
 - modify descriptions.
 - patch 5 is just a cleanup
Changes in v2:
 - as Bart suggested, add a new helper function for drivers to get
 request by tag.

Yu Kuai (6):
  nbd: don't handle response without a corresponding request message
  nbd: make sure request completion won't concurrent
  nbd: check sock index in nbd_read_stat()
  nbd: don't start request if nbd_queue_rq() failed
  nbd: partition nbd_read_stat() into nbd_read_reply() and
    nbd_handle_reply()
  nbd: fix uaf in nbd_handle_reply()

 block/blk-core.c    |   1 +
 drivers/block/nbd.c | 127 ++++++++++++++++++++++++++++++++------------
 2 files changed, 94 insertions(+), 34 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v6 1/6] nbd: don't handle response without a corresponding request message
  2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
@ 2021-09-15  8:15 ` Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 2/6] nbd: make sure request completion won't concurrent Yu Kuai
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

While handling a response message from server, nbd_read_stat() will
try to get request by tag, and then complete the request. However,
this is problematic if nbd haven't sent a corresponding request
message:

t1                      t2
                        submit_bio
                         nbd_queue_rq
                          blk_mq_start_request
recv_work
 nbd_read_stat
  blk_mq_tag_to_rq
 blk_mq_complete_request
                          nbd_send_cmd

Thus add a new cmd flag 'NBD_CMD_INFLIGHT', it will be set in
nbd_send_cmd() and checked in nbd_read_stat().

Noted that this patch can't fix that blk_mq_tag_to_rq() might
return a freed request, and this will be fixed in following
patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/nbd.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..23ded569e8d3 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -126,6 +126,12 @@ struct nbd_device {
 };
 
 #define NBD_CMD_REQUEUED	1
+/*
+ * This flag will be set if nbd_queue_rq() succeed, and will be checked and
+ * cleared in completion. Both setting and clearing of the flag are protected
+ * by cmd->lock.
+ */
+#define NBD_CMD_INFLIGHT	2
 
 struct nbd_cmd {
 	struct nbd_device *nbd;
@@ -400,6 +406,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	if (!mutex_trylock(&cmd->lock))
 		return BLK_EH_RESET_TIMER;
 
+	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		cmd->status = BLK_STS_TIMEOUT;
 		mutex_unlock(&cmd->lock);
@@ -729,6 +736,12 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	cmd = blk_mq_rq_to_pdu(req);
 
 	mutex_lock(&cmd->lock);
+	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+		dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d (status %u flags %lu)",
+			tag, cmd->status, cmd->flags);
+		ret = -ENOENT;
+		goto out;
+	}
 	if (cmd->cmd_cookie != nbd_handle_to_cookie(handle)) {
 		dev_err(disk_to_dev(nbd->disk), "Double reply on req %p, cmd_cookie %u, handle cookie %u\n",
 			req, cmd->cmd_cookie, nbd_handle_to_cookie(handle));
@@ -828,6 +841,7 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
 		return true;
 
 	mutex_lock(&cmd->lock);
+	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
 	cmd->status = BLK_STS_IOERR;
 	mutex_unlock(&cmd->lock);
 
@@ -964,7 +978,13 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	 * returns EAGAIN can be retried on a different socket.
 	 */
 	ret = nbd_send_cmd(nbd, cmd, index);
-	if (ret == -EAGAIN) {
+	/*
+	 * Access to this flag is protected by cmd->lock, thus it's safe to set
+	 * the flag after nbd_send_cmd() succeed to send request to server.
+	 */
+	if (!ret)
+		__set_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+	else if (ret == -EAGAIN) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Request send failed, requeueing\n");
 		nbd_mark_nsock_dead(nbd, nsock, 1);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 2/6] nbd: make sure request completion won't concurrent
  2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
@ 2021-09-15  8:15 ` Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 3/6] nbd: check sock index in nbd_read_stat() Yu Kuai
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

commit cddce0116058 ("nbd: Aovid double completion of a request")
try to fix that nbd_clear_que() and recv_work() can complete a
request concurrently. However, the problem still exists:

t1                    t2                     t3

nbd_disconnect_and_put
 flush_workqueue
                      recv_work
                       blk_mq_complete_request
                        blk_mq_complete_request_remote -> this is true
                         WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)
                          blk_mq_raise_softirq
                                             blk_done_softirq
                                              blk_complete_reqs
                                               nbd_complete_rq
                                                blk_mq_end_request
                                                 blk_mq_free_request
                                                  WRITE_ONCE(rq->state, MQ_RQ_IDLE)
  nbd_clear_que
   blk_mq_tagset_busy_iter
    nbd_clear_req
                                                   __blk_mq_free_request
                                                    blk_mq_put_tag
     blk_mq_complete_request -> complete again

There are three places where request can be completed in nbd:
recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they
all hold cmd->lock before completing the request, it's easy to
avoid the problem by setting and checking a cmd flag.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 23ded569e8d3..614c6ab2b8fe 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -406,7 +406,11 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	if (!mutex_trylock(&cmd->lock))
 		return BLK_EH_RESET_TIMER;
 
-	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+		mutex_unlock(&cmd->lock);
+		return BLK_EH_DONE;
+	}
+
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		cmd->status = BLK_STS_TIMEOUT;
 		mutex_unlock(&cmd->lock);
@@ -841,7 +845,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
 		return true;
 
 	mutex_lock(&cmd->lock);
-	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+		mutex_unlock(&cmd->lock);
+		return true;
+	}
 	cmd->status = BLK_STS_IOERR;
 	mutex_unlock(&cmd->lock);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 3/6] nbd: check sock index in nbd_read_stat()
  2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 2/6] nbd: make sure request completion won't concurrent Yu Kuai
@ 2021-09-15  8:15 ` Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 4/6] nbd: don't start request if nbd_queue_rq() failed Yu Kuai
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

The sock that clent send request in nbd_send_cmd() and receive reply
in nbd_read_stat() should be the same.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 614c6ab2b8fe..c724a5bd7fa4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -746,6 +746,10 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		ret = -ENOENT;
 		goto out;
 	}
+	if (cmd->index != index) {
+		dev_err(disk_to_dev(nbd->disk), "Unexpected reply %d from different sock %d (expected %d)",
+			tag, index, cmd->index);
+	}
 	if (cmd->cmd_cookie != nbd_handle_to_cookie(handle)) {
 		dev_err(disk_to_dev(nbd->disk), "Double reply on req %p, cmd_cookie %u, handle cookie %u\n",
 			req, cmd->cmd_cookie, nbd_handle_to_cookie(handle));
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 4/6] nbd: don't start request if nbd_queue_rq() failed
  2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
                   ` (2 preceding siblings ...)
  2021-09-15  8:15 ` [PATCH v6 3/6] nbd: check sock index in nbd_read_stat() Yu Kuai
@ 2021-09-15  8:15 ` Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 5/6] nbd: partition nbd_read_stat() into nbd_read_reply() and nbd_handle_reply() Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply() Yu Kuai
  5 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

Currently, blk_mq_end_request() will be called if nbd_queue_rq()
failed, thus start request in such situation is useless.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/nbd.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c724a5bd7fa4..22c91d8901f6 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -934,7 +934,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Socks array is empty\n");
-		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	config = nbd->config;
@@ -943,7 +942,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Attempted send on invalid socket\n");
 		nbd_config_put(nbd);
-		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	cmd->status = BLK_STS_OK;
@@ -967,7 +965,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 			 */
 			sock_shutdown(nbd);
 			nbd_config_put(nbd);
-			blk_mq_start_request(req);
 			return -EIO;
 		}
 		goto again;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 5/6] nbd: partition nbd_read_stat() into nbd_read_reply() and nbd_handle_reply()
  2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
                   ` (3 preceding siblings ...)
  2021-09-15  8:15 ` [PATCH v6 4/6] nbd: don't start request if nbd_queue_rq() failed Yu Kuai
@ 2021-09-15  8:15 ` Yu Kuai
  2021-09-15  8:15 ` [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply() Yu Kuai
  5 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

Prepare to fix uaf in nbd_read_stat(), no functional changes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 76 +++++++++++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 22c91d8901f6..9a7bbf8ebe74 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -694,38 +694,45 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
-/* NULL returned = something went wrong, inform userspace */
-static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
+static int nbd_read_reply(struct nbd_device *nbd, int index,
+			  struct nbd_reply *reply)
 {
-	struct nbd_config *config = nbd->config;
-	int result;
-	struct nbd_reply reply;
-	struct nbd_cmd *cmd;
-	struct request *req = NULL;
-	u64 handle;
-	u16 hwq;
-	u32 tag;
-	struct kvec iov = {.iov_base = &reply, .iov_len = sizeof(reply)};
+	struct kvec iov = {.iov_base = reply, .iov_len = sizeof(*reply)};
 	struct iov_iter to;
-	int ret = 0;
+	int result;
 
-	reply.magic = 0;
-	iov_iter_kvec(&to, READ, &iov, 1, sizeof(reply));
+	reply->magic = 0;
+	iov_iter_kvec(&to, READ, &iov, 1, sizeof(*reply));
 	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
-	if (result <= 0) {
-		if (!nbd_disconnected(config))
+	if (result < 0) {
+		if (!nbd_disconnected(nbd->config))
 			dev_err(disk_to_dev(nbd->disk),
 				"Receive control failed (result %d)\n", result);
-		return ERR_PTR(result);
+		return result;
 	}
 
-	if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
+	if (ntohl(reply->magic) != NBD_REPLY_MAGIC) {
 		dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n",
-				(unsigned long)ntohl(reply.magic));
-		return ERR_PTR(-EPROTO);
+				(unsigned long)ntohl(reply->magic));
+		return -EPROTO;
 	}
 
-	memcpy(&handle, reply.handle, sizeof(handle));
+	return 0;
+}
+
+/* NULL returned = something went wrong, inform userspace */
+static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
+					struct nbd_reply *reply)
+{
+	int result;
+	struct nbd_cmd *cmd;
+	struct request *req = NULL;
+	u64 handle;
+	u16 hwq;
+	u32 tag;
+	int ret = 0;
+
+	memcpy(&handle, reply->handle, sizeof(handle));
 	tag = nbd_handle_to_tag(handle);
 	hwq = blk_mq_unique_tag_to_hwq(tag);
 	if (hwq < nbd->tag_set.nr_hw_queues)
@@ -768,9 +775,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		ret = -ENOENT;
 		goto out;
 	}
-	if (ntohl(reply.error)) {
+	if (ntohl(reply->error)) {
 		dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n",
-			ntohl(reply.error));
+			ntohl(reply->error));
 		cmd->status = BLK_STS_IOERR;
 		goto out;
 	}
@@ -779,6 +786,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	if (rq_data_dir(req) != WRITE) {
 		struct req_iterator iter;
 		struct bio_vec bvec;
+		struct iov_iter to;
 
 		rq_for_each_segment(bvec, req, iter) {
 			iov_iter_bvec(&to, READ, &bvec, 1, bvec.bv_len);
@@ -792,7 +800,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 				 * and let the timeout stuff handle resubmitting
 				 * this request onto another connection.
 				 */
-				if (nbd_disconnected(config)) {
+				if (nbd_disconnected(nbd->config)) {
 					cmd->status = BLK_STS_IOERR;
 					goto out;
 				}
@@ -816,24 +824,30 @@ static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct nbd_sock *nsock;
 	struct nbd_cmd *cmd;
 	struct request *rq;
 
 	while (1) {
-		cmd = nbd_read_stat(nbd, args->index);
-		if (IS_ERR(cmd)) {
-			struct nbd_sock *nsock = config->socks[args->index];
+		struct nbd_reply reply;
 
-			mutex_lock(&nsock->tx_lock);
-			nbd_mark_nsock_dead(nbd, nsock, 1);
-			mutex_unlock(&nsock->tx_lock);
+		if (nbd_read_reply(nbd, args->index, &reply))
+			break;
+
+		cmd = nbd_handle_reply(nbd, args->index, &reply);
+		if (IS_ERR(cmd))
 			break;
-		}
 
 		rq = blk_mq_rq_from_pdu(cmd);
 		if (likely(!blk_should_fake_timeout(rq->q)))
 			blk_mq_complete_request(rq);
 	}
+
+	nsock = config->socks[args->index];
+	mutex_lock(&nsock->tx_lock);
+	nbd_mark_nsock_dead(nbd, nsock, 1);
+	mutex_unlock(&nsock->tx_lock);
+
 	nbd_config_put(nbd);
 	atomic_dec(&config->recv_threads);
 	wake_up(&config->recv_wq);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply()
  2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
                   ` (4 preceding siblings ...)
  2021-09-15  8:15 ` [PATCH v6 5/6] nbd: partition nbd_read_stat() into nbd_read_reply() and nbd_handle_reply() Yu Kuai
@ 2021-09-15  8:15 ` Yu Kuai
  2021-09-15  8:20   ` Christoph Hellwig
  2021-09-15  8:28   ` Ming Lei
  5 siblings, 2 replies; 11+ messages in thread
From: Yu Kuai @ 2021-09-15  8:15 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

There is a problem that nbd_handle_reply() might access freed request:

1) At first, a normal io is submitted and completed with scheduler:

internel_tag = blk_mq_get_tag -> get tag from sched_tags
 blk_mq_rq_ctx_init
  sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
...
blk_mq_get_driver_tag
 __blk_mq_get_driver_tag -> get tag from tags
 tags->rq[tag] = sched_tag->static_rq[internel_tag]

So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
to the request: sched_tags->static_rq[internal_tag]. Even if the
io is finished.

2) nbd server send a reply with random tag directly:

recv_work
 nbd_handle_reply
  blk_mq_tag_to_rq(tags, tag)
   rq = tags->rq[tag]

3) if the sched_tags->static_rq is freed:

blk_mq_sched_free_requests
 blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
  -> step 2) access rq before clearing rq mapping
  blk_mq_clear_rq_mapping(set, tags, hctx_idx);
  __free_pages() -> rq is freed here

4) Then, nbd continue to use the freed request in nbd_handle_reply

Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
thus request is ensured not to be freed because 'q_usage_counter' is
not zero.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-core.c    |  1 +
 drivers/block/nbd.c | 19 ++++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5454db2fa263..2008e6903166 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -489,6 +489,7 @@ void blk_queue_exit(struct request_queue *q)
 {
 	percpu_ref_put(&q->q_usage_counter);
 }
+EXPORT_SYMBOL(blk_queue_exit);
 
 static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 {
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9a7bbf8ebe74..f065afcc7586 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -824,6 +824,7 @@ static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct request_queue *q = nbd->disk->queue;
 	struct nbd_sock *nsock;
 	struct nbd_cmd *cmd;
 	struct request *rq;
@@ -834,13 +835,29 @@ static void recv_work(struct work_struct *work)
 		if (nbd_read_reply(nbd, args->index, &reply))
 			break;
 
+		/*
+		 * Get q_usage_counter can prevent accessing freed request
+		 * through blk_mq_tag_to_rq() in nbd_handle_reply(). If
+		 * q_usage_counter is zero, then no request is inflight, which
+		 * means something is wrong since we expect to find a request to
+		 * complete here.
+		 */
+		if (!percpu_ref_tryget(&q->q_usage_counter)) {
+			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
+				__func__);
+			break;
+		}
+
 		cmd = nbd_handle_reply(nbd, args->index, &reply);
-		if (IS_ERR(cmd))
+		if (IS_ERR(cmd)) {
+			blk_queue_exit(q);
 			break;
+		}
 
 		rq = blk_mq_rq_from_pdu(cmd);
 		if (likely(!blk_should_fake_timeout(rq->q)))
 			blk_mq_complete_request(rq);
+		blk_queue_exit(q);
 	}
 
 	nsock = config->socks[args->index];
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply()
  2021-09-15  8:15 ` [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply() Yu Kuai
@ 2021-09-15  8:20   ` Christoph Hellwig
  2021-09-15  8:44     ` yukuai (C)
  2021-09-15  8:28   ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-09-15  8:20 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, josef, ming.lei, hch, linux-block, linux-kernel, nbd, yi.zhang

On Wed, Sep 15, 2021 at 04:15:37PM +0800, Yu Kuai wrote:
> +++ b/block/blk-core.c
> @@ -489,6 +489,7 @@ void blk_queue_exit(struct request_queue *q)
>  {
>  	percpu_ref_put(&q->q_usage_counter);
>  }
> +EXPORT_SYMBOL(blk_queue_exit);

These needs to be an EXPORT_SYMBOL_GPL.  But more importantly it
needs to be a separate properly documented patch, and this function
needs to grow a kerneldoc comment as well.

> +		/*
> +		 * Get q_usage_counter can prevent accessing freed request
> +		 * through blk_mq_tag_to_rq() in nbd_handle_reply(). If
> +		 * q_usage_counter is zero, then no request is inflight, which
> +		 * means something is wrong since we expect to find a request to
> +		 * complete here.
> +		 */
> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
> +				__func__);
> +			break;
> +		}

And this needs a properly documented wrapper as well.

> +
>  		cmd = nbd_handle_reply(nbd, args->index, &reply);
> -		if (IS_ERR(cmd))
> +		if (IS_ERR(cmd)) {
> +			blk_queue_exit(q);
>  			break;
> +		}
>  
>  		rq = blk_mq_rq_from_pdu(cmd);
>  		if (likely(!blk_should_fake_timeout(rq->q)))
>  			blk_mq_complete_request(rq);
> +		blk_queue_exit(q);

That being said I can't say I like how this exposed block layer
internals.  We don't really need a reference to the queue here
anywhere, you just use it as a dumb debug check.  If we really want to
reuse (abuse?) q_usage_counter a helper to just grab a reference and
immediately drop it might be a better fit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply()
  2021-09-15  8:15 ` [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply() Yu Kuai
  2021-09-15  8:20   ` Christoph Hellwig
@ 2021-09-15  8:28   ` Ming Lei
  2021-09-15  8:41     ` yukuai (C)
  1 sibling, 1 reply; 11+ messages in thread
From: Ming Lei @ 2021-09-15  8:28 UTC (permalink / raw)
  To: Yu Kuai; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Wed, Sep 15, 2021 at 04:15:37PM +0800, Yu Kuai wrote:
> There is a problem that nbd_handle_reply() might access freed request:
> 
> 1) At first, a normal io is submitted and completed with scheduler:
> 
> internel_tag = blk_mq_get_tag -> get tag from sched_tags
>  blk_mq_rq_ctx_init
>   sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
> ...
> blk_mq_get_driver_tag
>  __blk_mq_get_driver_tag -> get tag from tags
>  tags->rq[tag] = sched_tag->static_rq[internel_tag]
> 
> So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
> to the request: sched_tags->static_rq[internal_tag]. Even if the
> io is finished.
> 
> 2) nbd server send a reply with random tag directly:
> 
> recv_work
>  nbd_handle_reply
>   blk_mq_tag_to_rq(tags, tag)
>    rq = tags->rq[tag]
> 
> 3) if the sched_tags->static_rq is freed:
> 
> blk_mq_sched_free_requests
>  blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
>   -> step 2) access rq before clearing rq mapping
>   blk_mq_clear_rq_mapping(set, tags, hctx_idx);
>   __free_pages() -> rq is freed here
> 
> 4) Then, nbd continue to use the freed request in nbd_handle_reply
> 
> Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
> thus request is ensured not to be freed because 'q_usage_counter' is
> not zero.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/blk-core.c    |  1 +
>  drivers/block/nbd.c | 19 ++++++++++++++++++-
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5454db2fa263..2008e6903166 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -489,6 +489,7 @@ void blk_queue_exit(struct request_queue *q)
>  {
>  	percpu_ref_put(&q->q_usage_counter);
>  }
> +EXPORT_SYMBOL(blk_queue_exit);
>  
>  static void blk_queue_usage_counter_release(struct percpu_ref *ref)
>  {
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 9a7bbf8ebe74..f065afcc7586 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -824,6 +824,7 @@ static void recv_work(struct work_struct *work)
>  						     work);
>  	struct nbd_device *nbd = args->nbd;
>  	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
>  	struct nbd_sock *nsock;
>  	struct nbd_cmd *cmd;
>  	struct request *rq;
> @@ -834,13 +835,29 @@ static void recv_work(struct work_struct *work)
>  		if (nbd_read_reply(nbd, args->index, &reply))
>  			break;
>  
> +		/*
> +		 * Get q_usage_counter can prevent accessing freed request
> +		 * through blk_mq_tag_to_rq() in nbd_handle_reply(). If
> +		 * q_usage_counter is zero, then no request is inflight, which
> +		 * means something is wrong since we expect to find a request to
> +		 * complete here.
> +		 */
> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
> +				__func__);
> +			break;
> +		}
> +
>  		cmd = nbd_handle_reply(nbd, args->index, &reply);
> -		if (IS_ERR(cmd))
> +		if (IS_ERR(cmd)) {
> +			blk_queue_exit(q);
>  			break;
> +		}
>  
>  		rq = blk_mq_rq_from_pdu(cmd);
>  		if (likely(!blk_should_fake_timeout(rq->q)))
>  			blk_mq_complete_request(rq);
> +		blk_queue_exit(q);

You can simply call percpu_ref_put() directly just like what scsi_end_request()
is doing.

-- 
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply()
  2021-09-15  8:28   ` Ming Lei
@ 2021-09-15  8:41     ` yukuai (C)
  0 siblings, 0 replies; 11+ messages in thread
From: yukuai (C) @ 2021-09-15  8:41 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/15 16:28, Ming Lei wrote:
> On Wed, Sep 15, 2021 at 04:15:37PM +0800, Yu Kuai wrote:
>> There is a problem that nbd_handle_reply() might access freed request:
>>
>> 1) At first, a normal io is submitted and completed with scheduler:
>>
>> internel_tag = blk_mq_get_tag -> get tag from sched_tags
>>   blk_mq_rq_ctx_init
>>    sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
>> ...
>> blk_mq_get_driver_tag
>>   __blk_mq_get_driver_tag -> get tag from tags
>>   tags->rq[tag] = sched_tag->static_rq[internel_tag]
>>
>> So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
>> to the request: sched_tags->static_rq[internal_tag]. Even if the
>> io is finished.
>>
>> 2) nbd server send a reply with random tag directly:
>>
>> recv_work
>>   nbd_handle_reply
>>    blk_mq_tag_to_rq(tags, tag)
>>     rq = tags->rq[tag]
>>
>> 3) if the sched_tags->static_rq is freed:
>>
>> blk_mq_sched_free_requests
>>   blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
>>    -> step 2) access rq before clearing rq mapping
>>    blk_mq_clear_rq_mapping(set, tags, hctx_idx);
>>    __free_pages() -> rq is freed here
>>
>> 4) Then, nbd continue to use the freed request in nbd_handle_reply
>>
>> Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
>> thus request is ensured not to be freed because 'q_usage_counter' is
>> not zero.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   block/blk-core.c    |  1 +
>>   drivers/block/nbd.c | 19 ++++++++++++++++++-
>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 5454db2fa263..2008e6903166 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -489,6 +489,7 @@ void blk_queue_exit(struct request_queue *q)
>>   {
>>   	percpu_ref_put(&q->q_usage_counter);
>>   }
>> +EXPORT_SYMBOL(blk_queue_exit);
>>   
>>   static void blk_queue_usage_counter_release(struct percpu_ref *ref)
>>   {
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 9a7bbf8ebe74..f065afcc7586 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -824,6 +824,7 @@ static void recv_work(struct work_struct *work)
>>   						     work);
>>   	struct nbd_device *nbd = args->nbd;
>>   	struct nbd_config *config = nbd->config;
>> +	struct request_queue *q = nbd->disk->queue;
>>   	struct nbd_sock *nsock;
>>   	struct nbd_cmd *cmd;
>>   	struct request *rq;
>> @@ -834,13 +835,29 @@ static void recv_work(struct work_struct *work)
>>   		if (nbd_read_reply(nbd, args->index, &reply))
>>   			break;
>>   
>> +		/*
>> +		 * Get q_usage_counter can prevent accessing freed request
>> +		 * through blk_mq_tag_to_rq() in nbd_handle_reply(). If
>> +		 * q_usage_counter is zero, then no request is inflight, which
>> +		 * means something is wrong since we expect to find a request to
>> +		 * complete here.
>> +		 */
>> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
>> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
>> +				__func__);
>> +			break;
>> +		}
>> +
>>   		cmd = nbd_handle_reply(nbd, args->index, &reply);
>> -		if (IS_ERR(cmd))
>> +		if (IS_ERR(cmd)) {
>> +			blk_queue_exit(q);
>>   			break;
>> +		}
>>   
>>   		rq = blk_mq_rq_from_pdu(cmd);
>>   		if (likely(!blk_should_fake_timeout(rq->q)))
>>   			blk_mq_complete_request(rq);
>> +		blk_queue_exit(q);
> 
> You can simply call percpu_ref_put() directly just like what scsi_end_request()
> is doing.
> 

Thanks for the adivce, will do that in next iteration. (hopefully the
last)

Best regards,
Kuai

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply()
  2021-09-15  8:20   ` Christoph Hellwig
@ 2021-09-15  8:44     ` yukuai (C)
  0 siblings, 0 replies; 11+ messages in thread
From: yukuai (C) @ 2021-09-15  8:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, josef, ming.lei, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/15 16:20, Christoph Hellwig wrote:
> On Wed, Sep 15, 2021 at 04:15:37PM +0800, Yu Kuai wrote:
>> +++ b/block/blk-core.c
>> @@ -489,6 +489,7 @@ void blk_queue_exit(struct request_queue *q)
>>   {
>>   	percpu_ref_put(&q->q_usage_counter);
>>   }
>> +EXPORT_SYMBOL(blk_queue_exit);
> 
> These needs to be an EXPORT_SYMBOL_GPL.  But more importantly it
> needs to be a separate properly documented patch, and this function
> needs to grow a kerneldoc comment as well.
> 
>> +		/*
>> +		 * Get q_usage_counter can prevent accessing freed request
>> +		 * through blk_mq_tag_to_rq() in nbd_handle_reply(). If
>> +		 * q_usage_counter is zero, then no request is inflight, which
>> +		 * means something is wrong since we expect to find a request to
>> +		 * complete here.
>> +		 */
>> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
>> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
>> +				__func__);
>> +			break;
>> +		}
> 
> And this needs a properly documented wrapper as well.
> 
>> +
>>   		cmd = nbd_handle_reply(nbd, args->index, &reply);
>> -		if (IS_ERR(cmd))
>> +		if (IS_ERR(cmd)) {
>> +			blk_queue_exit(q);
>>   			break;
>> +		}
>>   
>>   		rq = blk_mq_rq_from_pdu(cmd);
>>   		if (likely(!blk_should_fake_timeout(rq->q)))
>>   			blk_mq_complete_request(rq);
>> +		blk_queue_exit(q);
> 
> That being said I can't say I like how this exposed block layer
> internals.  We don't really need a reference to the queue here
> anywhere, you just use it as a dumb debug check.  If we really want to
> reuse (abuse?) q_usage_counter a helper to just grab a reference and
> immediately drop it might be a better fit.
> .

Hi,

The uaf is because blk_mq_sched_free_requests() can concurrent with
nbd_read_stat(), and hold the ref during nbd_read_stat() can break the
concurrency, thus the ref can't be dropped immediately.

I'll use percpu_ref_put() directly as Ming suggestted.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-15  8:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15  8:15 [PATCH v6 0/6] handle unexpected message from server Yu Kuai
2021-09-15  8:15 ` [PATCH v6 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
2021-09-15  8:15 ` [PATCH v6 2/6] nbd: make sure request completion won't concurrent Yu Kuai
2021-09-15  8:15 ` [PATCH v6 3/6] nbd: check sock index in nbd_read_stat() Yu Kuai
2021-09-15  8:15 ` [PATCH v6 4/6] nbd: don't start request if nbd_queue_rq() failed Yu Kuai
2021-09-15  8:15 ` [PATCH v6 5/6] nbd: partition nbd_read_stat() into nbd_read_reply() and nbd_handle_reply() Yu Kuai
2021-09-15  8:15 ` [PATCH v6 6/6] nbd: fix uaf in nbd_handle_reply() Yu Kuai
2021-09-15  8:20   ` Christoph Hellwig
2021-09-15  8:44     ` yukuai (C)
2021-09-15  8:28   ` Ming Lei
2021-09-15  8:41     ` yukuai (C)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).