From d9ccd0baf15779d3dcdd488577eda8077bb7cc21 Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Wed, 21 Oct 2020 01:45:34 -0700 Subject: [PATCH 1/2] nvme-tcp: fix possible double completion for timed out requests If error recovery ran before the timeout handler did and fully completed the timed out request, it will appear as not started (rq state is MQ_RQ_IDLE), however the timeout handler only verifies that the request was not completed (rq state is MQ_RQ_COMPLETE). Check and make sure that the request is both started and did not yet complete. Example trace: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 6 PID: 45 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb71b80837dc8 EFLAGS: 00010292 RAX: 0000000000000026 RBX: 0000000000000000 RCX: ffff93f37dcd8d08 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93f37dcd8d00 RBP: ffff93f37a812400 R08: 00000203c5221fce R09: ffffffffa7fffbc4 R10: 0000000000000477 R11: 000000000002835c R12: ffff93f37a8124e8 R13: ffff93f37a2b0000 R14: ffffb71b80837e70 R15: ffff93f37a2b0000 FS: 0000000000000000(0000) GS:ffff93f37dcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005637b4137028 CR3: 000000007c1be000 CR4: 00000000000006e0 Call Trace: blk_mq_check_expired+0x109/0x1b0 blk_mq_queue_tag_busy_iter+0x1b8/0x330 ? blk_poll+0x300/0x300 blk_mq_timeout_work+0x44/0xe0 process_one_work+0x1b4/0x370 worker_thread+0x53/0x3e0 ? process_one_work+0x370/0x370 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 ---[ end trace 7d137e36e23c0d19 ]--- Reported-by: Yi Zhang Signed-off-by: Sagi Grimberg --- drivers/nvme/host/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 629b025685d1..46428ff0b0fc 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) /* fence other contexts that may complete the command */ mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); - if (!blk_mq_request_completed(rq)) { + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request_sync(rq); } -- 2.25.1