From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62061C43600 for ; Thu, 29 Apr 2021 19:18:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E63461476 for ; Thu, 29 Apr 2021 19:18:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233585AbhD2TRx (ORCPT ); Thu, 29 Apr 2021 15:17:53 -0400 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:48192 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S239534AbhD2TLZ (ORCPT ); Thu, 29 Apr 2021 15:11:25 -0400 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13TJ6cp3019681; Thu, 29 Apr 2021 12:10:29 -0700 Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com with ESMTP id 387erumw2d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 12:10:28 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 29 Apr 2021 12:10:27 -0700 Received: from lbtlvb-pcie154.il.qlogic.org (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 29 Apr 2021 12:10:24 -0700 From: Shai Malin To: , , , , , CC: =?UTF-8?q?David=20S=20=2E=20Miller=20davem=20=40=20davemloft=20=2E=20net=20=C2=A0--cc=3DJakub=20Kicinski?= , , , , , , , Arie Gershberg Subject: [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation Date: Thu, 29 Apr 2021 22:09:11 +0300 Message-ID: <20210429190926.5086-13-smalin@marvell.com> X-Mailer: git-send-email 2.16.6 In-Reply-To: <20210429190926.5086-1-smalin@marvell.com> References: <20210429190926.5086-1-smalin@marvell.com> MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-GUID: C2Lc2DjUg1Ll8oceg4F1qEN4rHUWrsmj X-Proofpoint-ORIG-GUID: C2Lc2DjUg1Ll8oceg4F1qEN4rHUWrsmj X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-29_10:2021-04-28,2021-04-29 signatures=0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Arie Gershberg In this patch, we implement controller level error handling and recovery. Upon an error discovered by the ULP or reset controller initiated by the nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller recovery which includes teardown and re-connect of all queues. Acked-by: Igor Russkikh Signed-off-by: Arie Gershberg Signed-off-by: Prabhakar Kushwaha Signed-off-by: Omkar Kulkarni Signed-off-by: Michal Kalderon Signed-off-by: Ariel Elior Signed-off-by: Shai Malin --- drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++- drivers/nvme/host/tcp-offload.h | 1 + 2 files changed, 137 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c index 59e1955e02ec..9082b11c133f 100644 --- a/drivers/nvme/host/tcp-offload.c +++ b/drivers/nvme/host/tcp-offload.c @@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev) } EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev); +/** + * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery. + * function. + * @nctrl: NVMe controller instance to change to resetting. + * + * API function that change the controller state to resseting. + * Part of the overall controller reset sequence. + */ +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl) +{ + if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING)) + return; + + queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work); +} +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery); + /** * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event * callback function. Pointed to by nvme_tcp_ofld_queue->report_err. @@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev); */ int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue) { - /* Placeholder - invoke error recovery flow */ + pr_err("nvme-tcp-offload queue error\n"); + nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl); return 0; } @@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new) return rc; } +static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl) +{ + /* If we are resetting/deleting then do nothing */ + if (nctrl->state != NVME_CTRL_CONNECTING) { + WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW || + nctrl->state == NVME_CTRL_LIVE); + + return; + } + + if (nvmf_should_reconnect(nctrl)) { + dev_info(nctrl->device, "Reconnecting in %d seconds...\n", + nctrl->opts->reconnect_delay); + queue_delayed_work(nvme_wq, + &to_tcp_ofld_ctrl(nctrl)->connect_work, + nctrl->opts->reconnect_delay * HZ); + } else { + dev_info(nctrl->device, "Removing controller...\n"); + nvme_delete_ctrl(nctrl); + } +} + static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new) { struct nvmf_ctrl_options *opts = nctrl->opts; @@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove) /* Placeholder - teardown_io_queues */ } +static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work) +{ + struct nvme_tcp_ofld_ctrl *ctrl = + container_of(to_delayed_work(work), + struct nvme_tcp_ofld_ctrl, + connect_work); + struct nvme_ctrl *nctrl = &ctrl->nctrl; + + ++nctrl->nr_reconnects; + + if (ctrl->dev->ops->setup_ctrl(ctrl, false)) + goto requeue; + + if (nvme_tcp_ofld_setup_ctrl(nctrl, false)) + goto release_and_requeue; + + dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n", + nctrl->nr_reconnects); + + nctrl->nr_reconnects = 0; + + return; + +release_and_requeue: + ctrl->dev->ops->release_ctrl(ctrl); +requeue: + dev_info(nctrl->device, "Failed reconnect attempt %d\n", + nctrl->nr_reconnects); + nvme_tcp_ofld_reconnect_or_remove(nctrl); +} + +static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work) +{ + struct nvme_tcp_ofld_ctrl *ctrl = + container_of(work, struct nvme_tcp_ofld_ctrl, err_work); + struct nvme_ctrl *nctrl = &ctrl->nctrl; + + nvme_stop_keep_alive(nctrl); + nvme_tcp_ofld_teardown_io_queues(nctrl, false); + /* unquiesce to fail fast pending requests */ + nvme_start_queues(nctrl); + nvme_tcp_ofld_teardown_admin_queue(nctrl, false); + blk_mq_unquiesce_queue(nctrl->admin_q); + + if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) { + /* state change failure is ok if we started nctrl delete */ + WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING && + nctrl->state != NVME_CTRL_DELETING_NOIO); + + return; + } + + nvme_tcp_ofld_reconnect_or_remove(nctrl); +} + static void nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown) { - /* Placeholder - err_work and connect_work */ + struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl); + + cancel_work_sync(&ctrl->err_work); + cancel_delayed_work_sync(&ctrl->connect_work); nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown); blk_mq_quiesce_queue(nctrl->admin_q); if (shutdown) @@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl) nvme_tcp_ofld_teardown_ctrl(nctrl, true); } +static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work) +{ + struct nvme_ctrl *nctrl = + container_of(work, struct nvme_ctrl, reset_work); + struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl); + + nvme_stop_ctrl(nctrl); + nvme_tcp_ofld_teardown_ctrl(nctrl, false); + + if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) { + /* state change failure is ok if we started ctrl delete */ + WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING && + nctrl->state != NVME_CTRL_DELETING_NOIO); + + return; + } + + if (ctrl->dev->ops->setup_ctrl(ctrl, false)) + goto out_fail; + + if (nvme_tcp_ofld_setup_ctrl(nctrl, false)) + goto release_ctrl; + + return; + +release_ctrl: + ctrl->dev->ops->release_ctrl(ctrl); +out_fail: + ++nctrl->nr_reconnects; + nvme_tcp_ofld_reconnect_or_remove(nctrl); +} + static int nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set, struct request *rq, @@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts) opts->nr_poll_queues + 1; nctrl->sqsize = opts->queue_size - 1; nctrl->kato = opts->kato; + INIT_DELAYED_WORK(&ctrl->connect_work, + nvme_tcp_ofld_reconnect_ctrl_work); + INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work); + INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work); if (!(opts->mask & NVMF_OPT_TRSVCID)) { opts->trsvcid = kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL); diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h index 9fd270240eaa..b23b1d7ea6fa 100644 --- a/drivers/nvme/host/tcp-offload.h +++ b/drivers/nvme/host/tcp-offload.h @@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops { /* Exported functions for lower vendor specific offload drivers */ int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev); void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev); +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl); -- 2.22.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E2D2C433ED for ; Thu, 29 Apr 2021 19:19:31 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E249861434 for ; Thu, 29 Apr 2021 19:19:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E249861434 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=marvell.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=4boXLNMaPHBrkKuWPlpBECTKsKDOp6+VfmufIip5x1g=; b=gqt2dr+I5Nd24DaqE6emguRr8 syYvbphWzRJtt/sVIGh+ry/93cy8ZDhbdljoS2DfKPNUXlRnp13M2XQcEME6CbJDn2JoWgMxueCLP KoeW2Egz9TmJYCGcV/bqEARpbV51GSGCQHrztDvztmvWPs3OajX0mMqoQHbYZ9O5R823g17ZQzdsq QbYBqg/85y0d5kqua/lSD3a5kWi0TFL+tacYh7MywnRu/jzVun+vgbSUQ8LbqXdnlaFmf4I9jgjkI ie4Xidg1R+iacicw8JdtsveBNO8pjxIAJbIqW5Z3Dc+25+37KXvZZ+2ojkWKEBIZuQHC3xAZcEFSx ttvxzr1LA==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lcCBw-006H2T-Jf; Thu, 29 Apr 2021 19:19:16 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lcC3X-006FQt-HS for linux-nvme@desiato.infradead.org; Thu, 29 Apr 2021 19:10:37 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:CC:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=xkUWr2MifaEzZP96kIKX6KpgeAcdhziBHn7j3fCQhCk=; b=EDPI9IlvMWPqYP2VAmkvJW7n2k OsElkSbylaGnCBVPUX/57rYQAuIstqoMWlCPix8oj0azywtRJpFc9DdNBsthd2velTAaxok3baJtB AUD+BmNsAGqsKMmKnuSu8dZ+e9WWR49/cY0kX0vsusM4Irhge8JkLFFh/IwyjvvOe55OlHP51rEwr WBpFM7h3jr2JwdIMX9XNv7N41tUJEL8fIVZFf+7baJoqGS7YgmeIh2l4BJ1ub//e6NW+auZwloSqQ Rk0t9Wqj93X4Fqcj/wTiS/X3H2Xjr/9XdoAaMrIu08tBGsyN+0zXdHc+pC9UmAvF9yjx5s8EzdCbI 4FUUr5fg==; Received: from mx0a-0016f401.pphosted.com ([67.231.148.174] helo=mx0b-0016f401.pphosted.com) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lcC3U-000sdz-NP for linux-nvme@lists.infradead.org; Thu, 29 Apr 2021 19:10:34 +0000 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13TJ6cp3019681; Thu, 29 Apr 2021 12:10:29 -0700 Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com with ESMTP id 387erumw2d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 12:10:28 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 29 Apr 2021 12:10:27 -0700 Received: from lbtlvb-pcie154.il.qlogic.org (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 29 Apr 2021 12:10:24 -0700 From: Shai Malin To: , , , , , CC: =?UTF-8?q?David=20S=20=2E=20Miller=20davem=20=40=20davemloft=20=2E=20net=20=C2=A0--cc=3DJakub=20Kicinski?= , , , , , , , Arie Gershberg Subject: [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation Date: Thu, 29 Apr 2021 22:09:11 +0300 Message-ID: <20210429190926.5086-13-smalin@marvell.com> X-Mailer: git-send-email 2.16.6 In-Reply-To: <20210429190926.5086-1-smalin@marvell.com> References: <20210429190926.5086-1-smalin@marvell.com> MIME-Version: 1.0 X-Proofpoint-GUID: C2Lc2DjUg1Ll8oceg4F1qEN4rHUWrsmj X-Proofpoint-ORIG-GUID: C2Lc2DjUg1Ll8oceg4F1qEN4rHUWrsmj X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-04-29_10:2021-04-28, 2021-04-29 signatures=0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210429_121032_796166_DDF3D8B5 X-CRM114-Status: GOOD ( 17.68 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Arie Gershberg In this patch, we implement controller level error handling and recovery. Upon an error discovered by the ULP or reset controller initiated by the nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller recovery which includes teardown and re-connect of all queues. Acked-by: Igor Russkikh Signed-off-by: Arie Gershberg Signed-off-by: Prabhakar Kushwaha Signed-off-by: Omkar Kulkarni Signed-off-by: Michal Kalderon Signed-off-by: Ariel Elior Signed-off-by: Shai Malin --- drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++- drivers/nvme/host/tcp-offload.h | 1 + 2 files changed, 137 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c index 59e1955e02ec..9082b11c133f 100644 --- a/drivers/nvme/host/tcp-offload.c +++ b/drivers/nvme/host/tcp-offload.c @@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev) } EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev); +/** + * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery. + * function. + * @nctrl: NVMe controller instance to change to resetting. + * + * API function that change the controller state to resseting. + * Part of the overall controller reset sequence. + */ +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl) +{ + if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING)) + return; + + queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work); +} +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery); + /** * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event * callback function. Pointed to by nvme_tcp_ofld_queue->report_err. @@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev); */ int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue) { - /* Placeholder - invoke error recovery flow */ + pr_err("nvme-tcp-offload queue error\n"); + nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl); return 0; } @@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new) return rc; } +static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl) +{ + /* If we are resetting/deleting then do nothing */ + if (nctrl->state != NVME_CTRL_CONNECTING) { + WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW || + nctrl->state == NVME_CTRL_LIVE); + + return; + } + + if (nvmf_should_reconnect(nctrl)) { + dev_info(nctrl->device, "Reconnecting in %d seconds...\n", + nctrl->opts->reconnect_delay); + queue_delayed_work(nvme_wq, + &to_tcp_ofld_ctrl(nctrl)->connect_work, + nctrl->opts->reconnect_delay * HZ); + } else { + dev_info(nctrl->device, "Removing controller...\n"); + nvme_delete_ctrl(nctrl); + } +} + static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new) { struct nvmf_ctrl_options *opts = nctrl->opts; @@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove) /* Placeholder - teardown_io_queues */ } +static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work) +{ + struct nvme_tcp_ofld_ctrl *ctrl = + container_of(to_delayed_work(work), + struct nvme_tcp_ofld_ctrl, + connect_work); + struct nvme_ctrl *nctrl = &ctrl->nctrl; + + ++nctrl->nr_reconnects; + + if (ctrl->dev->ops->setup_ctrl(ctrl, false)) + goto requeue; + + if (nvme_tcp_ofld_setup_ctrl(nctrl, false)) + goto release_and_requeue; + + dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n", + nctrl->nr_reconnects); + + nctrl->nr_reconnects = 0; + + return; + +release_and_requeue: + ctrl->dev->ops->release_ctrl(ctrl); +requeue: + dev_info(nctrl->device, "Failed reconnect attempt %d\n", + nctrl->nr_reconnects); + nvme_tcp_ofld_reconnect_or_remove(nctrl); +} + +static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work) +{ + struct nvme_tcp_ofld_ctrl *ctrl = + container_of(work, struct nvme_tcp_ofld_ctrl, err_work); + struct nvme_ctrl *nctrl = &ctrl->nctrl; + + nvme_stop_keep_alive(nctrl); + nvme_tcp_ofld_teardown_io_queues(nctrl, false); + /* unquiesce to fail fast pending requests */ + nvme_start_queues(nctrl); + nvme_tcp_ofld_teardown_admin_queue(nctrl, false); + blk_mq_unquiesce_queue(nctrl->admin_q); + + if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) { + /* state change failure is ok if we started nctrl delete */ + WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING && + nctrl->state != NVME_CTRL_DELETING_NOIO); + + return; + } + + nvme_tcp_ofld_reconnect_or_remove(nctrl); +} + static void nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown) { - /* Placeholder - err_work and connect_work */ + struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl); + + cancel_work_sync(&ctrl->err_work); + cancel_delayed_work_sync(&ctrl->connect_work); nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown); blk_mq_quiesce_queue(nctrl->admin_q); if (shutdown) @@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl) nvme_tcp_ofld_teardown_ctrl(nctrl, true); } +static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work) +{ + struct nvme_ctrl *nctrl = + container_of(work, struct nvme_ctrl, reset_work); + struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl); + + nvme_stop_ctrl(nctrl); + nvme_tcp_ofld_teardown_ctrl(nctrl, false); + + if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) { + /* state change failure is ok if we started ctrl delete */ + WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING && + nctrl->state != NVME_CTRL_DELETING_NOIO); + + return; + } + + if (ctrl->dev->ops->setup_ctrl(ctrl, false)) + goto out_fail; + + if (nvme_tcp_ofld_setup_ctrl(nctrl, false)) + goto release_ctrl; + + return; + +release_ctrl: + ctrl->dev->ops->release_ctrl(ctrl); +out_fail: + ++nctrl->nr_reconnects; + nvme_tcp_ofld_reconnect_or_remove(nctrl); +} + static int nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set, struct request *rq, @@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts) opts->nr_poll_queues + 1; nctrl->sqsize = opts->queue_size - 1; nctrl->kato = opts->kato; + INIT_DELAYED_WORK(&ctrl->connect_work, + nvme_tcp_ofld_reconnect_ctrl_work); + INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work); + INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work); if (!(opts->mask & NVMF_OPT_TRSVCID)) { opts->trsvcid = kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL); diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h index 9fd270240eaa..b23b1d7ea6fa 100644 --- a/drivers/nvme/host/tcp-offload.h +++ b/drivers/nvme/host/tcp-offload.h @@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops { /* Exported functions for lower vendor specific offload drivers */ int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev); void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev); +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl); -- 2.22.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme