From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EF82C4363D for ; Wed, 21 Oct 2020 03:14:56 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6EC2222249 for ; Wed, 21 Oct 2020 03:14:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="jI1ZAUyF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6EC2222249 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=5w0P6tE+Cwe+nXj1Z2mAlTH0xMnU5QiQyMg4B+9DoPk=; b=jI1ZAUyFMb/a67FSz9AicD9AO PLMG2Ms8m7NqwbScooZ8J0yBz5zymldXACSdmuUeUAgHklhU+d3mY4RLNu+JOuAX+744fmNCzbLFn hoed5XSdVTIczTFAd/1oFOUV3K3VatE22FNeDwwZMDy5eDxAGQGQ2gKQeuGxE5uRJYtABE9IQV6L6 j+OEhOimEQO+WdbZzgx2w+K7XhOHPy+PRJpxAUeI6GcdSu1OH/Y6A8pd2b0FBZK2I4Z3icvZjZmh4 dJQEiXVqIYcYgt8krLwGoHyMZX2QHMFgl+9x2u3wG/z8d5/811yOuljXa+JPzuIm2GLdzojRQe9zf vR4Lcw3+Q==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kV4aM-0007qm-LF; Wed, 21 Oct 2020 03:14:46 +0000 Received: from szxga03-in.huawei.com ([45.249.212.189] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kV4aJ-0007q0-Dw for linux-nvme@lists.infradead.org; Wed, 21 Oct 2020 03:14:44 +0000 Received: from DGGEMM403-HUB.china.huawei.com (unknown [172.30.72.56]) by Forcepoint Email with ESMTP id 2C8EE4DAADA62DD7F785; Wed, 21 Oct 2020 11:14:39 +0800 (CST) Received: from dggema772-chm.china.huawei.com (10.1.198.214) by DGGEMM403-HUB.china.huawei.com (10.3.20.211) with Microsoft SMTP Server (TLS) id 14.3.487.0; Wed, 21 Oct 2020 11:14:38 +0800 Received: from [10.169.42.93] (10.169.42.93) by dggema772-chm.china.huawei.com (10.1.198.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Wed, 21 Oct 2020 11:14:38 +0800 Subject: Re: [PATCH V2 3/4] nvme: tcp: complete non-IO requests atomically To: Ming Lei References: <20201020085301.1553959-1-ming.lei@redhat.com> <20201020085301.1553959-4-ming.lei@redhat.com> <20201021012241.GC1571548@T590> <7a3cdbdb-8e57-484c-fcb3-e8e72dfe8d13@huawei.com> <20201021025534.GD1571548@T590> From: Chao Leng Message-ID: Date: Wed, 21 Oct 2020 11:14:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <20201021025534.GD1571548@T590> Content-Language: en-US X-Originating-IP: [10.169.42.93] X-ClientProxiedBy: dggeme709-chm.china.huawei.com (10.1.199.105) To dggema772-chm.china.huawei.com (10.1.198.214) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201020_231443_725572_50A2822A X-CRM114-Status: GOOD ( 26.75 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jens Axboe , Yi Zhang , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Keith Busch , Christoph Hellwig Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On 2020/10/21 10:55, Ming Lei wrote: > On Wed, Oct 21, 2020 at 10:20:11AM +0800, Chao Leng wrote: >> >> >> On 2020/10/21 9:22, Ming Lei wrote: >>> On Tue, Oct 20, 2020 at 05:04:29PM +0800, Chao Leng wrote: >>>> >>>> >>>> On 2020/10/20 16:53, Ming Lei wrote: >>>>> During controller's CONNECTING state, admin/fabric/connect requests >>>>> are submitted for recovery controller, and we allow to abort this request >>>>> directly in time out handler for not blocking setup procedure. >>>>> >>>>> So timout vs. normal completion race exists on these requests since >>>>> admin/fabirc/connect queues won't be shutdown before handling timeout >>>>> during CONNECTING state. >>>>> >>>>> Add atomic completion for requests from connect/fabric/admin queue for >>>>> avoiding the race. >>>>> >>>>> CC: Chao Leng >>>>> Cc: Sagi Grimberg >>>>> Reported-by: Yi Zhang >>>>> Tested-by: Yi Zhang >>>>> Signed-off-by: Ming Lei >>>>> --- >>>>> drivers/nvme/host/tcp.c | 40 +++++++++++++++++++++++++++++++++++++--- >>>>> 1 file changed, 37 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c >>>>> index d6a3e1487354..7e85bd4a8d1b 100644 >>>>> --- a/drivers/nvme/host/tcp.c >>>>> +++ b/drivers/nvme/host/tcp.c >>>>> @@ -30,6 +30,8 @@ static int so_priority; >>>>> module_param(so_priority, int, 0644); >>>>> MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority"); >>>>> +#define REQ_STATE_COMPLETE 0 >>>>> + >>>>> enum nvme_tcp_send_state { >>>>> NVME_TCP_SEND_CMD_PDU = 0, >>>>> NVME_TCP_SEND_H2C_PDU, >>>>> @@ -56,6 +58,8 @@ struct nvme_tcp_request { >>>>> size_t offset; >>>>> size_t data_sent; >>>>> enum nvme_tcp_send_state state; >>>>> + >>>>> + unsigned long comp_state; >>>> I do not think adding state is a good idea. >>>> It is similar to rq->state. >>>> In the teardown process, after quiesced queues delete the timer and >>>> cancel the timeout work maybe a better option. >>>> I will send the patch later. >>>> The patch is already tested with roce more than one week. >>> >>> Actually there isn't race between timeout and teardown, and patch 1 and patch >>> 2 are enough to fix the issue reported by Yi. >>> >>> It is just that rq->state is updated to IDLE in its. complete(), so >>> either one of code paths may think that this rq isn't completed, and >>> patch 2 has addressed this issue. >>> >>> In short, teardown lock is enough to cover the race. >> The race may cause abnormals: >> 1. Reported by Yi Zhang >> detail: https://lore.kernel.org/linux-nvme/1934331639.3314730.1602152202454.JavaMail.zimbra@redhat.com/ >> 2. BUG_ON in blk_mq_requeue_request >> Because error recovery and time out may repeated completion request. >> First error recovery cancel request in tear down process, the request >> will be retried in completion, rq->state will be changed to IDEL. > > Right. > >> And then time out will complete the request again, and samely retry >> the request, BUG_ON will happen in blk_mq_requeue_request. > > With patch2 in this patchset, timeout handler won't complete the request any > more. > >> 3. abnormal link disconnection >> Firt error recovery cancel all request, reconnect success, the request >> will be restarted. And then time out will complete the request again, >> the queue will be stoped in nvme_rdma(tcp)_complete_timed_out, >> Abnormal link diconnection will happen. The condition(time out process >> is delayed long time by some reason such as hardware interrupt) is need. >> So the probability is low. > > OK, the timeout handler may just get chance to run after recovery is > done, and it can be fixed by calling nvme_sync_queues() after > updating to CONNECTING or before updating to LIVE together with patch 1 & 2. > >> teardown_lock just serialize the race. and add checkint the rq->state can avoid >> the 1 and 2 scenario, but 3 scenario can not be fixed. > > I didn't think of scenario 3, which seems not triggered in Yi's test. The scenario 3 is unlikely triggered in normal test. The trigger condition are harsh. It'll only happen in some extreme situations. If without scenario 3, Sagi's patch can work well. > > > thanks, > Ming > > . > _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme