From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41E09C11F64 for ; Thu, 1 Jul 2021 09:14:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A3B461585 for ; Thu, 1 Jul 2021 09:14:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235681AbhGAJQp (ORCPT ); Thu, 1 Jul 2021 05:16:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:30202 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235088AbhGAJQp (ORCPT ); Thu, 1 Jul 2021 05:16:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625130854; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NWsQfkJsHozHt689Ycc7QAhEWD6ZJjZVr7BQ4llvQfI=; b=aTrvpZFLwmec0QAiO4NLmcw+rXLRgWyjtcyEm6WoYgf/YbnpEqvvPq1WIq9Sd0FV4AEWCk HWYfhVqdN/McfbSIDFXjj7bMKUiVp0xn1d7Q35Bh0sY5fSIg3njQfQXHpboD72jfkwTFh9 OW6E1Ft1/8f5l1uXbxjicnCQpkLFH0M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-7-vol2z-XON_SF3zKUgPjlww-1; Thu, 01 Jul 2021 05:14:08 -0400 X-MC-Unique: vol2z-XON_SF3zKUgPjlww-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E3255100C66B; Thu, 1 Jul 2021 09:14:05 +0000 (UTC) Received: from T590 (ovpn-13-92.pek2.redhat.com [10.72.13.92]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9661160C17; Thu, 1 Jul 2021 09:13:53 +0000 (UTC) Date: Thu, 1 Jul 2021 17:13:49 +0800 From: Ming Lei To: Hannes Reinecke Cc: Sagi Grimberg , Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Daniel Wagner , Wen Xiong , John Garry Subject: Re: [PATCH 0/2] blk-mq: fix blk_mq_alloc_request_hctx Message-ID: References: <20210629074951.1981284-1-ming.lei@redhat.com> <5f304121-38ce-034b-2d17-93d136c77fe6@suse.de> <89081624-fedd-aa94-1ba2-9a137708a1f1@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Thu, Jul 01, 2021 at 10:00:27AM +0200, Hannes Reinecke wrote: > On 7/1/21 1:59 AM, Ming Lei wrote: > > On Wed, Jun 30, 2021 at 09:46:35PM +0200, Hannes Reinecke wrote: > > > On 6/30/21 8:59 PM, Sagi Grimberg wrote: > > > > > > > > > > > > Shouldn't we rather modify the tagset to only refer to > > > > > > > > the current online > > > > > > > > CPUs _only_, thereby never submit a connect request for hctx with only > > > > > > > > offline CPUs? > > > > > > > > > > > > > > Then you may setup very less io queues, and performance may suffer even > > > > > > > though lots of CPUs become online later. > > > > > > > ; > > > > > > Only if we stay with the reduced number of I/O queues. Which is > > > > > > not what I'm > > > > > > proposing; I'd rather prefer to connect and disconnect queues > > > > > > from the cpu > > > > > > hotplug handler. For starters we could even trigger a reset once > > > > > > the first > > > > > > cpu within a hctx is onlined. > > > > > > > > > > Yeah, that need one big/complicated patchset, but not see any advantages > > > > > over this simple approach. > > > > > > > > I tend to agree with Ming here. > > > > > > Actually, Daniel and me came to a slightly different idea: use cpu hotplug > > > notifier. > > > Thing is, blk-mq already has cpu hotplug notifier, which should ensure that > > > no I/O is pending during cpu hotplug. > > > > Why should we ensure that for non-managed irq? > > > > While not strictly necessary, it does align the hctx layout with the current > CPU topology. > As such we won't have any 'stale' CPUs or queues in the hctx layout, and > with that avoid any issues we'll be having due to inactive CPUs in the > cpumask. We know the exact theory for non-managed interrupt, can you share us any issue you want to avoid? > > > > If we now add a nvme cpu hotplug notifier which essentially kicks off a > > > reset once all cpu in a hctx are offline the reset logic will rearrange the > > > queues to match the current cpu layout. > > > And when the cpus are getting onlined we'll do another reset. > > > > > > Daniel is currently preparing a patch; let's see how it goes. > > > > What is the advantage of that big change over this simple way? > > > > With the simple way we might (and, as the results show, do) run the > nvme_ctrl_reset() in parallel to CPU hotplug. > This leads to quite some complexity, and as we've seen is triggering quite > some race conditions. CPU hotplug isn't related with nvme reset at all, which is just for recovering nvme controller to make it working again. How can cpu offline affect nvme controller? For NVMe PCI, blk-mq have drained any inflight requests before the hctx is becoming inactive. For other NVMe controller which doesn't use managed irq, the inflight requests can still be completed via the original interrupt, or polling task, both can been migrated to new online cpu. I don't know what exact race conditions you are talking about. Please explain it in details, otherwise it just wastes our time. > > Hence I do think we need to synchronize nvme_ctrl_reset() with CPU hotplug, > to ensure that the reset handler is completed before the cpu is completely > torn down. No, you didn't provide any proof about the necessity of adding the sync. Thanks, Ming From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31B6EC11F69 for ; Thu, 1 Jul 2021 09:14:41 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E14DE613BE for ; Thu, 1 Jul 2021 09:14:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E14DE613BE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=58tjbmvV0BD9NddBTdVuB2JSdnitysF7gn4H+1cjLVo=; b=rKSIhUZlHpvms0 O08nzgDJgkA2yJv1DB6yZpcR9YNWXY6edV3tFdDRP6i2ZJJb4SfjH+QBtc35pFJEdRr/FOfLxZ6LO QUGD4jnh2VKMwOnXS7nf24+zzjqFcSGXf8G9DPTKqGl17eppagDBZGq0a/gchdBsP2fGUztEl7lJI rAEWhVEn7+qIjdqOHM0yJ9Ieq/9qRnR65NSn8uoS/gXXsGhRLbUBCLcXDvzXUhmBvYAyTQ4aqesWZ FHAYjAflbcQvvGJfmsPGTQgDlGQJDWStUEwfIlFJB/YgAifvhdyd07RTZaB05l+kbLDVWQdPoned1 0eULDMuafbVZp3KdkpiQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lysmD-00GgTG-UF; Thu, 01 Jul 2021 09:14:29 +0000 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lysly-00GgRp-Ia for linux-nvme@lists.infradead.org; Thu, 01 Jul 2021 09:14:16 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625130851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NWsQfkJsHozHt689Ycc7QAhEWD6ZJjZVr7BQ4llvQfI=; b=LTQdWrTfCWzVY1vta/S3+L+BXXoaOb40HawuDLW95LveExfjkdLopf+MVd91toRw9Y+RtB E6lmHrZd/5119dOQocuC05iXOs3jxxka7JrEV4qwb7DhCxh5gEz9L3EjIIvuzqaInWN9kZ 04F8X5Ge59H6XuYittmKa0ae1PMX3rY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-7-vol2z-XON_SF3zKUgPjlww-1; Thu, 01 Jul 2021 05:14:08 -0400 X-MC-Unique: vol2z-XON_SF3zKUgPjlww-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E3255100C66B; Thu, 1 Jul 2021 09:14:05 +0000 (UTC) Received: from T590 (ovpn-13-92.pek2.redhat.com [10.72.13.92]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9661160C17; Thu, 1 Jul 2021 09:13:53 +0000 (UTC) Date: Thu, 1 Jul 2021 17:13:49 +0800 From: Ming Lei To: Hannes Reinecke Cc: Sagi Grimberg , Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Daniel Wagner , Wen Xiong , John Garry Subject: Re: [PATCH 0/2] blk-mq: fix blk_mq_alloc_request_hctx Message-ID: References: <20210629074951.1981284-1-ming.lei@redhat.com> <5f304121-38ce-034b-2d17-93d136c77fe6@suse.de> <89081624-fedd-aa94-1ba2-9a137708a1f1@suse.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210701_021414_958681_100D8CA9 X-CRM114-Status: GOOD ( 38.71 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On Thu, Jul 01, 2021 at 10:00:27AM +0200, Hannes Reinecke wrote: > On 7/1/21 1:59 AM, Ming Lei wrote: > > On Wed, Jun 30, 2021 at 09:46:35PM +0200, Hannes Reinecke wrote: > > > On 6/30/21 8:59 PM, Sagi Grimberg wrote: > > > > > > > > > > > > Shouldn't we rather modify the tagset to only refer to > > > > > > > > the current online > > > > > > > > CPUs _only_, thereby never submit a connect request for hctx with only > > > > > > > > offline CPUs? > > > > > > > > > > > > > > Then you may setup very less io queues, and performance may suffer even > > > > > > > though lots of CPUs become online later. > > > > > > > ; > > > > > > Only if we stay with the reduced number of I/O queues. Which is > > > > > > not what I'm > > > > > > proposing; I'd rather prefer to connect and disconnect queues > > > > > > from the cpu > > > > > > hotplug handler. For starters we could even trigger a reset once > > > > > > the first > > > > > > cpu within a hctx is onlined. > > > > > > > > > > Yeah, that need one big/complicated patchset, but not see any advantages > > > > > over this simple approach. > > > > > > > > I tend to agree with Ming here. > > > > > > Actually, Daniel and me came to a slightly different idea: use cpu hotplug > > > notifier. > > > Thing is, blk-mq already has cpu hotplug notifier, which should ensure that > > > no I/O is pending during cpu hotplug. > > > > Why should we ensure that for non-managed irq? > > > > While not strictly necessary, it does align the hctx layout with the current > CPU topology. > As such we won't have any 'stale' CPUs or queues in the hctx layout, and > with that avoid any issues we'll be having due to inactive CPUs in the > cpumask. We know the exact theory for non-managed interrupt, can you share us any issue you want to avoid? > > > > If we now add a nvme cpu hotplug notifier which essentially kicks off a > > > reset once all cpu in a hctx are offline the reset logic will rearrange the > > > queues to match the current cpu layout. > > > And when the cpus are getting onlined we'll do another reset. > > > > > > Daniel is currently preparing a patch; let's see how it goes. > > > > What is the advantage of that big change over this simple way? > > > > With the simple way we might (and, as the results show, do) run the > nvme_ctrl_reset() in parallel to CPU hotplug. > This leads to quite some complexity, and as we've seen is triggering quite > some race conditions. CPU hotplug isn't related with nvme reset at all, which is just for recovering nvme controller to make it working again. How can cpu offline affect nvme controller? For NVMe PCI, blk-mq have drained any inflight requests before the hctx is becoming inactive. For other NVMe controller which doesn't use managed irq, the inflight requests can still be completed via the original interrupt, or polling task, both can been migrated to new online cpu. I don't know what exact race conditions you are talking about. Please explain it in details, otherwise it just wastes our time. > > Hence I do think we need to synchronize nvme_ctrl_reset() with CPU hotplug, > to ensure that the reset handler is completed before the cpu is completely > torn down. No, you didn't provide any proof about the necessity of adding the sync. Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme