From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FF9BC04A6B for ; Fri, 10 May 2019 15:13:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1143B2177B for ; Fri, 10 May 2019 15:13:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727692AbfEJPNk convert rfc822-to-8bit (ORCPT ); Fri, 10 May 2019 11:13:40 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:45864 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727346AbfEJPNk (ORCPT ); Fri, 10 May 2019 11:13:40 -0400 Received: from mail-pg1-f199.google.com ([209.85.215.199]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hP7DN-0006cG-Uz for linux-kernel@vger.kernel.org; Fri, 10 May 2019 15:13:38 +0000 Received: by mail-pg1-f199.google.com with SMTP id 63so4221985pga.18 for ; Fri, 10 May 2019 08:13:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=NmWASUs3KbA0AaXEm2yCaVhhAGbxqPp+fvrTyXt64uk=; b=DBhhnneK8mut4s61QabWWwKrASB5ScfJ4PK1qSPgrf4Pr0HalteYjPla1wHNpuZVqe /eZ5ynGixXv4VY7lAlhspAMwKN5rkKFYtAvYO6GMW5zaPCL5siWJLq2O6/I/mAd68ZhA N+vhCiCNMPR/ri94yT3/44G4E+bGnjugKA49JGEbi0LixGYjgx7HyQuIREqEjRkzhbe/ ZDTMglhLw8pOQkAXVmmj8PBlJhRVfqr24uPn/8hcbXty1fS6coB037gWeqpqIInVkFol 9EVlULPK4aIEfdAeCvEtlIuUWD3Y9sH6hrbetnQ6+xFPx2aclBMKyt9hFOh/W3dvxmim H/Wg== X-Gm-Message-State: APjAAAXlB6qU+RR7JaWRScOfcNy/6MsKvyayKPUMPsw1F5L5/HTfJQhp zU5eMsWQAieoVxrpFPdXpk4NwTnckYbFNJsNGqZDESulO+g35HFudetolFP72zyHUju2oXsIV9+ 243SbjcCO2+6rl+lQsWlbuQzNavF5joAwTifLSUYaSw== X-Received: by 2002:aa7:9242:: with SMTP id 2mr15164965pfp.230.1557501216535; Fri, 10 May 2019 08:13:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqxw5zxsqhG5LJGMPKVvE2BsNwPWG6uw9NXt/AOlH6TmHdYZv9RPBiVsLYlez6zOdIpxiCnFlQ== X-Received: by 2002:aa7:9242:: with SMTP id 2mr15164896pfp.230.1557501216170; Fri, 10 May 2019 08:13:36 -0700 (PDT) Received: from [192.168.1.220] (220-133-187-190.HINET-IP.hinet.net. [220.133.187.190]) by smtp.gmail.com with ESMTPSA id 2sm11743129pgc.49.2019.05.10.08.13.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 May 2019 08:13:33 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: [PATCH] nvme-pci: Use non-operational power state instead of D3 on Suspend-to-Idle From: Kai Heng Feng In-Reply-To: Date: Fri, 10 May 2019 23:15:05 +0800 Cc: Mario Limonciello , Keith Busch , Christoph Hellwig , Jens Axboe , Sagi Grimberg , Linux PM , Rafael Wysocki , Linux Kernel Mailing List , linux-nvme , Keith Busch Content-Transfer-Encoding: 8BIT Message-Id: <54E4999C-DBE8-4FC1-8E82-17FEDFDA9CA6@canonical.com> References: <064701C3-2BD4-4D93-891D-B7FBB5040FC4@canonical.com> <20190509095601.GA19041@lst.de> <225CF4F7-C8E1-4C66-B362-97E84596A54E@canonical.com> <20190509103142.GA19550@lst.de> <31b7d7959bf94c15a04bab0ced518444@AUSX13MPC101.AMER.DELL.COM> <20190509192807.GB9675@localhost.localdomain> <7a002851c435481593f8629ec9193e40@AUSX13MPC101.AMER.DELL.COM> <20190509215409.GD9675@localhost.localdomain> <495d76c66aec41a8bfbbf527820f8eb9@AUSX13MPC101.AMER.DELL.COM> To: "Rafael J. Wysocki" X-Mailer: Apple Mail (2.3445.104.8) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On May 10, 2019, at 4:23 PM, Rafael J. Wysocki wrote: > > On Fri, May 10, 2019 at 8:08 AM Kai-Heng Feng > wrote: >> >> at 06:19, wrote: >> >>>> -----Original Message----- >>>> From: Keith Busch >>>> Sent: Thursday, May 9, 2019 4:54 PM >>>> To: Limonciello, Mario >>>> Cc: kai.heng.feng@canonical.com; hch@lst.de; axboe@fb.com; >>>> sagi@grimberg.me; rafael@kernel.org; linux-pm@vger.kernel.org; >>>> rafael.j.wysocki@intel.com; linux-kernel@vger.kernel.org; linux- >>>> nvme@lists.infradead.org; keith.busch@intel.com >>>> Subject: Re: [PATCH] nvme-pci: Use non-operational power state instead >>>> of D3 on >>>> Suspend-to-Idle >>>> >>>> >>>> [EXTERNAL EMAIL] >>>> >>>> On Thu, May 09, 2019 at 09:37:58PM +0000, Mario.Limonciello@dell.com >>>> wrote: >>>>>> +int nvme_set_power(struct nvme_ctrl *ctrl, unsigned npss) >>>>>> +{ >>>>>> + int ret; >>>>>> + >>>>>> + mutex_lock(&ctrl->scan_lock); >>>>>> + nvme_start_freeze(ctrl); >>>>>> + nvme_wait_freeze(ctrl); >>>>>> + ret = nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, npss, NULL, 0, >>>>>> + NULL); >>>>>> + nvme_unfreeze(ctrl); >>>>>> + mutex_unlock(&ctrl->scan_lock); >>>>>> + >>>>>> + return ret; >>>>>> +} >>>>>> +EXPORT_SYMBOL_GPL(nvme_set_power); >>>>> >>>>> I believe without memory barriers at the end disks with HMB this will >>>>> still kernel panic (Such as Toshiba BG3). >>>> >>>> Well, the mutex has an implied memory barrier, but your HMB explanation >>>> doesn't make much sense to me anyway. The "mb()" in this thread's original >>>> patch is a CPU memory barrier, and the CPU had better not be accessing >>>> HMB memory. Is there something else going on here? >>> >>> Kai Heng will need to speak up a bit in his time zone as he has this disk >>> on hand, >>> but what I recall from our discussion was that DMA operation MemRd after >>> resume was the source of the hang. >> >> Yes, that’ what I was told by the NVMe vendor, so all I know is to impose a >> memory barrier. >> If mb() shouldn’t be used here, what’s the correct variant to use in this >> context? >> >>> >>>>> This still allows D3 which we found at least failed to go into deepest >>>>> state and >>>> blocked >>>>> platform s0ix for the following SSDs (maybe others): >>>>> Hynix PC601 >>>>> LiteOn CL1 >>>> >>>> We usually write features to spec first, then quirk non-compliant >>>> devices after. >>> >>> NVME spec doesn't talk about a relationship between SetFeatures w/ >>> NVME_FEAT_POWER_MGMGT and D3 support, nor order of events. >>> >>> This is why we opened a dialog with storage vendors, including >>> contrasting the behavior >>> of Microsoft Windows inbox NVME driver and Intel's Windows RST driver. >>> >>> Those two I mention that come to mind immediately because they were most >>> recently >>> tested to fail. Our discussion with storage vendors overwhelmingly >>> requested >>> that we don't use D3 under S2I because their current firmware >>> architecture won't >>> support it. >>> >>> For example one vendor told us with current implementation that receiving >>> D3hot >>> after NVME shutdown will prevent being able to enter L1.2. D3hot entry >>> was supported >>> by an IRQ handler that isn't serviced in NVME shutdown state. >>> >>> Another vendor told us that with current implementation it's impossible >>> to transition >>> to PS4 (at least via APST) while L1.2 D3hot is active. >> >> I tested the patch from Keith and it has two issues just as simply skipping >> nvme_dev_disable(): >> 1) It consumes more power in S2I >> 2) System freeze after resume > > Well, the Keith's patch doesn't prevent pci_pm_suspend_noirq() from > asking for D3 and both of the symptoms above may be consequences of > that in principle. Sorry, I should mention that I use a slightly modified drivers/nvme/host/pci.c: diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3e4fb891a95a..ece428ce6876 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -2833,6 +2834,11 @@ static int nvme_suspend(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct nvme_dev *ndev = pci_get_drvdata(pdev); + if (!pm_suspend_via_firmware()) { + nvme_set_power(&ndev->ctrl, ndev->ctrl.npss); + pci_save_state(pdev); + } + nvme_dev_disable(ndev, true); return 0; } @@ -2842,6 +2848,10 @@ static int nvme_resume(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct nvme_dev *ndev = pci_get_drvdata(pdev); + if (!pm_resume_via_firmware()) { + return nvme_set_power(&ndev->ctrl, 0); + } + nvme_reset_ctrl(&ndev->ctrl); return 0; } Does pci_save_state() here enough to prevent the device enter to D3? Kai-Heng