From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FAKE_REPLY_C,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8904C4332B for ; Fri, 20 Mar 2020 22:19:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7E7F320732 for ; Fri, 20 Mar 2020 22:19:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584742776; bh=csXHNoHEZCJcFFkQwtkt+kVeFZplv3+QRXaIVcYDwQ8=; h=Date:From:To:Cc:Subject:In-Reply-To:List-ID:From; b=EEnpBse8HEifhp0WUSSgJ/q1yKdKVrZSUDXefB6CfcK0jD0h7tazjcjcy7jlKwHVb 15efTLUkXIbRNZcrCZ0L2FJgVdTQxWCYKbf+VDXno/qL19Z8VBgwj7pxBdLW0uGgJ8 k/MWQzVloqWLL/Q6j0+hKN3H2qaY6IVear2kFyLg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727479AbgCTWTf (ORCPT ); Fri, 20 Mar 2020 18:19:35 -0400 Received: from mail.kernel.org ([198.145.29.99]:57596 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727453AbgCTWTd (ORCPT ); Fri, 20 Mar 2020 18:19:33 -0400 Received: from localhost (mobile-166-175-186-165.mycingular.net [166.175.186.165]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id F273720732; Fri, 20 Mar 2020 22:19:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584742773; bh=csXHNoHEZCJcFFkQwtkt+kVeFZplv3+QRXaIVcYDwQ8=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=J+xGqQlCtBvhli5sKpaYIumlmnmPLQMuzUvK+e8vtVT4koVbA0y3k2Xhs3XOvhyjP z7CnJCejD4kGynSVMX8iRp15ep+5F6fRLvEq9gEey/bLfvDb8DdYpL/lVRxR5bvib6 dM4eV4c79FU+H+e1gIhkKh7KO+hXghOT8OIuNoJU= Date: Fri, 20 Mar 2020 17:19:31 -0500 From: Bjorn Helgaas To: Karol Herbst Cc: linux-kernel@vger.kernel.org, Lyude Paul , "Rafael J . Wysocki" , Mika Westerberg , linux-pci@vger.kernel.org, linux-pm@vger.kernel.org, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Mika Westerberg Subject: Re: [PATCH v7] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges Message-ID: <20200320221931.GA23783@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200310192627.437947-1-kherbst@redhat.com> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 10, 2020 at 08:26:27PM +0100, Karol Herbst wrote: > Fixes the infamous 'runtime PM' bug many users are facing on Laptops with > Nvidia Pascal GPUs by skipping said PCI power state changes on the GPU. > > Depending on the used kernel there might be messages like those in demsg: > > "nouveau 0000:01:00.0: Refused to change power state, currently in D3" > "nouveau 0000:01:00.0: can't change power state from D3cold to D0 (config > space inaccessible)" > followed by backtraces of kernel crashes or timeouts within nouveau. > > It's still unkown why this issue exists, but this is a reliable workaround > and solves a very annoying issue for user having to choose between a > crashing kernel or higher power consumption of their Laptops. Thanks for the bugzilla link. The bugzilla mentions lots of mailing list discussion. Can you include links to some of that? IIUC this basically just turns off PCI power management for the GPU. Can you do that with something like the following? I don't know anything about DRM, so I don't know where you could save the pm_cap, but I'm sure the driver could keep it somewhere. diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index b65ae817eabf..2ad825e8891c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -618,6 +618,23 @@ nouveau_drm_device_fini(struct drm_device *dev) kfree(drm); } +static void quirk_broken_nv_runpm(struct drm_device *drm_dev) +{ + struct pci_dev *pdev = drm_dev->pdev; + struct pci_dev *bridge = pci_upstream_bridge(pdev); + + if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL) + return; + + switch (bridge->device) { + case 0x1901: + STASH->pm_cap = pdev->pm_cap; + pdev->pm_cap = 0; + NV_INFO(drm_dev, "Disabling PCI power management to avoid bug\n"); + break; + } +} + static int nouveau_drm_probe(struct pci_dev *pdev, const struct pci_device_id *pent) { @@ -699,6 +716,7 @@ static int nouveau_drm_probe(struct pci_dev *pdev, if (ret) goto fail_drm_dev_init; + quirk_broken_nv_runpm(drm_dev); return 0; fail_drm_dev_init: @@ -735,6 +753,9 @@ nouveau_drm_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev); + /* If we disabled PCI power management, restore it */ + if (STASH->pm_cap) + pdev->pm_cap = STASH->pm_cap; nouveau_drm_device_remove(dev); pci_disable_device(pdev); } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, FAKE_REPLY_C,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F68EC4332E for ; Fri, 20 Mar 2020 22:19:38 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1508120732 for ; Fri, 20 Mar 2020 22:19:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="J+xGqQlC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1508120732 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 193556EB7C; Fri, 20 Mar 2020 22:19:35 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id 790496EB7A; Fri, 20 Mar 2020 22:19:33 +0000 (UTC) Received: from localhost (mobile-166-175-186-165.mycingular.net [166.175.186.165]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id F273720732; Fri, 20 Mar 2020 22:19:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584742773; bh=csXHNoHEZCJcFFkQwtkt+kVeFZplv3+QRXaIVcYDwQ8=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=J+xGqQlCtBvhli5sKpaYIumlmnmPLQMuzUvK+e8vtVT4koVbA0y3k2Xhs3XOvhyjP z7CnJCejD4kGynSVMX8iRp15ep+5F6fRLvEq9gEey/bLfvDb8DdYpL/lVRxR5bvib6 dM4eV4c79FU+H+e1gIhkKh7KO+hXghOT8OIuNoJU= Date: Fri, 20 Mar 2020 17:19:31 -0500 From: Bjorn Helgaas To: Karol Herbst Subject: Re: [PATCH v7] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges Message-ID: <20200320221931.GA23783@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200310192627.437947-1-kherbst@redhat.com> User-Agent: Mutt/1.12.2 (2019-09-21) X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-pm@vger.kernel.org, linux-pci@vger.kernel.org, Mika Westerberg , "Rafael J . Wysocki" , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Mika Westerberg Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Tue, Mar 10, 2020 at 08:26:27PM +0100, Karol Herbst wrote: > Fixes the infamous 'runtime PM' bug many users are facing on Laptops with > Nvidia Pascal GPUs by skipping said PCI power state changes on the GPU. > > Depending on the used kernel there might be messages like those in demsg: > > "nouveau 0000:01:00.0: Refused to change power state, currently in D3" > "nouveau 0000:01:00.0: can't change power state from D3cold to D0 (config > space inaccessible)" > followed by backtraces of kernel crashes or timeouts within nouveau. > > It's still unkown why this issue exists, but this is a reliable workaround > and solves a very annoying issue for user having to choose between a > crashing kernel or higher power consumption of their Laptops. Thanks for the bugzilla link. The bugzilla mentions lots of mailing list discussion. Can you include links to some of that? IIUC this basically just turns off PCI power management for the GPU. Can you do that with something like the following? I don't know anything about DRM, so I don't know where you could save the pm_cap, but I'm sure the driver could keep it somewhere. diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index b65ae817eabf..2ad825e8891c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -618,6 +618,23 @@ nouveau_drm_device_fini(struct drm_device *dev) kfree(drm); } +static void quirk_broken_nv_runpm(struct drm_device *drm_dev) +{ + struct pci_dev *pdev = drm_dev->pdev; + struct pci_dev *bridge = pci_upstream_bridge(pdev); + + if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL) + return; + + switch (bridge->device) { + case 0x1901: + STASH->pm_cap = pdev->pm_cap; + pdev->pm_cap = 0; + NV_INFO(drm_dev, "Disabling PCI power management to avoid bug\n"); + break; + } +} + static int nouveau_drm_probe(struct pci_dev *pdev, const struct pci_device_id *pent) { @@ -699,6 +716,7 @@ static int nouveau_drm_probe(struct pci_dev *pdev, if (ret) goto fail_drm_dev_init; + quirk_broken_nv_runpm(drm_dev); return 0; fail_drm_dev_init: @@ -735,6 +753,9 @@ nouveau_drm_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev); + /* If we disabled PCI power management, restore it */ + if (STASH->pm_cap) + pdev->pm_cap = STASH->pm_cap; nouveau_drm_device_remove(dev); pci_disable_device(pdev); } _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel