From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Rafael J. Wysocki" Subject: Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges Date: Fri, 22 Nov 2019 12:54:26 +0100 Message-ID: References: <20191121112821.GU11621@lahna.fi.intel.com> <20191121114610.GW11621@lahna.fi.intel.com> <20191121125236.GX11621@lahna.fi.intel.com> <20191121194942.GY11621@lahna.fi.intel.com> <20191122103637.GA11621@lahna.fi.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Karol Herbst Cc: "Rafael J. Wysocki" , Mika Westerberg , Bjorn Helgaas , LKML , Lyude Paul , "Rafael J . Wysocki" , Linux PCI , Linux PM , dri-devel , nouveau , Dave Airlie , Mario Limonciello List-Id: dri-devel@lists.freedesktop.org On Fri, Nov 22, 2019 at 12:34 PM Karol Herbst wrote: > > On Fri, Nov 22, 2019 at 12:30 PM Rafael J. Wysocki wrote: > > [cut] > > > > the issue is not AML related at all as I am able to reproduce this > issue without having to invoke any of that at all, I just need to poke > into the PCI register directly to cut the power. Since the register is not documented, you don't actually know what exactly happens when it is written to. You basically are saying something like "if I write a specific value to an undocumented register, that makes things fail". And yes, writing things to undocumented registers is likely to cause failure to happen, in general. The point is that the kernel will never write into this register by itself. > The register is not documented, but effectively what the AML code is writing to as well. So that AML code is problematic. It expects the write to do something useful, but that's not the case. Without the AML, the register would not have been written to at all. > Of course it might also be that the code I was testing it was doing > things in a non conformant way and I just hit a different issue as > well, but in the end I don't think that the AML code is the root cause > of all of that. If AML is not involved at all, things work. You've just said so in another message in this thread, quoting verbatim: "yes. In my previous testing I was poking into the PCI registers of the bridge controller and the GPU directly and that never caused any issues as long as I limited it to putting the devices into D3hot." You cannot claim a hardware bug just because a write to an undocumented register from AML causes things to break. First, that may be a bug in the AML (which is not unheard of). Second, and that is more likely, the expectations of the AML code may not be met at the time it is run. Assuming the latter, the root cause is really that the kernel executes the AML in a hardware configuration in which the expectations of that AML are not met. We are now trying to understand what those expectations may be and so how to cause them to be met. Your observation that the issue can be avoided if the GPU is not put into D3hot by a PMCSR write is a step in that direction and it is a good finding. The information from Mika based on the ASL analysis is helpful too. Let's not jump to premature conclusions too quickly, though. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 611A5C432C0 for ; Fri, 22 Nov 2019 11:54:42 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 410A52068E for ; Fri, 22 Nov 2019 11:54:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 410A52068E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8A8C56E2D6; Fri, 22 Nov 2019 11:54:40 +0000 (UTC) Received: from mail-ot1-f67.google.com (mail-ot1-f67.google.com [209.85.210.67]) by gabe.freedesktop.org (Postfix) with ESMTPS id 677DC6E29E; Fri, 22 Nov 2019 11:54:39 +0000 (UTC) Received: by mail-ot1-f67.google.com with SMTP id 23so4147363otf.2; Fri, 22 Nov 2019 03:54:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C4mw3VH3FIHQmO15qFz9D30nizXldnLunLKZhMRaWvs=; b=DtY4x/Ttboi+XYZdQd9gIvaaFl2URmNPywwDzHuCCljClOJbRKc5BSoTrGlvdsWkO6 XeAnOXz8wpsLTmQ0rrYDCg3xY3Q/ByHSQ6hUEmB+fNP27JbAzBnuov2U7TPNdSPpVAS2 khqppnVwUb9d97Io24x4TmPdjnJ5Ce/yy04DbSRWox60KxEqfvXJxBhClyx+W0tdMnPg sPJp55ZVEjDbofiMsgySLR9dg2quVOY34vzOeLOh1hygcCFnC47w4kdwcZmfx6wvzCyw KOy7u6nbJlUU1nkLn2XK3r5cowf+bm3CrjNqDkip7JbaJXxlwaHqu0/1cKBVMWj7gJib 9DBQ== X-Gm-Message-State: APjAAAXTA1+H6Pbt98ylH4Zx11j88nwbtRHzjedO/2sTtd1d6KWvHsgM M/K7P+nn946JTT5MzLTbjaPJIFqzzxD9RBJFltc= X-Google-Smtp-Source: APXvYqwXetKL36FIRsxPYnOnyYAS3YKHZ/sbHgfrxxTdSkSPdYmAXFC+StyAtNbf4Peqs9hEtloUH7Ir06gvcptDfkE= X-Received: by 2002:a9d:7d01:: with SMTP id v1mr9895024otn.167.1574423678614; Fri, 22 Nov 2019 03:54:38 -0800 (PST) MIME-Version: 1.0 References: <20191121112821.GU11621@lahna.fi.intel.com> <20191121114610.GW11621@lahna.fi.intel.com> <20191121125236.GX11621@lahna.fi.intel.com> <20191121194942.GY11621@lahna.fi.intel.com> <20191122103637.GA11621@lahna.fi.intel.com> In-Reply-To: From: "Rafael J. Wysocki" Date: Fri, 22 Nov 2019 12:54:26 +0100 Message-ID: Subject: Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges To: Karol Herbst X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Rafael J. Wysocki" , Linux PCI , Mika Westerberg , Linux PM , "Rafael J . Wysocki" , LKML , dri-devel , Mario Limonciello , Bjorn Helgaas , nouveau Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Message-ID: <20191122115426.5cjNw-0SbK7LnI87yoL1nYZOCxvcDYxTvDH2p9LP7XI@z> T24gRnJpLCBOb3YgMjIsIDIwMTkgYXQgMTI6MzQgUE0gS2Fyb2wgSGVyYnN0IDxraGVyYnN0QHJl ZGhhdC5jb20+IHdyb3RlOgo+Cj4gT24gRnJpLCBOb3YgMjIsIDIwMTkgYXQgMTI6MzAgUE0gUmFm YWVsIEouIFd5c29ja2kgPHJhZmFlbEBrZXJuZWwub3JnPiB3cm90ZToKPiA+CgpbY3V0XQoKPiA+ Cj4KPiB0aGUgaXNzdWUgaXMgbm90IEFNTCByZWxhdGVkIGF0IGFsbCBhcyBJIGFtIGFibGUgdG8g cmVwcm9kdWNlIHRoaXMKPiBpc3N1ZSB3aXRob3V0IGhhdmluZyB0byBpbnZva2UgYW55IG9mIHRo YXQgYXQgYWxsLCBJIGp1c3QgbmVlZCB0byBwb2tlCj4gaW50byB0aGUgUENJIHJlZ2lzdGVyIGRp cmVjdGx5IHRvIGN1dCB0aGUgcG93ZXIuCgpTaW5jZSB0aGUgcmVnaXN0ZXIgaXMgbm90IGRvY3Vt ZW50ZWQsIHlvdSBkb24ndCBhY3R1YWxseSBrbm93IHdoYXQKZXhhY3RseSBoYXBwZW5zIHdoZW4g aXQgaXMgd3JpdHRlbiB0by4KCllvdSBiYXNpY2FsbHkgYXJlIHNheWluZyBzb21ldGhpbmcgbGlr ZSAiaWYgSSB3cml0ZSBhIHNwZWNpZmljIHZhbHVlCnRvIGFuIHVuZG9jdW1lbnRlZCByZWdpc3Rl ciwgdGhhdCBtYWtlcyB0aGluZ3MgZmFpbCIuICBBbmQgeWVzLAp3cml0aW5nIHRoaW5ncyB0byB1 bmRvY3VtZW50ZWQgcmVnaXN0ZXJzIGlzIGxpa2VseSB0byBjYXVzZSBmYWlsdXJlIHRvCmhhcHBl biwgaW4gZ2VuZXJhbC4KClRoZSBwb2ludCBpcyB0aGF0IHRoZSBrZXJuZWwgd2lsbCBuZXZlciB3 cml0ZSBpbnRvIHRoaXMgcmVnaXN0ZXIgYnkgaXRzZWxmLgoKPiBUaGUgcmVnaXN0ZXIgaXMgbm90 IGRvY3VtZW50ZWQsIGJ1dCBlZmZlY3RpdmVseSB3aGF0IHRoZSBBTUwgY29kZSBpcyB3cml0aW5n IHRvIGFzIHdlbGwuCgpTbyB0aGF0IEFNTCBjb2RlIGlzIHByb2JsZW1hdGljLiAgSXQgZXhwZWN0 cyB0aGUgd3JpdGUgdG8gZG8gc29tZXRoaW5nCnVzZWZ1bCwgYnV0IHRoYXQncyBub3QgdGhlIGNh c2UuICBXaXRob3V0IHRoZSBBTUwsIHRoZSByZWdpc3RlciB3b3VsZApub3QgaGF2ZSBiZWVuIHdy aXR0ZW4gdG8gYXQgYWxsLgoKPiBPZiBjb3Vyc2UgaXQgbWlnaHQgYWxzbyBiZSB0aGF0IHRoZSBj b2RlIEkgd2FzIHRlc3RpbmcgaXQgd2FzIGRvaW5nCj4gdGhpbmdzIGluIGEgbm9uIGNvbmZvcm1h bnQgd2F5IGFuZCBJIGp1c3QgaGl0IGEgZGlmZmVyZW50IGlzc3VlIGFzCj4gd2VsbCwgYnV0IGlu IHRoZSBlbmQgSSBkb24ndCB0aGluayB0aGF0IHRoZSBBTUwgY29kZSBpcyB0aGUgcm9vdCBjYXVz ZQo+IG9mIGFsbCBvZiB0aGF0LgoKSWYgQU1MIGlzIG5vdCBpbnZvbHZlZCBhdCBhbGwsIHRoaW5n cyB3b3JrLiAgWW91J3ZlIGp1c3Qgc2FpZCBzbyBpbgphbm90aGVyIG1lc3NhZ2UgaW4gdGhpcyB0 aHJlYWQsIHF1b3RpbmcgdmVyYmF0aW06CgoieWVzLiBJbiBteSBwcmV2aW91cyB0ZXN0aW5nIEkg d2FzIHBva2luZyBpbnRvIHRoZSBQQ0kgcmVnaXN0ZXJzIG9mIHRoZQpicmlkZ2UgY29udHJvbGxl ciBhbmQgdGhlIEdQVSBkaXJlY3RseSBhbmQgdGhhdCBuZXZlciBjYXVzZWQgYW55Cmlzc3VlcyBh cyBsb25nIGFzIEkgbGltaXRlZCBpdCB0byBwdXR0aW5nIHRoZSBkZXZpY2VzIGludG8gRDNob3Qu IgoKWW91IGNhbm5vdCBjbGFpbSBhIGhhcmR3YXJlIGJ1ZyBqdXN0IGJlY2F1c2UgYSB3cml0ZSB0 byBhbgp1bmRvY3VtZW50ZWQgcmVnaXN0ZXIgZnJvbSBBTUwgY2F1c2VzIHRoaW5ncyB0byBicmVh ay4KCkZpcnN0LCB0aGF0IG1heSBiZSBhIGJ1ZyBpbiB0aGUgQU1MICh3aGljaCBpcyBub3QgdW5o ZWFyZCBvZikuClNlY29uZCwgYW5kIHRoYXQgaXMgbW9yZSBsaWtlbHksIHRoZSBleHBlY3RhdGlv bnMgb2YgdGhlIEFNTCBjb2RlIG1heQpub3QgYmUgbWV0IGF0IHRoZSB0aW1lIGl0IGlzIHJ1bi4K CkFzc3VtaW5nIHRoZSBsYXR0ZXIsIHRoZSByb290IGNhdXNlIGlzIHJlYWxseSB0aGF0IHRoZSBr ZXJuZWwgZXhlY3V0ZXMKdGhlIEFNTCBpbiBhIGhhcmR3YXJlIGNvbmZpZ3VyYXRpb24gaW4gd2hp Y2ggdGhlIGV4cGVjdGF0aW9ucyBvZiB0aGF0CkFNTCBhcmUgbm90IG1ldC4KCldlIGFyZSBub3cg dHJ5aW5nIHRvIHVuZGVyc3RhbmQgd2hhdCB0aG9zZSBleHBlY3RhdGlvbnMgbWF5IGJlIGFuZCBz bwpob3cgdG8gY2F1c2UgdGhlbSB0byBiZSBtZXQuCgpZb3VyIG9ic2VydmF0aW9uIHRoYXQgdGhl IGlzc3VlIGNhbiBiZSBhdm9pZGVkIGlmIHRoZSBHUFUgaXMgbm90IHB1dAppbnRvIEQzaG90IGJ5 IGEgUE1DU1Igd3JpdGUgaXMgYSBzdGVwIGluIHRoYXQgZGlyZWN0aW9uIGFuZCBpdCBpcyBhCmdv b2QgZmluZGluZy4gIFRoZSBpbmZvcm1hdGlvbiBmcm9tIE1pa2EgYmFzZWQgb24gdGhlIEFTTCBh bmFseXNpcyBpcwpoZWxwZnVsIHRvby4gIExldCdzIG5vdCBqdW1wIHRvIHByZW1hdHVyZSBjb25j bHVzaW9ucyB0b28gcXVpY2tseSwKdGhvdWdoLgpfX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fXwpkcmktZGV2ZWwgbWFpbGluZyBsaXN0CmRyaS1kZXZlbEBsaXN0 cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9s aXN0aW5mby9kcmktZGV2ZWw=